filter - Best data structure for search? -

- May 15, 2011

I have a list with items where each has the number of properties (A, B, C, D) Filter (A, B, C, D) using the template with the same feature I want when I use a template I would like to filter all items that match this template. The match is assumed that if the item is equal to the template or after the smaller (matches 0 items).

Example Data

  ABCD 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Example templates

  [2 0] will filter {0 0], [2 0]} [2 0 2] will filter {[0], [2] 0], [2 0 2]} [2 0 2] will filter {{0], [2 2 2 1]} [3 4 5 6] filter will [0]} [0 0] filter {[0], [0], [2 0 2], [02 0]}

The problem is that the number of comparisons easily reach 300k And sometimes it can be slow, what can I use to make things quicker? Any ideas?

Keep all items in 16 buckets, assuming 4 properties.

The first bucket, where there is no zero-value for the properties. Selecting from here - General Lookup based on Key ABCD

The second bucket is where property A == 0. Selecting from here is a lookup on the template with the value of BCD.

The third bucket, where B == 0. Here is a lookup on the template with the value of the selection ACD.

The fourth is where A == 0 and B == 0. Select from here template with the value of a lookup CD.

....

Fifth is where A, B, C == 0 is on Lookup D.

is the 16th, where A, B, C, D == 0. It can be a Boolean variable; -)

Since all 16 bets are 'exact match' - you can use methods like hash tables for them to search inside them.

(This proposal is based on the assumption that this prop is 0 in value which is counted as 'any match', not in the template.) - because 2000 only selected in your exaample One value is clearly wrong if words are 'any' in both places.

Updates: Result: You can not exceed 2 ^ ENProperties matches.

Example:

Let's say we have 3 properties A, B, C and the following four items:

  itemX [A = 1, B = 0, C = 1] ---> B is a wildcard, so BaltAce [11] = Item X item Y [A = 2, B = 0, C = 0] ---> B and C are wildcards, then Bucket A [2] = Item Y items [A = 2, B = 1, C = 0] ---> C is a wildcard, so the Baltibab [21] = items

Now, the look for a key 'ABC' will be as follows (I also include the contents of the bucket in the right, read ease of reading, and '& lt;

  1.results & lt; & Lt; Bucket a [a] | '2' = & gt; Item Y [A = 2, B = 0, C = 0] 2. Results & lt; & Lt; Bucket B [B] 3. Rusts & lt; & Lt; Bucketab [AB] | '21' = & gt; Item SW [A = 2, B = 1, C = 0] 4. Loots & lt; & Lt; Bucket c [c] 5. rusts & lt; & Lt; Baltike [AC] | '11' = & gt; Madax [A = 1, B = 0, C = 1] 6. Resets & lt; & Lt; Baltibi [BC] 7. Reset & lt; & Lt; Bucket ABC [ABC] 8. Candidates & lt; & Lt; Bucket_item_all_wildcards

So if we use a template [2 0], then we only get results from key = A = 2 in the bucket if we use the template [2 1 0] If we use, we get results from the A = 2 being the key in Bucket A, and two results are obtained from AB = 21 in BackTit.

NB: Of course, the novel is like the key for the key, but it only treats "hashteble-like access with the end of the properties being said".

If you allow items with multiple properties at times, you will need to have several elements in some slots - and then, obviously, you can have more than 2 ^ ENProperties search results , You can still track the maximum number of duplicates and therefore always can calculate the maximum number of items with the worst position. / P>

Specifically, if the number of properties increases, then the total number of buckets will fly in a hurry (for example, 32 properties will mean more than 4 billion buckets), so this idea is not directly implemented. There will be more optimization around and around bucket traversal / allocation.

Search This Blog

IDEA SSL

filter - Best data structure for search? -

Comments

Post a Comment

Popular posts from this blog

c# - ListView onScroll event -

PHP - get image from byte array -

Linux Terminal Problem with Non-Canonical Terminal I/O app -