95.0k views
3 votes
How many hashes will be needed for calculating the Jaccard index with an expected error less than or equal to 0.10?

A. Depends on the data set
B. Fewer than 100 hashes
C. Around 1000 hashes
D. More than 10,000 hashes

User Bwalshy
by
7.5k points

1 Answer

4 votes

Final answer:

To calculate the Jaccard index with an expected error less than or equal to 0.10, fewer than 100 hash functions are needed. The formula SE = (1 / sqrt(k)) is used to determine this, where k represents the number of hash functions, leading to the answer B.

Step-by-step explanation:

The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used in understanding the similarity and diversity of sample sets. The coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets.

To estimate the Jaccard index using MinHash (or hash functions), with an expected error of less than or equal to 0.10, we can use the formula for the standard error of the Jaccard index estimation: SE = (1 / sqrt(k)), where k is the number of hash functions used.

Therefore, setting the SE to 0.10 and solving for k gives us k = (1 / SE)2, which means we would need 100 hashes.

The correct answer is B. Fewer than 100 hashes will be needed for calculating the Jaccard index with an expected error less than or equal to 0.10.

User Lisovaccaro
by
7.1k points