This set of Data Structures & Algorithms Multiple Choice Questions & Answers (MCQs) focuses on “Min Hash”.

1. Which technique is used for finding similarity between two sets?

a) MinHash

b) Stack

c) Priority Queue

d) PAT Tree

View Answer

Explanation: In computer science as well as data mining, to find the similarity between two given sets, a technique called MinHash or min-wise independent permutation scheme is used. It helps in the quick estimation of the similarity between two sets.

2. Who invented the MinHash technique?

a) Weiner

b) Samuel F. B. Morse

c) Friedrich Clemens Gerke

d) Andrei Broder

View Answer

Explanation: In computer science as well as data mining, to find the similarity between two given sets, a technique called MinHash or min-wise independent permutation scheme is used. It helps in the quick estimation of the similarity between two sets. It was invented by Andrei Broder in 1997.

3. Which technique was firstly used to remove duplicate web pages from search results in AltaVista search engine?

a) MinHash

b) Stack

c) Priority Queue

d) PAT Tree

View Answer

Explanation: In computer science as well as data mining, to find the similarity between two given sets, a technique called MinHash or min-wise independent permutation scheme is used. It helps in the quick estimation of the similarity between two sets. It is used in removing duplicate web pages from search results in AltaVista search engine.

4. Which technique was firstly used clustering documents using the similarity of two words or strings?

a) MinHash

b) Stack

c) Priority Queue

d) PAT Tree

View Answer

Explanation: In computer science as well as data mining, to find the similarity between two given sets, a technique called MinHash or min-wise independent permutation scheme is used. It helps in the quick estimation of similarity between two sets. It is used in clustering documents using the similarity of two words or strings.

5. Which indicator is used for similarity between two sets?

a) Rope Tree

b) Jaccard Coefficient

c) Tango Tree

d) MinHash Coefficient

View Answer

Explanation: In computer science as well as data mining, to find the similarity between two given sets, a technique called MinHash or min-wise independent permutation scheme is used. It helps in the quick estimation of similarity between two sets. Jaccard Coefficient is used for similarity between two sets.

6. Which of the following is defined as the ratio of total elements of intersection and union of two sets?

a) Rope Tree

b) Jaccard Coefficient Index

c) Tango Tree

d) MinHash Coefficient

View Answer

Explanation: MinHash helps in the quick estimation of similarity between two sets. Jaccard Coefficient is used for similarity between two sets. Jaccard Coefficient Index is defined as the ratio of total elements of intersection and union of two sets.

7. What is the value of the Jaccard index when the two sets are disjoint?

A) 1

b) 2

c) 3

d) 0

View Answer

Explanation: MinHash helps in the quick estimation of similarity between two sets. Jaccard Coefficient is used for similarity between two sets. Jaccard Coefficient Index is defined as the ratio of total elements of intersection and union of two sets. For two disjoint sets, the value of the Jaccard index is zero.

8. When are the members of two sets more common relatively?

a) Jaccard Index is Closer to 1

b) Jaccard Index is Closer to 0

c) Jaccard Index is Closer to -1

d) Jaccard Index is Farther to 1

View Answer

Explanation: Jaccard Coefficient Index is defined as the ratio of total elements of intersection and union of two sets. For two disjoint sets, the value of the Jaccard index is zero. The members of two set more common relatively when the Jaccard Index is Closer to 1.

9. What is the expected error for estimating the Jaccard index using MinHash scheme for k different hash functions?

a) O (log k!)

b) O (k!)

c) O (k^{2})

d) O (1/k½)

View Answer

Explanation: Jaccard Coefficient Index is defined as the ratio of total elements of intersection and union of two sets. For two disjoint sets, the value of the Jaccard index is zero. The expected error for estimating the Jaccard index using MinHash scheme for k different hash functions is O (1/k½).

10. How many hashes will be needed for calculating Jaccard index with an expected error less than or equal to 0.05?

a) 100

b) 200

c) 300

d) 400

View Answer

Explanation: The expected error for estimating the Jaccard index using MinHash scheme for k different hash functions is O (1/k½). 400 hashes will be needed for calculating Jaccard index with an expected error less than or equal to 0.05.

11. What is the expected error by the estimator Chernoff bound on the samples performed without replacement?

a) O (log k!)

b) O (k!)

c) O (k^{2})

d) O (1/k½)

View Answer

Explanation: The expected error for estimating the Jaccard index using MinHash scheme for k different hash functions is O (1/k½). The expected error by the estimator Chernoff bound on the samples performed without replacement is O (1/k½).

12. What is the time required for single variant hashing to maintain the minimum hash queue?

a) O (log n!)

b) O (n!)

c) O (n^{2})

d) O (n)

View Answer

Explanation: The expected error for estimating the Jaccard index using MinHash scheme for k different hash functions is O (1/k½). The time required for single variant hashing to maintain the minimum hash queue is O (n).

13. How many bits are needed to specify the single permutation by min-wise independent family?

a) O (log n!)

b) O (n!)

c) Ω (n^{2})

d) Ω (n)

View Answer

Explanation: The time required for single variant hashing to maintain the minimum hash queue is O (n). Ω (n) bits are needed to specify the single permutation by min-wise independent family.

14. Is MinHash used as a tool for association rule learning.

a) True

b) False

View Answer

Explanation: MinHash was originally used to remove the duplicate webpages from a search engine. But in data mining, MinHash used as a tool for association rule learning by Cohen at 2001.

15. Did Google conduct a large evaluation for comparing the performance by two technique MinHash and SimHash.

a) True

b) False

View Answer

Explanation: MinHash was originally used to remove the duplicate webpages from a search engine. But in data mining, MinHash used as a tool for association rule learning by Cohen at 2001. Google conducted a survey to compare the performance by two technique MinHash and SimHash.

**Sanfoundry Global Education & Learning Series – Data Structure.**

To practice all areas of Data Structure, __here is complete set of 1000+ Multiple Choice Questions and Answers__.