This set of Data Mining Multiple Choice Questions & Answers (MCQs) focuses on “Measuring Data Similarity and Dissimilarity – Set 2”.

1. Which of the following is the correct dissimilarity matrix for the given data?

Object | Color |
---|---|

A | Red |

B | Blue |

C | Blue |

a) \(\begin{bmatrix}

0 & & \\

1 & 0 & \\

1 & 0 & 0

\end{bmatrix}\)

b) \(\begin{bmatrix}

0 & & \\

1 & 0 & \\

0 & 0 & 0

\end{bmatrix}\)

c) \(\begin{bmatrix}

0 & & \\

0 & 0 & \\

1 & 0 & 1

\end{bmatrix}\)

d) \(\begin{bmatrix}

0 & & \\

1 & 1 & \\

1 & 0 & 0

\end{bmatrix}\)

View Answer

Explanation: The objects that have same color have zero dissimilarity. Others have highest dissimilarity value of 1.

Let Dissimilarity (object 1, object 2) = d (i, j). Also, the dissimilarity matrix is represented as:

\( \begin{matrix}

\\

A\\

B\\

C

\end{matrix}

\begin{bmatrix}

A & B & C \\

0 & & \\

& 0 & \\

& & 0 \\

\end{bmatrix}\)

From the data given above, d(A, B) = 1, d(A, C) = 1, d(B, C) = 0

Hence, dissimilarity matrix is:

\( \begin{matrix}

\\

A\\

B\\

C

\end{matrix}

\begin{bmatrix}

A & B & C \\

0 & & \\

1 & 0 & \\

1 & 0 & 0 \\

\end{bmatrix}\).

2. Which of the following is the correct similarity matrix for the given data?

Object | Color |
---|---|

A | Red |

B | Green |

C | Blue |

D | Green |

a) \(\begin{bmatrix}

0 & & & \\

1 & 0 & & \\

1 & 0 & 0 & \\

0 & 1 & 1 & 0

\end{bmatrix}\)

b) \(\begin{bmatrix}

1 & & & \\

0 & 1 & & \\

0 & 0 & 1 & \\

0 & 1 & 0 & 1

\end{bmatrix}\)

c) \(\begin{bmatrix}

0 & & & \\

1 & 0 & & \\

1 & 1 & 0 & \\

0 & 1 & 1 & 0

\end{bmatrix}\)

d) \(\begin{bmatrix}

0 & & & \\

1 & 1 & & \\

1 & 0 & 0 & \\

0 & 1 & 1 & 1

\end{bmatrix}\)

View Answer

Explanation: The objects that have same color have similarity value 1. Others have similarity value 0.

Let Similarity (object 1, object 2) = s (i, j). Also, the similarity matrix is represented as:

\( \begin{matrix}

\\

A\\

B\\

C

\end{matrix}

\begin{bmatrix}

A & B & C \\

1 & & \\

& 1 & \\

& & 1 \\

\end{bmatrix}\)

From the data given above, s(A, B) = 0, s(A, C) = 0, s(A, D) = 0, s(B, C) = 0, s(B, D) = 1, s(C, D) = 0

Hence, similarity matrix is:

\( \begin{matrix}

\\

A\\

B\\

C\\

D

\end{matrix}

\begin{bmatrix}

A & B & C & D\\

1 & & & \\

0 & 1 & & \\

0 & 0 & 1 & \\

0 & 1 & 0 & 1

\end{bmatrix}\)

3. Which of the following is the correct expression for symmetric binary dissimilarity between two objects, where q and t attributes have value 1 and 0 for both objects respectively, r attributes have value 1 for first object and 0 for the second object and s attributes have value 0 for the first object and 1 for the second object?

a) Dissimilarity = (r + s)/ (q + r + s + t)

b) Dissimilarity = (q – s)/ (q + r + s + r)

c) Dissimilarity = (t + s)/ (q + r + s + t)

d) Dissimilarity = (r – s)/ (q + r + s + t)

View Answer

Explanation: The symmetric binary dissimilarity between two objects is given by, Dissimilarity = (r + s)/ (q + r + s + t), where r represents the number of attributes that have value 1 for first object and 0 for the second object, s represents the number of attributes that have value 0 for the first object and 1 for the second object and q and t represent the number of attributes have value 1 and 0 for both objects respectively.

4. Given two objects, if q and t attributes have value 1 and 0 for both objects respectively, r attributes have value 1 for first object and 0 for the second object and s attributes have value 0 for the first object and 1 for the second object, which of the following is the correct expression for asymmetric binary dissimilarity between the two objects?

a) Dissimilarity = (r + s)/ (q + r + t)

b) Dissimilarity = (q – s)/ (q + r + s)

c) Dissimilarity = (r + s)/ (q + r + s)

d) Dissimilarity = (t – s)/ ( r + s + t)

View Answer

Explanation: The asymmetric binary dissimilarity between two objects is given by, Dissimilarity = (r + s)/ (q + r + s), where r represent the number of attributes that have value 1 for first object and 0 for the second object and s represent the number of attributes that have value 0 for the first object and 1 for the second object, q and t attributes have value 1 and 0 for both objects respectively.

5. Which of the following is also referred to as Jaccard coefficient?

a) Symmetric binary similarity

b) Symmetric binary dissimilarity

c) Asymmetric binary similarity

d) Asymmetric binary dissimilarity

View Answer

Explanation: For a given data set with binary attributes, the states of the binary attributes may not have equal importance. The similarity in such cases is referred to as asymmetric binary similarity or Jaccard coefficient.

6. Which of the following is true about the Euclidean distance between the given objects?

object | Result 1 | Result 2 | Result 3 | Result 4 | Result 5 |
---|---|---|---|---|---|

Object 1 | 1 | 3 | 2 | 3 | 1 |

Object 2 | 3 | 7 | 4 | 8 | 6 |

a) The Euclidean distance between the objects is 7.5

b) The Euclidean distance between the objects is 4.2

c) The Euclidean distance between the objects is 3.7

d) The Euclidean distance between the objects is 8.6

View Answer

Explanation: The Euclidean distance is given by:

D = ((x1-y1)

^{2}+ (x2-y2)

^{2}+ …..+(xn-yn)

^{2}))

^{1/2}

So, from the given data,

D = ((1-3)

^{2}+ (3-7)

^{2}+ (2-4)

^{2}+ (3-8)

^{2}+ (1-6)

^{2})

^{1/2}

D = (4 + 16 + 4 + 25 + 25)

^{1/2}

D = (74)

^{1/2}= 8.6

7. Which of the following is true about the Manhattan distance between the given grades of the students?

object | Grade 1 | Grade 2 | Grade 3 |
---|---|---|---|

Student 1 | 3.5 | 3.2 | 2.4 |

Student 2 | 3 | 2.7 | 3.9 |

a) The Manhattan distance between the objects is 4.5

b) The Manhattan distance between the objects is 3.5

c) The Manhattan distance between the objects is 2.5

d) The Manhattan distance between the objects is 1.5

View Answer

Explanation: The Manhattan distance is given by:

D = |x1-y1| + |x2-y2| + …..+|xn-yn|

So, from the given data,

D = |3.5 – 3| + |3.2 – 2.7| + |2.4 – 3.9|

D = 0.5 + 0.5 + 1.5

D = 2.5

8. Which of the following statement is true regarding the Euclidean distance?

a) The Euclidean distance is equal to the L_{1} norm

b) The Euclidean distance is equal to the L_{2} norm

c) The Euclidean distance is equal to the L_{3} norm

d) The Euclidean distance is equal to the L_{4} norm

View Answer

Explanation: The Euclidean distance between objects is used as a distance measure to find the similarity or dissimilarity between the objects. Euclidean distance is also referred to as the L

_{2}norm.

9. L^{∞} norm is also known as _____

a) Uniform norm

b) Non-uniform norm

c) Jaccard norm

d) L_{1} norm

View Answer

Explanation: The supremum distance between objects is a generalization of the Minkowski distance. The supremum distance is also referred to as the L

^{∞}norm. The L

^{∞}norm is also known as the uniform norm.

10. Which of the following is true about the supremum distance between the given objects?

object | Part 1 | Part 2 | Part 3 |
---|---|---|---|

object 1 | 3 | 4 | 8 |

object 2 | 2 | 7 | 3 |

a) The supremum distance between the objects is 2

b) The supremum distance between the objects is 6

c) The supremum distance between the objects is 4

d) The supremum distance between the objects is 5

View Answer

Explanation: The supremum difference is the maximum difference between attribute values of two objects.

Supremum distance = max(|x1-y1|, …….. , |xn-yn|)

S = max(|3-2|, |4-7|, |8-3|)

S = max(1, 3, 5) = 5

11. Which of the following is true about the Euclidean distance between the grades of students if Grade 1 carries double weightage than Grade 2?

object | Grade 1 | Grade 2 |
---|---|---|

Student 1 | 3 | 2 |

Student 2 | 2 | 2 |

a) The Euclidean distance is 2.7

b) The Euclidean distance is 4.4

c) The Euclidean distance is 2.9

d) The Euclidean distance is 1.4`

View Answer

Explanation: We find the weighted Euclidean distance. It is given by:

D = (w1*(x1-y1)

^{2}+w2 *(x2-y2)

^{2}+ …..+wn*(xn-yn)

^{2}))

^{1/2}

w1 = 2, w2 = 1

D = (2*(3-2)

^{2}+ 1*(2-2)

^{2})

^{1/2}

D = (2)

^{1/2}= 1.4

12. Which of the following is the dissimilarity matrix for the movie ratings data given in the diagram, where each movie can have a rating of either average, good and excellent and distance measure be Euclidean distance?

Survey | Rating |
---|---|

Person 1 | Good |

Person 2 | Excellent |

Person 3 | Good |

a) \(\begin{bmatrix}

0 & & \\

0.5 & 0 & \\

0 & 0.5 & 0

\end{bmatrix}\)

b) \(\begin{bmatrix}

0 & & \\

0.5 & 0 & \\

0.5 & 0.5 & 0

\end{bmatrix}\)

c) \(\begin{bmatrix}

0 & & \\

0.5 & 0 & \\

0 & 1 & 0

\end{bmatrix}\)

d) \(\begin{bmatrix}

0 & & \\

1 & 0 & \\

0 & 0.5 & 0

\end{bmatrix}\)

View Answer

Explanation: There are 3 ordered states for attribute Rating – Average, Good and Excellent. We assign a rank to each rating:

Average – Rank 1

Good – Rank 2

Excellent – Rank 3

Normalizing the ranks using the formula: Normalized rank = (Rank – 1)/ (No of ordered states – 1)

The ranks are normalized as follows:

Rank 1 = (1-1)/ (3-1) = 0.0

Rank 2 = (2-1)/ (3-1) = 0.5

Rank 3 = (3-1)/ (3-1) = 1.0

The dissimilarity matrix for the data is given by:

\( \begin{matrix}

\\

A\\

B\\

C

\end{matrix}

\begin{bmatrix}

A & B & C \\

0 & & \\

& 0 & \\

& & 0 \\

\end{bmatrix}\)

We fill cells of the matrix with Euclidean distance.

d(A,B) = ((0.5-1)

^{2})

^{1/2}, d(A,C) = ((0.5-0.5)

^{2})

^{1/2}, d(B,C) = ((1-0.5)

^{2})

^{1/2}

Hence, the dissimilarity matrix is:

\( \begin{matrix}

\\

A\\

B\\

C

\end{matrix}

\begin{bmatrix}

A & B & C \\

0 & & \\

0.5 & 0 & \\

0 & 0.5 & 0 \\

\end{bmatrix}\)

13. Term frequency vectors are often sparse.

a) True

b) False

View Answer

Explanation: A document contains terms which occur with some frequency. This information can be represented in the form of term-frequency vector. Term-frequency vectors often have many 0 values and hence, they are often sparse. Also, they are often very long.

14. A value 0 for cosine similarity between two objects indicates that the objects are alike.

a) True

b) False

View Answer

Explanation: The cosine similarity between two objects is used to find the similarity between the objects. It is often used to find similarity between documents. A value 0 for cosine similarity indicates that the objects are not alike.

15. Which of the following is true about the cosine similarity between A = (1, 2, 4, 2, 6) and B = (2, 4, 6, 5, 3)?

a) The cosine similarity is 0.12

b) The cosine similarity is 0.01

c) The cosine similarity is 0.43

d) The cosine similarity is 0.56

View Answer

Explanation: Given vectors A = (1, 2, 4, 2, 6) and B = (2, 4, 6, 5, 3)

A.B = (1*2 + 2*4 + 4*6 + 2*5 + 6*3) = (2 + 8 + 24 + 10 + 18) = 62

||A|| = (1

^{2}+ 2

^{2}+ 4

^{2}+ 2

^{2}+ 6

^{2}) = (1 + 4 + 16 + 4 + 36) = 61

||B|| = (2

^{2}+ 4

^{2}+ 6

^{2}+ 5

^{2}+ 3

^{2}) = (4 + 16 + 36 + 25 + 9) = 90

M = ||A||*||B|| = 61 * 90 = 5490

Sim = A.B/M

= 62/5490

= 0.011

**Sanfoundry Global Education & Learning Series – Data Mining.**

To practice all areas of Data Mining, __ here is complete set of Multiple Choice Questions and Answers__.