Data Mining Questions and Answers – Measuring Data Similarity and Dissimilarity – Set 2

This set of Data Mining Multiple Choice Questions & Answers (MCQs) focuses on “Measuring Data Similarity and Dissimilarity – Set 2”.

1. Which of the following is the correct dissimilarity matrix for the given data?

Object Color
A Red
B Blue
C Blue

a) \(\begin{bmatrix}
0 & & \\
1 & 0 & \\
1 & 0 & 0
\end{bmatrix}\)
b) \(\begin{bmatrix}
0 & & \\
1 & 0 & \\
0 & 0 & 0
\end{bmatrix}\)
c) \(\begin{bmatrix}
0 & & \\
0 & 0 & \\
1 & 0 & 1
\end{bmatrix}\)
d) \(\begin{bmatrix}
0 & & \\
1 & 1 & \\
1 & 0 & 0
\end{bmatrix}\)
View Answer

Answer: a
Explanation: The objects that have same color have zero dissimilarity. Others have highest dissimilarity value of 1.
Let Dissimilarity (object 1, object 2) = d (i, j). Also, the dissimilarity matrix is represented as:
\( \begin{matrix}
\\
A\\
B\\
C
\end{matrix}
\begin{bmatrix}
A & B & C \\
0 & & \\
& 0 & \\
& & 0 \\
\end{bmatrix}\)
From the data given above, d(A, B) = 1, d(A, C) = 1, d(B, C) = 0
Hence, dissimilarity matrix is:
\( \begin{matrix}
\\
A\\
B\\
C
\end{matrix}
\begin{bmatrix}
A & B & C \\
0 & & \\
1 & 0 & \\
1 & 0 & 0 \\
\end{bmatrix}\).

2. Which of the following is the correct similarity matrix for the given data?

Object Color
A Red
B Green
C Blue
D Green
advertisement
advertisement

a) \(\begin{bmatrix}
0 & & & \\
1 & 0 & & \\
1 & 0 & 0 & \\
0 & 1 & 1 & 0
\end{bmatrix}\)
b) \(\begin{bmatrix}
1 & & & \\
0 & 1 & & \\
0 & 0 & 1 & \\
0 & 1 & 0 & 1
\end{bmatrix}\)
c) \(\begin{bmatrix}
0 & & & \\
1 & 0 & & \\
1 & 1 & 0 & \\
0 & 1 & 1 & 0
\end{bmatrix}\)
d) \(\begin{bmatrix}
0 & & & \\
1 & 1 & & \\
1 & 0 & 0 & \\
0 & 1 & 1 & 1
\end{bmatrix}\)
View Answer

Answer: b
Explanation: The objects that have same color have similarity value 1. Others have similarity value 0.
Let Similarity (object 1, object 2) = s (i, j). Also, the similarity matrix is represented as:
\( \begin{matrix}
\\
A\\
B\\
C
\end{matrix}
\begin{bmatrix}
A & B & C \\
1 & & \\
& 1 & \\
& & 1 \\
\end{bmatrix}\)
From the data given above, s(A, B) = 0, s(A, C) = 0, s(A, D) = 0, s(B, C) = 0, s(B, D) = 1, s(C, D) = 0
Hence, similarity matrix is:
\( \begin{matrix}
\\
A\\
B\\
C\\
D
\end{matrix}
\begin{bmatrix}
A & B & C & D\\
1 & & & \\
0 & 1 & & \\
0 & 0 & 1 & \\
0 & 1 & 0 & 1
\end{bmatrix}\)

3. Which of the following is the correct expression for symmetric binary dissimilarity between two objects, where q and t attributes have value 1 and 0 for both objects respectively, r attributes have value 1 for first object and 0 for the second object and s attributes have value 0 for the first object and 1 for the second object?
a) Dissimilarity = (r + s)/ (q + r + s + t)
b) Dissimilarity = (q – s)/ (q + r + s + r)
c) Dissimilarity = (t + s)/ (q + r + s + t)
d) Dissimilarity = (r – s)/ (q + r + s + t)
View Answer

Answer: a
Explanation: The symmetric binary dissimilarity between two objects is given by, Dissimilarity = (r + s)/ (q + r + s + t), where r represents the number of attributes that have value 1 for first object and 0 for the second object, s represents the number of attributes that have value 0 for the first object and 1 for the second object and q and t represent the number of attributes have value 1 and 0 for both objects respectively.

4. Given two objects, if q and t attributes have value 1 and 0 for both objects respectively, r attributes have value 1 for first object and 0 for the second object and s attributes have value 0 for the first object and 1 for the second object, which of the following is the correct expression for asymmetric binary dissimilarity between the two objects?
a) Dissimilarity = (r + s)/ (q + r + t)
b) Dissimilarity = (q – s)/ (q + r + s)
c) Dissimilarity = (r + s)/ (q + r + s)
d) Dissimilarity = (t – s)/ ( r + s + t)
View Answer

Answer: c
Explanation: The asymmetric binary dissimilarity between two objects is given by, Dissimilarity = (r + s)/ (q + r + s), where r represent the number of attributes that have value 1 for first object and 0 for the second object and s represent the number of attributes that have value 0 for the first object and 1 for the second object, q and t attributes have value 1 and 0 for both objects respectively.

5. Which of the following is also referred to as Jaccard coefficient?
a) Symmetric binary similarity
b) Symmetric binary dissimilarity
c) Asymmetric binary similarity
d) Asymmetric binary dissimilarity
View Answer

Answer: c
Explanation: For a given data set with binary attributes, the states of the binary attributes may not have equal importance. The similarity in such cases is referred to as asymmetric binary similarity or Jaccard coefficient.
advertisement

6. Which of the following is true about the Euclidean distance between the given objects?

object Result 1 Result 2 Result 3 Result 4 Result 5
Object 1 1 3 2 3 1
Object 2 3 7 4 8 6

a) The Euclidean distance between the objects is 7.5
b) The Euclidean distance between the objects is 4.2
c) The Euclidean distance between the objects is 3.7
d) The Euclidean distance between the objects is 8.6
View Answer

Answer: d
Explanation: The Euclidean distance is given by:
D = ((x1-y1)2 + (x2-y2)2 + …..+(xn-yn)2))1/2
So, from the given data,
D = ((1-3)2 + (3-7)2 + (2-4)2 + (3-8)2 + (1-6)2)1/2
D = (4 + 16 + 4 + 25 + 25)1/2
D = (74)1/2 = 8.6
advertisement

7. Which of the following is true about the Manhattan distance between the given grades of the students?

object Grade 1 Grade 2 Grade 3
Student 1 3.5 3.2 2.4
Student 2 3 2.7 3.9

a) The Manhattan distance between the objects is 4.5
b) The Manhattan distance between the objects is 3.5
c) The Manhattan distance between the objects is 2.5
d) The Manhattan distance between the objects is 1.5
View Answer

Answer: c
Explanation: The Manhattan distance is given by:
D = |x1-y1| + |x2-y2| + …..+|xn-yn|
So, from the given data,
D = |3.5 – 3| + |3.2 – 2.7| + |2.4 – 3.9|
D = 0.5 + 0.5 + 1.5
D = 2.5

8. Which of the following statement is true regarding the Euclidean distance?
a) The Euclidean distance is equal to the L1 norm
b) The Euclidean distance is equal to the L2 norm
c) The Euclidean distance is equal to the L3 norm
d) The Euclidean distance is equal to the L4 norm
View Answer

Answer: b
Explanation: The Euclidean distance between objects is used as a distance measure to find the similarity or dissimilarity between the objects. Euclidean distance is also referred to as the L2 norm.

9. L norm is also known as _____
a) Uniform norm
b) Non-uniform norm
c) Jaccard norm
d) L1 norm
View Answer

Answer: a
Explanation: The supremum distance between objects is a generalization of the Minkowski distance. The supremum distance is also referred to as the L norm. The L norm is also known as the uniform norm.

10. Which of the following is true about the supremum distance between the given objects?

object Part 1 Part 2 Part 3
object 1 3 4 8
object 2 2 7 3

a) The supremum distance between the objects is 2
b) The supremum distance between the objects is 6
c) The supremum distance between the objects is 4
d) The supremum distance between the objects is 5
View Answer

Answer: d
Explanation: The supremum difference is the maximum difference between attribute values of two objects.
Supremum distance = max(|x1-y1|, …….. , |xn-yn|)
S = max(|3-2|, |4-7|, |8-3|)
S = max(1, 3, 5) = 5

11. Which of the following is true about the Euclidean distance between the grades of students if Grade 1 carries double weightage than Grade 2?

object Grade 1 Grade 2
Student 1 3 2
Student 2 2 2

a) The Euclidean distance is 2.7
b) The Euclidean distance is 4.4
c) The Euclidean distance is 2.9
d) The Euclidean distance is 1.4`
View Answer

Answer: d
Explanation: We find the weighted Euclidean distance. It is given by:
D = (w1*(x1-y1)2 +w2 *(x2-y2)2 + …..+wn*(xn-yn)2))1/2
w1 = 2, w2 = 1
D = (2*(3-2)2 + 1*(2-2)2)1/2
D = (2)1/2 = 1.4

12. Which of the following is the dissimilarity matrix for the movie ratings data given in the diagram, where each movie can have a rating of either average, good and excellent and distance measure be Euclidean distance?

Survey Rating
Person 1 Good
Person 2 Excellent
Person 3 Good

a) \(\begin{bmatrix}
0 & & \\
0.5 & 0 & \\
0 & 0.5 & 0
\end{bmatrix}\)
b) \(\begin{bmatrix}
0 & & \\
0.5 & 0 & \\
0.5 & 0.5 & 0
\end{bmatrix}\)
c) \(\begin{bmatrix}
0 & & \\
0.5 & 0 & \\
0 & 1 & 0
\end{bmatrix}\)
d) \(\begin{bmatrix}
0 & & \\
1 & 0 & \\
0 & 0.5 & 0
\end{bmatrix}\)
View Answer

Answer: a
Explanation: There are 3 ordered states for attribute Rating – Average, Good and Excellent. We assign a rank to each rating:
Average – Rank 1
Good – Rank 2
Excellent – Rank 3
Normalizing the ranks using the formula: Normalized rank = (Rank – 1)/ (No of ordered states – 1)
The ranks are normalized as follows:
Rank 1 = (1-1)/ (3-1) = 0.0
Rank 2 = (2-1)/ (3-1) = 0.5
Rank 3 = (3-1)/ (3-1) = 1.0
The dissimilarity matrix for the data is given by:
\( \begin{matrix}
\\
A\\
B\\
C
\end{matrix}
\begin{bmatrix}
A & B & C \\
0 & & \\
& 0 & \\
& & 0 \\
\end{bmatrix}\)
We fill cells of the matrix with Euclidean distance.
d(A,B) = ((0.5-1)2)1/2, d(A,C) = ((0.5-0.5)2)1/2, d(B,C) = ((1-0.5)2)1/2
Hence, the dissimilarity matrix is:
\( \begin{matrix}
\\
A\\
B\\
C
\end{matrix}
\begin{bmatrix}
A & B & C \\
0 & & \\
0.5 & 0 & \\
0 & 0.5 & 0 \\
\end{bmatrix}\)

13. Term frequency vectors are often sparse.
a) True
b) False
View Answer

Answer: a
Explanation: A document contains terms which occur with some frequency. This information can be represented in the form of term-frequency vector. Term-frequency vectors often have many 0 values and hence, they are often sparse. Also, they are often very long.

14. A value 0 for cosine similarity between two objects indicates that the objects are alike.
a) True
b) False
View Answer

Answer: b
Explanation: The cosine similarity between two objects is used to find the similarity between the objects. It is often used to find similarity between documents. A value 0 for cosine similarity indicates that the objects are not alike.

15. Which of the following is true about the cosine similarity between A = (1, 2, 4, 2, 6) and B = (2, 4, 6, 5, 3)?
a) The cosine similarity is 0.12
b) The cosine similarity is 0.01
c) The cosine similarity is 0.43
d) The cosine similarity is 0.56
View Answer

Answer: b
Explanation: Given vectors A = (1, 2, 4, 2, 6) and B = (2, 4, 6, 5, 3)
A.B = (1*2 + 2*4 + 4*6 + 2*5 + 6*3) = (2 + 8 + 24 + 10 + 18) = 62
||A|| = (12 + 22 + 42 + 22 + 62) = (1 + 4 + 16 + 4 + 36) = 61
||B|| = (22 + 42 + 62 + 52 + 32) = (4 + 16 + 36 + 25 + 9) = 90
M = ||A||*||B|| = 61 * 90 = 5490
Sim = A.B/M
= 62/5490
= 0.011

Sanfoundry Global Education & Learning Series – Data Mining.

To practice all areas of Data Mining, here is complete set of Multiple Choice Questions and Answers.

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]

advertisement
advertisement
Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
Manish Bhojasia - Founder & CTO at Sanfoundry
Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

Subscribe to his free Masterclasses at Youtube & discussions at Telegram SanfoundryClasses.