# Data Mining Questions and Answers – Data Cleaning and Data Integration – Set 3

This set of Data Mining Multiple Choice Questions & Answers (MCQs) focuses on “Data Cleaning and Data Integration – Set 3”.

1. What are the expected values for the attributes No. of hours studied and Marks scored?

Student No of hours Studied Marks Scored
1 5 80
2 6 82
3 10 95

a) 7, 85.6
b) 8, 87.2
c) 9, 76.9
d) 5, 67.8

Explanation: Expected value for an attribute is the average of the data values for that attribute.
Expected Value = E = (value1 + value2 + …. + value (n)) /n
E (No. of hours studied) = (5 + 6 + 10) / 3 = 7
E (Marks scored) = (80 + 82 + 95) / 3 = 85.6

2. What is the covariance between the attributes No. of hours studied and Marks scored for the given data?

Student No of hours Studied Marks Scored
1 5 80
2 6 82
3 10 95

a) Positive
b) Negative
c) No relation
d) Zero covariance

Explanation: Expected value for an attribute is the average of the data values for that attribute.
E (No. of hours studied) = (5 + 6 + 10) / 3 = 7
E (Marks scored) = (80 + 82 + 95) / 3 = 85.6
Covariance is given by:
C = E(A.B) – AmBm where Am represents the expected value of A and E(A.B) represents the expected value of A.B
E(No. of hours studied. Marks scored) = (5*80 + 6*82 + 10*95) / 3 = 614
E (No. of hours studied)* E (Marks scored) = 7 * 85.6 = 599.2
C = 614 – 599.2 = 14.8
Hence, there is a positive covariance between No. of hours studied and Marks scored.

3. Covariance between two attributes is termed as variance when the two attributes are _____
a) Identical
b) Different
c) Binary
d) Nominal

Explanation: The covariance value between two attributes is called variance when the two attributes are identical. Hence, variance is a special case of covariance, where, for an attribute, the covariance with itself is calculated.

4. If Sa represents the standard deviation of the attribute a, Cov (a, b) represents the covariance between attributes a and b, the correlation coefficient for the two attributes is given by _____
a) Cov (a, b)/ Sa Sb
b) Cov2 (a, b)/ Sa Sb
c) Cov (a, b)/ S2a S2b
d) Cov2 (a, b)/ S2a S2b

Explanation: The correlation coefficient between two attributes a and b is given by Cov (a, b)/ Sa Sb where, Cov (a, b) represents the covariance between attributes a and b, Sa and Sb represents the standard deviations of the attributes a and b.

5. Which of the following is not an example of data value conflicts?
a) Height in meters in one database and in centimeters in the other
b) Currency in dollars in one database and in rupees in the other
c) Weight in kilograms in one database and in kilograms in the other
d) Grading as 1 to 4 I one university database and as A to D in the other

Explanation: Data value conflicts are resolved during data integration. Some examples of data value conflicts are – height in meters in one database and in centimeters in the other, currency in dollars in one database and in rupees in the other. Weight in kilograms in one database and in kilograms in the other database does not lead to a data value conflict because both the attributes have same units i.e. kilograms.

Sanfoundry Global Education & Learning Series – Data Mining.

To practice all areas of Data Mining, here is complete set of Multiple Choice Questions and Answers.