Data Mining Questions and Answers – Data Cleaning and Data Integration – Set 3

This set of Data Mining Multiple Choice Questions & Answers (MCQs) focuses on “Data Cleaning and Data Integration – Set 3”.

1. What are the expected values for the attributes No. of hours studied and Marks scored?

Student No of hours Studied Marks Scored
1 5 80
2 6 82
3 10 95

a) 7, 85.6
b) 8, 87.2
c) 9, 76.9
d) 5, 67.8
View Answer

Answer: a
Explanation: Expected value for an attribute is the average of the data values for that attribute.
Expected Value = E = (value1 + value2 + …. + value (n)) /n
E (No. of hours studied) = (5 + 6 + 10) / 3 = 7
E (Marks scored) = (80 + 82 + 95) / 3 = 85.6

2. What is the covariance between the attributes No. of hours studied and Marks scored for the given data?

Student No of hours Studied Marks Scored
1 5 80
2 6 82
3 10 95
advertisement
advertisement

a) Positive
b) Negative
c) No relation
d) Zero covariance
View Answer

Answer: a
Explanation: Expected value for an attribute is the average of the data values for that attribute.
E (No. of hours studied) = (5 + 6 + 10) / 3 = 7
E (Marks scored) = (80 + 82 + 95) / 3 = 85.6
Covariance is given by:
C = E(A.B) – AmBm where Am represents the expected value of A and E(A.B) represents the expected value of A.B
E(No. of hours studied. Marks scored) = (5*80 + 6*82 + 10*95) / 3 = 614
E (No. of hours studied)* E (Marks scored) = 7 * 85.6 = 599.2
C = 614 – 599.2 = 14.8
Hence, there is a positive covariance between No. of hours studied and Marks scored.

3. Covariance between two attributes is termed as variance when the two attributes are _____
a) Identical
b) Different
c) Binary
d) Nominal
View Answer

Answer: a
Explanation: The covariance value between two attributes is called variance when the two attributes are identical. Hence, variance is a special case of covariance, where, for an attribute, the covariance with itself is calculated.

4. If Sa represents the standard deviation of the attribute a, Cov (a, b) represents the covariance between attributes a and b, the correlation coefficient for the two attributes is given by _____
a) Cov (a, b)/ Sa Sb
b) Cov2 (a, b)/ Sa Sb
c) Cov (a, b)/ S2a S2b
d) Cov2 (a, b)/ S2a S2b
View Answer

Answer: a
Explanation: The correlation coefficient between two attributes a and b is given by Cov (a, b)/ Sa Sb where, Cov (a, b) represents the covariance between attributes a and b, Sa and Sb represents the standard deviations of the attributes a and b.

5. Which of the following is not an example of data value conflicts?
a) Height in meters in one database and in centimeters in the other
b) Currency in dollars in one database and in rupees in the other
c) Weight in kilograms in one database and in kilograms in the other
d) Grading as 1 to 4 I one university database and as A to D in the other
View Answer

Answer: c
Explanation: Data value conflicts are resolved during data integration. Some examples of data value conflicts are – height in meters in one database and in centimeters in the other, currency in dollars in one database and in rupees in the other. Weight in kilograms in one database and in kilograms in the other database does not lead to a data value conflict because both the attributes have same units i.e. kilograms.
advertisement

Sanfoundry Global Education & Learning Series – Data Mining.

To practice all areas of Data Mining, here is complete set of Multiple Choice Questions and Answers.

advertisement

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]

advertisement
advertisement
Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
Manish Bhojasia - Founder & CTO at Sanfoundry
Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

Subscribe to his free Masterclasses at Youtube & discussions at Telegram SanfoundryClasses.