This set of Data Mining Multiple Choice Questions & Answers (MCQs) focuses on “Basic Data Mining Tasks, KDD, Issues”.
1. What is the relationship between data mining and knowledge discovery in databases?
a) Both are the same thing
b) Knowledge discovery in databases (KDD), the process of deriving patterns from data, is comprised of several steps including data mining
c) Data mining, the process of deriving patterns from data, is comprised of several steps including KDD
d) They are not related to each other
View Answer
Explanation: Knowledge discovery in databases is a multi-step process which is used to find patterns in the data. Data mining, one of the key steps in this process, is the use of certain algorithms to find these patterns according to the task on hand.
2. Which of the following is not a basic data mining task?
a) Classification
b) Prediction
c) Spooling
d) Clustering
View Answer
Explanation: Spooling facilitates data exchange between slow peripheral devices and the computer applications and hence, is not a data mining task. Classification, which maps data to predefined groups, is a basic data mining task. Similarly, prediction, which predicts the data values based on the past data, and clustering, which maps data to non-predefined groups, are also basic data mining tasks.
3. In the diagram given below, depicting the KDD process, what is the correct order of tasks performed?
a) Selection, Preprocessing, Transformation, Data mining, Interpretation
b) Selection, Preprocessing, Transformation, Interpretation, Data mining
c) Interpretation, Preprocessing, Transformation, Data mining, Selection
d) Data mining, Preprocessing, Transformation, Selection, Interpretation
View Answer
Explanation: Knowledge discovery in databases (KDD) is the process of finding information from data. It is mainly composed of five major steps – selection of appropriate data, preprocessing the selected data to get preprocessed data, transformation of preprocessed data to a preferred format, data mining algorithm on the transformed data depending on the task and then finally Interpretation of the patterns found.
4. Which of the following technique predict data values using results derived from different data?
a) Regression
b) Summarization
c) Sequence discovery
d) Association rules
View Answer
Explanation: The technique of predicting data values using results derived from different data comes under predictive modeling. Classification, regression, time series analysis are some of the key predictive modeling techniques used in data mining. Summarization, clustering, association rules come under descriptive modeling which, rather than predicting data values, explores the data properties.
5. Which of the following is based on predictive modeling?
a) Time series analysis
b) Summarization
c) Sequence discovery
d) Clustering
View Answer
Explanation: Predictive modeling, which includes classification, regression, time series analysis, refers to the prediction of data values based on the results deduced from different data. On the other hand, descriptive modeling techniques such as clustering, association rules, explore the data properties rather than prediction of data values.
6. Summarization is also known as _____
a) Characterization
b) Link analysis
c) Affinity analysis
d) Series analysis
View Answer
Explanation: Summarization, which broadly refers to the derivation of summary from data, is also known as characterization. Link analysis and affinity analysis both refer to analysis of data to find relationships among data. Series analysis is the other name for time series analysis, under which, analysis of data is performed considering time as an important factor.
7. The analysis of the data points that deviate from the general expected behavior of data in the data set is called _____
a) Cluster analysis
b) Relevance analysis
c) Anomaly analysis
d) Regression analysis
View Answer
Explanation: The data points that deviate from the general expected behavior of data in the data set are called outliers and their analysis is referred to as outlier analysis or anomaly analysis. Outliers are usually not desired in the data except in certain special tasks where they are the points of interest.
8. Which of the following is the basic principle of object grouping?
a) Maximizing the intra-class similarity
b) Minimizing the intra-class similarity
c) Maximizing the inter-class similarity
d) Taxonomy formation
View Answer
Explanation: In clustering, the objects are clustered or grouped as per the principle of maximization of intra-class similarity and minimization of inter-class similarity. This ensures that similar objects map to the same group and dissimilar objects map to other groups.
9. Which of the following is not an issue in data mining?
a) Overfitting
b) Outliers
c) High dimensionality
d) Shortage of data
View Answer
Explanation: The data mining, as a field, evolved due to the presence of large amounts of data. Due to the growing involvement of internet in our everyday lives, there is a huge amount of data generated which has been put to analytical use by data mining. On the other hand, overfitting, outliers and high dimensionality are some of the key problems faced in the implementation of data mining.
10. Sometimes the data under analysis has too many attributes, some of which may not be useful as per the task being performed. This problem is referred to as _____
a) Overfitting
b) Dimensionality curse
c) Outliers in data
d) Missing data
View Answer
Explanation: When there are too many attributes in a data set, some of these attributes may be redundant or not useful for the task on hand. Moreover, they cause problems in the implementation of data mining task due to increased complexity. This issue of high dimensionality is also known as dimensionality curse.
Sanfoundry Global Education & Learning Series – Data Mining.
To practice all areas of Data Mining, here is complete set of Multiple Choice Questions and Answers.