Here are Data Mining MCQs (Chapterwise).
1. What is data mining?
a) Deleting unnecessary data
b) Sorting data alphabetically
c) Storing data securely
d) Extracting useful patterns or information from large datasets
View Answer
Explanation: Data mining is the process of discovering patterns, correlations, or trends by analyzing large datasets. It involves various techniques from statistics, machine learning, and database systems to uncover valuable insights from data. These insights can be used for decision-making, prediction, and optimization in various fields such as business, science, healthcare, and finance.
2. Which of the following is not a basic data mining task?
a) Spooling
b) Prediction
c) Classification
d) Clustering
View Answer
Explanation: Spooling facilitates data exchange between slow peripheral devices and the computer applications and hence, is not a data mining task. Classification, which maps data to predefined groups, is a basic data mining task. Similarly, prediction, which predicts the data values based on the past data, and clustering, which maps data to non-predefined groups, are also basic data mining tasks.
3. Which of the following is not an issue in data mining?
a) High dimensionality
b) Shortage of data
c) Overfitting
d) Outliers
View Answer
Explanation: The data mining, as a field, evolved due to the presence of large amounts of data. Due to the growing involvement of internet in our everyday lives, there is a huge amount of data generated which has been put to analytical use by data mining. On the other hand, overfitting, outliers and high dimensionality are some of the key problems faced in the implementation of data mining.
4. Which of the following is not a motivating factor for the development of data mining tools?
a) Data tombs
b) Data rich but information poor situation
c) Data cleaning
d) Dependency on domain experts in expert systems
View Answer
Explanation: The presence of a huge amount of data but the inability to extract information from this data, also described as data rich but information poor situation, led to the need for data mining tools. This data stored in databases, when not used much, form data tombs. Expert systems formed to assist analysis of data require domain knowledge so it was also not completely error-free. All these situations motivated the development of data mining tools. Data cleaning, on the other hand, is a step towards data mining.
5. Which of the following is a subset of data warehouse focused on a specific functional area?
a) Data mart
b) Association rules
c) Flat files
d) Database
View Answer
Explanation: The data mart is a subset of data warehouse and is oriented to a specific functional area or subject. Data warehouse, on the other hand, is oriented towards different functional areas and may have a more complex design than a data mart.
6. Which of the following statement about knowledge and data discovery management system (KDDMS) is false?
a) It will provide concurrency features
b) It will provide recovery features
c) It will include data mining tools and data management tools
d) It will include data mining tools but not data management tools
View Answer
Explanation: Knowledge and data discovery management systems (KDDMS) are the upcoming data mining systems that will include data mining tools, data management tools, concurrency features, recovery features, and will also ensure data consistency.
7. Which field of data mining helps in removing uncertainty, noise etc?
a) Data preprocessing
b) Data Mining
c) Outlier detection and removal
d) Uncertainty Reasoning
View Answer
Explanation: Data Mining refers to the process of extraction of hidden patterns from the Data Warehouse data. Data Preprocessing, Outlier detection and removal and Uncertainty Reasoning are the methods which aim at removing uncertainty, noise, or incompleteness of data.
8. Which among the following are not among Various Operations in Data Warehousing?
a) Sticking
b) Dice
c) Drill down
d) Roll up
View Answer
Explanation: Sticking is not at all an Operation. Instead it is slicing which is just mis-spelt to confuse. Drilling down is used to increase granularity. Roll up is an operation to decrease granularity. Dice is the projection operation.
9. Pick the wrong data mining functionality among the given data mining functionalities.
a) Classification
b) Clustering
c) Class Description
d) Object Description
View Answer
Explanation: There are 5 data mining functionalities. They are class/concept description, Mining Frequent Patterns: associations and correlations, Classification and Regression, Clustering and Outlier analysis.
10. Which of the following refers to the set of features that describe a data object?
a) Attribute vector
b) Instance
c) Sample
d) Data point
View Answer
Explanation: A data object is described by one or more attributes or features. The set of attributes or features that represent the characteristics of a data point is called an attribute vector or a feature vector.
11. Which of the following is the most effective measure of the center of symmetric data set?
a) Mode
b) Midrange
c) Mean
d) Median
View Answer
Explanation: In symmetric data distribution, the variable values occur at regular frequencies. The arithmetic mean is the most commonly used measure of central tendency for symmetric data and represents the center of the data set.
12. Which of the following is not true about scatter plots?
a) It is used in the case of univariate distribution
b) It is used to identify relationship between attributes
c) It is used to identify clusters
d) It is used to identify outliers
View Answer
Explanation: Scatter plots are the plots that are used in bivariate distribution. They are used to identify the relationships between the data values. They are also used to identify clusters and outliers in a data set.
13. Which of the following is not a proximity measure?
a) Dissimilarity measures
b) Similarity measures
c) Probability measures
d) Distance measures
View Answer
Explanation: The proximity measures are used to evaluate the similarity and dissimilarity between the two objects. Similarity measures, dissimilarity measures and distance measures are the commonly used proximity measures.
14. Which of the following is true about the supremum distance between the given objects?
object | Part 1 | Part 2 | Part 3 |
---|---|---|---|
object 1 | 3 | 4 | 8 |
object 2 | 2 | 7 | 3 |
a) The supremum distance between the objects is 5
b) The supremum distance between the objects is 4
c) The supremum distance between the objects is 6
d) The supremum distance between the objects is 2
View Answer
Explanation: The supremum difference is the maximum difference between attribute values of two objects.
Supremum distance = max(|x1-y1|, …….. , |xn-yn|)
S = max(|3-2|, |4-7|, |8-3|)
S = max(1, 3, 5) = 5
15. Which of the following is not true about data reduction?
a) It involves dimensionality reduction
b) It involves numerosity reduction
c) Reduced data strives to gives same analytical results as the original data
d) Reduced data gives strives to give less accurate analytical results the original data
View Answer
Explanation: Data reduction is a part of the data preprocessing. It aims to reduce the size of the data, yet give same results on analysis of the reduced data as the original data. it involves dimensionality reduction and numerosity reduction.
16. What do data auditing tools not do?
a) Detect data that violate certain rules
b) Discover rules and relationships in the data
c) Use parsing to find rules in the data
d) Use statistical analysis to find rules in the data
View Answer
Explanation: The data auditing tools discover rules and relationships in the data and find the data that violate these rules. They make use of statistical techniques to find the correlations in the data.
17. Which of the following is true about bottom up discretization?
a) Some the values are treated as potential split points
b) All the values are treated as potential split points
c) Split points are not considered
d) Only one value are treated as potential split points
View Answer
Explanation: In bottom up discretization, also known as merging, all the values are considered as potential split points, which are then merged to form intervals recursively.
Chapterwise Multiple Choice Questions on Data Mining
1. MCQ on Data Mining Basics
The section contains multiple choice questions and answers on basic data mining tasks, KDD, issues, major issues in data mining, types of data that can be mined, and types of patterns that can be mined.
2. Data Exploration and Analysis
The section covers questions and answers on data objects and attribute types, basic statistical descriptions of data, data visualization, and measuring data similarity and dissimilarity.
3. Data Preprocessing
The section contains MCQs on data preprocessing, data cleaning, data integration, data reduction, data transformation, and data discretization.
Wish you the best in your endeavor to learn and master Data Mining!