Data Mining MCQ (Multiple Choice Questions)

Here are Data Mining MCQs (Chapterwise).

1. What is data mining?
a) Deleting unnecessary data
b) Sorting data alphabetically
c) Storing data securely
d) Extracting useful patterns or information from large datasets
View Answer

Answer: d
Explanation: Data mining is the process of discovering patterns, correlations, or trends by analyzing large datasets. It involves various techniques from statistics, machine learning, and database systems to uncover valuable insights from data. These insights can be used for decision-making, prediction, and optimization in various fields such as business, science, healthcare, and finance.

2. Which of the following is not a basic data mining task?
a) Spooling
b) Prediction
c) Classification
d) Clustering
View Answer

Answer: a
Explanation: Spooling facilitates data exchange between slow peripheral devices and the computer applications and hence, is not a data mining task. Classification, which maps data to predefined groups, is a basic data mining task. Similarly, prediction, which predicts the data values based on the past data, and clustering, which maps data to non-predefined groups, are also basic data mining tasks.

3. Which of the following is not an issue in data mining?
a) High dimensionality
b) Shortage of data
c) Overfitting
d) Outliers
View Answer

Answer: b
Explanation: The data mining, as a field, evolved due to the presence of large amounts of data. Due to the growing involvement of internet in our everyday lives, there is a huge amount of data generated which has been put to analytical use by data mining. On the other hand, overfitting, outliers and high dimensionality are some of the key problems faced in the implementation of data mining.
advertisement
advertisement

4. Which of the following is not a motivating factor for the development of data mining tools?
a) Data tombs
b) Data rich but information poor situation
c) Data cleaning
d) Dependency on domain experts in expert systems
View Answer

Answer: c
Explanation: The presence of a huge amount of data but the inability to extract information from this data, also described as data rich but information poor situation, led to the need for data mining tools. This data stored in databases, when not used much, form data tombs. Expert systems formed to assist analysis of data require domain knowledge so it was also not completely error-free. All these situations motivated the development of data mining tools. Data cleaning, on the other hand, is a step towards data mining.

5. Which of the following is a subset of data warehouse focused on a specific functional area?
a) Data mart
b) Association rules
c) Flat files
d) Database
View Answer

Answer: a
Explanation: The data mart is a subset of data warehouse and is oriented to a specific functional area or subject. Data warehouse, on the other hand, is oriented towards different functional areas and may have a more complex design than a data mart.

6. Which of the following statement about knowledge and data discovery management system (KDDMS) is false?
a) It will provide concurrency features
b) It will provide recovery features
c) It will include data mining tools and data management tools
d) It will include data mining tools but not data management tools
View Answer

Answer: d
Explanation: Knowledge and data discovery management systems (KDDMS) are the upcoming data mining systems that will include data mining tools, data management tools, concurrency features, recovery features, and will also ensure data consistency.

7. Which field of data mining helps in removing uncertainty, noise etc?
a) Data preprocessing
b) Data Mining
c) Outlier detection and removal
d) Uncertainty Reasoning
View Answer

Answer: b
Explanation: Data Mining refers to the process of extraction of hidden patterns from the Data Warehouse data. Data Preprocessing, Outlier detection and removal and Uncertainty Reasoning are the methods which aim at removing uncertainty, noise, or incompleteness of data.
advertisement

8. Which among the following are not among Various Operations in Data Warehousing?
a) Sticking
b) Dice
c) Drill down
d) Roll up
View Answer

Answer: a
Explanation: Sticking is not at all an Operation. Instead it is slicing which is just mis-spelt to confuse. Drilling down is used to increase granularity. Roll up is an operation to decrease granularity. Dice is the projection operation.

9. Pick the wrong data mining functionality among the given data mining functionalities.
a) Classification
b) Clustering
c) Class Description
d) Object Description
View Answer

Answer: d
Explanation: There are 5 data mining functionalities. They are class/concept description, Mining Frequent Patterns: associations and correlations, Classification and Regression, Clustering and Outlier analysis.
advertisement

10. Which of the following refers to the set of features that describe a data object?
a) Attribute vector
b) Instance
c) Sample
d) Data point
View Answer

Answer: a
Explanation: A data object is described by one or more attributes or features. The set of attributes or features that represent the characteristics of a data point is called an attribute vector or a feature vector.

11. Which of the following is the most effective measure of the center of symmetric data set?
a) Mode
b) Midrange
c) Mean
d) Median
View Answer

Answer: c
Explanation: In symmetric data distribution, the variable values occur at regular frequencies. The arithmetic mean is the most commonly used measure of central tendency for symmetric data and represents the center of the data set.

12. Which of the following is not true about scatter plots?
a) It is used in the case of univariate distribution
b) It is used to identify relationship between attributes
c) It is used to identify clusters
d) It is used to identify outliers
View Answer

Answer: a
Explanation: Scatter plots are the plots that are used in bivariate distribution. They are used to identify the relationships between the data values. They are also used to identify clusters and outliers in a data set.

13. Which of the following is not a proximity measure?
a) Dissimilarity measures
b) Similarity measures
c) Probability measures
d) Distance measures
View Answer

Answer: c
Explanation: The proximity measures are used to evaluate the similarity and dissimilarity between the two objects. Similarity measures, dissimilarity measures and distance measures are the commonly used proximity measures.

14. Which of the following is true about the supremum distance between the given objects?

object Part 1 Part 2 Part 3
object 1 3 4 8
object 2 2 7 3

a) The supremum distance between the objects is 5
b) The supremum distance between the objects is 4
c) The supremum distance between the objects is 6
d) The supremum distance between the objects is 2
View Answer

Answer: a
Explanation: The supremum difference is the maximum difference between attribute values of two objects.
Supremum distance = max(|x1-y1|, …….. , |xn-yn|)
S = max(|3-2|, |4-7|, |8-3|)
S = max(1, 3, 5) = 5

15. Which of the following is not true about data reduction?
a) It involves dimensionality reduction
b) It involves numerosity reduction
c) Reduced data strives to gives same analytical results as the original data
d) Reduced data gives strives to give less accurate analytical results the original data
View Answer

Answer: d
Explanation: Data reduction is a part of the data preprocessing. It aims to reduce the size of the data, yet give same results on analysis of the reduced data as the original data. it involves dimensionality reduction and numerosity reduction.

16. What do data auditing tools not do?
a) Detect data that violate certain rules
b) Discover rules and relationships in the data
c) Use parsing to find rules in the data
d) Use statistical analysis to find rules in the data
View Answer

Answer: c
Explanation: The data auditing tools discover rules and relationships in the data and find the data that violate these rules. They make use of statistical techniques to find the correlations in the data.

17. Which of the following is true about bottom up discretization?
a) Some the values are treated as potential split points
b) All the values are treated as potential split points
c) Split points are not considered
d) Only one value are treated as potential split points
View Answer

Answer: b
Explanation: In bottom up discretization, also known as merging, all the values are considered as potential split points, which are then merged to form intervals recursively.


Chapterwise Multiple Choice Questions on Data Mining

Data Mining MCQ

Our MCQs focus on all topics of the Data Mining subject, covering all topics. This will help you to prepare for exams, contests, online tests, quizzes, viva-voce, interviews, and certifications. You can practice these MCQs chapter by chapter starting from the 1st chapter or you can jump to any chapter of your choice.
  1. Data Mining Basics
  2. Data Exploration and Analysis
  3. Data Preprocessing

1. MCQ on Data Mining Basics

The section contains multiple choice questions and answers on basic data mining tasks, KDD, issues, major issues in data mining, types of data that can be mined, and types of patterns that can be mined.

  • Basic Data Mining Tasks, KDD, Issues
  • Basic Data Mining Tasks, KDD, Issues – Set 2
  • Major Issues in Data Mining
  • What Kind of Data can be Mined
  • What Kind of Patterns can be Mined
  • What Kind of Patterns can be Mined – Set 2
  • 3. Data Preprocessing

    The section contains MCQs on data preprocessing, data cleaning, data integration, data reduction, data transformation, and data discretization.

  • Data Preprocessing
  • Data Cleaning and Data Integration
  • Data Cleaning and Data Integration – Set 2
  • Data Cleaning and Data Integration – Set 3
  • Data Reduction
  • Data Reduction – Set 2
  • Data Transformation and Data Discretization
  • If you would like to learn "Data Mining" thoroughly, you should attempt to work on the complete set of 1000+ MCQs - multiple choice questions and answers mentioned above. It will immensely help anyone trying to crack an exam or an interview.

    Wish you the best in your endeavor to learn and master Data Mining!

    advertisement
    advertisement
    Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

    Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
    Manish Bhojasia - Founder & CTO at Sanfoundry
    Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

    Subscribe to his free Masterclasses at Youtube & discussions at Telegram SanfoundryClasses.