Here are 1000 Data Science MCQ (Chapterwise).
1. What is data science primarily concerned with?
a) Analyzing and interpreting data
b) Collecting data only
c) Storing data in databasementionedabove
View Answer
Explanation: Data science focuses on analyzing and interpreting data to extract insights and knowledge from it.
2. Which of the following is one of the key data science skills?
a) Data Visualization
b) Machine Learning
c) Statistics
d) All of the mentioned
View Answer
Explanation: Data visualization is the presentation of data in a pictorial or graphical format.
3. Which of the following is NOT a type of machine learning?
a) Computational learning
b) Reinforcement learning
c) Unsupervised learning
d) Supervised learning
View Answer
Explanation: Computational learning is not a recognized category of machine learning; the main types are supervised, unsupervised, and reinforcement learning.
4. Which of the following characteristic of big data is relatively more concerned to data science?
a) Volume
b) Velocity
c) Variety
d) None of the mentioned
View Answer
Explanation: Big data enables organizations to store, manage, and manipulate vast amounts of disparate data at the right speed and at the right time.
5. Which of the following is a good way of performing experiments in data science?
a) Generalize to the problem
b) Have Replication
c) Measure variability
d) All of the mentioned
View Answer
Explanation: Experiments on causal relationships investigate the effect of one or more variables on one or more outcome variables.
6. What does the term “feature engineering” refer to in data science?
a) The process of transforming raw data into meaningful features
b) The process of gathering more data
c) The process of applying machine learning algorithms
d) The process of splitting data into training and testing sets
View Answer
Explanation: Feature engineering involves creating new features or modifying existing ones to improve model performance and accuracy.
7. Which of the following is the most important language for Data Science?
a) R
b) Java
c) Ruby
d) None of the mentioned
View Answer
Explanation: R is free software for statistical computing and analysis.
8. What is the role of processing code in the research pipeline?
a) Transforms the analytical results into figures and tables
b) Transforms the measured data into analytic data
c) Transforms the analytic data into measured data
d) All of the mentioned
View Answer
Explanation: Data science workflow is a non-linear, iterative process.
9. Which of the following is a common method for data preprocessing?
a) Data normalization
b) Data storage
c) Data aggregation
d) Data visualization
View Answer
Explanation: Data normalization is a preprocessing technique used to scale the features of a dataset to a uniform range, improving the performance of machine learning models.
10. Which of the following is the top most important thing in data science?
a) data
b) question
c) answer
d) none of the mentioned
View Answer
Explanation: The second most important is the data.
11. In supervised learning, which of the following is required?
a) Labeled data
b) Only numerical data
c) Unlabeled data
d) Only categorical data
View Answer
Explanation: Supervised learning requires labeled data, where the algorithm learns to map inputs to known outputs.
12. What is the purpose of using the ‘train-test split’ method?
a) To evaluate the performance of a machine learning model
b) To clean the data
c) To visualize the data
d) To reduce the dimensionality of the data
View Answer
Explanation: The train-test split method is used to divide a dataset into a training set and a testing set to evaluate the performance of a machine learning model on unseen data.
13. What technique is commonly used in data science for making predictions?
a) Data cleaning
b) Data Storage
c) Machine Learning
d) Data Encoding
View Answer
Explanation: Machine learning techniques are used to make predictions based on historical data.
14. Which of the following tools is widely used for data visualization in data science?
a) Tableau
b) Excel
c) SQL
d) All of the mentioned
View Answer
Explanation: SQL is used for data querying, Excel for basic analysis, and Tableau is a specialized tool for data visualization. All are used in data science.
15. What type of data is considered unstructured?
a) Data in relational databases
b) Data in spreadsheets
c) Data in CSV files
d) Text documents and images
View Answer
Explanation: Unstructured data includes text documents, images, and other formats that do not fit neatly into rows and columns.
16. Which of the following statements about big data is true?
a) Big data refers to extremely large datasets that may be analyzed computationally.
b) Big data is always structured data.
c) Big data can only be processed in real-time.
d) Big data is irrelevant in data science.
View Answer
Explanation: Big data refers to datasets that are so large and complex that traditional data processing applications cannot adequately deal with them.
17. Which of the following is performed by Data Scientist?
a) Challenge results
b) Create reproducible code
c) Define the question
d) All of the mentioned
View Answer
Explanation: A data scientist is a job title for an employee or business intelligence (BI) consultant who excels at analyzing data, particularly large amounts of data.
18. Which of the following is characteristic of Processed Data?
a) Data is not ready for analysis
b) All steps should be noted
c) Hard to use for data analysis
d) None of the mentioned
View Answer
Explanation: Processing includes merging, summarizing and subsetting data.
19. Which of the following approach should be used to ask Data Analysis question?
a) Find out the answer from the dataset without asking a question
b) Find out the question which is to be answered
c) Find only one solution for a particular problem
d) None of the mentioned
View Answer
Explanation: Data analysis has multiple facets and approaches.
20. Which of the following is not a step in data analysis?
a) Obtain the data
b) Clean the data
c) EDA
d) None of the mentioned
View Answer
Explanation: EDA stands for Exploratory Data Analysis.
21. Which of the following technique comes under practical machine learning?
a) Bagging
b) Boosting
c) Forecasting
d) None of the mentioned
View Answer
Explanation: Boosting is an approach to machine learning based on the idea of creating a highly accurate predictor.
22. Which of the following uses data on some object to predict values for another object?
a) Predictive
b) Exploratory
c) Inferential
d) None of the mentioned
View Answer
Explanation: A prediction is a forecast, but not only about the weather.
23. Which of the following step is performed by data scientist after acquiring the data?
a) Data Integration
b) Data Replication
c) Data Cleansing
d) All of the mentioned
View Answer
Explanation: Data cleansing, data cleaning, or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database.
24. Which of the following is commonly referred to as ‘data fishing’?
a) Data dredging
b) Data bagging
c) Data merging
d) Data booting
View Answer
Explanation: Data dredging is sometimes referred to as “data fishing”.
25. Which of the following is characteristic of Raw Data?
a) Data is ready for analysis
b) Original version of data
c) Easy to use for data analysis
d) None of the mentioned
View Answer
Explanation: Raw data is data that has not been processed for use.
26. Which of the following input can be accepted by DataFrame?
a) DataFrame
b) Series
c) Structured ndarray
d) All of the mentioned
View Answer
Explanation: DataFrame is a 2-dimensional labeled data structure with columns of potentially different types.
27. Which type of data is generated by POS terminal in a busy supermarket each day?
a) Processed
b) Source
c) Synchronized
d) All of the mentioned
View Answer
Explanation: Raw data is sometimes referred to as source data.
28. Which of the following is a trait of tidy data?
a) Each observation in different row
b) Each variable in one column
c) One table for each kind of variable
d) None of the mentioned
View Answer
Explanation: The summary could be the sum of the observations, the number of occurrences, their mean value, and so on.
29. Which of the following data mining technique is used to uncover patterns in data?
a) Data bagging
b) Data Dredging
c) Data merging
d) Data booting
View Answer
Explanation: Data dredging, also called data snooping, refers to the practice of misusing data mining techniques to show misleading scientific ‘research’.
30. Which of the following makes use of pandas and returns data in a series or DataFrame?
a) freedapi
b) pandaSDMX
c) OutPy
d) None of the mentioned
View Answer
Explanation: freedapi module requires a FRED API key that you can obtain for free on the FRED website.
31. Which of the following is the most common problem with messy data?
a) Variables are stored in both rows and columns
b) Column headers are values
c) A single observational unit is stored in multiple tables
d) All of the mentioned
View Answer
Explanation: Real datasets can, and often do, violate the three precepts of tidy data in almost every way imaginable.
32. Which of the following function is used for loading flat files?
a) read.sheet
b) read.table
c) read.data
d) none of the mentioned
View Answer
Explanation: This reads data into the RAM.
33. Which of the following is used to extract data from HTML code of websites?
a) Webscraping
b) Webcleaning
c) Webdredging
d) All of the mentioned
View Answer
Explanation: Webscraping is a great way to get data.
34. Which of the following function is used for casting data frames?
a) dcast
b) rcast
c) ucast
d) all of the mentioned
View Answer
Explanation: Use acast or dcast depending on whether you want vector/matrix/array output or data frame output.
35. Which of the following gave rise to the need for graphs in data analysis?
a) Decision making
b) Communicating results
c) Data visualization
d) All of the mentioned
View Answer
Explanation: A picture can tell a better story than data.
36. Which of the following testing is concerned with making decisions using data?
a) Hypothesis
b) Probability
c) Causal
d) None of the mentioned
View Answer
Explanation: The null hypothesis is assumed true, and statistical evidence is required to reject it in favor of a research or alternative hypothesis.
Chapterwise Multiple Choice Questions on Data Science
- Data Science Basics and Data Scientist Toolbox
- Data Analysis with Python
- Getting Data
- Data Analysis and Research
- Statistical Inference and Regression Models
- Machine Learning
- Developing Data Products and Working with NumPy
1. Data Science Basics and Data Scientist Toolbox
The section contains multiple choice questions and answers on basics of data sciences and toolbox, workflow of CLI and git, big data analysis and experimental design.
|
|
2. Data Analysis with Python
The section contains questions and answers on pandas, time deltas, python plotting, data structures and computational tools.
|
|
3. Getting Data
The section contains MCQs on raw data, processed data, tidy data, web reading, API, data summarization and merging, regular expressions and text variables.
|
|
4. Data Analysis and Research
The section contains multiple choice questions and answers on graphical devices and plotting systems, basics of reproducible research, clustering, exploratory graphs and basics of literate statistical programming.
5. Statistical Inference and Regression Models
The section contains questions and answers on probability and statistics, basics of statistical inference, regression models, distributions and likelihood, binary and count outcomes and residual variations.
6. Machine Learning
The section contains MCQs on caret, prediction with motivation, regression and model and cross validation.
|
|
7. Developing Data Products and Working with NumPy
The section contains multiple choice questions and answers on shiny, slidify, googleVis and numPy.
|
|
Wish you the best in your endeavor to learn and master Data Science!