This set of tricky Data Science questions and answers focuses on “Tidy Data”.
1. Which of the following is an example of tidy data?
a) complicated JSON from facebook API
b) complicated JSON from Twitter API
c) unformatted excel file
d) all of the mentioned
View Answer
Explanation: Tidy data is obtained after processing script.
2. Point out the correct statement.
a) Nearly 80% of data analysis is spent on wrangling data
b) Nearly 20% of data analysis is spent on data dredging
c) Nearly 80% of data analysis is spent on the cleaning and preparing data
d) None of the mentioned
View Answer
Explanation: Data cleansing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database.
3. Which of the following is a trait of tidy data?
a) each variable in one column
b) each observation in different row
c) one table for each kind of variable
d) none of the mentioned
View Answer
Explanation: The summary could be the sum of the observations, the number of occurrences, their mean value, and so on.
4. Which of the following package is used for tidy data?
a) tidyr
b) souryr
c) NumPy
d) all of the mentioned
View Answer
Explanation: tidyr is used for tidy data with spread and gather functions.
5. Point out the wrong statement.
a) Tidy datasets are all alike but every messy dataset is messy in its own way
b) Most statistical datasets are data frames made up of rows and columns
c) Tidy datasets provide a standardized way to link the structure of a dataset with its semantics
d) None of the mentioned
View Answer
Explanation: The tidy data standard has been designed to simplify the development of data analysis tools that work well together.
6. Which of the following process involves structuring datasets to facilitate analysis?
a) Data tidying
b) Data mining
c) Data booting
d) All of the mentioned
View Answer
Explanation: The principles of tidy data provide a standard way to organize data values within a dataset.
7. Strange binary file generated from machines is an example of tidy data.
a) True
b) False
View Answer
Explanation: Data sets stored in spreadsheets, such as Microsoft’s Excel, are binary, not raw ASCII data files.
8. Which of the following is the most common problem with messy data?
a) Column headers are values
b) Variables are stored in both rows and columns
c) A single observational unit is stored in multiple tables
d) All of the mentioned
View Answer
Explanation: Real datasets can, and often do, violate the three precepts of tidy data in almost every way imaginable.
9. tidyr is a reframing of _______ designed to accompany the tidy data framework.
a) reshape5
b) dplyr
c) reshape2
d) all of the mentioned
View Answer
Explanation: tidyr does less reframing than reshape2.
10. Raw data in the real-world is tidy and properly formatted.
a) True
b) False
View Answer
Explanation: Data analysis is not a goal in itself; the goal is to enable the business to make better decisions.
Sanfoundry Global Education & Learning Series – Data Science.
Here’s the list of Best Books in Data Science.
- Check Programming Books
- Check Computer Science Books
- Practice Computer Science MCQs
- Practice Programming MCQs
- Check Data Science Books