Hadoop Data Analysis Questions and Answers

This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on “Analyzing Data with Hadoop”.

1. Mapper implementations are passed the JobConf for the job via the ________ method.
a) JobConfigure.configure
b) JobConfigurable.configure
c) JobConfigurable.configurable
d) None of the mentioned
View Answer

Answer: b
Explanation: JobConfigurable.configure method is overridden to initialize themselves.

2. Point out the correct statement.
a) Applications can use the Reporter to report progress
b) The Hadoop MapReduce framework spawns one map task for each InputSplit generated by the InputFormat for the job
c) The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format
d) All of the mentioned
View Answer

Answer: d
Explanation: Reporters can be used to set application-level status messages and update Counters.

3. Input to the _______ is the sorted output of the mappers.
a) Reducer
b) Mapper
c) Shuffle
d) All of the mentioned
View Answer

Answer: a
Explanation: In the Shuffle phase the framework fetches the relevant partition of the output of all the mappers, via HTTP.

4. The right number of reduces seems to be ____________
a) 0.90
b) 0.80
c) 0.36
d) 0.95
View Answer

Answer: d
Explanation: The right number of reduces seems to be 0.95 or 1.75.

5. Point out the wrong statement.
a) Reducer has 2 primary phases
b) Increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures
c) It is legal to set the number of reduce-tasks to zero if no reduction is desired
d) The framework groups Reducer inputs by keys (since different mappers may have output the same key) in the sort stage
View Answer

Answer: a
Explanation: Reducer has 3 primary phases: shuffle, sort and reduce.

6. The output of the _______ is not sorted in the Mapreduce framework for Hadoop.
a) Mapper
b) Cascader
c) Scalding
d) None of the mentioned
View Answer

Answer: d
Explanation: The output of the reduce task is typically written to the FileSystem. The output of the Reducer is not sorted.

Subscribe Now: Hadoop Newsletter | Important Subjects Newsletters

7. Which of the following phases occur simultaneously?
a) Shuffle and Sort
b) Reduce and Sort
c) Shuffle and Map
d) All of the mentioned
View Answer

Answer: a
Explanation: The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.

8. Mapper and Reducer implementations can use the ________ to report progress or just indicate that they are alive.
a) Partitioner
b) OutputCollector
c) Reporter
d) All of the mentioned
View Answer

Answer: c
Explanation: Reporter is a facility for MapReduce applications to report progress, set application-level status messages and update Counters.

9. __________ is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer.
a) Partitioner
b) OutputCollector
c) Reporter
d) All of the mentioned
View Answer

Answer: b
Explanation: Hadoop MapReduce comes bundled with a library of generally useful mappers, reducers, and partitioners.

10. _________ is the primary interface for a user to describe a MapReduce job to the Hadoop framework for execution.
a) Map Parameters
b) JobConf
c) MemoryConf
d) None of the mentioned
View Answer

Answer: b
Explanation: JobConf represents a MapReduce job configuration.

Sanfoundry Global Education & Learning Series – Hadoop.

Here’s the list of Best Books in Hadoop.

To practice all areas of Hadoop, here is complete set of 1000+ Multiple Choice Questions and Answers.

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]

« Prev - Hadoop Questions and Answers – Introduction to Mapreduce

» Next - Hadoop Questions and Answers – Scaling out in Hadoop

Related Posts:

Recommended Articles: