Hadoop Mapreduce Questions and Answers

This set of Questions & Answers focuses on “Mapreduce Development – 2”.

1. The Mapper implementation processes one line at a time via _________ method.
a) map
b) reduce
c) mapper
d) reducer
View Answer

Answer: a
Explanation: The Mapper outputs are sorted and then partitioned per Reducer.

2. Point out the correct statement.
a) Mapper maps input key/value pairs to a set of intermediate key/value pairs
b) Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce methods
c) Mapper and Reducer interfaces form the core of the job
d) None of the mentioned
View Answer

Answer: d
Explanation: The transformed intermediate records do not need to be of the same type as the input records.

3. The Hadoop MapReduce framework spawns one map task for each __________ generated by the InputFormat for the job.
a) OutputSplit
b) InputSplit
c) InputSplitStream
d) All of the mentioned
View Answer

Answer: b
Explanation: Mapper implementations are passed the JobConf for the job via the JobConfigurable.configure(JobConf) method and override it to initialize themselves.

4. Users can control which keys (and hence records) go to which Reducer by implementing a custom?
a) Partitioner
b) OutputSplit
c) Reporter
d) All of the mentioned
View Answer

Answer: a
Explanation: Users can control the grouping by specifying a Comparator via JobConf.setOutputKeyComparatorClass(Class).

5. Point out the wrong statement.
a) The Mapper outputs are sorted and then partitioned per Reducer
b) The total number of partitions is the same as the number of reduce tasks for the job
c) The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format
d) None of the mentioned
View Answer

Answer: d
Explanation: All intermediate values associated with a given output key are subsequently grouped by the framework, and passed to the Reducer(s) to determine the final output.

6. Applications can use the ____________ to report progress and set application-level status messages.
a) Partitioner
b) OutputSplit
c) Reporter
d) All of the mentioned
View Answer

Answer: c
Explanation: Reporter is also used to update Counters, or just indicate that they are alive.

Note: Join free Sanfoundry classes at Telegram or Youtube

7. The right level of parallelism for maps seems to be around _________ maps per-node.
a) 1-10
b) 10-100
c) 100-150
d) 150-200
View Answer

Answer: b
Explanation: Task setup takes a while, so it is best if the maps take at least a minute to execute.

8. The number of reduces for the job is set by the user via _________
a) JobConf.setNumTasks(int)
b) JobConf.setNumReduceTasks(int)
c) JobConf.setNumMapTasks(int)
d) All of the mentioned
View Answer

Answer: b
Explanation: Reducer has 3 primary phases: shuffle, sort and reduce.

9. The framework groups Reducer inputs by key in _________ stage.
a) sort
b) shuffle
c) reduce
d) none of the mentioned
View Answer

Answer: a
Explanation: The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.

10. The output of the reduce task is typically written to the FileSystem via _____________
a) OutputCollector.collect
b) OutputCollector.get
c) OutputCollector.receive
d) OutputCollector.put
View Answer

Answer: a
Explanation: The output of the Reducer is not sorted.

Sanfoundry Global Education & Learning Series – Hadoop.

Here’s the list of Best Books in Hadoop.

To practice all areas of Hadoop MapReduce, here is complete set of 1000+ Multiple Choice Questions and Answers.

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]

« Prev - Hadoop Questions and Answers – Mapreduce Development – 1

» Next - Hadoop Questions and Answers – MapReduce Features – 1

Related Posts:

Recommended Articles: