This set of Questions & Answers focuses on “Mapreduce Development – 2”.
1. The Mapper implementation processes one line at a time via _________ method.
a) map
b) reduce
c) mapper
d) reducer
View Answer
Explanation: The Mapper outputs are sorted and then partitioned per Reducer.
2. Point out the correct statement.
a) Mapper maps input key/value pairs to a set of intermediate key/value pairs
b) Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce methods
c) Mapper and Reducer interfaces form the core of the job
d) None of the mentioned
View Answer
Explanation: The transformed intermediate records do not need to be of the same type as the input records.
3. The Hadoop MapReduce framework spawns one map task for each __________ generated by the InputFormat for the job.
a) OutputSplit
b) InputSplit
c) InputSplitStream
d) All of the mentioned
View Answer
Explanation: Mapper implementations are passed the JobConf for the job via the JobConfigurable.configure(JobConf) method and override it to initialize themselves.
4. Users can control which keys (and hence records) go to which Reducer by implementing a custom?
a) Partitioner
b) OutputSplit
c) Reporter
d) All of the mentioned
View Answer
Explanation: Users can control the grouping by specifying a Comparator via JobConf.setOutputKeyComparatorClass(Class).
5. Point out the wrong statement.
a) The Mapper outputs are sorted and then partitioned per Reducer
b) The total number of partitions is the same as the number of reduce tasks for the job
c) The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format
d) None of the mentioned
View Answer
Explanation: All intermediate values associated with a given output key are subsequently grouped by the framework, and passed to the Reducer(s) to determine the final output.
6. Applications can use the ____________ to report progress and set application-level status messages.
a) Partitioner
b) OutputSplit
c) Reporter
d) All of the mentioned
View Answer
Explanation: Reporter is also used to update Counters, or just indicate that they are alive.
7. The right level of parallelism for maps seems to be around _________ maps per-node.
a) 1-10
b) 10-100
c) 100-150
d) 150-200
View Answer
Explanation: Task setup takes a while, so it is best if the maps take at least a minute to execute.
8. The number of reduces for the job is set by the user via _________
a) JobConf.setNumTasks(int)
b) JobConf.setNumReduceTasks(int)
c) JobConf.setNumMapTasks(int)
d) All of the mentioned
View Answer
Explanation: Reducer has 3 primary phases: shuffle, sort and reduce.
9. The framework groups Reducer inputs by key in _________ stage.
a) sort
b) shuffle
c) reduce
d) none of the mentioned
View Answer
Explanation: The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.
10. The output of the reduce task is typically written to the FileSystem via _____________
a) OutputCollector.collect
b) OutputCollector.get
c) OutputCollector.receive
d) OutputCollector.put
View Answer
Explanation: The output of the Reducer is not sorted.
Sanfoundry Global Education & Learning Series – Hadoop.
Here’s the list of Best Books in Hadoop.
- Apply for Computer Science Internship
- Practice Programming MCQs
- Check Hadoop Books
- Check Programming Books