Hadoop Questions and Answers – Analyzing Data with Hadoop

This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on “Analyzing Data with Hadoop”.

1. Mapper implementations are passed the JobConf for the job via the ________ method.
a) JobConfigure.configure
b) JobConfigurable.configure
c) JobConfigurable.configurable
d) None of the mentioned
View Answer

Answer: b
Explanation: JobConfigurable.configure method is overridden to initialize themselves.

2. Point out the correct statement.
a) Applications can use the Reporter to report progress
b) The Hadoop MapReduce framework spawns one map task for each InputSplit generated by the InputFormat for the job
c) The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format
d) All of the mentioned
View Answer

Answer: d
Explanation: Reporters can be used to set application-level status messages and update Counters.

3. Input to the _______ is the sorted output of the mappers.
a) Reducer
b) Mapper
c) Shuffle
d) All of the mentioned
View Answer

Answer: a
Explanation: In the Shuffle phase the framework fetches the relevant partition of the output of all the mappers, via HTTP.

4. The right number of reduces seems to be ____________
a) 0.90
b) 0.80
c) 0.36
d) 0.95
View Answer

Answer: d
Explanation: The right number of reduces seems to be 0.95 or 1.75.

advertisement
advertisement

5. Point out the wrong statement.
a) Reducer has 2 primary phases
b) Increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures
c) It is legal to set the number of reduce-tasks to zero if no reduction is desired
d) The framework groups Reducer inputs by keys (since different mappers may have output the same key) in the sort stage
View Answer

Answer: a
Explanation: Reducer has 3 primary phases: shuffle, sort and reduce.

6. The output of the _______ is not sorted in the Mapreduce framework for Hadoop.
a) Mapper
b) Cascader
c) Scalding
d) None of the mentioned
View Answer

Answer: d
Explanation: The output of the reduce task is typically written to the FileSystem. The output of the Reducer is not sorted.

7. Which of the following phases occur simultaneously?
a) Shuffle and Sort
b) Reduce and Sort
c) Shuffle and Map
d) All of the mentioned
View Answer

Answer: a
Explanation: The shuffle and sort phases occur simultaneously; while map-outputs are being fetched they are merged.

8. Mapper and Reducer implementations can use the ________ to report progress or just indicate that they are alive.
a) Partitioner
b) OutputCollector
c) Reporter
d) All of the mentioned
View Answer

Answer: c
Explanation: Reporter is a facility for MapReduce applications to report progress, set application-level status messages and update Counters.

advertisement

9. __________ is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer.
a) Partitioner
b) OutputCollector
c) Reporter
d) All of the mentioned
View Answer

Answer: b
Explanation: Hadoop MapReduce comes bundled with a library of generally useful mappers, reducers, and partitioners.

10. _________ is the primary interface for a user to describe a MapReduce job to the Hadoop framework for execution.
a) Map Parameters
b) JobConf
c) MemoryConf
d) None of the mentioned
View Answer

Answer: b
Explanation: JobConf represents a MapReduce job configuration.

advertisement

Sanfoundry Global Education & Learning Series – Hadoop.

Here’s the list of Best Books in Hadoop.

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]

advertisement
advertisement
Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
Manish Bhojasia - Founder & CTO at Sanfoundry
Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

Subscribe to his free Masterclasses at Youtube & discussions at Telegram SanfoundryClasses.