Hadoop Questions and Answers – Data Flow

This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on “Data Flow”.

1. ________ is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.
a) Hive
b) MapReduce
c) Pig
d) Lucene
View Answer

Answer: b
Explanation: MapReduce is the heart of hadoop.

2. Point out the correct statement.
a) Data locality means movement of the algorithm to the data instead of data to algorithm
b) When the processing is done on the data algorithm is moved across the Action Nodes rather than data to the algorithm
c) Moving Computation is expensive than Moving Data
d) None of the mentioned
View Answer

Answer: a
Explanation: Data flow framework possesses the feature of data locality.

3. The daemons associated with the MapReduce phase are ________ and task-trackers.
a) job-tracker
b) map-tracker
c) reduce-tracker
d) all of the mentioned
View Answer

Answer: a
Explanation: Map-Reduce jobs are submitted on job-tracker.

4. The JobTracker pushes work out to available _______ nodes in the cluster, striving to keep the work as close to the data as possible.
a) DataNodes
b) TaskTracker
c) ActionNodes
d) All of the mentioned
View Answer

Answer: b
Explanation: A heartbeat is sent from the TaskTracker to the JobTracker every few minutes to check its status whether the node is dead or alive.

advertisement
advertisement

5. Point out the wrong statement.
a) The map function in Hadoop MapReduce have the following general form:map:(K1, V1) -> list(K2, V2)
b) The reduce function in Hadoop MapReduce have the following general form: reduce: (K2, list(V2)) -> list(K3, V3)
c) MapReduce has a complex model of data processing: inputs and outputs for the map and reduce functions are key-value pairs
d) None of the mentioned
View Answer

Answer: c
Explanation: MapReduce is relatively simple model to implement in Hadoop.

6. InputFormat class calls the ________ function and computes splits for each file and then sends them to the jobtracker.
a) puts
b) gets
c) getSplits
d) all of the mentioned
View Answer

Answer: c
Explanation: InputFormat uses their storage locations to schedule map tasks to process them on the tasktrackers.

Sanfoundry Certification Contest of the Month is Live. 100+ Subjects. Participate Now!

7. On a tasktracker, the map task passes the split to the createRecordReader() method on InputFormat to obtain a _________ for that split.
a) InputReader
b) RecordReader
c) OutputReader
d) None of the mentioned
View Answer

Answer: b
Explanation: The RecordReader loads data from its source and converts into key-value pairs suitable for reading by mapper.

8. The default InputFormat is __________ which treats each value of input a new value and the associated key is byte offset.
a) TextFormat
b) TextInputFormat
c) InputFormat
d) All of the mentioned
View Answer

Answer: b
Explanation: A RecordReader is little more than an iterator over records, and the map task uses one to generate record key-value pairs.

advertisement

9. __________ controls the partitioning of the keys of the intermediate map-outputs.
a) Collector
b) Partitioner
c) InputFormat
d) None of the mentioned
View Answer

Answer: b
Explanation: The output of the mapper is sent to the partitioner.

10. Output of the mapper is first written on the local disk for sorting and _________ process.
a) shuffling
b) secondary sorting
c) forking
d) reducing
View Answer

Answer: a
Explanation: All values corresponding to the same key will go the same reducer.

advertisement

Sanfoundry Global Education & Learning Series – Hadoop.

Here’s the list of Best Books in Hadoop.

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]

advertisement
advertisement
Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
Manish Bhojasia - Founder & CTO at Sanfoundry
Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

Subscribe to his free Masterclasses at Youtube & discussions at Telegram SanfoundryClasses.