Hadoop Data Flow Questions and Answers

This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on “Data Flow”.

1. ________ is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.
a) Hive
b) MapReduce
c) Pig
d) Lucene
View Answer

Answer: b
Explanation: MapReduce is the heart of hadoop.

2. Point out the correct statement.
a) Data locality means movement of the algorithm to the data instead of data to algorithm
b) When the processing is done on the data algorithm is moved across the Action Nodes rather than data to the algorithm
c) Moving Computation is expensive than Moving Data
d) None of the mentioned
View Answer

Answer: a
Explanation: Data flow framework possesses the feature of data locality.

3. The daemons associated with the MapReduce phase are ________ and task-trackers.
a) job-tracker
b) map-tracker
c) reduce-tracker
d) all of the mentioned
View Answer

Answer: a
Explanation: Map-Reduce jobs are submitted on job-tracker.

4. The JobTracker pushes work out to available _______ nodes in the cluster, striving to keep the work as close to the data as possible.
a) DataNodes
b) TaskTracker
c) ActionNodes
d) All of the mentioned
View Answer

Answer: b
Explanation: A heartbeat is sent from the TaskTracker to the JobTracker every few minutes to check its status whether the node is dead or alive.

5. Point out the wrong statement.
a) The map function in Hadoop MapReduce have the following general form:map:(K1, V1) -> list(K2, V2)
b) The reduce function in Hadoop MapReduce have the following general form: reduce: (K2, list(V2)) -> list(K3, V3)
c) MapReduce has a complex model of data processing: inputs and outputs for the map and reduce functions are key-value pairs
d) None of the mentioned
View Answer

Answer: c
Explanation: MapReduce is relatively simple model to implement in Hadoop.

6. InputFormat class calls the ________ function and computes splits for each file and then sends them to the jobtracker.
a) puts
b) gets
c) getSplits
d) all of the mentioned
View Answer

Answer: c
Explanation: InputFormat uses their storage locations to schedule map tasks to process them on the tasktrackers.

Sanfoundry Certification Contest of the Month is Live. 100+ Subjects. Participate Now!

7. On a tasktracker, the map task passes the split to the createRecordReader() method on InputFormat to obtain a _________ for that split.
a) InputReader
b) RecordReader
c) OutputReader
d) None of the mentioned
View Answer

Answer: b
Explanation: The RecordReader loads data from its source and converts into key-value pairs suitable for reading by mapper.

8. The default InputFormat is __________ which treats each value of input a new value and the associated key is byte offset.
a) TextFormat
b) TextInputFormat
c) InputFormat
d) All of the mentioned
View Answer

Answer: b
Explanation: A RecordReader is little more than an iterator over records, and the map task uses one to generate record key-value pairs.

9. __________ controls the partitioning of the keys of the intermediate map-outputs.
a) Collector
b) Partitioner
c) InputFormat
d) None of the mentioned
View Answer

Answer: b
Explanation: The output of the mapper is sent to the partitioner.

10. Output of the mapper is first written on the local disk for sorting and _________ process.
a) shuffling
b) secondary sorting
c) forking
d) reducing
View Answer

Answer: a
Explanation: All values corresponding to the same key will go the same reducer.

Sanfoundry Global Education & Learning Series – Hadoop.

Here’s the list of Best Books in Hadoop.

To practice all areas of Hadoop, here is complete set of 1000+ Multiple Choice Questions and Answers.

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]

« Prev - Hadoop Questions and Answers – Java Interface

» Next - Hadoop Questions and Answers – Hadoop Archives

Related Posts:

Recommended Articles: