This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on “Data Flow”.
1. ________ is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.
a) Hive
b) MapReduce
c) Pig
d) Lucene
View Answer
Explanation: MapReduce is the heart of hadoop.
2. Point out the correct statement.
a) Data locality means movement of the algorithm to the data instead of data to algorithm
b) When the processing is done on the data algorithm is moved across the Action Nodes rather than data to the algorithm
c) Moving Computation is expensive than Moving Data
d) None of the mentioned
View Answer
Explanation: Data flow framework possesses the feature of data locality.
3. The daemons associated with the MapReduce phase are ________ and task-trackers.
a) job-tracker
b) map-tracker
c) reduce-tracker
d) all of the mentioned
View Answer
Explanation: Map-Reduce jobs are submitted on job-tracker.
4. The JobTracker pushes work out to available _______ nodes in the cluster, striving to keep the work as close to the data as possible.
a) DataNodes
b) TaskTracker
c) ActionNodes
d) All of the mentioned
View Answer
Explanation: A heartbeat is sent from the TaskTracker to the JobTracker every few minutes to check its status whether the node is dead or alive.
5. Point out the wrong statement.
a) The map function in Hadoop MapReduce have the following general form:map:(K1, V1) -> list(K2, V2)
b) The reduce function in Hadoop MapReduce have the following general form: reduce: (K2, list(V2)) -> list(K3, V3)
c) MapReduce has a complex model of data processing: inputs and outputs for the map and reduce functions are key-value pairs
d) None of the mentioned
View Answer
Explanation: MapReduce is relatively simple model to implement in Hadoop.
6. InputFormat class calls the ________ function and computes splits for each file and then sends them to the jobtracker.
a) puts
b) gets
c) getSplits
d) all of the mentioned
View Answer
Explanation: InputFormat uses their storage locations to schedule map tasks to process them on the tasktrackers.
7. On a tasktracker, the map task passes the split to the createRecordReader() method on InputFormat to obtain a _________ for that split.
a) InputReader
b) RecordReader
c) OutputReader
d) None of the mentioned
View Answer
Explanation: The RecordReader loads data from its source and converts into key-value pairs suitable for reading by mapper.
8. The default InputFormat is __________ which treats each value of input a new value and the associated key is byte offset.
a) TextFormat
b) TextInputFormat
c) InputFormat
d) All of the mentioned
View Answer
Explanation: A RecordReader is little more than an iterator over records, and the map task uses one to generate record key-value pairs.
9. __________ controls the partitioning of the keys of the intermediate map-outputs.
a) Collector
b) Partitioner
c) InputFormat
d) None of the mentioned
View Answer
Explanation: The output of the mapper is sent to the partitioner.
10. Output of the mapper is first written on the local disk for sorting and _________ process.
a) shuffling
b) secondary sorting
c) forking
d) reducing
View Answer
Explanation: All values corresponding to the same key will go the same reducer.
Sanfoundry Global Education & Learning Series – Hadoop.
Here’s the list of Best Books in Hadoop.
- Practice Programming MCQs
- Check Programming Books
- Apply for Computer Science Internship
- Check Hadoop Books