Hadoop Questions and Answers – Mapreduce Formats

This set of Hadoop Interview Questions & Answers for experienced focuses on “MapReduce Formats”.

1. ___________ takes node and rack locality into account when deciding which blocks to place in the same split.
a) CombineFileOutputFormat
b) CombineFileInputFormat
c) TextFileInputFormat
d) None of the mentioned
View Answer

Answer: b
Explanation: CombineFileInputFormat does not compromise the speed at which it can process the input in a typical MapReduce job.

2. Point out the correct statement.
a) With TextInputFormat and KeyValueTextInputFormat, each mapper receives a variable number of lines of input
b) StreamXmlRecordReader, the page elements can be interpreted as records for processing by a mapper
c) The number depends on the size of the split and the length of the lines.
d) All of the mentioned
View Answer

Answer: d
Explanation: Large XML documents that are composed of a series of “records” can be broken into these records using simple string or regular-expression matching to find start and end tags of records.

3. The key, a ____________ is the byte offset within the file of the beginning of the line.
a) LongReadable
b) LongWritable
c) ShortReadable
d) All of the mentioned
View Answer

Answer: b
Explanation: The value is the contents of the line, excluding any line terminators (newline, carriage return), and is packaged as a Text object.

advertisement
advertisement

4. _________ is the output produced by TextOutputFor mat, Hadoop default OutputFormat.
a) KeyValueTextInputFormat
b) KeyValueTextOutputFormat
c) FileValueTextInputFormat
d) All of the mentioned
View Answer

Answer: b
Explanation: To interpret such files correctly, KeyValueTextInputFormat is appropriate.

5. Point out the wrong statement.
a) Hadoop sequence file format stores sequences of binary key-value pairs
b) SequenceFileAsBinaryInputFormat is a variant of SequenceFileInputFormat that retrieves the sequence file’s keys and values as opaque binary objects
c) SequenceFileAsTextInputFormat is a variant of SequenceFileInputFormat that retrieves the sequence file’s keys and values as opaque binary objects.
d) None of the mentioned
View Answer

Answer: c
Explanation: SequenceFileAsBinaryInputFormat is used for reading keys, values from SequenceFiles in binary (raw) format.

Note: Join free Sanfoundry classes at Telegram or Youtube

6. __________ is a variant of SequenceFileInputFormat that converts the sequence file’s keys and values to Text objects.
a) SequenceFile
b) SequenceFileAsTextInputFormat
c) SequenceAsTextInputFormat
d) All of the mentioned
View Answer

Answer: b
Explanation: With multiple reducers, records will be allocated evenly across reduce tasks, with all records that share the same key being processed by the same reduce task.

7. __________ class allows you to specify the InputFormat and Mapper to use on a per-path basis.
a) MultipleOutputs
b) MultipleInputs
c) SingleInputs
d) None of the mentioned
View Answer

Answer: b
Explanation: One might be tab-separated plain text, the other a binary sequence file. Even if they are in the same format, they may have different representations, and therefore need to be parsed differently.

advertisement

8. ___________ is an input format for reading data from a relational database, using JDBC.
a) DBInput
b) DBInputFormat
c) DBInpFormat
d) All of the mentioned
View Answer

Answer: b
Explanation: DBInputFormat is the most frequently used format for reading data.

9. Which of the following is the default output format?
a) TextFormat
b) TextOutput
c) TextOutputFormat
d) None of the mentioned
View Answer

Answer: c
Explanation: TextOutputFormat keys and values may be of any type.

advertisement

10. Which of the following writes MapFiles as output?
a) DBInpFormat
b) MapFileOutputFormat
c) SequenceFileAsBinaryOutputFormat
d) None of the mentioned
View Answer

Answer: c
Explanation: SequenceFileAsBinaryOutputFormat writes keys and values in raw binary format into a SequenceFile container.

11. The split size is normally the size of a ________ block, which is appropriate for most applications.
a) Generic
b) Task
c) Library
d) HDFS
View Answer

Answer: d
Explanation: FileInputFormat splits only large files(Here “large” means larger than an HDFS block).

12. Point out the correct statement.
a) The minimum split size is usually 1 byte, although some formats have a lower bound on the split size
b) Applications may impose a minimum split size
c) The maximum split size defaults to the maximum value that can be represented by a Java long type
d) All of the mentioned
View Answer

Answer: a
Explanation: The maximum split size has an effect only when it is less than the block size, forcing splits to be smaller than a block.

13. Point out the wrong statement.
a) Hadoop works better with a small number of large files than a large number of small files
b) CombineFileInputFormat is designed to work well with small files
c) CombineFileInputFormat does not compromise the speed at which it can process the input in a typical MapReduce job
d) None of the mentioned
View Answer

Answer: c
Explanation: If the file is very small (“small” means significantly smaller than an HDFS block) and there are a lot of them, then each map task will process very little input, and there will be a lot of them (one per file), each of which imposes extra bookkeeping overhead.

Sanfoundry Global Education & Learning Series – Hadoop.

Here’s the list of Best Books in Hadoop.

To practice all areas of Hadoop for interviews, here is complete set of 1000+ Multiple Choice Questions and Answers.

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]

advertisement
advertisement
Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
Manish Bhojasia - Founder & CTO at Sanfoundry
Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

Subscribe to his free Masterclasses at Youtube & discussions at Telegram SanfoundryClasses.