Hadoop Questions and Answers – Crunch with Hadoop – 2

This set of Interview Questions & Answers focuses on “Crunch with Hadoop – 2”.

1. PCollection, PTable, and PGroupedTable all support a __________ operation.
a) intersection
b) union
c) OR
d) None of the mentioned
View Answer

Answer: b
Explanation: Union operation takes a series of distinct PCollections that all have the same data type and treats them as a single virtual PCollection.

2. Point out the correct statement.
a) StreamPipeline executes the pipeline in-memory on the client
b) MemPipeline executes the pipeline by converting it to a series of Spark pipelines
c) MapReduce framework approach makes it easy for the framework to serialize data from the client to the cluster
d) All of the mentioned
View Answer

Answer: c
Explanation: SparkPipeline executes the pipeline by converting it to a series of Spark pipelines.

3. Crunch uses Java serialization to serialize the contents of all of the ______ in a pipeline definition.
a) Transient
b) DoFns
c) Configuration
d) All of the mentioned
View Answer

Answer: b
Explanation: Dofus is a Flash based massively multiplayer online role-playing game (MMORPG) developed and published by Ankama Games.

4. Inline DoFn that splits a line up into words is an inner class ____________
a) Pipeline
b) MyPipeline
c) ReadPipeline
d) WritePipe
View Answer

Answer: b
Explanation: Inner classes contain references to their parent outer classes, so unless MyPipeline implements the Serializable interface, the NotSerializableException will be thrown when Crunch tries to serialize the inner DoFn.

advertisement
advertisement

5. Point out the wrong statement.
a) DoFns also have a number of helper methods for working with Hadoop Counters, all named increment
b) The Crunch APIs contain a number of useful subclasses of DoFn that handle common data processing scenarios and are easier to write and test
c) FilterFn class defines a single abstract method
d) None of the mentioned
View Answer

Answer: d
Explanation: Counters are an incredibly useful way of keeping track of the state of long-running data pipelines and detecting any exceptional conditions that occur during processing

6. DoFns provide direct access to the __________ object that is used within a given Map or Reduce task via the getContext method.
a) TaskInputContext
b) TaskInputOutputContext
c) TaskOutputContext
d) All of the mentioned
View Answer

Answer: b
Explanation: There are also a number of helper methods for working with the objects associated with the TaskInputOutputContext

Note: Join free Sanfoundry classes at Telegram or Youtube

7. The top-level ___________ package contains three of the most important specializations in Crunch.
a) org.apache.scrunch
b) org.apache.crunch
c) org.apache.kcrunch
d) all of the mentioned
View Answer

Answer: b
Explanation: Each of these specialized DoFn implementations has associated methods on the PCollection, PTable, and PGroupedTable interfaces to support common data processing steps.

8. The Avros class also has a _____ method for creating PTypes for POJOs using Avro’s reflection-based serialization mechanism.
a) spot
b) reflects
c) gets
d) all of the mentioned
View Answer

Answer: b
Explanation: There are a couple of restrictions on the structure of the POJO.

advertisement

9. The ______________ class defines a configuration parameter named LINES_PER_MAP that controls how the input file is split.
a) NLineInputFormat
b) InputLineFormat
c) LineInputFormat
d) None of the mentioned
View Answer

Answer: a
Explanation: We can set the value of parameter via the Source interface’s inputConf method.

10. The ________ class allows developers to exercise precise control over how data is partitioned, sorted, and grouped by the underlying execution engine.
a) Grouping
b) GroupingOptions
c) RowGrouping
d) None of the mentioned
View Answer

Answer: b
Explanation: The GroupingOptions class is immutable.

advertisement

Sanfoundry Global Education & Learning Series – Hadoop.

Here’s the list of Best Books in Hadoop.

To practice all interview questions and answers on Crunch, here is complete set of 1000+ Multiple Choice Questions and Answers.

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]

advertisement
advertisement
Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
Manish Bhojasia - Founder & CTO at Sanfoundry
Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

Subscribe to his free Masterclasses at Youtube & discussions at Telegram SanfoundryClasses.