Hadoop Questions and Answers – Crunch with Hadoop – 1

This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on “Crunch with Hadoop – 1”.

1. The Apache Crunch Java library provides a framework for writing, testing, and running ___________ pipelines.
a) MapReduce
b) Pig
c) Hive
d) None of the mentioned
View Answer

Answer: a
Explanation: Goal of Crunch is to make pipelines that are composed of many user-defined functions simple to write, easy to test, and efficient to run.

2. Point out the correct statement.
a) Scrunch’s Java API is centered around three interfaces that represent distributed datasets
b) All of the other data transformation operations supported by the Crunch APIs are implemented in terms of three primitives
c) A number of common Aggregator<V> implementations are provided in the Aggregators class
d) All of the mentioned
View Answer

Answer: c
Explanation: PGroupedTable provides a combine values operation that allows a commutative and associative Aggregator to be applied to the values of the PGroupedTable instance on both the map and reduce sides of the shuffle.

3. For Scala users, there is the __________ API, which is built on top of the Java APIs.
a) Prunch
b) Scrunch
c) Hivench
d) All of the mentioned
View Answer

Answer: b
Explanation: It includes a REPL (read-eval-print loop) for creating MapReduce pipelines.

4. The Crunch APIs are modeled after _________ which is the library that Google uses for building data pipelines on top of their own implementation of MapReduce.
a) FlagJava
b) FlumeJava
c) FlakeJava
d) All of the mentioned
View Answer

Answer: b
Explanation: The Apache Crunch project develops and supports Java APIs that simplify the process of creating data pipelines on top of Apache Hadoop.

advertisement
advertisement

5. Point out the wrong statement.
a) Crunch pipeline written by the development team sessionizes a set of user logs generates are then processed by a diverse collection of Pig scripts and Hive queries
b) Crunch pipelines provide a thin veneer on top of MapReduce
c) Developers have access to low-level MapReduce APIs
d) None of the mentioned
View Answer

Answer: d
Explanation: Crunch is extremely fast, only slightly slower than a hand-tuned pipeline developed with the MapReduce APIs.

6. Crunch was designed for developers who understand __________ and want to use MapReduce effectively.
a) Java
b) Python
c) Scala
d) Javascript
View Answer

Answer: a
Explanation: Crunch is often used in conjunction with Hive and Pig.

7. Hive, Pig, and Cascading all use a _________ data model.
a) value centric
b) columnar
c) tuple-centric
d) none of the mentioned
View Answer

Answer: c
Explanation: Crunch allows developers considerable flexibility in how they represent their data, which makes Crunch the best pipeline platform for developers.

8. A __________ represents a distributed, immutable collection of elements of type T.
a) PCollect<T>
b) PCollection<T>
c) PCol<T>
d) All of the mentioned
View Answer

Answer: b
Explanation: PCollection<T> provides a method, parallelDo, that applies a DoFn to each element in the PCollection<T>.

advertisement

9. ___________ executes the pipeline as a series of MapReduce jobs.
a) SparkPipeline
b) MRPipeline
c) MemPipeline
d) None of the mentioned
View Answer

Answer: b
Explanation: Every Crunch data pipeline is coordinated by an instance of the Pipeline interface.

10. __________ represent the logical computations of your Crunch pipelines.
a) DoFns
b) DoFn
c) ThreeFns
d) None of the mentioned
View Answer

Answer: a
Explanation: DoFns are designed to be easy to write, easy to test, and easy to deploy within the context of a MapReduce job.

advertisement

Sanfoundry Global Education & Learning Series – Hadoop.

Here’s the list of Best Books in Hadoop.

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]

advertisement
advertisement
Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
Manish Bhojasia - Founder & CTO at Sanfoundry
Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

Subscribe to his free Masterclasses at Youtube & discussions at Telegram SanfoundryClasses.