Advanced Hadoop Questions and Answers

This set of Advanced Hadoop Multiple Choice Questions & Answers (MCQs) focuses on “Mahout with Hadoop”.

1. Mahout provides ____________ libraries for common and primitive Java collections.
a) Java
b) Javascript
c) Perl
d) Python
View Answer

Answer: a
Explanation: Maths operations are focused on linear algebra and statistics.

2. Point out the correct statement.
a) Mahout is distributed under a commercially friendly Apache Software license
b) Mahout is a library of scalable machine-learning algorithms, implemented on top of Apache Hadoop® and using the MapReduce paradigm
c) Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms
d) None of the mentioned
View Answer

Answer: d
Explanation: The goal of Mahout is to build a vibrant, responsive, diverse community to facilitate discussions not only on the project itself but also on potential use cases.

3. _________ does not restrict contributions to Hadoop based implementations.
a) Mahout
b) Oozie
c) Impala
d) All of the mentioned
View Answer

Answer: a
Explanation: Mahout is distributed under a commercially friendly Apache Software license.

4. Mahout provides an implementation of a ______________ identification algorithm which scores collocations using log-likelihood ratio.
a) collocation
b) compaction
c) collection
d) none of the mentioned
View Answer

Answer: a
Explanation: The log-likelihood score indicates the relative usefulness of a collocation with regards other term combinations in the text.

5. Point out the wrong statement.
a) ‘Taste’ collaborative-filtering recommender component of Mahout was originally a separate project and can run standalone without Hadoop
b) Integration of Mahout with initiatives such as the Pregel-like Giraph are actively under discussion
c) Calculating the LLR is very straightforward
d) None of the mentioned
View Answer

Answer: d
Explanation: There are a couple ways to run the llr-based collocation algorithm in mahout.

6. The tokens are passed through a Lucene ____________ to produce NGrams of the desired length.
a) ShngleFil
b) ShingleFilter
c) SingleFilter
d) Collfilter
View Answer

Answer: b
Explanation: The tools that the collocation identification algorithm are embedded within either consume tokenized text as input or provide the ability to specify an implementation of the Lucene Analyzer class perform tokenization in order to form ngrams.

Note: Join free Sanfoundry classes at Telegram or Youtube

7. The _________ collocation identifier is integrated into the process that is used to create vectors from sequence files of text keys and values.
a) lbr
b) lcr
c) llr
d) lar
View Answer

Answer: c
Explanation: The –minLLR option can be used to control the cutoff that prevents collocations below the specified LLR score from being emitted.

8. ____________ generates NGrams and counts frequencies for ngrams, head and tail subgrams.
a) CollocationDriver
b) CollocDriver
c) CarDriver
d) All of the mentioned
View Answer

Answer: b
Explanation: Each call to the mapper passes in the full set of tokens for the corresponding document using a StringTuple.

9. A key of type ___________ is generated which is used later to join ngrams with their heads and tails in the reducer phase.
a) GramKey
b) Primary
c) Secondary
d) None of the mentioned
View Answer

Answer: a
Explanation: The GramKey is a composite key made up of a string n-gram fragment as the primary key and a secondary key used for grouping and sorting in the reduce phase.

10. ________ phase merges the counts for unique ngrams or ngram fragments across multiple documents.
a) CollocCombiner
b) CollocReducer
c) CollocMerger
d) None of the mentioned
View Answer

Answer: a
Explanation: The combiner treats the entire GramKey as the key and as such, identical tuples from separate documents are passed into a single call to the combiner’s reduce method, their frequencies are summed and a single tuple is passed out via the collector.

Sanfoundry Global Education & Learning Series – Hadoop.

Here’s the list of Best Books in Hadoop.

To practice advanced questions on all areas of Hadoop, here is complete set of 1000+ Multiple Choice Questions and Answers.

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]

« Prev - Hadoop Questions and Answers – HCatalog with Hadoop – 2

» Next - Hadoop Questions and Answers – Drill with Hadoop

Related Posts:

Recommended Articles: