This set of Hadoop Multiple Choice Questions & Answers (MCQs) focuses on “Mahout with Hadoop”.
1. Mahout provides ____________ libraries for common and primitive Java collections.
a) Java
b) Javascript
c) Perl
d) Python
View Answer
Explanation: Maths operations are focused on linear algebra and statistics.
2. Point out the correct statement.
a) Mahout is distributed under a commercially friendly Apache Software license
b) Mahout is a library of scalable machine-learning algorithms, implemented on top of Apache Hadoop® and using the MapReduce paradigm
c) Apache Mahout is a project of the Apache Software Foundation to produce free implementations of distributed or otherwise scalable machine learning algorithms
d) None of the mentioned
View Answer
Explanation: The goal of Mahout is to build a vibrant, responsive, diverse community to facilitate discussions not only on the project itself but also on potential use cases.
3. _________ does not restrict contributions to Hadoop based implementations.
a) Mahout
b) Oozie
c) Impala
d) All of the mentioned
View Answer
Explanation: Mahout is distributed under a commercially friendly Apache Software license.
4. Mahout provides an implementation of a ______________ identification algorithm which scores collocations using log-likelihood ratio.
a) collocation
b) compaction
c) collection
d) none of the mentioned
View Answer
Explanation: The log-likelihood score indicates the relative usefulness of a collocation with regards other term combinations in the text.
5. Point out the wrong statement.
a) ‘Taste’ collaborative-filtering recommender component of Mahout was originally a separate project and can run standalone without Hadoop
b) Integration of Mahout with initiatives such as the Pregel-like Giraph are actively under discussion
c) Calculating the LLR is very straightforward
d) None of the mentioned
View Answer
Explanation: There are a couple ways to run the llr-based collocation algorithm in mahout.
6. The tokens are passed through a Lucene ____________ to produce NGrams of the desired length.
a) ShngleFil
b) ShingleFilter
c) SingleFilter
d) Collfilter
View Answer
Explanation: The tools that the collocation identification algorithm are embedded within either consume tokenized text as input or provide the ability to specify an implementation of the Lucene Analyzer class perform tokenization in order to form ngrams.
7. The _________ collocation identifier is integrated into the process that is used to create vectors from sequence files of text keys and values.
a) lbr
b) lcr
c) llr
d) lar
View Answer
Explanation: The –minLLR option can be used to control the cutoff that prevents collocations below the specified LLR score from being emitted.
8. ____________ generates NGrams and counts frequencies for ngrams, head and tail subgrams.
a) CollocationDriver
b) CollocDriver
c) CarDriver
d) All of the mentioned
View Answer
Explanation: Each call to the mapper passes in the full set of tokens for the corresponding document using a StringTuple.
9. A key of type ___________ is generated which is used later to join ngrams with their heads and tails in the reducer phase.
a) GramKey
b) Primary
c) Secondary
d) None of the mentioned
View Answer
Explanation: The GramKey is a composite key made up of a string n-gram fragment as the primary key and a secondary key used for grouping and sorting in the reduce phase.
10. ________ phase merges the counts for unique ngrams or ngram fragments across multiple documents.
a) CollocCombiner
b) CollocReducer
c) CollocMerger
d) None of the mentioned
View Answer
Explanation: The combiner treats the entire GramKey as the key and as such, identical tuples from separate documents are passed into a single call to the combiner’s reduce method, their frequencies are summed and a single tuple is passed out via the collector.
Sanfoundry Global Education & Learning Series – Hadoop.
Here’s the list of Best Books in Hadoop.
- Check Hadoop Books
- Apply for Computer Science Internship
- Check Programming Books
- Practice Programming MCQs