This set of MCQs focuses on “Amazon Elastic Mapreduce”.
1. Amazon EMR also allows you to run multiple versions concurrently, allowing you to control your ___________ version upgrade.
a) Pig
b) Windows Server
c) Hive
d) Ubuntu
View Answer
Explanation: Amazon EMR supports several versions of Hive, which you can install on any running cluster.
2. Point out the correct statement.
a) Amazon Elastic MapReduce (Amazon EMR) provides support for Apache Hive
b) Pig extends the SQL paradigm by including serialization formats and the ability to invoke mapper and reducer scripts
c) The Amazon Hive default input format is text
d) All of the mentioned
View Answer
Explanation: With Hive 0.13.1 on Amazon EMR, certain options introduced in previous versions of Hive on EMR have been removed in favor of greater parity with Apache Hive. For example, the -x option was removed.
3. The Amazon EMR default input format for Hive is __________
a) org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
b) org.apache.hadoop.hive.ql.iont.CombineHiveInputFormat
c) org.apache.hadoop.hive.ql.io.CombineFormat
d) All of the mentioned
View Answer
Explanation: You can specify the hive.base.inputformat option in Hive to select a different file format,
4. Hadoop clusters running on Amazon EMR use ______ instances as virtual Linux servers for the master and slave nodes.
a) EC2
b) EC3
c) EC4
d) None of the mentioned
View Answer
Explanation: Amazon EMR has made enhancements to Hadoop and other open-source applications to work seamlessly with AWS.
5. Point out the wrong statement.
a) Apache Hive saves Hive log files to /tmp/{user.name}/ in a file named hive.log
b) Amazon EMR saves Hive logs to /mnt/var/log/apps/
c) In order to support concurrent versions of Hive, the version of Hive you run determines the log file name
d) None of the mentioned
View Answer
Explanation: If you have many GZip files in your Hive cluster, you can optimize performance by passing multiple files to each mapper.
6. Amazon EMR uses Hadoop processing combined with several __________ products.
a) AWS
b) ASQ
c) AMR
d) AWES
View Answer
Explanation: Amazon Elastic MapReduce (Amazon EMR) is a web service that makes it easy to process large amounts of data efficiently.
7. ___________ is an RPC framework that defines a compact binary serialization format used to persist data structures for later analysis.
a) Pig
b) Hive
c) Thrift
d) None of the mentioned
View Answer
Explanation: Amazon EMR does not support Hive Authorization.
8. Impala on Amazon EMR requires _________ running Hadoop 2.x or greater.
a) AMS
b) AMI
c) AWR
d) All of the mentioned
View Answer
Explanation: Impala is an open source tool in the Hadoop ecosystem for interactive, ad hoc querying using SQL syntax.
9. Impala executes SQL queries using a _________ engine.
a) MAP
b) MPP
c) MPA
d) None of the mentioned
View Answer
Explanation: Impala avoids Hive’s overhead from creating MapReduce jobs, giving it faster query times than Hive.
10. Amazon EMR clusters can read and process Amazon _________ streams directly.
a) Kinet
b) kinematics
c) Kinesis
d) None of the mentioned
View Answer
Explanation: The Amazon EMR connector for Amazon Kinesis uses the DynamoDB database as its backing for checkpointing metadata.
Sanfoundry Global Education & Learning Series – Hadoop.
Here’s the list of Best Books in Hadoop.
- Apply for Computer Science Internship
- Check Hadoop Books
- Check Programming Books
- Practice Programming MCQs