Hadoop Questions and Answers – User-defined Functions in Pig

This set of Interview Questions and Answers focuses on “User-defined Functions in Pig”

1. __________ abstract class has three main methods for loading data and for most use cases it would suffice to extend it.
a) Load
b) LoadFunc
c) FuncLoad
d) None of the mentioned
View Answer

Answer: b
Explanation: LoadFunc and StoreFunc implementations should use the Hadoop 20 API based classes.

2. Point out the correct statement.
a) LoadMeta has methods to convert byte arrays to specific types
b) The Pig load/store API is aligned with Hadoop InputFormat class only
c) LoadPush has methods to push operations from Pig runtime into loader implementations
d) All of the mentioned
View Answer

Answer: c
Explanation: Currently only the pushProjection() method is called by Pig to communicate to the loader the exact fields that are required in the Pig script.

3. Which of the following has methods to deal with metadata?
a) LoadPushDown
b) LoadMetadata
c) LoadCaster
d) All of the mentioned
View Answer

Answer: b
Explanation: Most implementation of loaders don’t need to implement this unless they interact with some metadata system.

4. ____________ method will be called by Pig both in the front end and back end to pass a unique signature to the Loader.
a) relativeToAbsolutePath()
b) setUdfContextSignature()
c) getCacheFiles()
d) getShipFiles()
View Answer

Answer: b
Explanation: The signature can be used to store into the UDFContext any information which the Loader needs to store between various method invocations in the front end and back end.

advertisement
advertisement

5. Point out the wrong statement.
a) The load/store UDFs control how data goes into Pig and comes out of Pig.
b) LoadCaster has methods to convert byte arrays to specific types.
c) The meaning of getNext() has changed and is called by Pig runtime to get the last tuple in the data
d) None of the mentioned
View Answer

Answer: c
Explanation: The meaning of getNext() has not changed and is called by Pig runtime to get the next tuple in the data.

6. ___________ return a list of hdfs files to ship to distributed cache.
a) relativeToAbsolutePath()
b) setUdfContextSignature()
c) getCacheFiles()
d) getShipFiles()
View Answer

Answer: d
Explanation: The default implementation provided in LoadFunc handles this for FileSystem locations.

7. The loader should use ______ method to communicate the load information to the underlying InputFormat.
a) relativeToAbsolutePath()
b) setUdfContextSignature()
c) getCacheFiles()
d) setLocation()
View Answer

Answer: d
Explanation: setLocation() method is called by Pig to communicate the load location to the loader.

8. ____________ method enables the RecordReader associated with the InputFormat provided by the LoadFunc is passed to the LoadFunc.
a) getNext()
b) relativeToAbsolutePath()
c) prepareToRead()
d) all of the mentioned
View Answer

Answer: c
Explanation: The RecordReader can then be used by the implementation in getNext() to return a tuple representing a record of data back to pig.

advertisement

9. __________ method tells LoadFunc which fields are required in the Pig script.
a) pushProjection()
b) relativeToAbsolutePath()
c) prepareToRead()
d) none of the mentioned
View Answer

Answer: a
Explanation: Pig will use the column index requiredField.index to communicate with the LoadFunc about the fields required by the Pig script.

10. A loader implementation should implement __________ if casts (implicit or explicit) from DataByteArray fields to other types need to be supported.
a) LoadPushDown
b) LoadMetadata
c) LoadCaster
d) All of the mentioned
View Answer

Answer: c
Explanation: LoadCaster has methods to convert byte arrays to specific types.

advertisement

Sanfoundry Global Education & Learning Series – Hadoop.

Here’s the list of Best Books in Hadoop.

To practice all interview questions and answers on Hadoop Pig, here is complete set of 1000+ Multiple Choice Questions and Answers.

If you find a mistake in question / option / answer, kindly take a screenshot and email to [email protected]

advertisement
advertisement
Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
Manish Bhojasia - Founder & CTO at Sanfoundry
Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

Subscribe to his free Masterclasses at Youtube & discussions at Telegram SanfoundryClasses.