Hadoop Developer and Analyst Training


Course Name
Hadoop Training – Developer / Analyst Course

Course Overview
This training course on Hadoop Developer/Analyst is designed to provide knowledge and skills to become a successful Hadoop Developer. In-depth knowledge of concepts such as HDFS, Map-Reduce, Hive, Pig, HBase etc. will be covered in the course.

Course Benefits
After the completion of this course, participant would be able to:

  • Get a clear understanding of Apache Hadoop, HDFS, Single Node Hadoop Cluster Setup
  • Master the concepts of Hadoop Distributed File System and MapReduce framework
  • Learn to write basic and complex MapReduce programs
  • Get clear understanding of Pig Scripting and Hive QL and Perform Data Analytics using Pig and Hive
  • Get clear understanding of HBase (a NoSql Database) and get the understanding of HBase API
  • Get understanding of scheduling framework using Oozie to schedule hadoop jobs
  • Get understanding of Sqoop utility in hadoop and some Real Time processing framework (Flume and Storm)

Course Duration
4 days (32 Hours)

Target Audience
Software Developer, Software Architects, Data Warehouse Professionals, IT Managers interested in learning Hadoop

Course Pre-requisite

  • Linux Basics like using vi and basic commands
  • Basic understanding of SQL
  • Basic Knowledge on databases (mysql/oracle)
  • Basic Core Java programming
  • Lab Requirement

  • Windows/MAC/Linux OS with at least 4 GB RAM
  • VM Player 5.0.2
  • Hands on is conducted on Cloudera VM
  • The trainer will share a VM instance which needs to be copied on all machines and
  • Cloudera VM has all the required setup for Hadoop
  • Fee, Schedule & Registration
    Click Here for Hadoop Developer course training schedule, fee and registration information

    Hadoop Developer/Analyst Course Outline

    Session – 1 : Bigdata, Hadoop and HDFS
    Understanding Big Data
    What is Big Data
    Bigdata Technologies
    Why not RDBMS / EDW? Problems with traditional RDBMS
    How Hadoop solves the problem?
    What is Apache Hadoop
    Hadoop & its Ecosystem
    Components of Hadoop (Architecture)
    HDFS – The Hadoop DFS(Distributed File System)
    What is Distributed File System
    HDFS Overview
    Basic Cluster Components
    HDFS Master/Slave Architecture
    Name Node
    Data Node
    HDFS Read mechanism
    Complete dataflow to read data from HDFS
    HDFS Write mechanism
    Complete dataflow to write files on HDFS
    HDFS Operations and Parameters
    HDFS commands:
    mkdir, put, get, ls, cat, chmod, copyFromLocal, cp, mv, rm, touchz
    HDFS Parameters

    Various Failure Scenarios and How HDFS handle internally.
    Data Node Failure
    Name node Failure
    Communication Failure
    Data Corruption

    Session – 2 : Class Details
    Session – 2 (HDFS Cluster Setup)
    Single Node Cluster Setup on Local Machine
    Understanding step by step to setup the cluster on one machine
    HDFS environment and config files
    HDFS Browser
    Name Node browser
    Job Tracker Browser

    Hands On : HDFS Commands
    Manipulation Files in HDFS
    Hands on with some commands: Export & Import files into HDFS

    Assignment 1: On HDFS shell commands/HDFS API

    Session – 3 : Map Reduce
    Map Reduce Basics
    Why Map Reduce ? With some examples
    What is MapReduce framework?
    Job Tracker & Task Tracker concepts
    Some Real Life examples/scenarios to understand Map-Reduce
    Map Reduce Components
    Map Reduce Programming
    Java API
    Data Types
    Input & Output Formats
    Understand the input/output of mapper and reduces
    Thinking in terms of Map Reduce

    Practical Hands On: Writing Map Reduce Programs on scenarios Mentioned above

    Session – 4 : Map Reduce Advance Topics

    Map Reduce API Advance Topics with Examples (Full Hands On)
    Secondary Sort
    Speculative Execution
    Zero & One Reducer
    Distributed Cache
    Hadoop Streaming
    Sorting in Hadoop



    Assignment: On Map reduce Scenarios

    Session – 5 : Pig Latin
    Introduction to Pig Latin?
    What is Pig
    Pig Use Case
    Why Pig Required over Map reduce
    Pig Architecture
    Install & Configure Pig
    Running Pig – Grunt Shell
    Run Pig in Local as well as Cluster Mode
    Basic Data Analysis with Pig
    Pig Latin Syntax
    Basic operators in Pig
    Loading data in Pig
    Basic and Complex Data Types in Pig
    Storage Format
    HBaseLoader and HBaseStorage
    Joins in Pig
    Inner Joins and Outer Joins
    Replicated Joins
    Skewed joins
    Pig Built In Functions
    Pig UDF(User Defined Functions)
    Pig Macros
    Define macros
    Import macros
    Testing Scenarios for Pig
    Using Pig for basic ETL processing
    Hands on: writing some pig scripts for some scenarios

    Assignment 3: Write some Pig Scripts for some scenarios/topics mentioned

    Session – 6 : Hive
    What is Hive?
    Hive Architecture
    Hive Schema and Data Storage
    Compare Hive to Traditional Database
    Hive Use Cases
    Hive Data Models
    Hive Metastore
    Partitioning and Bucketing
    Install and configure Hive
    Understand Hive QL(query Language)
    Explore Hive Database and Hive Tables
    Basic Hive Syntax
    Hive Operation
    DDL Operations
    DML Operations
    Row level Operation
    Join in Hive
    Skewed Join
    Hive Built-In Functions
    Hive UDF
    Hive Client
    Hive CLI
    JDBC Client
    Thrift java Client
    Storage Formats
    Testing Scenarios in Hive
    Using Hive for basic ETL
    Hands On: Exercise on Hive for some scenarios/topics mentioned above.
    Assignment 4: Case studies and write some Hive Jobs

    Session – 7 : HBase


    What is NoSQL & Columnar Databases
    Introduction to HBase
    Hbase vs Other Storage Technology
    Hbase vs HDFS
    Hbase vs RDBMS
    Hbase Architecture
    Hbase Shell
    Start and Stop Hbase
    Hbase Datamodel
    Row Key
    Column Family
    Hbase shell API
    DDL in Hbase
    Scan the Hbase Table
    DML in Hbase
    Hbase Java API
    Filter on Row key and Value
    Hands On: Exercise on HBase with example.
    Assignment 6: on HBase API

    Session – 8 : (Sqoop,Oozie and Real Time Processing Frameworks)
    What is Sqoop
    Why Sqoop?
    Sqoop commands
    Loading into HDFS – Import tables using Sqoop
    Hands On: Exercise on Sqoop with example.

    Oozie Workflow Scheduler
    What is Oozie Workflow Scheduler
    Oozie in Hadoop Ecosystem
    Overview of Oozie Workflow and its components
    Job Submission in Hadoop and How it works
    Hands On: Exercise on Oozie Workflow with example.
    Realtime Processing usinf Flume and Storm
    Why Real time data processing?
    What is Apache Flume
    Flume Use cases
    What is Storm and Architecture

    Hands-On Lab Details
    Course is a Hands On course, participants are expected to run examples and code snippets in each session. The participants will be able to work with the problems in every session and guided then and there.

    Trainer Details
    This course is delivered by Sanfoundry accredited Trainer. The trainer is an IT professional working in top MNC/IT-Company in Bangalore, India.

    Subscribe to our Newsletters (Subject-wise). Participate in the Sanfoundry Certification contest to get free Certificate of Merit. Join our social networks below and stay updated with latest contests, videos, internships and jobs!

    Youtube | Telegram | LinkedIn | Instagram | Facebook | Twitter | Pinterest
    Manish Bhojasia - Founder & CTO at Sanfoundry
    Manish Bhojasia, a technology veteran with 20+ years @ Cisco & Wipro, is Founder and CTO at Sanfoundry. He lives in Bangalore, and focuses on development of Linux Kernel, SAN Technologies, Advanced C, Data Structures & Alogrithms. Stay connected with him at LinkedIn.

    Subscribe to his free Masterclasses at Youtube & technical discussions at Telegram SanfoundryClasses.