Friday 21 March 2014

Big Data/Hadoop Architect/Developer/Admin Course Outline

Audience : This course has been comprehensively targeted for Architects, Administrators and 
                 developers. Even just passed out college students can also attend

Training mode : We offer face to face class room training , Online and Fasttrack training programs.

Prerequisites :  No prior java or administration experience is required. Basic knowledge on writing sql
                         script is sufficient


For Hadoop Administration specific training refer hadoop admin course content on your right side.

Attend Once and Play any role as  you wish ! 


Module 1
Big data Getting Started
What is Big Data?
What  is Apache Hadoop ?
History of Hadoop
Understanding distributed file systems and Hadoop
Hadoop eco system components
Hadoop use cases
Ubuntu Installation
JDK Installation
Module 2
Hadoop Distributed File system

Eclipse Installation
Overview of HDFS
Communication Protocols
Rack Awareness
Hadoop cluster Topology
Setting up SSH for Hadoop Cluster
Running Hadoop –
1.       Pseudo-distributed mode
Linux basic commands
HDFS file commands
Reading and writing to HDFS programmatically
Hands-on Lab Exercises
Module 3
MapReduce Framework

Java Basics
Anatomy of a MapReduce Program
Writables
InputFormat
OutputFormat
Streaming API
Inherent failure handling
Reading and writing
Hands-on Lab Exercises
Module 4
Advanced MapReduce  Programming
Input splits, Record Reader, Mapper, Partition & Shuffle, Reduce, OutputFormat
Writing MapReduce program
Streaming in Hadoop
Counters
Performance Tuning
Joins
Sorting
Determining Optimal number of reducers, partitions
Hadoop cluster – Performance tuning
Hands-on Lab Exercises
Module  5
Apache Hadoop Administration 
Best Practices for Hadoop setup and infrastructure

Hadoop cluster Installation preparation
   Ø  Cluster network design
   Ø  Installation of Linux operating system
   Ø  Configuring SSH
   Ø  Walkthrough on Rack topology and set up

Managing Hadoop cluster
   Ø  HDFS cluster management
   Ø  Secondary Name node configuration
   Ø  Task Tracker management
   Ø  Configuring the HDFS quota
   Ø  Configuring Fair Scheduler      
   Ø  Upgrading Hadoop     
   Ø  Deploying and managing Hadoop clusters
          with Ambari

Monitoring Hadoop cluster
   Ø  Monitoring Hadoop cluster with Ganglia
   Ø  Monitoring Hadoop cluster with Ambari
   Ø  Monitoring Hadoop cluster with Nagia

Hadoop Cluster Performance Tuning
   Ø  Benchmarking and profiling
   Ø  Using compression for input and output
   Ø  Configuring optimal map and reduce
          slots  for the TT
   Ø  Fine tuning Job Tracker config
   Ø  Fine tuning Task Tracker config
   Ø  Tuning Shuffle, merge and sort parameters
    Security Implementation
              Kerberos security Implementation  
Workflow Scheduler
              Capacity Scheduler
               Fair Scheduler  

dfsadmin & mradmin commands

Administration of Hcatalog and Hive

Backup and Recovery
Scenario based exercises
-          Data node failure & Recovery
-          Name Node Failure & Recovery
-          JT & TT failure  & Recovery
-          Removing data nodes
-          Adding Data nodes
-          Commissioning and decommissioning of nodes


Module  6
Pig and Pig Latin
Installation and configuration
Running Pig Lating through grunt
Writing programs
-          Filter , Load & Store functions
Writing user defined functions

Working with Scripts
Lab Exercises
Module  7
HBase and ZooKeeper
NoSQL Vs SQL
Cap  Theorem
Architecture
Installation
Configuration
Java API
MR integration
Performance Tuning
Lab Exercises
Module  8
Hive
Features of Hive
Architecture
Installation and configuration
HiveQL

Lab Exercises
Module  9
Other Hadoop eco system components
Overview of Ambari, Oozie ,Mahout
Installing & configuring Sqoop, mysql-server
Installing & configuring flume

Lab Exercises
Module 10
Hadoop on Cloud
Hosting Hadoop on Amazon EC2
EMR Hands-on

Monday 17 March 2014

Hadoop Administrator Course Outline


Bigdata/ Hadoop Administrator Course Content for week end training program, we do offer week end / Online / Fast track training programs around Hadoop Administration, Interested pl contact @ 9840014739

Prerequisites - General Administration experience in any Rdbms, Unix or network experience is preferable.

Duration :  5 week-ends

Module 1
Big data Getting Started
What is Big Data?
What  is Apache Hadoop ?
History of Hadoop
Understanding distributed file systems and Hadoop
Hadoop eco system components
Hadoop use cases
Ubuntu Installation
JDK Installation
Module 2
Hadoop Distributed File system
              
Eclipse Installation
Overview of HDFS
Communication Protocols
Hadoop cluster Topology Overview
Setting up SSH for Hadoop Cluster
Running Hadoop –
     Pseudo-distributed mode
Linux basic admin commands
HDFS file commands
Reading and writing to HDFS programmatically
Hands-on Lab Exercises
Module 3
MapReduce Framework

Java Basics
Anatomy of a MapReduce Program
Writables
InputFormat
OutputFormat
Streaming API
Inherent failure handling
Reading and writing
Hands-on Lab Exercises
Module 4
Advanced MapReduce  Programming
Input splits, Record Reader, Mapper, Partition & Shuffle, Reduce, OutputFormat
Writing MapReduce program
Streaming in Hadoop
Counters
Performance Tuning
Joins
Sorting
Determining Optimal number of reducers, partitions
Hadoop cluster – Performance tuning
Hands-on Lab Exercises
Module 5 - Apache Hadoop Administration

Level 1  
Operating System Preparation
      Deployment Setup
      Software
      Hostname, DNS, and Identification
      Users, Groups and Privileges

 Kernel Tuning
     vm.overcommit_memory
     Vm.swappiness

Best Practices for Hadoop setup and infrastructure

Hadoop multi-node cluster Installation preparation & Configuration
   Ø  Cluster network design
   Ø  Installation of Linux operating system
   Ø  Configuring SSH
   Ø  Understanding configuration
        files
   Ø  Understanging Rack topology
        and implementation

Managing Hadoop cluster
   Ø  HDFS cluster management
   Ø  Secondary Name node
        configuration
   Ø  Task Tracker management
   Ø  Configuring the HDFS quota
   Ø  Configuring Fair Scheduler      
   Ø  Upgrading Hadoop     
   Ø  Deploying and managing Hadoop clusters
          with Ambari

Monitoring Hadoop cluster
   Ø  Monitoring Hadoop cluster with
         Ganglia
   Ø  Monitoring Hadoop cluster with
        Ambari
   Ø  Monitoring Hadoop cluster with
        Nagios

Hadoop Cluster Performance Tuning
   Ø  Benchmarking and profiling
   Ø  Using compression for input and
        output
   Ø  Configuring optimal map and
        reduce slots  for the TT
   Ø  Fine tuning Job Tracker config
   Ø  Fine tuning Task Tracker config
   Ø  Tuning Shuffle, merge and sort
        parameters
Security Implementation
    Kerberos security Implementation
   Workflow Scheduler
    FIFO Scheduler Configuration
    Capacity Scheduler Configuration
    Fair Scheduler  Configuration

understanding dfsadmin & mradmin commands

Administration of Hcatalog and Hive

Backup and Recovery
-           
Level  2  Cluster maintenance
Starting and stopping Processes with Init Scripts
Starting and Stopping processes manually

  HDFS maintenance Tasks
-           Data node failure & Recovery
-          Name Node Failure & Recovery
-          JT & TT failure  & Recovery
-          Removing data nodes
-          Adding Data nodes
-           Commissioning and decommissioning of nodes
  Map Reduce  maintenance Tasks
-          Shared upon registration
Level 3  Monitoring
Hadoop Metrics

Health-check
        Hadoop Processes
     Rest of them shared upon request
Level 4 Backup and Recovery
Data Backup
 Name Node backup


Module  6
Pig and Pig Latin
Installation and configuration
Running Pig Lating through grunt
Working with Scripts
Lab Exercises
Module  7
HBase and ZooKeeper
NoSQL Vs SQL
Cap  Theorem
Architecture
Installation
Configuration
Java API
Performance Tuning
Lab Exercises
Module  8
Hive
Features of Hive
Architecture
Installation and configuration
HiveQL
HCatalog & Hive Administration 
Lab Exercises
Module  9
Other Hadoop eco system components
Overview of Ambari, Oozie ,Mahout
Installing & configuring Sqoop, mysql-server
Installing & configuring flume

Lab Exercises
Module 10
Hadoop on Cloud
Hadoop Certification
Hosting Hadoop on Amazon EC2
EMR Hands-on
Certification exam oriented tips specific to Hadoop distributions