Audience : This course has been comprehensively targeted for Architects, Administrators and
developers. Even just passed out college students can also attend
Training mode : We offer face to face class room training , Online and Fasttrack training programs.
Prerequisites : No prior java or administration experience is required. Basic knowledge on writing sql
script is sufficient
For Hadoop Administration specific training refer hadoop admin course content on your right side.
Attend Once and Play any role as you wish !
developers. Even just passed out college students can also attend
Training mode : We offer face to face class room training , Online and Fasttrack training programs.
Prerequisites : No prior java or administration experience is required. Basic knowledge on writing sql
script is sufficient
For Hadoop Administration specific training refer hadoop admin course content on your right side.
Attend Once and Play any role as you wish !
Module 1
Big data Getting Started
|
What is Big Data?
What is Apache Hadoop ?
History of Hadoop
Understanding distributed file systems and Hadoop
Hadoop eco system components
Hadoop use cases
Ubuntu Installation
JDK Installation
|
Module 2
Hadoop Distributed File system
|
Eclipse Installation
Overview of HDFS
Communication Protocols
Rack Awareness
Hadoop cluster Topology
Setting up SSH for Hadoop Cluster
Running Hadoop –
1. Pseudo-distributed mode
Linux basic commands
HDFS file commands
Reading and writing to HDFS programmatically
Hands-on Lab Exercises
|
Module 3
MapReduce Framework
|
Java Basics
Anatomy of a MapReduce Program
Writables
InputFormat
OutputFormat
Streaming API
Inherent failure handling
Reading and writing
Hands-on Lab Exercises
|
Module 4
Advanced MapReduce Programming
|
Input splits, Record Reader, Mapper, Partition & Shuffle, Reduce, OutputFormat
Writing MapReduce program
Streaming in Hadoop
Counters
Performance Tuning
Joins
Sorting
Determining Optimal number of reducers, partitions
Hadoop cluster – Performance tuning
Hands-on Lab Exercises
|
Module 5
Apache Hadoop Administration
|
Best Practices for Hadoop setup and infrastructure
Hadoop cluster Installation preparation
Ø Cluster network design
Ø Installation of Linux operating system
Ø Configuring SSH
Ø Walkthrough on Rack topology and set up
Managing Hadoop cluster
Ø HDFS cluster management
Ø Secondary Name node configuration
Ø Task Tracker management
Ø Configuring the HDFS quota
Ø Configuring Fair Scheduler
Ø Upgrading Hadoop
Ø Deploying and managing Hadoop clusters
with Ambari
Monitoring Hadoop cluster
Ø Monitoring Hadoop cluster with Ganglia
Ø Monitoring Hadoop cluster with Ambari
Ø Monitoring Hadoop cluster with Nagia
Hadoop Cluster Performance Tuning
Ø Benchmarking and profiling
Ø Using compression for input and output
Ø Configuring optimal map and reduce
slots for the TT
Ø Fine tuning Job Tracker config
Ø Fine tuning Task Tracker config
Ø Tuning Shuffle, merge and sort parameters
Security Implementation
Kerberos security Implementation
Workflow Scheduler
Capacity Scheduler
Fair Scheduler
dfsadmin & mradmin commands
Administration of Hcatalog and Hive
Backup and Recovery
Scenario based exercises
- Data node failure & Recovery
- Name Node Failure & Recovery
- JT & TT failure & Recovery
- Removing data nodes
- Adding Data nodes
- Commissioning and decommissioning of nodes
|
Module 6
Pig and Pig Latin
|
Installation and configuration
Running Pig Lating through grunt
Writing programs
- Filter , Load & Store functions
Writing user defined functions
Working with Scripts
Lab Exercises
|
Module 7
HBase and ZooKeeper
|
NoSQL Vs SQL
Cap Theorem
Architecture
Installation
Configuration
Java API
MR integration
Performance Tuning
Lab Exercises
|
Module 8
Hive
|
Features of Hive
Architecture
Installation and configuration
HiveQL
Lab Exercises
|
Module 9
Other Hadoop eco system components
|
Overview of Ambari, Oozie ,Mahout
Installing & configuring Sqoop, mysql-server
Installing & configuring flume
Lab Exercises
|
Module 10
Hadoop on Cloud
|
Hosting Hadoop on Amazon EC2
EMR Hands-on
|