Cloudera Developer Training for Apache Hadoop

Training » Administrator Training » Apache Hadoop » Cloudera Developer Training for Apache Hadoop

Course Summary

This four-day training course is for developers who want to learn to use Apache Hadoop to build powerful data processing applications.

[top] Duration

4 days.

[top] You Will Learn

  • How MapReduce and the Hadoop Distributed File System work
  • How to write MapReduce code in Java or other programming languages
  • What issues to consider when developing MapReduce jobs
  • How to implement common algorithms in Hadoop
  • Best practices for Hadoop development and debugging
  • How to leverage other project such as Apache Hive, Apache Pig, Sqoop and Oozie
  • Advanced Hadoop API topics required for real-world data analysis

[top] Prerequisites

This course is designed for developers with some programming experience (preferably Java). Existing knowledge of Hadoop is not required.

Additional Notes

Download the full agenda for Cloudera's Developer Training for Apache Hadoop.

Hands-On Exercises

Throughout the course, students write Hadoop code and perform other Hands-On Exercises to solidify their understanding of the concepts being presented.

Certification Exam

Following the training, attendees will be given a voucher good for one certification exam attempt to become a Cloudera Certified Developer for Apache Hadoop (CCDH). Learn more about the CCDH Certification Exam here: http://university.cloudera.com/certification.html 

[top] Outline

Introduction

The Motivation For Hadoop

  • Problems with traditional large-scale systems
  • Requirements for a new approach

Hadoop: Basic Concepts

  • An Overview of Hadoop
  • The Hadoop Distributed File System
  • Hands-On Exercise
  • How MapReduce Works
  • Hands-On Exercise
  • Anatomy of a Hadoop Cluster
  • Other Hadoop Ecosystem Components

Writing a MapReduce Program

  • The MapReduce Flow
  • Examining a Sample MapReduce Program
  • Basic MapReduce API Concepts
  • The Driver Code
  • The Mapper
  • The Reducer
  • Hadoop’s Streaming API
  • Using Eclipse for Rapid Development
  • Hands-on exercise
  • The New MapReduce API

Integrating Hadoop Into The Workflow

  • Relational Database Management Systems
  • Storage Systems
  • Importing Data from RDBMSs With Sqoop
  • Hands-on exercise
  • Importing Real-Time Data with Flume
  • Accessing HDFS Using FuseDFS and Hoop

Delving Deeper Into The Hadoop API

  • More about ToolRunner
  • Testing with MRUnit
  • Reducing Intermediate Data With Combiners
  • The configure and close methods for Map/Reduce Setup and Teardown
  • Writing Partitioners for Better Load Balancing
  • Hands-On Exercise
  • Directly Accessing HDFS
  • Using the Distributed Cache
  • Hands-On Exercise

Common MapReduce Algorithms

  • Sorting and Searching
  • Indexing
  • Machine Learning With Mahout
  • Term Frequency – Inverse Document Frequency
  • Word Co-Occurrence
  • Hands-On Exercise

Using Hive and Pig

  • Hive Basics
  • Pig Basics
  • Hands-on exercise

Practical Development Tips and Techniques

  • Debugging MapReduce Code
  • Using LocalJobRunner Mode For Easier Debugging
  • Retrieving Job Information with Counters
  • Logging
  • Splittable File Formats
  • Determining the Optimal Number of Reducers
  • Map-Only MapReduce Jobs
  • Hands-On Exercise

More Advanced MapReduce Programming

  • Custom Writables and WritableComparables
  • Saving Binary Data using SequenceFiles and Avro Files
  • Creating InputFormats and OutputFormats
  • Hands-On Exercise

Joining Data Sets in MapReduce

  • Map-Side Joins
  • The Secondary Sort
  • Reduce-Side Joins

Graph Manipulation in Hadoop

  • Introduction to graph techniques
  • Representing graphs in Hadoop
  • Implementing a sample algorithm: Single Source Shortest Path

Creating Workflows With Oozie

  • The Motivation for Oozie
  • Oozie’s Workflow Definition Format
  • Hands-On Exercise

Training Schedule

Location May 2012 Jun 2012 Jul 2012 Aug 2012
ExitCertified - Edmonton
  Jun 11 - Jun 14
   
ExitCertified - Phoenix       Aug 13 - Aug 16
ExitCertified - Vancouver   Jun 11 - Jun 14
   
New Horizons - Boston   Jun 26 - Jun 29
   
OSSCube - Dubai     Jul 16 - Jul 19
 
OSSCube - Kuala Lumpur       Aug 14 - Aug 17
OSSCube - Moscow   Jun 18 - Jun 21
   
Obsidian - South Africa   Jun 11 - Jun 14
   
MicroTek - Los Angeles     Jul 9 - Jul 12
 
Seaport Conference Center   Jun 18 - Jun 21
Jul 17 - Jul 20
Aug 6 - Aug 9
ExitCertified - Sacramento       Aug 13 - Aug 16
MicroTek - Washington DC     Jul 24 - Jul 27
 
MicroTek - Chicago     Jul 9 - Jul 12
 
Bridge Education - Columbia   Jun 4 - Jun 7
Jun 19 - Jun 22
Jul 23 - Jul 26
Aug 21 - Aug 24
MicroTek - NYC     Jul 30 - Aug 2
 
Nextec - Herndon       Aug 13 - Aug 16
Training Choice - Sydney     Jul 24 - Jul 27
 
HTI - Brazil     Jul 10 - Jul 13
 
ExitCertified - Ottawa     Jul 9 - Jul 12
 
ExitCertified - Toronto     Jul 9 - Jul 12
 
ExitCertified - Montreal     Jul 9 - Jul 12
 
BroadenGate - Shanghai, China   Jun 18 - Jun 21
   
BroadenGate - Shenzhen, China   Jun 4 - Jun 7
   
Xebia - Paris     Jul 9 - Jul 12
 
OSSCube Solutions - Bangalore   Jun 26 - Jun 29
   
Xebia - Netherlands   Jun 12 - Jun 15
   
Learning Tree - London   Jun 25 - Jun 28