Cloudera Training for Apache Hive and Pig

Training » Administrator Training » Apache Hive and Pig » Cloudera Training for Apache Hive and Pig

Course Summary

Hive and Pig are higher-level abstractions on top of MapReduce which allow those without Java programming knowledge to manage and manipulate data in a Hadoop cluster.

Cloudera's two-day course on Apache Hive and Pig is designed for people with a basic understanding of how Hadoop works and who want to use these languages for analysis of their data.

[top] Duration

2 days.

[top] You Will Learn

  • How Hive augments MapReduce
  • How to create and manipulate tables using Hive
  • Hive's basic and advanced data types
  • Partitioning and bucketing data with Hive
  • Advanced features of Hive
  • How to load and manipulate data using Pig
  • Features of the PigLatin programming language
  • Solving real-world problems with Pig

[top] Audience

Hive makes Hadoop accessible to users who already know SQL; Pig is similar to popular scripting languages. Students should have basic familiarity SQL and/or a scripting language.

Additional Notes

Hands-On Exercises

In this training course we alternate between instructional sessions and hands-on labs to ensure participants leave ready to import and analyze their own data with Hive and Pig.

[top] Outline

An Introduction to Hive

  • What is Hadoop?
  • Motivation for Hive

Getting Data Into Hive

  • The Hive Architecture
  • Creating Hive Tables
  • Loading Data into Hive
  • Creating Different Databases
  • Hands-on exercise

Manipulating Data with Hive

  • Retrieving Data with the SELECT Statement
  • Joining Tables
  • Storing Query Results in HDFS
  • Basic Hive Functions
  • Hands-on exercise

Partitioning and Bucketing Data

  • Partitioning Data
  • Bucketing Data
  • Hands-on exercise

Advanced Hive Features

  • More Advanced HiveQL Tables
  • Hive Variables
  • Creating User-Defined Functions
  • Debugging and Troubleshooting Hive Queries

Hive Best Practices

  • Configuring a Shared Metastore
  • Handling Dates
  • Dealing with SerDes

Reading and Writing data with Pig

  • Loading data
  • Pig Schemas
  • Writing data
  • Hands-On Exercise

PigLatin in-depth

  • FILTERing data
  • Grouping and Sorting Data
  • Pig Expressions and Functions
  • Joining Multiple Datasets
  • Validating Datasets
  • Advanced topics like COGROUP and STREAM
  • Hands-On Exercise

Debugging Pig scripts

  • Strategies for debugging Pig programs
  • Handling bad data
  • Using ILLUSTRATE

Best Practices for Pig

  • General best practices
  • Achieving Optimal Pig Performance in Production

Hive And Pig: Bringing it together

  • When to use Hive?
  • When to use Pig?

Training Schedule

Location May 2012 Jun 2012 Jul 2012 Aug 2012
Seaport Conference Center   Jun 25 - Jun 26
   
BroadenGate - Shanghai, China   Jun 28 - Jun 29
   
BroadenGate - Shenzhen, China   Jun 14 - Jun 15
   
Learning Tree - London     Jul 4 - Jul 5