Cloudera Training for Apache Hive and Pig
Course Summary
Hive and Pig are higher-level abstractions on top of MapReduce which allow those without Java programming knowledge to manage and manipulate data in a Hadoop cluster.
Cloudera's two-day course on Apache Hive and Pig is designed for people with a basic understanding of how Hadoop works and who want to use these languages for analysis of their data.
Duration
2 days.
You Will Learn
- How Hive augments MapReduce
- How to create and manipulate tables using Hive
- Hive's basic and advanced data types
- Partitioning and bucketing data with Hive
- Advanced features of Hive
- How to load and manipulate data using Pig
- Features of the PigLatin programming language
- Solving real-world problems with Pig
Audience
Additional Notes
Hands-On Exercises
In this training course we alternate between instructional sessions and hands-on labs to ensure participants leave ready to import and analyze their own data with Hive and Pig.
Outline
An Introduction to Hive
- What is Hadoop?
- Motivation for Hive
Getting Data Into Hive
- The Hive Architecture
- Creating Hive Tables
- Loading Data into Hive
- Creating Different Databases
- Hands-on exercise
Manipulating Data with Hive
- Retrieving Data with the SELECT Statement
- Joining Tables
- Storing Query Results in HDFS
- Basic Hive Functions
- Hands-on exercise
Partitioning and Bucketing Data
- Partitioning Data
- Bucketing Data
- Hands-on exercise
Advanced Hive Features
- More Advanced HiveQL Tables
- Hive Variables
- Creating User-Defined Functions
- Debugging and Troubleshooting Hive Queries
Hive Best Practices
- Configuring a Shared Metastore
- Handling Dates
- Dealing with SerDes
Reading and Writing data with Pig
- Loading data
- Pig Schemas
- Writing data
- Hands-On Exercise
PigLatin in-depth
- FILTERing data
- Grouping and Sorting Data
- Pig Expressions and Functions
- Joining Multiple Datasets
- Validating Datasets
- Advanced topics like COGROUP and STREAM
- Hands-On Exercise
Debugging Pig scripts
- Strategies for debugging Pig programs
- Handling bad data
- Using ILLUSTRATE
Best Practices for Pig
- General best practices
- Achieving Optimal Pig Performance in Production
Hive And Pig: Bringing it together
- When to use Hive?
- When to use Pig?
Training Schedule
| Location | May 2012 | Jun 2012 | Jul 2012 | Aug 2012 |
|---|---|---|---|---|
| Seaport Conference Center |
Jun 25 - Jun 26
|
|||
| BroadenGate - Shanghai, China |
Jun 28 - Jun 29
|
|||
| BroadenGate - Shenzhen, China |
Jun 14 - Jun 15
|
|||
| Learning Tree - London |
Jul 4 - Jul 5
|
