Cloudera Administrator Training for Apache Hadoop
Training
»
Administrator Training
»
Apache Hadoop »
Cloudera Administrator Training for Apache Hadoop
Course Summary
This three-day hands-on training course is for system administrators and others responsible for managing Apache Hadoop clusters in production or development environments.Duration
3 days.
You Will Learn
- How the Hadoop Distributed File System and MapReduce work
- What hardware configurations are optimal for Hadoop clusters
- What network considerations to take into account when building out your cluster
- How to configure Hadoop's options for best cluster performance
- How to configure the FairScheduler to provide service-level agreements for multiple users of a cluster
- How to maintain and monitor your cluster
- How to load data into the cluster from dynamically-generated files using Flume, and from relational database management systems using Sqoop
- What system administration issues exist with other Hadoop projects such as Hive, Pig, and HBase
Prerequisites
This course is designed for people with at least a basic level of Linux system administration experience. Prior knowledge of Hadoop is not required.
Additional Notes
Download the full agenda for Cloudera's Administrator Training for Apache Hadoop.
Hands-On Exercises
Throughout the course, hands-on labs help students build their knowledge and apply the concepts being discussed.
Certification Exam
Following the training, attendees will be given a voucher good for one certification exam attempt to become a Cloudera Certified Administrator for Apache Hadoop (CCAH). Learn more about the CCAH Certification Exam here: http://university.cloudera.com/certification.html/
Outline
Introduction
An Introduction To Hadoop And HDFS
- Why Hadoop?
- HDFS
- MapReduce
- Hive, Pig, HBase, and Other Ecosystem Projects
- Hands-On Exercise:
Planning Your Hadoop Cluster
- General Planning Considerations
- Choosing The Right Hardware
- Choosing The Right Hardware
- Network Considerations
- Configuring Nodes
Configuring and Deploying Your Cluster
- Deployment Types
- Installing Hadoop
- Using Cloudera Manager for Easy Installation
- Typical Configuration Parameters
- Configuring Rack Awareness
- Using Configuration Management Tools
- Hands-On Exercise
Managing and Scheduling Jobs
- Managing Running Jobs
- Hands-On Exercise
- The FIFO Scheduler
- The FairScheduler
- Configuring the FairScheduler
- Hands-On Exercise
Cluster Maintenance
- Checking HDFS Status
- Hands-On Exercise:
- Copying Data Between Clusters
- Adding and Removing Cluster Nodes
- Rebalancing the Cluster
- Hands-On Exercise:
- NameNode Metadata Backup
- Cluster Upgrading
Cluster Monitoring and Troubleshooting
- General System Monitoring
- Managing Hadoop’s Log Files
- Using the NameNode and JobTracker Web UI
- Hands-On Exercise
- Cluster Monitoring with Ganglia
- Common Troubleshooting Issues
- Benchmarking Your Cluster
Populating HDFS From External Sources
- An Overview of Flume
- Hands-On Exercise
- An Overview of Sqoop
- Best Practices for Importing Data
Installing And Managing Other Hadoop Projects
- Hive
- Pig
- HBase
Conclusion
