The Apache Hadoop Ecosystem

Online Resources » The Apache Hadoop Ecosystem

Cloudera's tutorial series introduces Apache Hadoop, its components, and related projects aimed at helping systems administrators, developers, data analysts, and data scientists get the most from their data. These videos present a basic overview of the open source tools within the Hadoop ecosystem, what they do, who uses them, how they interact, and the value they can deliver to businesses and their customers.

Introduction to MapReduce and HDFS

These tools provide the core functionality to allow you to store both complex and structured data and perform sophisticated processing and analysis. This video demystifies Hadoop and explains how it works, giving you an understanding of how components fit together and build on one another to provide a scalable and powerful system.

Apache Hadoop Ecosystem

Learn about the projects surrounding Apache Hadoop, which complete the greater ecosystem of available big data processing tools.

Thinking at Scale

You know your data is BIG – you found Apache Hadoop. Now you need to understand what implications to consider when working at such massive scale. This video addresses common challenges and general best practices for scaling with your data.

Hadoop Tutorial

This document describes the most important user-facing facets of the Apache Hadoop framework. MapReduce consists of client APIs for writing applications and a runtime on which to run the applications. There are two versions of the API: old and new; and two versions of the runtime: MRv1 and MRv2. This tutorial describes the old API and MRv1.

Productionizing Hadoop

Many Hadoop deployments start small solving a single business problem and then begin to grow as organizations find more value in their data. Moving a Hadoop deployment from the proof of concept phase into a full production system presents real challenges. Learn how some of the largest Hadoop clusters in the world were successfully productionized and the best practices they applied to running Hadoop.

An Introduction to Impala

Work at the speed of thought! This e-learning course explores Impala's features, architecture, and benefits over legacy Hadoop platforms. Learn how to run interactive queries inside Impala and understand how it optimizes data systems. This free online course includes a training module, homework, and an Impala demo VM download to experiment with this powerful new tool.

Introduction to Apache HBase

This brief introduction to HBase, Hadoop's database, explains HBase usage scenarios, how HBase compares to an RDBMS and how HBase complements Hadoop.

Introduction to Apache Hive

Hive enables analysis of large data sets using a language very similar to standard ANSI SQL. This means anyone who can write SQL queries can access data stored on the Hadoop cluster. This tutorial introduces the functionality of Hive, as well as its various applications for data analysis and data warehousing.

Introduction to Apache Pig

Pig is a simple-to-understand data flow language used in the analysis of large data sets. Pig scripts are automatically converted into MapReduce jobs by the Pig interpreter, so you can analyze the data in a Hadoop cluster even if you aren't familiar with MapReduce. Find out more about Pig use cases, Pig Latin and the benefits of utilizing Pig.

MapReduce Algorithms

Algorithms designed for running on MapReduce look a little different than those you've written before. Learn widely-used algorithms, common idioms for designing your own and techniques for implementing these in Java MapReduce and scripting languages via HadoopStreaming.

Programming with Hadoop

Learn how to get started writing programs against Hadoop's API.


Next Steps