The Apache Hadoop Ecosystem
Cloudera's tutorial series introduces Apache Hadoop, its components, and related projects aimed at helping systems administrators, developers, data analysts, and data scientists get the most from their data. These videos present a basic overview of the open source tools within the Hadoop ecosystem, what they do, who uses them, how they interact, and the value they can deliver to businesses and their customers.
These tools provide the core functionality to allow you to store both complex and structured data and perform sophisticated processing and analysis. This video demystifies Hadoop and explains how it works, giving you an understanding of how components fit together and build on one another to provide a scalable and powerful system.
Learn about the projects surrounding Apache Hadoop, which complete the greater ecosystem of available big data processing tools.
You know your data is BIG – you found Apache Hadoop. Now you need to understand what implications to consider when working at such massive scale. This video addresses common challenges and general best practices for scaling with your data.
This document describes the most important user-facing facets of the Apache Hadoop framework. MapReduce consists of client APIs for writing applications and a runtime on which to run the applications. There are two versions of the API: old and new; and two versions of the runtime: MRv1 and MRv2. This tutorial describes the old API and MRv1.
Many Hadoop deployments start small solving a single business problem and then begin to grow as organizations find more value in their data. Moving a Hadoop deployment from the proof of concept phase into a full production system presents real challenges. Learn how some of the largest Hadoop clusters in the world were successfully productionized and the best practices they applied to running Hadoop.
Work at the speed of thought! This e-learning course explores Impala's features, architecture, and benefits over legacy Hadoop platforms. Learn how to run interactive queries inside Impala and understand how it optimizes data systems. This free online course includes a training module, homework, and an Impala demo VM download to experiment with this powerful new tool.
This brief introduction to HBase, Hadoop's database, explains HBase usage scenarios, how HBase compares to an RDBMS and how HBase complements Hadoop.
Hive enables analysis of large data sets using a language very similar to standard ANSI SQL. This means anyone who can write SQL queries can access data stored on the Hadoop cluster. This tutorial introduces the functionality of Hive, as well as its various applications for data analysis and data warehousing.
Pig is a simple-to-understand data flow language used in the analysis of large data sets. Pig scripts are automatically converted into MapReduce jobs by the Pig interpreter, so you can analyze the data in a Hadoop cluster even if you aren't familiar with MapReduce. Find out more about Pig use cases, Pig Latin and the benefits of utilizing Pig.
Algorithms designed for running on MapReduce look a little different than those you've written before. Learn widely-used algorithms, common idioms for designing your own and techniques for implementing these in Java MapReduce and scripting languages via HadoopStreaming.
Learn how to get started writing programs against Hadoop's API.
- Dive deeper into what big data analytics can do for your business by enrolling in a Cloudera Essentials for Apache Hadoop live training near you.
- Learn how Cloudera Manager can increase the performance and decrease the cost of your Apache Hadoop cluster in production.
- Watch Cloudera CEO Mike Olson talk about the big questions society needs at Strata Conference + Hadoop World 2012.
- If you're an administrator, developer, HBase specialist, analyst, or aspiring data scientist, Cloudera offers training and certification to meet your needs.