Developer for Spark and Hadoop - Blended Learning
This training course delivers the key concepts and expertise developers need to use Apache Spark to develop high-performance parallel applications. Participants will learn how to use Spark SQL to query structured data and Spark Streaming to perform real-time processing on streaming data from a variety of sources. Developers will also practice writing applications that use core Spark to perform ETL processing and iterative algorithms. The course covers how to work with “big data” stored in a distributed file system, and execute Spark applications on a Hadoop cluster. After taking this course, participants will be prepared to face real-world challenges and build applications to execute faster decisions, better decisions, and interactive analysis, applied to a wide variety of use cases, architectures, and industries.
Participants enrolled in blended learning will be provided with:
- OnDemand access of Developer Training for Spark and Hadoop.
- Twenty (20) hours of cloud-based lab access.
- Five (5) three hour live-virtual sessions with a senior Cloudera Instructor.
Access to self-paced learning, labs, and materials will be available up to one week prior to the first live session through the Friday following the final live session. Live-virtual sessions will be focused on demonstrating labs and covering select topics from the weekly lessons. The live sessions allow time for students to ask questions, but assume participants have already completed the Ondemand lessons for that particular week.
This course is designed for developers and engineers who have programming experience, but prior knowledge of Spark and Hadoop is not required. Apache Spark examples and hands-on exercises are presented in Scala and Python. The ability to program in one of those languages is required. Basic familiarity with the Linux command line is assumed. Basic knowledge of SQL is helpful.