Course Prerequisites

Cloudera University Courses

Administrator: This course is best suited to systems administrators and IT managers who have basic Linux experience. Prior knowledge of Apache Hadoop is not required.

Big Data Architecture Workshop: Cloudera highly recommends participants attend the Cloudera Developer for Spark and Hadoop training course prior to attending this workshop. Participants should have working knowledge of technologies such as HDFS, Spark, Map-Reduce, Hive/Impala, Data Formats and relational database management systems. Detailed API level knowledge is not needed, as there will not be any programming activities.

Data Science Workbench Training: This course is designed for learners at organizations using CDSW under a Cloudera Enterprise license or a trial license. The learner must have access to a CDSW environment on a Cloudera cluster running Apache Spark 2. Some experience with data science using Python or R is helpful but not required. No prior knowledge of Spark or other Hadoop ecosystem tools is required.

Data Analyst: This course is designed for data analysts, business intelligence specialists, developers, system architects, and database administrators. Knowledge of SQL is assumed, as is basic Linux command-line familiarity. Knowledge of at least one scripting language (e.g., Bash scripting, Perl, Python, Ruby) would be helpful but is not essential. Prior knowledge of Apache Hadoop is not required.

HBase: This course is best suited to developers and administrators who have experience with databases and data modeling, although it is not required. Prior knowledge of Apache Hadoop is not required.

Data Scientist Training: Workshop participants should have a basic understanding of Python or R and some experience exploring and analyzing data and developing statistical or machine learning models. Knowledge of Hadoop or Spark is not required.

Developer for Spark & Hadoop: This course is designed for developers and engineers who have programming experience. Apache Spark examples and hands-on exercises are presented in Scala and Python, so the ability to program in one of those languages is required. Basic knowledge of SQL is helpful; prior knowledge of Hadoop is not required.

Search: This course is intended for developers and data engineers with at least basic familiarity with Hadoop and experience programming in a general-purpose language such as Java, C, C++, Perl, or Python. Participants should be comfortable with the Linux command line and should be able to perform basic tasks such as creating and removing directories, viewing and changing file permissions, executing scripts, and examining file output. No prior experience with Apache Solr or Cloudera Search is required, nor is any experience with HBase or SQL.

Just Enough Python: This course is intended for developers who do not yet have the prerequisite skills writing code in Scala, basic programming experience in at least one commonly-used programming language (ideally Java, but Ruby, Perl, Scala, C, C++, PHP, or Javascript will suffice) is assumed.

Just Enough Scala: This course is best suited to students with Java programming experience. Those with experience in another language may prefer the Just Enough Python course. Basic knowledge of Linux is assumed.

Introduction to Machine Learning: This course is intended for software engineers who have basic Linux experience in addition to experience with either the Scala or Python programming languages (code examples and exercises are presented in both languages, so students can choose whichever language they prefer). Prior knowledge of Apache Spark is required, so it is expected that students have taken the relevant foundational material from our Developer Training for Spark and Hadoop.

Data Science at Scale: This course is best suited to developers, data analysts, and statisticians with basic knowledge of Apache Hadoop: HDFS, MapReduce, Hadoop Streaming, and Apache Hive. Students should have proficiency in a scripting language; Python is strongly preferred, but familiarity with Perl or Ruby is sufficient.

NOTE: Data Science Workbench Training and Data Science at Scale offered as OnDemand only.

Services Enablement Training (Cloudera Connect Partners)

Services Enablement Boot Camp (PS): Prior to attending, participants must be able to install/configure a MySQL server, create/manage AWS EC2 instances, use git and GitHub, Tune Linux settings, mount disk volumes, recover failed services, adjust network parameters, and administer YUM repos and Linux packages. Participants must have attended the Cloudera Administrator for Apache Hadoop course AND passed the CCA Cloudera Administrator Certification Exam before attending. Please refer to the Cloudera Connect Partner Portal for more information.

Application Architecture Boot Camp (PS): The course focuses on application development. Students must have attended Cloudera’s Developer for Spark and Hadoop course AND pass the CCA Developer certification before attending.

NOTE: The CCA Administrator certification is NOT suitable for this course.

To view course setup requirements please click here.

If you have any questions, please contact