Onsite Training

Request a quote for a private training session.

Request Quote

Check out our FAQ page.

Introduction to Data Science: Building Recommender Systems

Introduction to Data Science - Building Recommender Systems

Summary

This hands-on course is suitable for software engineers, data analysts, and statisticians. It is problem-driven and focuses on helping participants understand what data scientists do, the problems they solve, and their methods. By taking a practical approach to the subject, including multiple hands-on exercises, participants will leave the course with skills they can immediately apply to real-world problems.

Description

Download the full agenda for Cloudera's Introduction to Data Science.

Read the blog post: Training a New Generation of Data Scientists.

Data Science Webinar

Watch the on-demand webinar, Training a New Generation of Data Scientists to learn what data scientists do, how they think about problems, the relationship between data science and Hadoop, and how Cloudera training can help you join this growing and increasingly important profession, followed by an informative Q&A with Cloudera Senior Director of Data Science, Josh Wills. Watch now!

Duration

3 Days

You Will Learn

  • Describe the role and responsibilities of a data scientist
  • Explain several ways in which data scientists create value for organizations across many industries
  • Locate and acquire data from diverse sources
  • Use transformation and normalization techniques to produce accurate, useful data sets
  • Determine the most appropriate type of analysis to perform for a given problem
  • Be able to implement an automated recommendation system
  • Develop, evaluate and refine scoring systems for recommenders
  • Understand the considerations involved in working at scale
  • Identify meaningful, actionable and business-oriented results from the analysis

Prerequisites

This course is suitable for software engineers, data analysts and statisticians with basic knowledge of Apache Hadoop: HDFS, MapReduce, Hadoop Streaming, Apache Hive. Students should have proficiency in a scripting language: Python is strongly preferred, but familiarity with Perl or Ruby is sufficient.

Certification Exam

Following successful completion of the training class, attendees receive a Data Science Essentials practice test. Cloudera training and the practice test together are the best resources to prepare for the certification exam. Data Science Essentials plus the Data Science Challenge constitute the Cloudera Certified Professional: Data Scientist (CCP:DS). Participants are encouraged to prepare for and take both parts of the exam. Learn more about CCP:DS.

Outline

  • Introduction
  • Data Science Overview
  • Use Cases
  • Project Lifecycle
  • Data Acquisition
  • Evaluating Input Data
  • Data Transformation
  • Data Analysis and Statistical Methods
  • Fundamentals of Machine Learning
  • Recommender Overview
  • Introduction to Apache Mahout
  • Implementing Recommenders with Apache Mahout
  • Experimentation and Evaluation
  • Production Deployment and Beyond
  • Conclusion
  • Appendix A : Hadoop Overview
  • Appendix B: Mathematical Formulas
  • Appendix C : Language and Tool Reference