Introduction to Data Science: Building Recommender Systems

Training » Administrator Training » Data Science » Introduction to Data Science: Building Recommender Systems

Course Summary

This hands-on course is suitable for software engineers, data analysts and statisticians. It is problem-driven and focuses on helping participants understand what a data scientist does, the problems they typically solve and their approach to doing so. By taking a practical approach to the subject, including multiple hands-on exercises, participants will leave the course with skills they can immediately apply to real-world problems.

Download the full agenda for Cloudera's Introduction to Data Science.

Read the blog post: Training a New Generation of Data Scientists.

Data Science Webinar

Watch the on-demand webinar, Training a New Generation of Data Scientists to learn what data scientists do, how they think about problems, the relationship between data science and Hadoop, and how Cloudera training can help you join this growing and increasingly important profession, followed by an informative Q&A with Cloudera Senior Director of Data Science, Josh Wills. Watch now!

[top] Duration

3 days.

[top] You Will Learn

  • Describe the role and responsibilities of a data scientist
  • Explain several ways in which data scientists create value for organizations across many industries
  • Locate and acquire data from diverse sources
  • Use transformation and normalization techniques to produce accurate, useful data sets
  • Determine the most appropriate type of analysis to perform for a given problem
  • Be able to implement an automated recommendation system
  • Develop, evaluate and refine scoring systems for recommenders
  • Understand the considerations involved in working at scale
  • Identify meaningful, actionable and business-oriented results from the analysis

[top] Prerequisites

This course is suitable for software engineers, data analysts and statisticians with basic knowledge of Apache Hadoop: HDFS, MapReduce, Hadoop Streaming, Apache Hive. Students should have proficiency in a scripting language: Python is strongly preferred, but familiarity with Perl or Ruby is sufficient.

Certification Exam

Following successful completion of the training class, attendees will be given a voucher for one attempt of the written certification exam. This voucher is non-transfearable and is given only to individuals who successfully complete the entire training class. Participants are also encouraged to prepare for and take the Data Scientist hands-on lab exam. Both exams will be available soon.

[top] Outline

  • Introduction
  • Data Science Overview
  • Use Cases
  • Project Lifecycle
  • Data Acquisition
  • Evaluating Input Data
  • Data Transformation
  • Data Analysis and Statistical Methods
  • Fundamentals of Machine Learning
  • Recommender Overview
  • Introduction to Apache Mahout
  • Implementing Recommenders with Apache Mahout
  • Experimentation and Evaluation
  • Production Deployment and Beyond
  • Conclusion
  • Appendix A : Hadoop Overview
  • Appendix B: Mathematical Formulas
  • Appendix C : Language and Tool Reference

Training Schedule

United States Jun 2013 Jul 2013 Aug 2013 Sep 2013
Charlotte, NC Jul 16 - Jul 18
Dallas, TX Aug 21 - Aug 23
Denver, CO Aug 14 - Aug 16
Los Angeles, CA Jul 24 - Jul 26
San Francisco Bay Area, CA Jul 10 - Jul 12
Aug 7 - Aug 9
Sep 18 - Sep 20
Washington, DC Metro Area Jun 26 - Jun 28
Jul 17 - Jul 19
Sep 11 - Sep 13
International Jun 2013 Jul 2013 Aug 2013 Sep 2013
London, United Kingdom   Jul 29 - Jul 31
   
東京, Japan Jun 24 - Jun 26