SCI-241 HDP Data Science

SCI-241 HDP Data Science

Summary

This course provides instruction on the theory and practice of data science, including machine learning and natural language processing. This course introduces many of the core concepts behind today’s most commonly used algorithms and introducing them in practical applications. We’ll discuss concepts and key algorithms in all of the major areas – Classification, Regression, Clustering, Dimensionality Reduction, including a primer on Neural Networks. We’ll focus on both single-server tools and frameworks (Python, NumPy, pandas, SciPy, Scikit-learn, NLTK, TensorFlow Jupyter) as well as large-scale tools and frameworks (Spark MLlib, Stanford CoreNLP, TensorFlowOnSpark/Horovod/MLeap, Apache Zeppelin).

Duration

3 Days

Audience

Architects, software developers, analysts and data scientists who need to apply data science and machine learning on Spark/Hadoop
.

Prerequisites

Students must have experience with Python and Scala, Spark, and prior exposure to statistics, probability, and a basic understanding of big data and Hadoop principles. While brief reviews are offered in these topics, students new to Hadoop are encouraged to attend the Apache Hadoop Essentials (HDP-123) course and HDP Spark Developer (DEV-343), as well as the language-specific introduction courses.


Outline

AGENDA SUMMARY

  • Day 1: Introducing Data Science, SciKit-Learn, HDFS, Reviewing Spark apps, DataFrames and NOSQL,Reviewing Mathematics, Statistics, and Probability, HDP and HDF and Apache NiFi, and Kafka with Structured
    Streaming
  • Day 2: Algorithms in Spark ML and SciKit-Learn: Linear Regression, Logistic Regression, Support Vectors,
    Decision Trees, Random Forests, KNN, Spam Classifier
  • Day 3: Algorithms in Spark ML and SciKit-Learn: K-Means & GMM Clustering, Essential TensorFlow, NLP with
    NLTK, NLP with Stanford CoreNLP, Sentiment Analysis, Dimensionality Reduction
  • Day 4: Algorithms in Spark ML and SciKit-Learn: HyperParameter Tuning, K-Fold Validation, Ensemble Methods, ML Pipelines in SparkML, TensorFlow on Spark, Horovod, MLeap

Download the full agenda for this course.

Upcoming Classes

No classes have been scheduled, but you can always Request a Quote.

Onsite Training

Request a quote for a private training session.

Request Quote

Public Training

Check out our FAQ page.