SCI-241 HDP Data Science
Architects, software developers, analysts and data scientists who need to apply data science and machine learning on Spark/Hadoop .
Students must have experience with Python and Scala, Spark, and prior exposure to statistics, probability, and a basic understanding of big data and Hadoop principles. While brief reviews are offered in these topics, students new to Hadoop are encouraged to attend the Apache Hadoop Essentials (HDP-123) course and HDP Spark Developer (DEV-343), as well as the language-specific introduction courses.
- Day 1: Introducing Data Science, SciKit-Learn, HDFS, Reviewing Spark apps, DataFrames and NOSQL,Reviewing Mathematics, Statistics, and Probability, HDP and HDF and Apache NiFi, and Kafka with Structured
- Day 2: Algorithms in Spark ML and SciKit-Learn: Linear Regression, Logistic Regression, Support Vectors,
Decision Trees, Random Forests, KNN, Spam Classifier
- Day 3: Algorithms in Spark ML and SciKit-Learn: K-Means & GMM Clustering, Essential TensorFlow, NLP with
NLTK, NLP with Stanford CoreNLP, Sentiment Analysis, Dimensionality Reduction
- Day 4: Algorithms in Spark ML and SciKit-Learn: HyperParameter Tuning, K-Fold Validation, Ensemble Methods, ML Pipelines in SparkML, TensorFlow on Spark, Horovod, MLeap