Overview
Context
Spark is a powerful, general purpose tool for working with Big Data. Spark transparently handles the distribution of compute tasks across a cluster. This means that operations are fast, but it also allows you to focus on the analysis rather than worry about technical details.
In this Machine Learning with PySpark course offered by Data Camp you'll learn how to get data into Spark and then delve into the three fundamental Spark Machine Learning algorithms: Linear Regression, Logistic Regression/Classifiers, and creating pipelines.
Build and Test Decision Trees
Building your own decision trees is a great way to start exploring machine learning models. You’ll use an algorithm called ‘Recursive Partitioning’ to divide data into two classes and find a predictor within your data that results in the most informative split of the two classes, and repeat this action with further nodes. You can then use your decision tree to make predictions with new data.
Master Logistic and Linear Regression in PySpark
Logistic and linear regression are essential machine learning techniques that are supported by PySpark. You’ll learn to build and evaluate logistic regression models, before moving on to creating linear regression models to help you refine your predictors to only the most relevant options.
Programme Structure
Chapters include:
- Regression
- Classification
- Ensembles & Pipelines
Key information
Duration
- Part-time
- 1 days
Start dates & application deadlines
Language
Delivered
Campus Location
- New York City, United States
Disciplines
Machine Learning View 210 other Short Courses in Machine Learning in United StatesWhat students do after studying
Academic requirements
We are not aware of any specific GRE, GMAT or GPA grading score requirements for this programme.
English requirements
We are not aware of any English requirements for this programme.
Other requirements
General requirements
- This course is not suitable for complete beginners to PySpark. We recommend that you take our Introduction to PySpark and Supervised Learning with scikit-learn in order to fully benefit from the course and gain an introduction to both elements of the course.
Prerequisites
- Supervised Learning with scikit-learn
- Introduction to PySpark
Tuition Fees
-
International Applies to you
Applies to youNon-residentsFree - Out-of-StateFree
-
Domestic
Applies to youIn-StateFree
Additional Details
This course can be accessed for free with the Data Camp Premium or Teams subscriptions