Overview
Context
R is mostly optimized to help you write data analysis code quickly and readably. Apache Spark is designed to analyze huge datasets quickly. The sparklyr package lets you write dplyr R code that runs on a Spark cluster, giving you the best of both worlds. This Introduction to Spark with sparklyr in R course at Data Camp teaches you how to manipulate Spark DataFrames using both the dplyr interface and the native interface to Spark, as well as trying machine learning techniques.
Load Data into Spark and Manipulate Spark DataFrames
You’ll start this Spark course by investigating how Spark and R work well together and practicing loading data, ready for cleaning, transformation, and analysis. You’ll use Spark frames and dplyr syntax to manipulate your data by filtering and arranging rows, and mutating and summarizing columns.Delve into Big Data Analysis with Spark MLib
This course focuses on building your skills and confidence in analyzing huge datasets. The final chapters take you through Spark’s machine learning data transformation features and offer you the chance to practice sparklyr’s machine learning routines by using it to make predictions using gradient boosted trees and random forests. "Programme Structure
Chapters
- Light My Fire: Starting To Use Spark With dplyr Syntax
- Tools of the Trade: Advanced dplyr Usage
- Going Native: Use The Native Interface to Manipulate Spark DataFrames
- Case Study: Learning to be a Machine: Running Machine Learning Models on Spark
Key information
Duration
- Part-time
- 1 days
Start dates & application deadlines
Language
Delivered
Campus Location
- New York City, United States
Disciplines
Data Science & Big Data View 464 other Short Courses in Data Science & Big Data in United StatesWhat students do after studying
Academic requirements
We are not aware of any specific GRE, GMAT or GPA grading score requirements for this programme.
English requirements
We are not aware of any English requirements for this programme.
Other requirements
General requirements
- PREREQUISITES: Supervised Learning in R: Regression
- Even though no prior knowledge of Apache Spark is required, this course introduces learners to the basics of Apache Spark and how to use Spark with the sparklyr package in R.
- This course can be beneficial for anyone interested in learning how to manipulate large datasets quickly using Apache Spark and the sparklyr package in R. From data engineers to data scientists to analytics professionals and software developers, anyone working with large datasets would benefit from this course.
Tuition Fees
-
International Applies to you
Applies to youNon-residentsFree - Out-of-StateFree
-
Domestic
Applies to youIn-StateFree
Additional Details
This course can be accessed for free with the Data Camp Premium or Teams subscriptions