Overview
Context
There's been a lot of buzz about Big Data over the past few years, and it's finally become mainstream for many companies. But what is this Big Data?
This Big Data Fundamentals with PySpark course offered by Data Camp covers the fundamentals of Big Data via PySpark. Spark is a "lightning fast cluster computing" framework for Big Data.
It provides a general data processing platform engine and lets you run programs up to 100x faster in memory, or 10x faster on disk, than Hadoop.
You’ll use PySpark, a Python package for Spark programming and its powerful, higher-level libraries such as SparkSQL, MLlib (for machine learning), etc.
You will explore the works of William Shakespeare, analyze Fifa 2018 data and perform clustering on genomic datasets.
At the end of this course, you will have gained an in-depth understanding of PySpark and its application to general Big Data analysis.
Programme Structure
Chapters include:
- Big Data analysis with Spark
- PySpark SQL & DataFrames
- Programming in PySpark RDD’s
- Machine Learning with PySpark MLlib
Key information
Duration
- Part-time
- 1 days
Start dates & application deadlines
Language
Delivered
Campus Location
- New York City, United States
Disciplines
Data Science & Big Data View 464 other Short Courses in Data Science & Big Data in United StatesWhat students do after studying
Academic requirements
We are not aware of any specific GRE, GMAT or GPA grading score requirements for this programme.
English requirements
We are not aware of any English requirements for this programme.
Other requirements
General requirements
Prerequisites
- Introduction to Python
Tuition Fees
-
International Applies to you
Applies to youNon-residentsFree - Out-of-StateFree
-
Domestic
Applies to youIn-StateFree
Additional Details
This course can be accessed for free with the Data Camp Premium or Teams subscriptions