Overview
Key Facts
In data science, data is called “big” if it cannot fit into the memory of a single standard laptop or workstation.
The analysis of big datasets requires using a cluster of tens, hundreds or thousands of computers. Effectively using such clusters requires the use of distributed files systems, such as the Hadoop Distributed File System (HDFS) and corresponding computational models, such as Hadoop, MapReduce and Spark.
In the Big Data Analytics Using Spark Certificate, part of the Data Science MicroMasters Program from EdX in partnership with University of California, San Diego - UC San DiegoX, you will learn what the bottlenecks are in massive parallel computation and how to use spark to minimize these bottlenecks.
You will learn how to perform supervised an unsupervised machine learning on massive datasets using the Machine Learning Library (MLlib).
In this course, as in the other ones in this MicroMasters program, you will gain hands-on experience using PySpark within the Jupyter notebooks environment.
Get more details
Visit programme websiteProgramme Structure
What you'll learn
Programming Spark using Pyspark
Identifying the computational tradeoffs in a Spark application
Performing data loading and cleaning using Spark and Parquet
Modeling data through statistical and machine learning methods
Check out the full curriculum
Visit programme websiteKey information
Duration
- Part-time
- 70 days
- 9 hrs/week
Start dates & application deadlines
Language
Delivered
Disciplines
Information Technology (IT) Data Science & Big Data Machine Learning View 262 other Short Courses in Machine Learning in United StatesExplore more key information
Visit programme websiteAcademic requirements
We are not aware of any specific GRE, GMAT or GPA grading score requirements for this programme.
English requirements
We are not aware of any English requirements for this programme.
Other requirements
General requirements
Prerequisites
- The previous courses in the MicroMasters program: Python for Data Science, Probability and Statistics in Data Science using Python, Machine Learning Fundamentals.
Make sure you meet all requirements
Visit programme websiteTuition Fee
-
International
350 USD/fullTuition FeeBased on the tuition of 350 USD for the full programme during 70 days. -
National
350 USD/fullTuition FeeBased on the tuition of 350 USD for the full programme during 70 days.
- Unlimited access + verified certificate: $350
- Limited access: free