In data science, data is called “big” if it cannot fit into the memory of a single standard laptop or workstation.
The analysis of big datasets requires using a cluster of tens, hundreds or thousands of computers. Effectively using such clusters requires the use of distributed files systems, such as the Hadoop Distributed File System (HDFS) and corresponding computational models, such as Hadoop, MapReduce and Spark.
In the Big Data Analytics Using Spark Certificate, part of the Data Science MicroMasters Program from EdX in partnership with University of California, San Diego - UC San DiegoX, you will learn what the bottlenecks are in massive parallel computation and how to use spark to minimize these bottlenecks.
You will learn how to perform supervised an unsupervised machine learning on massive datasets using the Machine Learning Library (MLlib).
In this course, as in the other ones in this MicroMasters program, you will gain hands-on experience using PySpark within the Jupyter notebooks environment.
Always verify the dates on the programme website programme website .
10 weeks, 10 hours per week
Yoav Freund - Professor of Computer Science and Engineering, UC San Diego
This programme requires students to demonstrate proficiency in English.
Check the programme website for information about funding options.
StudyPortals Tip: Students can search online for independent or external scholarships that can help fund their studies. Check the scholarships to see whether you are eligible to apply. Many scholarships are either merit-based or needs-based.
Together with the ISIC Association and British Council IELTS, StudyPortals offers you the chance to receive up to £10000 to expand your horizon and study abroad. We want to ultimately encourage you to study abroad in order to experience and explore new countries, cultures and languages.