Overview

This Distributed Computing with Spark SQL course offered by Coursera in partnership with UC Davis is for students with SQL experience that want to take the next step on their data journey by learning distributed computing using Apache Spark. Students will gain a thorough understanding of this open-source standard for working with large datasets. Students will gain an understanding of the fundamentals of data analysis using SQL on Spark, setting the foundation for how to combine data with advanced analytics at scale and in production environments. The four modules build on one another and by the end of the course you will understand: the Spark architecture, queries within Spark, common ways to optimize Spark SQL, and how to build reliable data pipelines.

Features

The first module introduces Spark and the Databricks environment including how Spark distributes computation and Spark SQL. Module 2 covers the core concepts of Spark such as storage vs. compute, caching, partitions, and troubleshooting performance issues via the Spark UI. It also covers new features in Apache Spark 3.x such as Adaptive Query Execution. The third module focuses on Engineering Data Pipelines including connecting to databases, schemas and data types, file formats, and writing reliable data. The final module covers data lakes, data warehouses, and lakehouses. Students build production grade data pipelines by combining Spark with the open-source project Delta Lake. By the end of this course, students will hone their SQL and distributed computing skills to become more adept at advanced analysis and to set the stage for transitioning to more advanced analytics as Data Scientists.

Get more details

Visit programme website

Programme Structure

Course structure:

Spark Terminology

Caching

Shuffle Partitions

Spark UI

Adaptive Query Execution (AQE)

Accessing Data

Check out the full curriculum

Visit programme website

Key information

Duration

Part-time
- 1 days

Start dates & application deadlines

You can apply for and start this programme anytime.

Language

English

Delivered

Online

Disciplines

Computer Sciences Software Engineering View 746 other Short Courses in Computer Sciences in United States

Explore more key information

Visit programme website

Academic requirements

We are not aware of any specific GRE, GMAT or GPA grading score requirements for this programme.

English requirements

We are not aware of any English requirements for this programme.

Other requirements

General requirements

Intermediate level
No previous experience necessary

Make sure you meet all requirements

Visit programme website

Tuition Fee

To always see correct tuition fees

International

Free

Tuition Fee

Based on the tuition of 0 USD for the full programme during 1 days.
National

Free

Tuition Fee

Based on the tuition of 0 USD for the full programme during 1 days.

You can choose from hundreds of free courses, or get a degree or certificate at a breakthrough price. You can now select Coursera Plus, an annual subscription that provides unlimited access.

Funding

Coursera provides financial aid to learners who cannot afford the fee. Apply for it by clicking on the Financial Aid link beneath the "Enroll" button on the left. You'll be prompted to complete an application and will be notified if you are approved. You'll need to complete this step for each course in the Specialization, including the Capstone Project.

Improve page content

Distributed Computing with Spark SQL

About

Overview

Features

Get more details

Programme Structure

Check out the full curriculum

Key information

Duration

Start dates & application deadlines

Language

Delivered

Disciplines

Explore more key information

Academic requirements

English requirements

Other requirements

General requirements

Make sure you meet all requirements

Tuition Fee

International

National

Funding

Other interesting programmes for you

Our partners