Distributed Computing with Spark SQL, Certificate | Part time online | Coursera | United States
1 days
Duration
Free
Free
Unknown
Tuition fee
Anytime
Unknown
Apply date
Anytime
Unknown
Start date

About

This Distributed Computing with Spark SQL course offered by Coursera in partnership with UC Davis is all about big data. 

Visit the Visit programme website for more information

Overview

This Distributed Computing with Spark SQL course offered by Coursera in partnership with UC Davis is for students with SQL experience that want to take the next step on their data journey by learning distributed computing using Apache Spark. Students will gain a thorough understanding of this open-source standard for working with large datasets. Students will gain an understanding of the fundamentals of data analysis using SQL on Spark, setting the foundation for how to combine data with advanced analytics at scale and in production environments. The four modules build on one another and by the end of the course you will understand: the Spark architecture, queries within Spark, common ways to optimize Spark SQL, and how to build reliable data pipelines.

Features

The first module introduces Spark and the Databricks environment including how Spark distributes computation and Spark SQL. Module 2 covers the core concepts of Spark such as storage vs. compute, caching, partitions, and troubleshooting performance issues via the Spark UI. It also covers new features in Apache Spark 3.x such as Adaptive Query Execution. The third module focuses on Engineering Data Pipelines including connecting to databases, schemas and data types, file formats, and writing reliable data. The final module covers data lakes, data warehouses, and lakehouses. Students build production grade data pipelines by combining Spark with the open-source project Delta Lake. By the end of this course, students will hone their SQL and distributed computing skills to become more adept at advanced analysis and to set the stage for transitioning to more advanced analytics as Data Scientists.

Programme Structure

Course structure:

  • Spark Terminology
  • Caching
  • Shuffle Partitions
  • Spark UI
  • Adaptive Query Execution (AQE)
  • Accessing Data

Key information

Duration

  • Part-time
    • 1 days

Start dates & application deadlines

You can apply for and start this programme anytime.

Language

English

Delivered

Online

Academic requirements

We are not aware of any specific GRE, GMAT or GPA grading score requirements for this programme.

English requirements

We are not aware of any English requirements for this programme.

Other requirements

General requirements

  • Intermediate level
  • No previous experience necessary

Tuition Fee

To always see correct tuition fees
  • International

    Free
    Tuition Fee
    Based on the tuition of 0 USD for the full programme during 1 days.
  • National

    Free
    Tuition Fee
    Based on the tuition of 0 USD for the full programme during 1 days.

You can choose from hundreds of free courses, or get a degree or certificate at a breakthrough price. You can now select Coursera Plus, an annual subscription that provides unlimited access.

Funding

Coursera provides financial aid to learners who cannot afford the fee. Apply for it by clicking on the Financial Aid link beneath the "Enroll" button on the left. You'll be prompted to complete an application and will be notified if you are approved. You'll need to complete this step for each course in the Specialization, including the Capstone Project.

Other interesting programmes for you

Our partners

Distributed Computing with Spark SQL
-
Coursera

Wishlist

Go to your profile page to get personalised recommendations!