Spark Programming in Python for Beginners with Apache Spark 3
Preview this course
30-Day Money-Back Guarantee Full Lifetime Access.
- OR -
Get this course, plus 250+ of our top-rated AI & Data courses, with a Subscription.
Starting at $20 per month. Enjoy a Free Trial. Cancel Anytime.
67 on-demand videos & exercises
Level: Beginner
English
6hrs 35mins
Access on mobile, web and TV
Who's this course for?
This
course is designed for software engineers willing to develop a data
engineering pipeline and application using Apache Spark; for data architects
and data engineers who are responsible for designing and building the
organization’s data-centric infrastructure, for managers and architects who
do not directly work with Spark implementation but work with the people who
implement Apache Spark at the ground level.This course does not require any prior knowledge of Apache Spark or
Hadoop; only programming knowledge using Python programming language is
required.
What you'll learn
Learn
Apache Spark Foundation and Spark architecture.
Learn data engineering and
data processing in Spark Work with data sources and sinks.
Work with data
frames and Spark SQL.
Use PyCharm IDE for Spark development and debugging.
Learn unit testing, managing application logs, and cluster deployment.
Key Features
Build
your own data engineering solutions using Spark structured API in Python.
Gain an in-depth understanding of the Apache Hadoop architecture, ecosystem,
and practices.
Learn to apply Spark programming basics.
Course Curriculum
What to know about this course
If you
are looking to expand your knowledge in data engineering or want to level up
your portfolio by adding Spark programming to your skillset, then you are in
the right place. This course will help you understand Spark programming and
apply that knowledge to build data engineering solutions. This course is
example-driven and follows a working session-like approach. We will be taking
a live coding approach and explaining all the concepts needed along the
way.
In this course, we will start
with a quick introduction to Apache Spark, then set up our environment by
installing and using Apache Spark. Next, we will learn about Spark execution
model and architecture, and about Spark programming model and developer experience.
Next, we will cover Spark structured API foundation and then move towards
Spark data sources and sinks. Then we
will cover Spark Dataframe and dataset transformations. We will also cover
aggregations in Apache Spark and finally, we will cover Spark Dataframe
joins.
By the end of this course, you
will be able to build data engineering solutions using Spark structured API
in Python. All the resources for the
course are available at
https://github.com/PacktPublishing/Spark-Programming-in-Python-for-Beginners-with-Apache-Spark-3
About the Author
ScholarNest
ScholarNest
is a small team of people passionate about helping others learn and grow in
their careers by bridging the gap between their existing and required
skills. Together, they have over 40+
years of experience in IT as a developer, architect, consultant, trainer, and
mentor. They have worked with international software services organizations
on various data-centric and Big Data projects. It is a team of firm believers in
lifelong continuous learning and skill development. To popularize the
importance of continuous learning, they started publishing free training
videos on their YouTube channel. They conceptualized the notion of continuous
learning, creating a journal of our learning under the Learning Journal
banner.