Apache Spark Developer

Apache Spark Developer


Invite friends

Contact Us

Page Views : 2

About The Event


This four day course of Spark Developer is for data engineers,  analysts, architects; software engineers; IT operations; and technical managers interested in a thorough, hands-on overview of Apache Spark.

 The course covers the core APIs for using Spark, fundamental mechanisms and basic internals of the framework, SQL and other high-level data access tools, as well as Spark’s streaming capabilities and machine learning APIs.


After taking this class you will be able to:

  • Describe Spark’s fundamental mechanics
  • Use the core Spark APIs to operate on data
  • Articulate and implement typical use cases for Spark
  • Build data pipelines with SparkSQL and DataFrames
  • Analyze Spark jobs using the UIs and logs Create Streaming and Machine Learning jobs


Pre requisitie : 

  • Required

◦               Basic to intermediate Linux knowledge, including: The ability to use a text editor, such as vi

Familiarity with basic command-line options such a mv, cp, ssh, grep, cd, useradd

◦                            Knowledge of application development principles


•   Recommended

◦                            Knowledge of functional programming

◦                            Knowledge of Scala or Python

◦                            Beginner fluency with SQL

 Course Overview

Lesson 1 – Introduction to Apache Spark 

  • Describe the features of Apache Spark
  • Advantages of Spark
  • How Spark fits in with the Big Data application stack
  • How Spark fits in with Hadoop
  • Define Apache Spark components 

Lesson 2 – Load and Inspect Data in Apache Spark 

  • Describe different ways of getting data into Spark
  • Create and use Resilient Distributed Datasets (RDDs)
  • Apply transformation to RDDs
  • Use actions on RDDs
  • Load and inspect data in RDD
  • Cache intermediate RDDs
  • Use Spark DataFrames for simple queries
  • Load and inspect data in DataFrames

Lesson 3 – Build a Simple Apache Spark Application

  • Define the lifecycle of a Spark program
  • Define the function of SparkContext
  • Create the application
  • Define different ways to run a Spark application
  • Run your Spark application
  • Launch the application

Lesson 4 – Work with PairRDD 

  • Review loading and exploring data in RDD
  • Load and explore data in RDD
  • Describe and create Pair RDD
  • Create and explore PairRDD
  • Control partitioning across nodes


Lesson 5 – Work with DataFrames 


  • Create DataFrames

◦                            From  existing RDD

◦                            From  data sources

  • Work with data in DataFrames

◦                            Use DataFrame operations

◦                            Use SQL

◦                            Explore data in DataFrames

  • Create user-defined functions (UDF)

◦                            UDF used with Scala DSL

◦                            UDF used with SQL

◦                            Create and use user-defined functions

  • Repartition DataFrames
  • Supplemental Lab: Build a standalone application

Lesson 6 – Monitor Apache Spark Applications 

  • Describe components of the Spark execution model
  • Use Spark Web UI to monitor Spark applications
  • Debug and tune Spark applications
  • Use the Spark Web U 

Lesson 7 – Introduction to Apache Spark Data Pipelines

Identify components of Apache Spark Unified Stack

  • List benefits of Apache Spark over Hadoop ecosystem
  • Describe data pipeline use cases

Lesson 8 – Create an Apache Spark Streaming Application

Venue Map