Apache Spark Developer copy

Apache Spark Developer copy


Invite friends

Contact Us

Page Views : 4

About The Event


This four day course of Spark Developer is for data engineers,  analysts, architects; software engineers; IT operations; and technical managers interested in a thorough, hands-on overview of Apache Spark.

 The course covers the core APIs for using Spark, fundamental mechanisms and basic internals of the framework, SQL and other high-level data access tools, as well as Spark’s streaming capabilities and machine learning APIs.



After taking this class you will be able to:

  • Describe Spark’s fundamental mechanics
  • Use the core Spark APIs to operate on data
  • Articulate and implement typical use cases for Spark
  • Build data pipelines with SparkSQL and DataFrames
  • Analyze Spark jobs using the UIs and logs Create Streaming and Machine Learning jobs


Pre requisitie : 

  • Required
  • Basic to intermediate Linux knowledge, including: The ability to use a text editor, such as vi
  • Familiarity with basic command-line options such a mv, cp, ssh, grep, cd, useradd
  • Knowledge of application development principles


•   Recommended

◦   Knowledge of functional programming

◦   Knowledge of Scala or Python

◦   Beginner fluency with SQL


 Course Overview

Lesson 1 – Introduction to Apache Spark 

  • Describe the features of Apache Spark
  • Advantages of Spark
  • How Spark fits in with the Big Data application stack
  • How Spark fits in with Hadoop
  • Define Apache Spark components 


Lesson 2 – Load and Inspect Data in Apache Spark 

  • Describe different ways of getting data into Spark
  • Create and use Resilient Distributed Datasets (RDDs)
  • Apply transformation to RDDs
  • Use actions on RDDs
  • Load and inspect data in RDD
  • Cache intermediate RDDs
  • Use Spark DataFrames for simple queries
  • Load and inspect data in DataFrames


Lesson 3 – Build a Simple Apache Spark Application

  • Define the lifecycle of a Spark program
  • Define the function of SparkContext
  • Create the application
  • Define different ways to run a Spark application
  • Run your Spark application
  • Launch the application


Lesson 4 – Work with PairRDD 

  • Review loading and exploring data in RDD
  • Load and explore data in RDD
  • Describe and create Pair RDD
  • Create and explore PairRDD
  • Control partitioning across nodes


Lesson 5 – Work with DataFrames  

  • Create DataFrames

    ◦   From  existing RDD

    ◦   From  data sources

        Work with data in DataFrames

    ◦   Use DataFrame operations

    ◦   Use SQL

   ◦    Explore data in DataFrames

  • Create user-defined functions (UDF)

    ◦    UDF used with Scala DSL

    ◦    UDF used with SQL

    ◦   Create and use user-defined functions

  • Repartition DataFrames
  • Supplemental Lab: Build a standalone application


Lesson 6 – Monitor Apache Spark Applications 

  • Describe components of the Spark execution model
  • Use Spark Web UI to monitor Spark applications
  • Debug and tune Spark applications
  • Use the Spark Web U 


Lesson 7 – Introduction to Apache Spark Data Pipelines

Identify components of Apache Spark Unified Stack

  • List benefits of Apache Spark over Hadoop ecosystem
  • Describe data pipeline use cases

Lesson 8 – Create an Apache Spark Streaming Application

Venue Map