Sale Date Ended
Overview
Apache Spark with Scala training will advance your expertise in Distributed programming with Spark and Scala. Skill set gained through the course in Core Spark, Scala, SparkSQL and Streaming will help you to solve complex problem. Deep knowledge of Spark with Scala will always make you distinct, which will open a successful path for your career.
Objective
Hadoop mapreduce faciliteted to solve complex problems on distributed systems but with some limitations. This course will discuss limitation of Hadoop mapreduce and how Spark overcomes those limitations. We describe RDDs which is core of Spark and In memory computation. Understanding of persistent RDDs, in memory computation, and solving Big Data problems using Spark with Scala is core of this course. Discussion will move through SparkSQL and problem solving with SparkSQL dataframes. Hand-on is the parallel movement for all the discussion. Concept on dealing with streaming data with Spark Streaming is also an important topic, which is included. Last part of course is Spark program optimization. Optimization of Spark core, Spark SQL, Spark streaming and optimizing the utilization of cluster system . We discuss Spark on Yarn, Standalone and Mesos cluster too.
Detail Description of class :
Introduction to Big Data and Distributed Computing :
Big data analysis is future. This section of course will help you to understand, the need of distributed computation.
Hadoop :
◦ mkdir
◦ ls
◦ rmdir and rm
◦ copyFromLocal
◦ put
◦ cat
◦ copyToLocal
◦ get
◦ touchz
◦ mv
◦ cp
◦ distcp
◦ etc…...
Scala :
Spark Introduction :
Operations On RDD :
Fault tolerance and Persistence :
Optimizing Spark program
IO in Spark :
Spark Streaming :
Spark Code Deployment and cluster managers.
Note : Every part of course will be associated with hands on . A number of objective questions will always help you in scratch your brain.
Projects :
Project 1 : Spark core can be used for data preparation and aggregation. Aggregation will be implemented using Spark core APIs.
For data aggregation movie lance data will be used.
Project 2 : Implementing streaming data word frequency visualization. using Kafka and Spark streaming integration.
Project 3 : Implementation of Moving average using SparkSQL.
Project 4 : Data preprocessing, data manipulation and aggregation using SparkSQL. It will be done using Real time
For any queries reach out to info@walsoul.com or +919900498065
Instructor - Raju Mishra (Linkedin - https://www.linkedin.com/in/raju-mishra-40322a104/
Detailed Course Content - Walsoul.com
Note: This course is both online as well as offline, offline classroom classes will be conducted at BTM Bangalore