Conversion of data import/export queries into MapReduce jobs
11. Data Collection using Flume
What is Apache Flume
Flume architecture and aggregation flow
Understanding Flume components like data Sources and Sinks
Flume channels to buffer events
Reliable & scalable data collection tools
Aggregating streams using Fan-in
Separating streams using Fan-out
Internals of the agent architecture
Production architecture of Flume
Collecting data from different sources to Hadoop HDFS
Multi-tier Flume flow for collection of volumes of data using AVRO
12. Apache YARN & advanced concepts in the latest version
The need for and the evolution of YARN
YARN and its eco-system
YARN daemon architecture
Master of YARN – Resource Manager
Slave of YARN – Node Manager
Requesting resources from the application master
Dynamic slots (containers)
Application execution flow
MapReduce version 2 application over Yarn
Hadoop Federation and Namenode HA
13. Processing data with Apache Spark
Introduction to Apache Spark
Comparison - Hadoop MapReduce vs Apache Spark
Spark key features
RDD and various RDD operations
RDD abstraction, interfacing, and creation of RDDs
Fault Tolerance in Spark
The Spark Programming Model
Data flow in Spark
The Spark Ecosystem, Hadoop compatibility, & integration
Installation & configuration of Spark
Processing Big Data using Spark
14. Real-Life Project on Big Data
A live Big Data Hadoop project based on industry use-cases using Hadoop components like Pig, HBase, MapReduce, and Hive to solve real-world problems in Big Data Analytics