Book Online Tickets for Big Data And Hadoop Training, Indore. 
1. The big picture of Big Data
 



What is Big Data
Necessity of Big Data and Hadoop in the industry
Paradigm shift - why the industry is shifting to Big Data tools
Different dimensions of Big Data
Data explosion in the Big Data industry
Vario

Big Data And Hadoop Training

 

Invite friends

Contact Us

Page Views : 9

About The Event

1. The big picture of Big Data
 
  1. What is Big Data
  2. Necessity of Big Data and Hadoop in the industry
  3. Paradigm shift - why the industry is shifting to Big Data tools
  4. Different dimensions of Big Data
  5. Data explosion in the Big Data industry
  6. Various implementations of Big Data
  7. Different technologies to handle Big Data
  8. Traditional systems and associated problems
  9. Future of Big Data in the IT industry
2. Demystifying Hadoop
 
  1. Why Hadoop is at the heart of every Big Data solution
  2. Introduction to the Big Data Hadoop framework
  3. Hadoop architecture and design principles
  4. Ingredients of Hadoop
  5. Hadoop characteristics and data-flow
  6. Components of the Hadoop ecosystem
  7. Hadoop Flavors – Apache, Cloudera, Hortonworks, and more
3. Setup and Installation of Hadoop
SETUP AND INSTALLATION OF SINGLE-NODE HADOOP CLUSTER
Hadoop environment setup and pre-requisites
  1. Hadoop Installation and configuration
  2. Working with Hadoop in pseudo-distributed mode
  3. Troubleshooting encountered problems
SETUP AND INSTALLATION OF HADOOP MULTI-NODE CLUSTER
  1. Hadoop environment setup on the cloud (Amazon cloud)
  2. Installation of Hadoop pre-requisites on all nodes
  3. Configuration of masters and slaves on the cluster
  4. Playing with Hadoop in distributed mode
4. HDFS – The Storage Layer
 
  1. What is HDFS (Hadoop Distributed File System)
  2. HDFS daemons and architecture
  3. HDFS data flow and storage mechanism
  4. Hadoop HDFS characteristics and design principles
  5. Responsibility of HDFS Master – NameNode
  6. Storage mechanism of Hadoop meta-data
  7. Work of HDFS Slaves – DataNodes
  8. Data Blocks and distributed storage
  9. Replication of blocks, reliability, and high availability
  10. Rack-awareness, scalability, and other features
  11. Different HDFS APIs and terminologies
  12. Commissioning of nodes and addition of more nodes
  13. Expanding clusters in real-time
  14. Hadoop HDFS Web UI and HDFS explorer
  15. HDFS best practices and hardware discussion
5. A Deep Dive into MapReduce
 
  1. What is MapReduce, the processing layer of Hadoop
  2. The need for a distributed processing framework
  3. Issues before MapReduce and its evolution
  4. List processing concepts
  5. Components of MapReduce – Mapper and Reducer
  6. MapReduce terminologies- keys, values, lists, and more
  7. Hadoop MapReduce execution flow
  8. Mapping and reducing data based on keys
  9. MapReduce word-count example to understand the flow
  10. Execution of Map and Reduce together
  11. Controlling the flow of mappers and reducers
  12. Optimization of MapReduce Jobs
  13. Fault-tolerance and data locality
  14. Working with map-only jobs
  15. Introduction to Combiners in MapReduce
  16. How MR jobs can be optimized using combiners
6. MapReduce - Advanced Concepts
 
  1. Anatomy of MapReduce
  2. Hadoop MapReduce data types
  3. Developing custom data types using Writable & WritableComparable
  4. InputFormats in MapReduce
  5. InputSplit as a unit of work
  6. How Partitioners partition data
  7. Customization of RecordReader
  8. Moving data from mapper to reducer – shuffling & sorting
  9. Distributed cache and job chaining
  10. Different Hadoop case-studies to customize each component
  11. Job scheduling in MapReduce
7. Hive – Data Analysis Tool
 
  1. The need for an adhoc SQL based solution – Apache Hive
  2. Introduction to and architecture of Hadoop Hive
  3. Playing with the Hive shell and running HQL queries
  4. Hive DDL and DML operations
  5. Hive execution flow
  6. Schema design and other Hive operations
  7. Schema-on-Read vs Schema-on-Write in Hive
  8. Meta-store management and the need for RDBMS
  9. Limitations of the default meta-store
  10. Using SerDe to handle different types of data
  11. Optimization of performance using partitioning
  12. Different Hive applications and use cases
8. Pig – Data Analysis Tool
 
  1. The need for a high level query language - Apache Pig
  2. How Pig complements Hadoop with a scripting language
  3. What is Pig
  4. Pig execution flow
  5. Different Pig operations like filter and join
  6. Compilation of Pig code into MapReduce
  7. Comparison - Pig vs MapReduce
9. NoSQL Database - HBase
 
  1. NoSQL databases and their need in the industry
  2. Introduction to Apache HBase
  3. Internals of the HBase architecture
  4. The HBase Master and Slave Model
  5. Column-oriented, 3-dimensional, schema-less datastores
  6. Data modeling in Hadoop HBase
  7. Storing multiple versions of data
  8. Data high-availability and reliability
  9. Comparison - HBase vs HDFS
  10. Comparison - HBase vs RDBMS
  11. Data access mechanisms
  12. Work with HBase using the shell
10. Data Collection using Sqoop
 
  1. The need for Apache Sqoop
  2. Introduction and working of Sqoop
  3. Importing data from RDBMS to HDFS
  4. Exporting data to RDBMS from HDFS
  5. Conversion of data import/export queries into MapReduce jobs
11. Data Collection using Flume
 
  1. What is Apache Flume
  2. Flume architecture and aggregation flow
  3. Understanding Flume components like data Sources and Sinks
  4. Flume channels to buffer events
  5. Reliable & scalable data collection tools
  6. Aggregating streams using Fan-in
  7. Separating streams using Fan-out
  8. Internals of the agent architecture
  9. Production architecture of Flume
  10. Collecting data from different sources to Hadoop HDFS
  11. Multi-tier Flume flow for collection of volumes of data using AVRO
12. Apache YARN & advanced concepts in the latest version
  1. The need for and the evolution of YARN
  2. YARN and its eco-system
  3. YARN daemon architecture
  4. Master of YARN – Resource Manager
  5. Slave of YARN – Node Manager
  6. Requesting resources from the application master
  7. Dynamic slots (containers)
  8. Application execution flow
  9. MapReduce version 2 application over Yarn
  10. Hadoop Federation and Namenode HA
13. Processing data with Apache Spark
  1. Introduction to Apache Spark
  2. Comparison - Hadoop MapReduce vs Apache Spark
  3. Spark key features
  4. RDD and various RDD operations
  5. RDD abstraction, interfacing, and creation of RDDs
  6. Fault Tolerance in Spark
  7. The Spark Programming Model
  8. Data flow in Spark
  9. The Spark Ecosystem, Hadoop compatibility, & integration
  10. Installation & configuration of Spark
  11. Processing Big Data using Spark
14. Real-Life Project on Big Data

A live Big Data Hadoop project based on industry use-cases using Hadoop components like Pig, HBase, MapReduce, and Hive to solve real-world problems in Big Data Analytics

More Events From Same Organizer

Similar Category Events