Big Data and Hadoop - Geeksdemy
Home Courses Community Support
Big Data and Hadoop
Gartner predicts that Big Data demand will generate 4.4 million jobs in the IT Industry all around the world.
It is predicted that by 2020, we will have created 35 zetabytes worth of data and Big Data will drive $232 billion in spending through 2016.
About Course

Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.

"Big Data" has revolutionary increased the demand of information management specialists and Hadoop’s momentum is unstoppable as its open source roots grow wildly into enterprises. Its refreshingly unique approach to data management is transforming how companies store, process, analyze, and share big data.

You may start with no knowledge in Big Data and Hadoop. In-depth knowledge of core concepts will be covered in the course along with implementation on varied industry case studies. This training will help you mastering the skills of Hadoop to a level you can get ready to appear certification exam from Cloudera: Cloudera Certified Developer for Hadoop (CCDH) as well as learn how to work with Hortonworks, MapR Distribution, Amazon EMR and other popular Commercial platforms.

What are the learning outcomes?
By the end of the course, you will:

• Understand Complete Apache Hadoop 2.x Framework
• Master the concepts of Hadoop Distributed File System (HDFS)
• Learn to work with Setup Hadoop Cluster and write Complex MapReduce programs
• Perform Data Analytics using Pig, Hive and YARN
• Acquire in-depth understanding of Big Data and Hadoop Ecosystem
• Learn data loading techniques using Sqoop and Flume
• Implement HBase, Zookeeper and MapReduce Integration
• Implement Advanced Usage and Indexing
• Schedule jobs using Apache Oozie work flow scheduler
• Get hands-on experience in setting up different configurations of Hadoop cluster
• Manage, Maintain, Monitor and Troubleshoot a Hadoop Cluster
• Implement best practices for Hadoop development
• Work on real life industry based Project on Big Data Analytics
Who should do this course?
With the number of Big Data career opportunities on the rise, Hadoop has become a must-know technology for the following professionals:

• Software Developers and Architects
• Analytics Professionals
• Data Management/Warehouse Professionals
• Business Intelligence/ETL Professionals
• Project Managers
• Testing and mainframe Professionals
• Aspiring Data Scientists
• Software Developers and Architects
• Anyone with a genuine interest in Big Data Analytics
• Students aiming to build a successful career around Big Data
What are the Prerequisites?
You can master Hadoop, irrespective of your IT background. While basic knowledge of Core Java and SQL Scripting might help.
Why you should take this course?
This could open your doors to the most in demand IT jobs in the current times. Right now is the perfect time to enter the field of Big Data. Some of the reasons to learn Hadoop are;
• Hadoop is a combination of online running applications on a huge scale built of commodity hardware. It helps in storing and handling huge amounts of data in a faster and cost-effective manner.
• There is an excessive need for professionals skilled in Hadoop Development for better salary and excellent job opportunities.
• The online traininng here will help you prepare for Hadoop Developer Certification, thereby adding professional credibility to your Career.
• Companies Using Hadoop: Amazon Web Services, IBM, Hortonworks, Cloudera, Intel, Microsoft, Pivotal, Twitter, Salesforce, AT&T, StumbleUpon, eBay, Yahoo!, Facebook, Hulu, etc.
• You can find amazing career opportunities. Top Job websites for Hadoop Jobs- Indeed: 11000+, Simplyhired: 12000+, LinkedIn: 4500+, net: 8000+.
Can I appear for Certification Exam after this course?
obviously! Even we recommend to grab the in-demand certification after this course. For more information on Hadoop certification, search google for Cloudera CCDH and Hortonworks Certified Hadoop 2.x Developer (HDPCD).
Course Curriculum
About Big Data - A Quick Dive
What is Big Data Technology
Evolution and facts of Big data
Market Trends: Big Data – Big Value
Limitation of Large and complex data sets
Introduction to Big Data and Hadoop
Data Explosion and Need for Big Data
Characteristics of Big Data Technology
Leveraging Multiple Data Sources
Traditional IT Analytics Approach
Handling Limitations of Big Data Technology
Hadoop and Other Solutions
Distributed Architecture – A Brief Overview
Basics of Hadoop
Overview of Hadoop Ecosystem
Compare Hadoop vs traditional systems
VMware Player: Introduction and Requirements
Oracle VirtualBox to Open a Virtual Machine
Hadoop Setup and Deployment
Steps to install Linux (Ubuntu) Server for Hadoop
Ubuntu Server - Introduction
Hadoop Installation - Prerequisites
Hadoop Installation: Tips and Tricks for instant deployment
Hadoop Multi-Node Installation - Prerequisites
Steps involved in single and multi-node Hadoop installation on Ubuntu Server
Setup and Deployment Hadoop
Single-Node Cluster vs. Multi-Node Cluster
Creating the Clone of Hadoop
Steps to perform Clustering of the Hadoop environment
Hadoop Multi Node Cluster Setup using Amazon EC2
Hadoop Architecture and HDFS
Hadoop Architecture
Hadoop Cluster in Commodity Hardware
Hadoop Configuration and services
Apache Hadoop Core Components
Concept of HDFS
Regular File System vs. Hadoop Distributed File System
HDFS – features, Architecture and Operation Principle
HDFS: File System Namespace
HDFS Read and Write
Name Node Operation
Data Block Split
Benefits of Data Block Approach
HDFS - Block Replication Architecture and method
Data Replication Topology
Data Replication Representation
Case Study
MapReduce and YARN
Introduction to YARN and MapReduce
YARN: What and Why, Architecture
Different components of YARN
Resource Manager
Application Master
Applications startup in YARN
Role of AppMaster in Application Startup
Concepts of MapReduce
MapReduce – Analogy, Types and Formates
Map Execution - Distributed Two Node Environment
MapReduce and Associated Tasks
Set up Environment for MapReduce Development
Build MapReduce Program
Hadoop MapReduce Requirements
MapReduce Java Programming in Eclipse
How to develop Map reduce Application
Best practices for developing and writing, Debugging Map Reduce applications
Joining Data sets in Map Reduce
Checking Hadoop Environment for MapReduce
Practice and Case Study
Advanced HDFS and MapReduce
Advanced HDFS and related concepts
HDFS Benchmarking
Setting Up HDFS Block Size
Steps to decommissioning a DataNode
Advanced MapReduce Concepts
Interfaces and Data Types in Hadoop
Input and Output Formats in MapReduce
Distributed Cache
Joining Datasets in MapReduce
Various Joins in MapReduce
Reduce Side Join, Replicated Join, Composite Join
Hadoop Streaming and Hadoop Pipes
Practice and Case Study
Deep Dive in Pig
Introduction to Pig: What and Why
Components and features of Pig
Pig working and use cases
Data Model and Nested Data Model
Pig Execution and Interactive Modes
Salient Features
Pig vs. SQL
Basic Data Analysis with Pig
Prerequisites to Setup the environment For Pig Latin
Installation of a Pig Engine
Getting Datasets For Pig Development
Viewing the Schema
Loading, Filtering and Sorting Data
Script Interpretation
Processing Complex Data with Pig
Performing Grouping, Splitting and Joining relations
Filtering, Transforming and shaping relations
Various Pig Commands
Extending Pig: Macros and Imports, UDFs
Using Other Languages to process data with Pig
Practice and Case Studies
Deep Dive in Hive and HiveQL
Use of Hive and its Importance
Hive schema and Data Storage
Hive – Architecture and Components
Comparing Hive to Traditional Databases
Hive vs. Pig
Metastore and Metastore Configuration
Hive Thrift Server and Client Components
Basics of the Hive Query Language
Relational Data Analysis with Hive
Hive Databases and Tables
Data Types
Joining Data Sets
Common Built-in Functions
Running Hive Queries on the Shell, Scripts and Hue
Hive Data Formats and Management
Creating Databases and Hive-Managed Tables
Loading Data into Hive
Altering Databases and Tables
Self-Managed Tables
Simplifying Queries with views and storing Query results
Controlling Access to Data
Data Management with Hive
Data Model - External Tables and Partitions
Hive Optimization
Bucketing in Hive
Serialization and Deserialization
Hive Query Language - Extensibility
User-Defined and Built-in Functions
Practices and Case Study
Apache HBase
HBase Architecture and Characteristics
Companies Using HBase
HBase Components
Storage Model of HBase
Row Distribution of Data between Region Servers
Data Storage in HBase
HBase Data Model
When to Use HBase
HBase vs. RDBMS
Installation and Configuration of HBase
Connecting to HBase
HBase Shell Commands
Concept of NoSQL Database
Practice and Case Study
Working with Major Commercial Hadoop Platforms
Major Commercial distributions of Hadoop
Cloudera CDH
Cloudera Quickstart Virtual Machine
Download, start and work with Cloudera VM
Logging Into Hue Interface
Cloudera Manager
Logging into Cloudera Manager
Eclipse with MapReduce in Cloudera
Hortonworks Data Platform
MapR Data Platform
Pivotal HD
IBM InfoSphere BigInsights
Amazon Web Services EMR
Zookeeper, Sqoop and Flume
Zookeeper and its role
Features of ZooKeeper
Challenges Faced in Distributed processing
ZooKeeper Entities and Data Model
Install and Configure Zookeeper
Znode: Types, Operations, watches
Client API Functions
Cluster Management
Leader Election
Distributed Exclusive Lock
View ZooKeeper Nodes Using CLI
Concept of Sqoop
Benefits and Configuration of Sqoop
Sqoop Execution Process
Importing Data Using Sqoop
Importing Data to Hive and HBase
Exporting Data from Hadoop Using Sqoop
Sqoop Connectors
Sample Sqoop Commands
Import Data on Sqoop Using MySQL Database
Introduction and Concept of Flume
Flume Model and Goals
Scalability in Flume
Configure and Run Flume Agents
Ecosystem and its Components
Apache Hadoop Ecosystem Structure
Different Components and their roles in the ecosystem
Hadoop Administration and Maintenance
Namenode/Datanode directory structures and files
Optimizing a Hadoop Cluster
File system image and Edit log
The Checkpoint Procedure
Namenode failure and recovery procedure
Safe Mode
Metadata and Data backup
Potential problems and solutions / what to look for
Adding and removing nodes
Hadoop Monitoring and Troubleshooting
Best practices of monitoring a Hadoop cluster
Different Configuration Files of Hadoop Cluster
Properties of hadoop-default.xml
Hadoop ClusterCritical Parameters
Hadoop DFS OperationCritical Parameters
Different parameters for performance monitoring and tuning
Troubleshooting and Log Observation
Troubleshooting a Missing DataNode Issue
Using logs and stack traces for monitoring and troubleshooting
Using open-source tools to monitor Hadoop cluster
12 weeks
Course Duration
70% learning
through hands-on practice
60 Hours
Lab Exercises
24 x 7
Frequently Asked Questions
How do I enroll for the training?
You can enroll for the online training through our website. You can make online payment using any of the following options:
• Visa/master Credit card
• ATM/Debit Card
• Internet Banking
• Mobile Wallets (Paytm, Mobikwik etc. )

Once the online payment is done, you will automatically receive payment receipt, via email.
What are the opportunity for Hadoopers?
Opportunities for Hadoopers are infinite - from a Hadoop Developer, to a Data Scientist or a Hadoop Architect, and so on.
The content of the course is fundamental to every carrier path and very necessary to reach the ultimate goal.
How will my course run?
Once you enrol, our counsellor will have a chat with you to discuss your current comfort in programming, your targets/goals for this program and your preferred time availability. Your training sessions will commence after that. You will have 36 hours of online sessions with the instructor, over 20 days (Mon-fri) or 12 week (sat,sun).
How will be assignments and practicals done?
We will help you to setup Virtual Machine in your System with local access. In case your system doesn't meet the pre-requisites e.g. 4GB RAM, you will be provided remote access to the our cluster for the practicals. You can also create an account on AWS EC2 and use 'Free tier usage' eligible servers to create your Hadoop Cluster on AWS EC2. Team of experts will always be there to assist you for round the clock support.
Can I cancel my enrollment? Do I get a refund?
Yes! You can cancel your enrollment. We will provide you complete refund after deducting the administration fee. To know more, please go through our Refund Policy.
Which Case-Studies will be the part of the Course?
Towards the end of the course, you will be working on a live project where you will be using PIG, HIVE, HBase and MapReduce to perform Big Data analytics. We will also consider various industry-specific (Retail, Social media, finance, education) Big Data case studies.
When do my course start?
Sessions normally start every Monday and Saturday. In case there is rush, or if a candidate gets his colleagues together, a new batch can start even on other weekdays / weekends.
When are the classes held?
Your live classes will be held on either Weekdays or on Weekends depending on your time availability. In addition to live classes, there will be hands-on assignments with every module which you can do at your own schedule with the help of our expert support team.
Who are the instructors?
All our instructors are working professionals from the Industry, working in leading organizations and have real world industrial experience.
How can I request for support?
Practicing is the best way to master any skill and its natural to get stuck when you practice. We acknowledge that and provide you round the clock help. Experts respond to your query at the earliest and guide you through.
What are the system requirement to install Hadoop Environment?
Your system should have 4GB RAM, a processor better than core 2 duo. In case, your system falls short of these requirements, we can provide you remote access to our Hadoop Cluster.
What if I have queries after completion of the course?
Once you join the course, your support will be for lifetime. Even after the course completion, you can get back to the support team for any queries that you may have.
Online Classroom


What our students say about us...

Post Review

 Rate this Course

Our mission is to provide highly effective and quality education via innovative solutions. Geeksdemy look forward to bridge the gap between in-demand technology and academics in order to deliver innovative, easy, interesting and affordable learning across the Globe.


  • Python
  • Game Development with Unity 3D
  • Arduino
  • PCB & Circuit Designing
  • Robotics and Embedded C
  • 8051/8052 Embedded Systems
  • QuadCopter & UAV
  • MATLAB with Robotics
  • Raspberry Pi
Learn On the Go!
Get the Android App
Get the iOS App