Tech Bodhi - Quality Trainings | Software Testing | Java

BIG DATA HADOOP TRAINING

Batches

Morning Batch

Time Of Batch : Slots Available - Call Us

Afternoon batch

Time Of Batch : Slots Available - Call Us

Evening batch

Time Of Batch : Slots Available - Call Us

Features

Duration Of Class

Classroom Training with practical :48 hours

Live project

Apply the skills you learn on a distributed cluster to solve real-world problems

Case Studies

During the training sessions, relevant case studies will be shared and discussed with students.

Placement Policy

Placement Support will be provided for all the eligible students from job oriented courses. Additional Support will be provided based on partner companies

Expert Support

All technical support team members are committed to help students with any technical queries arise during the course. Career guidance will be provied to students along with resume building as per need basis

Certification

Techbodhi certifies you as an Hadoop Developer based on your performance during exit test.

Course Description

What is Big Data?

Big Data is collection of huge or massive amount of data.We live in data age.And its not easy to measure the total volume of data or to manage & process this enormous data. The flood of thisBig Data are coming from different resources. Such as : New York stock exchange, Facebook, Twitter, AirCraft, Wallmart etc. Todays world information is getting doubled after every two years (1.8 times). And still 80% of data is in unstructured format,which is very difficult to store,process or retrieve. so, we can say all this unstructured data is Big Data.

Hadoop Architect

A Hadoop Architect is an individual or team of experts who manage penta bytes of data and provide documentation for Hadoop based environments around the globe. An even more crucial role of a Hadoop Architect is to govern administers, managers and manage the best of their efforts as an administrator. Hadoop Architect also needs to govern Hadoop on large cluster. Every Hadoop Architect must have an impeccable experience in Java, MapReduce, Hive, Hbase and Pig.

Project

What are the system requirements for this course?

Your system should have 8GB RAM and i3 processor. We provide you VM image of hadoop ecosystem

How will I execute the practicals?

For your practical work techbodhi team will provide guidance.

Which Case-Studies will be a part of the course?

Towards the end of the course, you will work on a live project where you will be using PIG, HIVE, HBase and MapReduce to perform Big Data analytics.
Following are a few industry-specific Big Data case studies that are included in our Big Data and Hadoop Certification e.g. Finance, Retail, Media, Aviation etc. which you can consider foryour project work:

Project #1: Analyze social bookmarking sites to find insights

Industry: Social Media

Data: It comprises of the information gathered from sites like reddit.com, stumbleupon.com which are bookmarking sites and allow you to bookmark, review, rate, search various links on any topic.reddit.com, stumbleupon.com, etc. A bookmarking site allows you to bookmark, review, rate, search various links on any topic. The data is in XML format and contains various links/posts URL, categories defining it and the ratings linked with it.

Problem Statement:Analyze the data in the Hadoop ecosystem to:

Fetch the data into a Hadoop Distributed File System and analyze it with the help of MapReduce, Pig and Hive to find the top rated links based on the user comments, likes etc.
Using MapReduce, convert the semi-structured format (XML data) into a structured format and categorize the user rating as positive and negative for each of the thousand links.
Push the output HDFS and then feed it into PIG, which splits the data into two parts: Category data and Ratings data.
Write a fancy Hive Query to analyze the data further and push the output is into relational database (RDBMS) using Sqoop.
Use a web server running on grails/java/ruby/python that renders the result in real time processing on a website.

Project #2: Customer Complaints Analysis

Industry: Retail

Data: Publicly available dataset, containing a few lakh observations with attributes like; CustomerId, Payment Mode, Product Details, Complaint, Location, Status of the complaint, etc.

Problem Statement:Analyze the data in the Hadoop ecosystem to:

Get the number of complaints filed under each product
Get the total number of complaints filed from a particular location
Get the list of complaints grouped by location which has no timely response

Project #3: Tourism Data Analysis

Industry: Tourism

Data: The dataset comprises attributes like: City pair (combination of from and to), adults traveling, seniors traveling, children traveling, air booking price, car booking price, etc.

Problem Statement:Find the following insights from the data:

Top 20 destinations people frequently travel to: Based on given data we can find the most popular destinations where people travel frequently, based on the specific initial number of trips booked for a particular destination
Top 20 locations from where most of the trips start based on booked trip count
Top 20 high air-revenue destinations, i.e the 20 cities that generate high airline revenues for travel, so that the discount offers can be given to attract more bookings for these destinations.

Project #4: Airline Data Analysis

Industry: Aviation

Data: Publicly available dataset which contains the flight details of various airlines such as: Airport id, Name of the airport, Main city served by airport, Country or territory where airport is located, Code of Airport, Decimal degrees, Hours offset from UTC, Timezone, etc.

Problem Statement:Analyze the airlines’ data to:

Find list of airports operating in the country
Find the list of airlines having zero stops
List of airlines operating with code share
Which country (or) territory has the highest number of airports
Find the list of active airlines in the United States

Project #5: Analyze Loan Dataset

Industry: Banking and Finance

Data: Publicly available dataset which contains complete details of all the loans issued, including the current loan status (Current, Late, Fully Paid, etc.) and latest payment information.

Problem Statement:

Find the number of cases per location and categorize the count with respect to reason for taking loan and display the average risk score.

Project #6: Analyze Movie Ratings

Industry: Media

Data: Publicly available data from sites like rotten tomatoes, IMDB, etc.

Problem Statement:Analyze the movie ratings by different users to:

Get the user who has rated the most number of movies
Get the user who has rated the least number of movies
Get the count of total number of movies rated by user belonging to a specific occupation
Get the number of underage users

Project #7: Analyze YouTube data

Industry: Social Media

Data: It is about the YouTube videos and contains attributes such as: VideoID, Uploader, Age, Category, Length, views, ratings, comments, etc.

Problem Statement:

Identify the top 5 categories in which the most number of videos are uploaded, the top 10 rated videos, and the top 10 most viewed videos.

Apart from these there are some twenty more use-cases to choose:

Market data Analysis
Twitter Data Analysis

Curriculum

About the Course

The "Introduction to Big Data and Hadoop" is an ideal course package for individuals who want to understand the basic concepts of Big Data and Hadoop. On completing this course, learners will be able to interpret what goes behind the processing of huge volumes of data as the industry switches over from excel-based analytics to real-time analytics. The course focuses on the basics of Big Data and Hadoop. It further provides an overview of the commercial distributions of Hadoop as well as the components of the Hadoop ecosystem.

Introduction to BigData & Hadoop

What is Big Data?
Why all industries are talking about Big Data?
What are the issues in Big Data?
What are the challenges for storing big data?
What are the challenges for processing big data?
What are the technologies support big data?

Hadoop

What is Hadoop?
History of Hadoop
Why Hadoop?
Hadoop Use cases
Advantages and Disadvantages of Hadoop
Importance of Different Ecosystems of Hadoop
Importance of Integration with other BigData solutions
Big Data Real time Use Cases

Hadoop Cluster

Data Storage in HDFS
HDFS Block size
HDFS Replication factor
Accessing HDFS
HDFS Commands
Configurations
How to overcome the Drawbacks in HDFS

How to configure the Hadoop Cluster

How to add the new nodes ( Commissioning )
How to remove the existing nodes ( DE-Commissioning )
How to verify the Dead Nodes
How to start the Dead Nodes

Hadoop 2.x.x version features

How to add the new nodes ( Commissioning )
How to remove the existing nodes ( DE-Commissioning )
How to verify the Dead Nodes
How to start the Dead Nodes

MapReduce Architecture

JobTracker

Importance of JobTracker
What are the roles of JobTracker
What are the drawbacks in JobTracker

TaskTracker

Importance of TaskTracker
What are the roles of TaskTracker
What are the drawbacks in TaskTracker

Data Types in Hadoop

What are the Data types in Map Reduce
Why these are importance in Map Reduce
Can we write custom Data Types in MapReduce

Input Formats in Map Reduce

Text Input Format
Key Value Text Input Format
Sequence File Input Format
Nline Input Format
Importance of Input Format in Map Reduce
How to use Input Format in Map Reduce
How to write custom Input Formats and its Record Readers

Output Formats in Map Reduce

Text Output Format
Sequence File Output Format
Importance of Output Format in Map Reduce
How to use Output Format in Map Reduce
How to write custom Output Formats and its Record Writers

Mapper

What is mapper in Map Reduce Job
Why we need mapper?
What are the Advantages and Disadvantages of mapper
Writing mapper programs

Reducer

What is reducer in Map Reduce Job
Why we need reducer?
What are the Advantages and Disadvantages of reducer
Writing reducer programs

Driver

What is Driver in Map Reduce Job
Why we need Driver?
Writing Driver program

Input Split

InputSplit
Need Of Input Split in Map Reduce
lnputSplit Size
InputSplit Size Vs Block Size
InputSplit Vs Mappers
Map Reduce Job execution flow

Combiner

What is combiner in Map Reduce Job
Why we need combiner?
What are the Advantages and Disadvantages of Combiner
Writing Combiner programs
Identity Mapper and Identity Reducer

Partitioner

What is Partitioner in Map Reduce Job
Why we need Partitioner?
What are the Advantages and Disadvantages of Partitioner
Writing Partitioner programs

Distributed Cache

What is Distributed Cache in Map Reduce Job
Importance of Distributed Cache in Map Reduce job
What are the Advantages and Disadvantages of Distributed Cache
Writing Distributed Cache programs

Counters

What is Counter in Map Reduce Job
Why we need Counters in production environment?
How to Write Counters in Map Reduce programs

Importance of Writable and Writable Comparable Apis

How to write custom Map Reduce Keys using Writable
How to write custom Map Reduce Values using Writable Comparable

Joins

Map Side Join

What is the importance of Map Side Join
Where we are using it

Reduce Side Join

What is the importance of Reduce Side Join
Where we are using it

What is the difference between Map Side join and Reduce Side Join?

Compression techniques

Importance of Compression techniques in production environment
Compression Types
NONE, RECORD and BLOCK
Compression Codecs
Default, Gzip, Bzip, Snappy and LZO
Enabling and Disabling these techniques for all the Jobs
Enabling and Disabling these techniques for a particular Job

Map Reduce Programming Model

How to write the Map Reduce jobs in Java
Running the Map Reduce jobs in local mode
Running the Map Reduce jobs in pseudo mode
Running the Map Reduce jobs in cluster mode

Debugging Map Reduce Jobs

How to debug Map Reduce Jobs in Local
How to debug Map Reduce Jobs in Remote

YARN (Next Generation Map Reduce)

What is YARN?
What is the importance of YARN?
Where we can use the concept of YARN in Real Time
What is difference between YARN and Map Reduce

Data Locality

What is Data Locality?
Will Hadoop follows Data Locality?

Speculative Execution

What is Speculative Execution?
Will Hadoop follows Speculative Execution?

Map Reduce Commands

Importance of each command
How to execute the command
Mapreduce admin related commands explanation

Configurations

Can we change the existing configurations of mapreduce or not?
Importance of configurations
Writing Unit Tests for Map Reduce Jobs
Use of Secondary Sorting and how to solve using MapReduce
How to Identify Performance Bottlenecks in MR jobs and tuning MR
Map Reduce Streaming and Pipes with examples
Exploring the Apache MapReduce Web UI

Apache Pig

Introduction to Apache Pig
Map Reduce Vs Apache Pig
SQL Vs Apache Pig
Different data types in Pig

Modes of Execution in Pig

Local Mode
Map Reduce Mode

Execution Mechanism

Grunt Shell
Script
Embedded

UDFs

How to write the UDFs in Pig
How to use the UDFs in Pig
Importance of UDFs in Pig

Filters

How to write the Filters in Pig
How to use the Filters in Pig
Importance of Filters in Pig

Load Functions

How to write the Load Functions in Pig
How to use the Load Functions in Pig
Importance of Load Functions in Pig

Store Functions

How to use the Store Functions in Pig
Importance of Store Functions in Pig
Transformations in Pig
How to write the complex pig scripts
How to integrate the Pig and Hbase

Apache HIVE

Hive

Introduction Hive architecture

Driver
Compiler
Semantic Analyzer
Hive Integration with Hadoop
Hive Query Language(Hive QL)
SQL VS Hive QL
Hive Installation and Configuration
Hive, Map-Reduce and Local-Mode
Hive DLL and DML Operations

Hive Services

CLI
Hiveserver
Hwi

Metastore

embedded metastore configuration
external metastore configuration

UDFs

How to write the UDFs in Hive
How to use the UDFs in Hive
Importance of UDFs in Hive

UDAFs

How to use the UDAFs in Hive
Importance of UDAFs in Hive

UDTFs

How to use the UDTFs in Hive
Importance of UDTFs in Hive
How to write a complex Hive queries
What is Hive Data Model?

Partitions

Importance of Hive Partitions in production environment
Limitations of Hive Partitions
How to write Partitions

Buckets

Importance of Hive Buckets in production environment
How to write Buckets

SerDe

Importance of Hive SerDes in production environment
How to write SerDe programs
How to integrate the Hive and Hbase

Apache Zookeeper

Introduction to zookeeper
Pseudo mode installations
Zookeeper cluster installations
Basic commands execution

Apache HBas

Hbase introduction
Hbase use cases
Hbase basics
Column families
Scans

Hbase installation

Local mode
Pseudo mode
Cluster mode

Hbase Architecture

Storage
Write Ahead Log
Log Structured Merge Trees

Mapreduce integration

Mapreduce over Hbase

Hbase Usage

Key design
Bloom Filters
Versioning
Coprocessors
Filters

Hbase Clients

REST
Thrift
Hive
Web Based UI

Hbase Admin

Schema definition
Basic CRUD operations

Apache Sqoop

Introduction to Sqoop
MySQL client and Server Installation
Sqoop Installation
How to connect to Relational Database using Sqoop
Sqoop Commands and Examples on Import
and Export commands

Apache Flume

Introduction to flume
Flume installation
Flume agent usage and Flume examples execution

Apache Oozie

Taking Photos Simply
Recording Videos Simply
Controlling the Camera

MongoDB

Introduction to MongoDB
MongoDB installation
MongoDB examples

FAQs

What if I miss a class?

Tech Bodhi Assures you do not miss any contents:

If You miss single or couple of sessions, Trainer will manager
If large number of sessions are missed, Center Head will provide the solution

Will I get placement assistance?

Techbodhi is committed to provide you an awesome learning experience through world-class content and best-in-class instructors. We will create an ecosystem through this training, that will enable you to convert opportunities into job offers by presenting your skills at the time of an interview. We can assist you in resume building and also share important interview questions once you are done with the training. Placement Assistance is a complimentory service provided to job oriented courses

Certification

TechBodhi Certification Process:

Once you are successfully through the exit test and project assignment (Reviewed by a Techbodhi’s expert), you will be awarded with Techbodhi’s diploma Certificate
Techbodhi’s certification has industry recognition and we are the preferred training partner for many MNCs. Names cannot be disclosed as per NDA