BIG DATA HADOOP TRAINING

Batches

Morning Batch

Time Of Batch : Slots Available - Call Us

Afternoon batch

Time Of Batch : Slots Available - Call Us

Evening batch

Time Of Batch : Slots Available - Call Us


Features

Duration Of Class

Classroom Training with practical :48 hours

Live project

Apply the skills you learn on a distributed cluster to solve real-world problems

Case Studies

During the training sessions, relevant case studies will be shared and discussed with students.

Placement Policy

Placement Support will be provided for all the eligible students from job oriented courses. Additional Support will be provided based on partner companies

Expert Support

All technical support team members are committed to help students with any technical queries arise during the course. Career guidance will be provied to students along with resume building as per need basis

Certification

Techbodhi certifies you as an Hadoop Developer based on your performance during exit test.


Course Description

Big Data is collection of huge or massive amount of data.We live in data age.And its not easy to measure the total volume of data or to manage & process this enormous data. The flood of thisBig Data are coming from different resources. Such as : New York stock exchange, Facebook, Twitter, AirCraft, Wallmart etc. Todays world information is getting doubled after every two years (1.8 times). And still 80% of data is in unstructured format,which is very difficult to store,process or retrieve. so, we can say all this unstructured data is Big Data.
A Hadoop Architect is an individual or team of experts who manage penta bytes of data and provide documentation for Hadoop based environments around the globe. An even more crucial role of a Hadoop Architect is to govern administers, managers and manage the best of their efforts as an administrator. Hadoop Architect also needs to govern Hadoop on large cluster. Every Hadoop Architect must have an impeccable experience in Java, MapReduce, Hive, Hbase and Pig.

Project

Your system should have 8GB RAM and i3 processor. We provide you VM image of hadoop ecosystem
For your practical work techbodhi team will provide guidance.

    Towards the end of the course, you will work on a live project where you will be using PIG, HIVE, HBase and MapReduce to perform Big Data analytics.
    Following are a few industry-specific Big Data case studies that are included in our Big Data and Hadoop Certification e.g. Finance, Retail, Media, Aviation etc. which you can consider foryour project work:

    • Project #1: Analyze social bookmarking sites to find insights

Industry: Social Media

Data: It comprises of the information gathered from sites like reddit.com, stumbleupon.com which are bookmarking sites and allow you to bookmark, review, rate, search various links on any topic.reddit.com, stumbleupon.com, etc. A bookmarking site allows you to bookmark, review, rate, search various links on any topic. The data is in XML format and contains various links/posts URL, categories defining it and the ratings linked with it.

Problem Statement:Analyze the data in the Hadoop ecosystem to:

  • Fetch the data into a Hadoop Distributed File System and analyze it with the help of MapReduce, Pig and Hive to find the top rated links based on the user comments, likes etc.
  • Using MapReduce, convert the semi-structured format (XML data) into a structured format and categorize the user rating as positive and negative for each of the thousand links.
  • Push the output HDFS and then feed it into PIG, which splits the data into two parts: Category data and Ratings data.
  • Write a fancy Hive Query to analyze the data further and push the output is into relational database (RDBMS) using Sqoop.
  • Use a web server running on grails/java/ruby/python that renders the result in real time processing on a website.

    • Project #2: Customer Complaints Analysis

Industry: Retail

Data: Publicly available dataset, containing a few lakh observations with attributes like; CustomerId, Payment Mode, Product Details, Complaint, Location, Status of the complaint, etc.

Problem Statement:Analyze the data in the Hadoop ecosystem to:

  • Get the number of complaints filed under each product
  • Get the total number of complaints filed from a particular location
  • Get the list of complaints grouped by location which has no timely response

    • Project #3: Tourism Data Analysis

Industry: Tourism

Data: The dataset comprises attributes like: City pair (combination of from and to), adults traveling, seniors traveling, children traveling, air booking price, car booking price, etc.

Problem Statement:Find the following insights from the data:

  • Top 20 destinations people frequently travel to: Based on given data we can find the most popular destinations where people travel frequently, based on the specific initial number of trips booked for a particular destination
  • Top 20 locations from where most of the trips start based on booked trip count
  • Top 20 high air-revenue destinations, i.e the 20 cities that generate high airline revenues for travel, so that the discount offers can be given to attract more bookings for these destinations.

    • Project #4: Airline Data Analysis

Industry: Aviation

Data: Publicly available dataset which contains the flight details of various airlines such as: Airport id, Name of the airport, Main city served by airport, Country or territory where airport is located, Code of Airport, Decimal degrees, Hours offset from UTC, Timezone, etc.

Problem Statement:Analyze the airlines’ data to:

  • Find list of airports operating in the country
  • Find the list of airlines having zero stops
  • List of airlines operating with code share
  • Which country (or) territory has the highest number of airports
  • Find the list of active airlines in the United States

    • Project #5: Analyze Loan Dataset

Industry: Banking and Finance

Data: Publicly available dataset which contains complete details of all the loans issued, including the current loan status (Current, Late, Fully Paid, etc.) and latest payment information.

Problem Statement:

  • Find the number of cases per location and categorize the count with respect to reason for taking loan and display the average risk score.

    • Project #6: Analyze Movie Ratings

Industry: Media

Data: Publicly available data from sites like rotten tomatoes, IMDB, etc.

Problem Statement:Analyze the movie ratings by different users to:

  • Get the user who has rated the most number of movies
  • Get the user who has rated the least number of movies
  • Get the count of total number of movies rated by user belonging to a specific occupation
  • Get the number of underage users

    • Project #7: Analyze YouTube data

Industry: Social Media

Data: It is about the YouTube videos and contains attributes such as: VideoID, Uploader, Age, Category, Length, views, ratings, comments, etc.

Problem Statement:

  • Identify the top 5 categories in which the most number of videos are uploaded, the top 10 rated videos, and the top 10 most viewed videos.

Apart from these there are some twenty more use-cases to choose:

  • Market data Analysis
  • Twitter Data Analysis

Curriculum

The "Introduction to Big Data and Hadoop" is an ideal course package for individuals who want to understand the basic concepts of Big Data and Hadoop. On completing this course, learners will be able to interpret what goes behind the processing of huge volumes of data as the industry switches over from excel-based analytics to real-time analytics. The course focuses on the basics of Big Data and Hadoop. It further provides an overview of the commercial distributions of Hadoop as well as the components of the Hadoop ecosystem.
  • What is Big Data?
  • Why all industries are talking about Big Data?
  • What are the issues in Big Data?
  • What are the challenges for storing big data?
  • What are the challenges for processing big data?
  • What are the technologies support big data?

Hadoop

  • What is Hadoop?
  • History of Hadoop
  • Why Hadoop?
  • Hadoop Use cases
  • Advantages and Disadvantages of Hadoop
  • Importance of Different Ecosystems of Hadoop
  • Importance of Integration with other BigData solutions
    Big Data Real time Use Cases
  • Data Storage in HDFS
  • HDFS Block size
  • HDFS Replication factor
  • Accessing HDFS
  • HDFS Commands
  • Configurations
  • How to overcome the Drawbacks in HDFS
  • How to add the new nodes ( Commissioning )
  • How to remove the existing nodes ( DE-Commissioning )
  • How to verify the Dead Nodes
  • How to start the Dead Nodes
  • How to add the new nodes ( Commissioning )
  • How to remove the existing nodes ( DE-Commissioning )
  • How to verify the Dead Nodes
  • How to start the Dead Nodes

JobTracker

  • Importance of JobTracker
  • What are the roles of JobTracker
  • What are the drawbacks in JobTracker

TaskTracker

  • Importance of TaskTracker
  • What are the roles of TaskTracker
  • What are the drawbacks in TaskTracker

 

Data Types in Hadoop

  • What are the Data types in Map Reduce
  • Why these are importance in Map Reduce
  • Can we write custom Data Types in MapReduce

Input Formats in Map Reduce

  • Text Input Format
  • Key Value Text Input Format
  • Sequence File Input Format
  • Nline Input Format
  • Importance of Input Format in Map Reduce
  • How to use Input Format in Map Reduce
  • How to write custom Input Formats and its Record Readers

Output Formats in Map Reduce

  • Text Output Format
  • Sequence File Output Format
  • Importance of Output Format in Map Reduce
  • How to use Output Format in Map Reduce
  • How to write custom Output Formats and its Record Writers

Mapper

  • What is mapper in Map Reduce Job
  • Why we need mapper?
  • What are the Advantages and Disadvantages of mapper
  • Writing mapper programs

Reducer

  • What is reducer in Map Reduce Job
  • Why we need reducer?
  • What are the Advantages and Disadvantages of reducer
  • Writing reducer programs

Driver

  • What is Driver in Map Reduce Job
  • Why we need Driver?
  • Writing Driver program

Input Split

  • InputSplit
  • Need Of Input Split in Map Reduce
  • lnputSplit Size
  • InputSplit Size Vs Block Size
  • InputSplit Vs Mappers
  • Map Reduce Job execution flow

Combiner

  • What is combiner in Map Reduce Job
  • Why we need combiner?
  • What are the Advantages and Disadvantages of Combiner
  • Writing Combiner programs
  • Identity Mapper and Identity Reducer

 

Partitioner

  • What is Partitioner in Map Reduce Job
  • Why we need Partitioner?
  • What are the Advantages and Disadvantages of Partitioner
  • Writing Partitioner programs

 

Distributed Cache

  • What is Distributed Cache in Map Reduce Job
  • Importance of Distributed Cache in Map Reduce job
  • What are the Advantages and Disadvantages of Distributed Cache
  • Writing Distributed Cache programs

 

Counters

  • What is Counter in Map Reduce Job
  • Why we need Counters in production environment?
  • How to Write Counters in Map Reduce programs

 

Importance of Writable and Writable Comparable Apis

  • How to write custom Map Reduce Keys using Writable
  • How to write custom Map Reduce Values using Writable Comparable

 

Joins

Map Side Join

  • What is the importance of Map Side Join
  • Where we are using it

 

Reduce Side Join

  • What is the importance of Reduce Side Join
  • Where we are using it
  • What is the difference between Map Side join and Reduce Side Join?

Compression techniques

  • Importance of Compression techniques in production environment
  • Compression Types
  • NONE, RECORD and BLOCK
  • Compression Codecs
  • Default, Gzip, Bzip, Snappy and LZO
  • Enabling and Disabling these techniques for all the Jobs
  • Enabling and Disabling these techniques for a particular Job

 

Map Reduce Programming Model

  • How to write the Map Reduce jobs in Java
  • Running the Map Reduce jobs in local mode
  • Running the Map Reduce jobs in pseudo mode
  • Running the Map Reduce jobs in cluster mode

Debugging Map Reduce Jobs

  • How to debug Map Reduce Jobs in Local
  • How to debug Map Reduce Jobs in Remote
  • What is YARN?
  • What is the importance of YARN?
  • Where we can use the concept of YARN in Real Time
  • What is difference between YARN and Map Reduce

Data Locality

  • What is Data Locality?
  • Will Hadoop follows Data Locality?

Speculative Execution

  • What is Speculative Execution?
  • Will Hadoop follows Speculative Execution?

Map Reduce Commands

  • Importance of each command
  • How to execute the command
  • Mapreduce admin related commands explanation

Configurations

  • Can we change the existing configurations of mapreduce or not?
  • Importance of configurations
  • Writing Unit Tests for Map Reduce Jobs
  • Use of Secondary Sorting and how to solve using MapReduce
  • How to Identify Performance Bottlenecks in MR jobs and tuning MR
  • Map Reduce Streaming and Pipes with examples
  • Exploring the Apache MapReduce Web UI
  • Introduction to Apache Pig
  • Map Reduce Vs Apache Pig
  • SQL Vs Apache Pig
  • Different data types in Pig

Modes of Execution in Pig

  • Local Mode
  • Map Reduce Mode

Execution Mechanism

  • Grunt Shell
  • Script
  • Embedded

UDFs

  • How to write the UDFs in Pig
  • How to use the UDFs in Pig
  • Importance of UDFs in Pig

Filters

  • How to write the Filters in Pig
  • How to use the Filters in Pig
  • Importance of Filters in Pig

Load Functions

  • How to write the Load Functions in Pig
  • How to use the Load Functions in Pig
  • Importance of Load Functions in Pig

Store Functions

  • How to use the Store Functions in Pig
  • Importance of Store Functions in Pig
  • Transformations in Pig
  • How to write the complex pig scripts
  • How to integrate the Pig and Hbase

Hive

Introduction Hive architecture

  • Driver
  • Compiler
  • Semantic Analyzer
  • Hive Integration with Hadoop
  • Hive Query Language(Hive QL)
  • SQL VS Hive QL
  • Hive Installation and Configuration
  • Hive, Map-Reduce and Local-Mode
  • Hive DLL and DML Operations

Hive Services

  • CLI
  • Hiveserver
  • Hwi

Metastore

  • embedded metastore configuration
  • external metastore configuration

UDFs

  • How to write the UDFs in Hive
  • How to use the UDFs in Hive
  • Importance of UDFs in Hive

UDAFs

  • How to use the UDAFs in Hive
  • Importance of UDAFs in Hive

UDTFs

  • How to use the UDTFs in Hive
  • Importance of UDTFs in Hive
  • How to write a complex Hive queries
  • What is Hive Data Model?

Partitions

  • Importance of Hive Partitions in production environment
  • Limitations of Hive Partitions
  • How to write Partitions

Buckets

  • Importance of Hive Buckets in production environment
  • How to write Buckets

SerDe

  • Importance of Hive SerDes in production environment
  • How to write SerDe programs
  • How to integrate the Hive and Hbase
  • Introduction to zookeeper
  • Pseudo mode installations
  • Zookeeper cluster installations
  • Basic commands execution
  • Hbase introduction
  • Hbase use cases
  • Hbase basics
  • Column families
  • Scans

Hbase installation

  • Local mode
  • Pseudo mode
  • Cluster mode

Hbase Architecture

  • Storage
  • Write Ahead Log
  • Log Structured Merge Trees

Mapreduce integration

  • Mapreduce over Hbase

Hbase Usage

  • Key design
  • Bloom Filters
  • Versioning
  • Coprocessors
  • Filters

Hbase Clients

  • REST
  • Thrift
  • Hive
  • Web Based UI

Hbase Admin

  • Schema definition
  • Basic CRUD operations
  • Introduction to Sqoop
  • MySQL client and Server Installation
  • Sqoop Installation
  • How to connect to Relational Database using Sqoop
  • Sqoop Commands and Examples on Import
  • and Export commands
  • Introduction to flume
  • Flume installation
  • Flume agent usage and Flume examples execution
  • Taking Photos Simply
  • Recording Videos Simply
  • Controlling the Camera
  • Introduction to MongoDB
  • MongoDB installation
  • MongoDB examples

FAQs

Tech Bodhi Assures you do not miss any contents:
  • If You miss single or couple of sessions, Trainer will manager
  • If large number of sessions are missed, Center Head will provide the solution

Techbodhi is committed to provide you an awesome learning experience through world-class content and best-in-class instructors. We will create an ecosystem through this training, that will enable you to convert opportunities into job offers by presenting your skills at the time of an interview. We can assist you in resume building and also share important interview questions once you are done with the training. Placement Assistance is a complimentory service provided to job oriented courses

We can arrange a demo session on your request
All the trainers at Tech Bodhi are practitioners from the Industry with minimum 5-20 yrs of relevant IT experience. They are subject matter experts and are trained by Tech Bodhi for providing an learning experience.
You can give us a CALL at +91 9960295908 OR email atinfo@techbodhi.co.in

Certification

  • Once you are successfully through the exit test and project assignment (Reviewed by a Techbodhi’s expert), you will be awarded with Techbodhi’s diploma Certificate
  • Techbodhi’s certification has industry recognition and we are the preferred training partner for many MNCs. Names cannot be disclosed as per NDA