Proof of concept# Analysis of census data using Hive and zeppelin on Hadoop.
I want analyze census data of India and its states.
Date : Input format - .csv
Input file(.csv)
Input data set attributes like:
1. State
2. District
3. Persons
4. Males population
5. Females population
6. Growth rate
7. Rural population
Problem Statement : Analyze the data on Hadoop using Hive and zeppelin to
1. Total male and female population by states
2. Population Density
3. Literacy percentage by states
4. Literacy percentage less than 50%
5. Working people percentage by state
Proof of concept coding details:
Hive script :
Census.hql

Shell Script
Census.sh
#!/bin/sh
###############################################################################
########################### COMPLETE SCRIPT ##############################
### HEADER - PROGRAM NAME - <Census.sh> ###
### DATE - Jan/2017 ###
### VERSION - 1.0 ###
### DESCRIPTION - Data: Census data processing using Hive ###
##############################################################################
DATE=$(date +"%Y%m%d_%H%M%S")
LOGFILE="/home/Sachin/POC/Census/"$DATE".log"
echo "Hive script here"
echo "Census data processing Started" >> $LOGFILE
hive -f Census.hql
if [ $? -eq 0 ]; then
echo "Successfully finished Processing " >> $LOGFILE
else
echo "Hive processed Failed Please check the Log " >> $LOGFILE
fi
#################################End of Script##################################
Script Execution:
Shell scripts starts here


Run the queries in zeppelin to create graph for problem statement.

1. Total males and females by states.

2. Population Density:

Same result of population density in pie chart:

3. Literacy percentage by states:
4. Literacy percentage less than 50%

Same result in pie chart:
5. Working people percentage by state:

I want analyze census data of India and its states.
Date : Input format - .csv
Input file(.csv)
Input data set attributes like:
1. State
2. District
3. Persons
4. Males population
5. Females population
6. Growth rate
7. Rural population
Problem Statement : Analyze the data on Hadoop using Hive and zeppelin to
1. Total male and female population by states
2. Population Density
3. Literacy percentage by states
4. Literacy percentage less than 50%
5. Working people percentage by state
Proof of concept coding details:
Hive script :
Census.hql
Shell Script
Census.sh
#!/bin/sh
###############################################################################
########################### COMPLETE SCRIPT ##############################
### HEADER - PROGRAM NAME - <Census.sh> ###
### DATE - Jan/2017 ###
### VERSION - 1.0 ###
### DESCRIPTION - Data: Census data processing using Hive ###
##############################################################################
DATE=$(date +"%Y%m%d_%H%M%S")
LOGFILE="/home/Sachin/POC/Census/"$DATE".log"
echo "Hive script here"
echo "Census data processing Started" >> $LOGFILE
hive -f Census.hql
if [ $? -eq 0 ]; then
echo "Successfully finished Processing " >> $LOGFILE
else
echo "Hive processed Failed Please check the Log " >> $LOGFILE
fi
#################################End of Script##################################
Script Execution:
Shell scripts starts here
Table created and data loaded in Hive table.
Run the queries in zeppelin to create graph for problem statement.
1. Total males and females by states.
2. Population Density:
Same result of population density in pie chart:
3. Literacy percentage by states:
4. Literacy percentage less than 50%
Same result in pie chart:
5. Working people percentage by state: