Sunday, 19 March 2017

Analysis of Census data

Proof of concept# Analysis of census data using Hive and zeppelin on Hadoop.

I want analyze census data of India and its states.

Date : Input format - .csv

Input file(.csv)



Input data set attributes like:

1. State
2. District
3. Persons
4. Males population
5. Females population
6. Growth rate
7. Rural population

Problem Statement : Analyze the data on Hadoop using Hive and zeppelin to

1. Total male and female population by states
2. Population Density
3. Literacy percentage by states
4. Literacy percentage less than 50%
5. Working people percentage by state

Proof of concept coding details:

Hive script : 

Census.hql



Shell Script

Census.sh

#!/bin/sh
###############################################################################
########################### COMPLETE SCRIPT ##############################
### HEADER - PROGRAM NAME - <Census.sh> ###
### DATE - Jan/2017 ###
### VERSION - 1.0 ###
### DESCRIPTION - Data: Census data processing using Hive ###
##############################################################################

DATE=$(date +"%Y%m%d_%H%M%S")
LOGFILE="/home/Sachin/POC/Census/"$DATE".log"
echo "Hive script here"
echo "Census data processing Started" >> $LOGFILE
hive -f Census.hql
if [ $? -eq 0 ]; then
echo "Successfully finished Processing " >> $LOGFILE
else
echo "Hive processed Failed Please check the Log " >> $LOGFILE
fi


#################################End of Script##################################


Script Execution:

Shell scripts starts here


Table created and data loaded in Hive table.



Run the queries in zeppelin to create graph for problem statement.



1. Total males and females by states.



 2. Population Density:

 
 

 Same result of population density in pie chart:

 

3. Literacy percentage by states:



4. Literacy percentage less than 50%

 

 Same result in pie chart:



5. Working people percentage by state: