Data Manipulation Case Study -2 (Modified Correct code)

#Domain – HR

#focus – Insights from data

import pandas as pd

# Set the option on how to display float

# To See the impact of this command comment it out and check outputs

pd.set_option('display.float_format', lambda x: '%.3f' % x)

# Read Salaries.csv as a dataframe called salary

salary = pd.read_csv('Salaries.csv')

# Check the basic data

print(salary.head())

# Get more info on data & Verify that 148,648 records are there

print (salary.info())

# Get Some statistical summary of data

print (salary.describe())

# Check the total salary cost per year and see how it has increased over years

sum_year = salary.groupby('Year').sum()['TotalPayBenefits']

print ( sum_year)

# Question 1 : Compute how much total salary cost has increased from 2011 to 2014.

Code:

mean_year = salary.groupby('Year').mean()['TotalPayBenefits']

print ( mean_year)

# Question 2 : Which Job Title in Year 2014 has highest mean salary

grouped_df_JobTitle = salary.query("Year == 2014").groupby(["JobTitle"]).agg("mean").filter(["JobTitle", "TotalPay"])

max_min_salary = grouped_df_JobTitle[grouped_df_JobTitle["TotalPay"] == grouped_df_JobTitle["TotalPay"].max()]

print("highest mean Salary job title is " + str(max_min_salary))

# Question 3 : How much money could have been saved in Year 2014 by stopping OverTimePay

over_time_pay_year = salary.groupby(["Year"]).sum()[["OvertimePay"]].query("Year == 2014")

print("Money could have been saved in Year 2014 by stopping OverTimePay :"+ str(float(over_time_pay_year["OvertimePay"])))

Question 4: Which are the top 5 common job in Year 2014 and how much do they cost SFO

top_job_title = salary[salary['Year'] == 2014]['JobTitle'].value_counts().head(5)

print (top_job_title)

# Uncomment this and check what it prints

#print (type(top_job_title))

# Calculate the Cost

sum_cost = 0

for index,value in top_job_title.iteritems():

print(index,value)

sum_cost += sum(salary[ (salary['Year']== 2014) & (salary['JobTitle'] == index)]['TotalPayBenefits'])

print (" Total Cost of Top 5 Jobs in Year 2014 ", sum_cost)

# Question 5 : Who was the top earning employee across all the years?

grouped_df_5_top_earners = df.groupby(["Year", "TotalPay"]).agg("max").filter(["Year", "TotalPay", "EmployeeName"])

print(grouped_df_5_top_earners)

Regards

Abhilash M

Related Articles
Advanced Encryption Package
Dear Learner, Greetings from edureka! Kindly download the Advanced Encryption Package from the link below http://www.aeppro.com/download/aep_setup.msi Next, you need to create a file on your computer that you need to encrypt using the tool you ...
Cross Validation and different types of splitting techniques
In machine learning, we couldn’t fit the model on the training data and can’t say that the model will work accurately for the real data. For this, we must assure that our model got the correct patterns from the data, and it is not getting up too much ...
Hypothesis Testing(z test,t test with examples)
Hypothesis Testing : Null Hypothesis (H⁰) : It states that there is no difference of certain characteristics between the sample and population data due to random chances. Eg: Assuming the researcher’s predictions are true. Alternate Hypothesis (H1): ...
MEANStack application for Student Records
Kindly refer below MEANStack application which is working end to end. Technology Stack- Angular as Frontend , Nodejs & Expressjs as Backend and MongoDB as database. Please refer the source code of this application- ...
CNN model on Iris images
Convolutional Neural Networks, like neural networks, are made up of neurons with learnable weights and biases. Each neuron receives several inputs, takes a weighted sum over them, passes it through an activation function and responds with an output. ...

Data Manipulation

Data Manipulation Case Study -2 (Modified Correct code)

Related Articles

Advanced Encryption Package

Cross Validation and different types of splitting techniques

Hypothesis Testing(z test,t test with examples)

MEANStack application for Student Records

CNN model on Iris images