Presentation is loading. Please wait.

Presentation is loading. Please wait.

Employee Turnover: Data Analysis and Exploration

Similar presentations


Presentation on theme: "Employee Turnover: Data Analysis and Exploration"— Presentation transcript:

1 Employee Turnover: Data Analysis and Exploration
Group 7: Bashe Aden, Brian Keenum, Gordon Kelly BIT Spring 2018

2 Business Problem and Questions being Investigated
Most valuable employees are leaving the company Turnover is expensive in terms of recruiting and training costs Could potentially increase retention and reduce costs with better understanding of this phenomenon Questions Being Investigated: Why employees are leaving, and in what timeframe Looking at attributes of employees, perhaps we can predict the employees that will leave next. Adjust work environment so that these employees will stay with the firm

3 Description of Data Sources and Preliminary Work
Dataset is called Human Resource Analytics Available for download at Kaggle: analytics 9 independent variables, dependent target variable is “employee left” 15,000 observations CSV format Data dictionary created (as seen here) Assessed any formatting errors (none found) Addressed null values (none found) Variable Dictionary Variable Name Description Datatype satisfaction_level Employee satisfaction level continuous last_evaluation Employee evaluation score number_project Number of employee projects average_montly_hours Average monthly hours per employee time_spend_company Time employee spent at company in years Work_accident Whether employee has had a work accident (1 = Yes, 0 = No) Nominal left Whether the employee has left the company (1 = Yes, 0 = No) promotion_last_5years Whether employee has been promoted within last 5 years (1 = Yes, 0 = No) sales Sales type salary Employee salary level Ordinal

4 Planned Analysis Logistic Regression Principal Components Analysis
Show potential two-way relationships between predictor variable and exploratory variables Appropriate since the dependent variable is binary Independent variables are a mix of nominal, continuous and ordinal data types Principal Components Analysis Creates more efficient model, should increase accuracy and reduction of errors through removal of less correlated variables. Using the 80% rule, making sure that we keep the components that account for 80% of the variation in the data set Classification Tree To uncover potential relationships that may have been missed by the logistic regression Allow for future prediction of whether or not an employee will leave

5 Logistic Regression We initiated this exploratory analysis by first examining each independent variable one at a time against the variable of interest First, by Satisfaction level, you can see a strong correlation that as satisfaction level increases the instances of the employee leaving decreases As you can see in this Contingency analysis Of left by salary, the Higher the salary the lower The instance of leaving Only 6.6% of employees With a high salary leave, Compared to 29.7% of those With low salaries leaving

6 Logistic Regression- Continued
Finally, looking at the two way relationship between “left” and time_spend_company, it is clear that there is a positive trend in this relationship. The more time that the employee spends at the company the more likely they are to leave. Management decisions can be inferred from this, could this be burnout of employees?

7 Principal Component Analysis
Produced correlations and summary plot Satisfaction and salary are high positive contributors to staying in the organization Eigenvalues reveal that satisfaction, last evaluation and number of projects have a higher proportion of effect Top 6 components chosen to use for the principal components, as the total variation represented by these is 85.25%

8 Principal Component Analysis- Continued
Regression results for Prin1, Prin2 show an R2 Value of .1234, with an RMSE value of .3988 Prin1 results in a R2 of and RMSE of .4237 principal components yielded an R2 of and an RMSE of .3854 This shows an improvement in R2 value and a reduction in error over the prin1, prin2 components Model R2 RMSE Prin1, Prin2 .1234 .3988 Prin1 .0103 .4237 Principal Components .1813 .3854

9 Comparison of Models Logistic Regression and PCA show some differences and some similarities Both show that satisfaction level, salary and time spent are significant contributors to whether or not an employee leaves the company PCA also adds number of hours and number of projects to factors Analysis by second model shows 77% accuracy First model shows through the contingency table that nearly 30% of those with low salary leave, with only 6% of high salaried employees leaving the company.

10 Classification Tree Purpose to uncover relationships that may have been missed by the other two models Target variable is “left” and the independent variables are Satisfaction level Last evaluation Number of projects Average monthly hours Time spent with the company Work accidents Promotions within last 5 years Sales Salary Tree was split 10 times

11 Classification Tree Continued
The entropy R2 for the training set was .79, and it was .78 for the validation set RMSE values for training set were .160, and were .164 for the validation set

12 Classification Tree ROC Curve
Both training and validation sets show accurate response values Nearly identical results for each data set Crossing of response values show performance is better between yes and no depending on the cut off point

13 Summary Lower satisfaction levels at work leads to higher attrition
The more satisfied employees, who are also more experienced (worked longer at company), are leaving if they have high average_monthly_work_hours (spend a lot of time at work). Employees with strong evaluations (good employees) are working long hours and leaving Employees who are satisfied, but with fewer projects do not feel challenged? After comparing the results of all the models this report concludes that employee satisfaction level, salary paid, hours worked, number of projects assigned, and time spend with the company are the main contributors of employee’s decision of leaving the company.


Download ppt "Employee Turnover: Data Analysis and Exploration"

Similar presentations


Ads by Google