MSDS6306 Doing Data Science

For solutions, purchase a LIVE CHAT plan or contact us

Case Study 02

Due: Sunday, August 7th 11:59pm CST

Description: DDSAnalytics is an analytics company that specializes in talent management solutions for Fortune 100 companies. Talent management is defined as the iterative process of developing and retaining employees. It may include workforce planning, employee training programs, identifying high-potential employees and reducing/preventing voluntary employee turnover (attrition). To gain a competitive edge over its competition, DDSAnalytics is planning to leverage data science for talent management. The executive leadership has identified predicting employee turnover as its first application of data science for talent management. Before the business green lights the project, they have tasked your data science team to conduct an analysis of existing employee data.

You have been given a dataset (CaseStudy2-data.csv on AWS S3 in the smuddsproject2 bucket) to do a data analysis to identify factors that lead to attrition. You should identify the top three factors that contribute to turnover (backed up by evidence provided by analysis). There may or may not be a need to create derived attributes/variables/features. The business is also interested in learning about any job role specific trends that may exist in the data set (e.g., “Data Scientists have the highest job satisfaction”). You can also provide any other interesting trends and observations from your analysis. The analysis should be backed up by robust experimentation and appropriate visualization. Experiments and analysis must be conducted in R. You will also be asked to build a model to predict attrition. Finally, you will develop an RShiny App to visualize some of the relationships or lack thereof. Details are below.

I provided an additional data set of 300 observations (also on AWS S3) that do not have the labels (attrition or not attrition). We will refer to this data set as the “Competition Set” and is in the file “CaseStudy2CompSet No Attrition.csv”. I have the real labels and will thus assess the accuracy rate of your best classification model. 10% of your grade will depend on the sensitivity and specificity rate of your “best” classification model for identifying attrition. You must provide a model that will attain at least 60% sensitivity and specificity (60 each = 120 total) for the training and the validation set. Therefore, you must provide the labels (ordered by ID) in a csv file. Please include this in your GitHub repository and call the file “Case2PredictionsXXXX Attrition.csv”. XXXX is your last name. (Example: Case2PredictionsSadler Attrition.csv” would be mine.) An example submission file can be found on AWS S3 in the smuddsproject2 bucket: Case2PredictionsClassifyEXAMPLE.csv.

I have also provided an additional data set of 300 observations that do not have the Monthly Incomes. This data is in the file “CaseStudy2CompSet No Salary.csv”. I have the real monthly incomes (salaries) and will thus assess the RMSE regression model. 10% of your grade will depend on the RMSE (Root Mean square error) of your final model. You must provide a model that will attain a RMSE < $3000 for the training and the validation set. Therefore, you must provide the predicted salaries (ordered by ID) in a csv file. Please include this in your GitHub repository and call the file “Case2PredictionsXXXX Salary.csv”. XXXX is your last name. (Example: Case2PredictionsSadler Salary.csv” would be mine.) An example submission file can be found on AWS S3 in the smuddsproject 2 bucket: Case2PredictionsRegressEXAMPLE.csv.

For solutions, purchase a LIVE CHAT plan or contact us

Limited time offer: