
Decision tree classifier
The DecisionTtreeClassifier from scikit-learn has been utilized for modeling purposes, which is available in the tree submodule:
# Decision Tree Classifier >>> from sklearn.tree import DecisionTreeClassifier
The parameters selected for the DT classifier are in the following code with splitting criterion as Gini, Maximum depth as 5, the minimum number of observations required for qualifying split is 2, and the minimum samples that should be present in the terminal node is 1:
>>> dt_fit = DecisionTreeClassifier(criterion="gini", max_depth=5,min_samples_split=2, min_samples_leaf=1,random_state=42) >>> dt_fit.fit(x_train,y_train) >>> print ("\nDecision Tree - Train Confusion Matrix\n\n", pd.crosstab(y_train, dt_fit.predict(x_train),rownames = ["Actuall"],colnames = ["Predicted"])) >>> from sklearn.metrics import accuracy_score, classification_report >>> print ("\nDecision Tree - Train accuracy\n\n",round(accuracy_score (y_train, dt_fit.predict(x_train)),3)) >>> print ("\nDecision Tree - Train Classification Report\n", classification_report(y_train, dt_fit.predict(x_train))) >>> print ("\n\nDecision Tree - Test Confusion Matrix\n\n",pd.crosstab(y_test, dt_fit.predict(x_test),rownames = ["Actuall"],colnames = ["Predicted"])) >>> print ("\nDecision Tree - Test accuracy",round(accuracy_score(y_test, dt_fit.predict(x_test)),3)) >>> print ("\nDecision Tree - Test Classification Report\n", classification_report( y_test, dt_fit.predict(x_test)))

By carefully observing the results, we can infer that, even though the test accuracy is high (84.6%), the precision and recall of one category (Attrition = Yes) is low (precision = 0.39 and recall = 0.20). This could be a serious issue when management tries to use this model to provide some extra benefits proactively to the employees with a high chance of attrition prior to actual attrition, as this model is unable to identify the real employees who will be leaving. Hence, we need to look for other modifications; one way is to control the model by using class weights. By utilizing class weights, we can increase the importance of a particular class at the cost of an increase in other errors.
For example, by increasing class weight to category 1, we can identify more employees with the characteristics of actual attrition, but by doing so, we will mark some of the non-potential churner employees as potential attriters (which should be acceptable).
Another classic example of the important use of class weights is, in banking scenarios. When giving loans, it is better to reject some good applications than accepting bad loans. Hence, even in this case, it is a better idea to use higher weight to defaulters over non-defaulters:
R Code for Decision Tree Classifier Applied on HR Attrition Data:
# Decision Trees using C5.0 package library(C50) dtree_fit = C5.0(train_data[-31],train_data$Attrition_ind,costs = NULL,control = C5.0Control(minCases = 1)) summary(dtree_fit) tr_y_pred = predict(dtree_fit, train_data,type = "class") ts_y_pred = predict(dtree_fit,test_data,type = "class") tr_y_act = train_data$Attrition_ind;ts_y_act = test_data$Attrition_ind tr_tble = table(tr_y_act,tr_y_pred) print(paste("Train Confusion Matrix")) print(tr_tble) tr_acc = accrcy(tr_y_act,tr_y_pred) trprec_zero = prec_zero(tr_y_act,tr_y_pred); trrecl_zero = recl_zero(tr_y_act,tr_y_pred) trprec_one = prec_one(tr_y_act,tr_y_pred); trrecl_one = recl_one(tr_y_act,tr_y_pred) trprec_ovll = trprec_zero *frac_trzero + trprec_one*frac_trone trrecl_ovll = trrecl_zero *frac_trzero + trrecl_one*frac_trone print(paste("Decision Tree Train accuracy:",tr_acc)) print(paste("Decision Tree - Train Classification Report")) print(paste("Zero_Precision",trprec_zero,"Zero_Recall",trrecl_zero)) print(paste("One_Precision",trprec_one,"One_Recall",trrecl_one)) print(paste("Overall_Precision",round(trprec_ovll,4),"Overall_Recall", round(trrecl_ovll,4))) ts_tble = table(ts_y_act,ts_y_pred) print(paste("Test Confusion Matrix")) print(ts_tble) ts_acc = accrcy(ts_y_act,ts_y_pred) tsprec_zero = prec_zero(ts_y_act,ts_y_pred); tsrecl_zero = recl_zero(ts_y_act,ts_y_pred) tsprec_one = prec_one(ts_y_act,ts_y_pred); tsrecl_one = recl_one(ts_y_act,ts_y_pred) tsprec_ovll = tsprec_zero *frac_tszero + tsprec_one*frac_tsone tsrecl_ovll = tsrecl_zero *frac_tszero + tsrecl_one*frac_tsone print(paste("Decision Tree Test accuracy:",ts_acc)) print(paste("Decision Tree - Test Classification Report")) print(paste("Zero_Precision",tsprec_zero,"Zero_Recall",tsrecl_zero)) print(paste("One_Precision",tsprec_one,"One_Recall",tsrecl_one)) print(paste("Overall_Precision",round(tsprec_ovll,4), "Overall_Recall",round(tsrecl_ovll,4)))