Abstract
Personal credit data usually contain a large number of features, some of which do not significantly contribute to the performance of default prediction models. Screening features through appropriate methods is essential to improve the efficiency of prediction models. However, little attention has been paid to feature selection methods in the area of personal loan default prediction. In this study, we employ random forest (RF), XGBoost, Adaptive Boosting (AdaBoost), Categorical Boosting (CatBoost), and Light Gradient Boosting Machine (LightGBM) as base algorithms of wrapper and embedded methods to select features and use these algorithms as classifiers to predict personal loan default. We find that when classical filter methods are used to select features, the number of selected features needs to be large enough to enable tree-based classifiers to get their best performance. However, when the tree-based algorithm is used to select features, it only needs to select a small number of features to deliver a satisfactory classification performance. AdaBoost, Chi2, and (Formula presented.) -score are found to be ideal feature selection methods in the area of personal credit default prediction. Moreover, we find that it is better to use different algorithms in feature selection and classification; AdaBoost and CatBoost perform the best among all classifiers.
| Original language | English |
|---|---|
| Pages (from-to) | 1248-1313 |
| Number of pages | 66 |
| Journal | Journal of Forecasting |
| Volume | 41 |
| Issue number | 6 |
| DOIs | |
| State | Published - Sep 2022 |
| Externally published | Yes |
Keywords
- credit risk
- feature selection
- machine learning
- personal loan default prediction
Fingerprint
Dive into the research topics of 'A comparative study of combining tree-based feature selection methods and classifiers in personal loan default prediction'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver