Skip to main navigation Skip to search Skip to main content

A comparative study of combining tree-based feature selection methods and classifiers in personal loan default prediction

  • Weidong Guo
  • , Zach Zhizhong Zhou*
  • *Corresponding author for this work
  • Shanghai Jiao Tong University
  • Tongji University

Research output: Contribution to journalArticlepeer-review

Abstract

Personal credit data usually contain a large number of features, some of which do not significantly contribute to the performance of default prediction models. Screening features through appropriate methods is essential to improve the efficiency of prediction models. However, little attention has been paid to feature selection methods in the area of personal loan default prediction. In this study, we employ random forest (RF), XGBoost, Adaptive Boosting (AdaBoost), Categorical Boosting (CatBoost), and Light Gradient Boosting Machine (LightGBM) as base algorithms of wrapper and embedded methods to select features and use these algorithms as classifiers to predict personal loan default. We find that when classical filter methods are used to select features, the number of selected features needs to be large enough to enable tree-based classifiers to get their best performance. However, when the tree-based algorithm is used to select features, it only needs to select a small number of features to deliver a satisfactory classification performance. AdaBoost, Chi2, and (Formula presented.) -score are found to be ideal feature selection methods in the area of personal credit default prediction. Moreover, we find that it is better to use different algorithms in feature selection and classification; AdaBoost and CatBoost perform the best among all classifiers.

Original languageEnglish
Pages (from-to)1248-1313
Number of pages66
JournalJournal of Forecasting
Volume41
Issue number6
DOIs
StatePublished - Sep 2022
Externally publishedYes

Keywords

  • credit risk
  • feature selection
  • machine learning
  • personal loan default prediction

Fingerprint

Dive into the research topics of 'A comparative study of combining tree-based feature selection methods and classifiers in personal loan default prediction'. Together they form a unique fingerprint.

Cite this