Abstract
In many real-world regression tasks, particularly within engineering and science, datasets exhibit imbalanced distributions where data density does not match local function complexity. This mismatch degrades model performance in complex, under-represented regions, as models tend to overfit simple, dense areas while neglecting intricate, sparse ones. This study develops a framework to address this challenge in low-dimensional imbalanced regression. The main contributions are twofold. First, the Complexity to Density Ratio (CDR) is introduced as a metric to formally quantify this type of imbalance, capturing the ratio between local function complexity and data density. Second, a data pruning method, Error Distribution Smoothing (EDS), is proposed. EDS constructs a representative dataset by systematically removing redundant samples from over-represented, low-complexity regions. This process smooths the prediction error distribution and focuses model training on challenging areas of the feature space. The efficacy of the EDS method is validated on nonlinear dynamic systems and real-world robotics datasets. Models trained on EDS-processed data demonstrate superior robustness, achieving up to an order-of-magnitude reduction in maximum prediction error and substantial improvements in computational efficiency.
| Original language | English |
|---|---|
| Article number | 115299 |
| Journal | Knowledge-Based Systems |
| Volume | 336 |
| DOIs | |
| State | Published - 15 Mar 2026 |
Keywords
- Data pruning
- Imbalanced regression
- Machine learning
Fingerprint
Dive into the research topics of 'Error distribution smoothing for low-dimensional imbalanced regression'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver