Abstract
With the rapid development of data collection techniques in recent years, multiple types of data have emerged, including scalar data, functional data (curve-like), and compositional data (pie-like). While existing studies propose predictive models for multiple-type of data, few address the issue of variable selection. The challenge lies in the fact that different data types originate from different vector spaces, making it difficult to conduct variable selection at the variable level instead of selection at their sub-component level. This study leverages the group selection ability of gPLS (group Partial Least Squares) and gsPLS (group sparse Partial Least Squares) by regarding the functional and compositional variables as natural groups and proposes two variable selection approaches, named MD-gPLS and MD-gsPLS, after building a vector space for multiple types of data. Numerical studies and real-world examples verify the effectiveness of the proposed approaches. This study broadens the statistical modeling tools of multiple types of data analysis in terms of variable selection and also contributes to the literature by introducing the vector space of multiple types of data.
| Original language | English |
|---|---|
| Article number | 104969 |
| Pages (from-to) | 1369-1387 |
| Number of pages | 19 |
| Journal | Soft Computing |
| Volume | 29 |
| Issue number | 3 |
| DOIs | |
| State | Published - Feb 2025 |
Keywords
- Compositional data
- Functional data
- Group variable selection
- PLS
- Variable selection
Fingerprint
Dive into the research topics of 'Variable selection of multiple types of data: a PLS approach'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver