Skip to main navigation Skip to search Skip to main content

Variable selection of multiple types of data: a PLS approach

  • Boao Kong
  • , Huiwen Wang
  • , Shan Lu*
  • *Corresponding author for this work
  • Peking University
  • Central University of Finance and Economics

Research output: Contribution to journalArticlepeer-review

Abstract

With the rapid development of data collection techniques in recent years, multiple types of data have emerged, including scalar data, functional data (curve-like), and compositional data (pie-like). While existing studies propose predictive models for multiple-type of data, few address the issue of variable selection. The challenge lies in the fact that different data types originate from different vector spaces, making it difficult to conduct variable selection at the variable level instead of selection at their sub-component level. This study leverages the group selection ability of gPLS (group Partial Least Squares) and gsPLS (group sparse Partial Least Squares) by regarding the functional and compositional variables as natural groups and proposes two variable selection approaches, named MD-gPLS and MD-gsPLS, after building a vector space for multiple types of data. Numerical studies and real-world examples verify the effectiveness of the proposed approaches. This study broadens the statistical modeling tools of multiple types of data analysis in terms of variable selection and also contributes to the literature by introducing the vector space of multiple types of data.

Original languageEnglish
Article number104969
Pages (from-to)1369-1387
Number of pages19
JournalSoft Computing
Volume29
Issue number3
DOIs
StatePublished - Feb 2025

Keywords

  • Compositional data
  • Functional data
  • Group variable selection
  • PLS
  • Variable selection

Fingerprint

Dive into the research topics of 'Variable selection of multiple types of data: a PLS approach'. Together they form a unique fingerprint.

Cite this