TY - GEN
T1 - Dirichlet component analysis
T2 - 25th International Conference on Machine Learning
AU - Wang, Hua Yan
AU - Yang, Qiang
AU - Qin, Hong
AU - Zha, Hongbin
PY - 2008
Y1 - 2008
N2 - We consider feature extraction (dimensionality reduction) for compositional data, where the data vectors are constrained to be positive and constant-sum. In real-world probleins, the data components (variables) usually have complicated "correlations" while their total number is huge. Such scenario demands feature extraction. That is, we shall de-correlate the components and reduce their dimensionality. Traditional techniques such as the Principle Component Analysis (PCA) are not suitable for these problems due to unique statistical properties and the need to satisfy the constraints in compositional data. This paper presents a novel approach to feature extraction for compositional data. Our method first identifies a family of dimensionality reduction projections that preserve all relevant constraints, and then finds the optimal projection that maximizes the estimated Dirichlet precision on projected data. It reduces the compositional data to a given lower dimensionality while the components in the lower-dirnensioiial space are de-correlated as much as possible. We develop theoretical foundation of our approach, and validate its effectiveness on some synthetic and real-world datasets.
AB - We consider feature extraction (dimensionality reduction) for compositional data, where the data vectors are constrained to be positive and constant-sum. In real-world probleins, the data components (variables) usually have complicated "correlations" while their total number is huge. Such scenario demands feature extraction. That is, we shall de-correlate the components and reduce their dimensionality. Traditional techniques such as the Principle Component Analysis (PCA) are not suitable for these problems due to unique statistical properties and the need to satisfy the constraints in compositional data. This paper presents a novel approach to feature extraction for compositional data. Our method first identifies a family of dimensionality reduction projections that preserve all relevant constraints, and then finds the optimal projection that maximizes the estimated Dirichlet precision on projected data. It reduces the compositional data to a given lower dimensionality while the components in the lower-dirnensioiial space are de-correlated as much as possible. We develop theoretical foundation of our approach, and validate its effectiveness on some synthetic and real-world datasets.
UR - https://www.scopus.com/pages/publications/56449088321
M3 - 会议稿件
AN - SCOPUS:56449088321
SN - 9781605582054
T3 - Proceedings of the 25th International Conference on Machine Learning
SP - 1128
EP - 1135
BT - Proceedings of the 25th International Conference on Machine Learning
Y2 - 5 July 2008 through 9 July 2008
ER -