Abstract
For existing unsupervised spectral feature selection algorithms, the quality of the eigenvectors decides the performance. There eigenvectors are calculated from the Laplacian matrix of similarity graph which is built from samples. When applying these algorithms to high-dimensional data, we meet the very embarrassing chicken-and-egg problem: “the success of feature selection depends on the quality of indication vectors which are related to the structure of data. But the purpose of feature selection is to give more accurate data structure.” To alleviate this problem, we propose a graph-based approach to reduce the dimension of data by searching and removing redundant features automatically. A sparse graph is generated at feature side and is used to learn the redundant relationship among features. We name this novel graph as sparse feature graph (SFG). To avoid the inaccurate distance information among high-dimensional vectors, the construction of SFG does not utilize the pairwise relationship among samples, which means the structure info of data is not used. Our proposed algorithm is also a nonparametric one as it does not make any assumption about the data distribution. We treat this proposed redundant feature removal algorithm as a data preprocessing approach for existing popular unsupervised spectral feature selection algorithms like multi-cluster feature selection (MCFS) which requires accurate cluster structure information based on samples. Our experimental results on benchmark datasets show that the proposed SFG and redundant feature remove algorithm can improve the performance of those unsupervised spectral feature selection algorithms consistently.
| Original language | English |
|---|---|
| Pages (from-to) | 77-93 |
| Number of pages | 17 |
| Journal | International Journal of Data Science and Analytics |
| Volume | 8 |
| Issue number | 1 |
| DOIs | |
| State | Published - 1 Jul 2019 |
| Externally published | Yes |
Keywords
- Dense subgraph
- Sparse graph representation
- Unsupervised spectral feature selection
Fingerprint
Dive into the research topics of 'Redundant features removal for unsupervised spectral feature selection algorithms: an empirical study based on nonparametric sparse feature graph'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver