Abstract
Dimension reduction plays an important role in practical big data analysis and data mining applications. However, popular dimension reduction techniques, such as principal component analysis (PCA), are known to be computation-intensive and are considered as a computation bottleneck for data processing and mining. In this paper, we propose to reduce the computation of PCA via crowdsourcing, a paradigm that accomplishes hard-to-compute problems leveraging collective intelligence. We design CPCA, crowd principal component analysis, a novel crowd-based dimension reduction framework. The CPCA designs tasks for crowd workers to obtain the relations among features based on their semantics and formulates a weighted graph from the collected answers to derive the covariance matrix and the principal components. We prove the correctness of CPCA and conduct extensive evaluations on real datasets. Experimental results show that CPCA could achieve significantly reduction on the computational cost in terms of both time and memory, which lowers the bar for learning.
| Original language | English |
|---|---|
| Article number | 8519735 |
| Pages (from-to) | 73191-73199 |
| Number of pages | 9 |
| Journal | IEEE Access |
| Volume | 6 |
| DOIs | |
| State | Published - 2018 |
Keywords
- Dimensionality reduction
- crowdsourcing
- machine learning
- principal component analysis
Fingerprint
Dive into the research topics of 'CPCA: A feature semantics based crowd dimension reduction framework'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver