TY - GEN
T1 - CrowdCleaner
T2 - 30th IEEE International Conference on Data Engineering, ICDE 2014
AU - Tong, Yongxin
AU - Cao, Caleb Chen
AU - Zhang, Chen Jason
AU - Li, Yatao
AU - Chen, Lei
PY - 2014
Y1 - 2014
N2 - Multi-version data is often one of the most concerned information on the Web since this type of data is usually updated frequently. Even though there exist some Web information integration systems that try to maintain the latest update version, the maintained multi-version data usually includes inaccurate and invalid information due to the data integration or update delay errors. In this demo, we present CrowdCleaner, a smart data cleaning system for cleaning multi-version data on the Web, which utilizes crowdsourcing-based approaches for detecting and repairing errors that usually cannot be solved by traditional data integration and cleaning techniques. In particular, CrowdCleaner blends active and passive crowdsourcing methods together for rectifying errors for multi-version data. We demonstrate the following four facilities provided by CrowdCleaner: (1) an error-monitor to find out which items (e.g., submission date, price of real estate, etc.) are wrong versions according to the reports from the crowds, which belongs to a passive crowdsourcing strategy; (2) a task-manager to allocate the tasks to human workers intelligently; (3) a smart-decision-maker to identify which answer from the crowds is correct with active crowdsourcing methods; and (4) a whom-to-ask-finder to discover which users (or human workers) should be the most credible according to their answer records.
AB - Multi-version data is often one of the most concerned information on the Web since this type of data is usually updated frequently. Even though there exist some Web information integration systems that try to maintain the latest update version, the maintained multi-version data usually includes inaccurate and invalid information due to the data integration or update delay errors. In this demo, we present CrowdCleaner, a smart data cleaning system for cleaning multi-version data on the Web, which utilizes crowdsourcing-based approaches for detecting and repairing errors that usually cannot be solved by traditional data integration and cleaning techniques. In particular, CrowdCleaner blends active and passive crowdsourcing methods together for rectifying errors for multi-version data. We demonstrate the following four facilities provided by CrowdCleaner: (1) an error-monitor to find out which items (e.g., submission date, price of real estate, etc.) are wrong versions according to the reports from the crowds, which belongs to a passive crowdsourcing strategy; (2) a task-manager to allocate the tasks to human workers intelligently; (3) a smart-decision-maker to identify which answer from the crowds is correct with active crowdsourcing methods; and (4) a whom-to-ask-finder to discover which users (or human workers) should be the most credible according to their answer records.
UR - https://www.scopus.com/pages/publications/84901749646
U2 - 10.1109/ICDE.2014.6816736
DO - 10.1109/ICDE.2014.6816736
M3 - 会议稿件
AN - SCOPUS:84901749646
SN - 9781479925544
T3 - Proceedings - International Conference on Data Engineering
SP - 1182
EP - 1185
BT - 2014 IEEE 30th International Conference on Data Engineering, ICDE 2014
PB - IEEE Computer Society
Y2 - 31 March 2014 through 4 April 2014
ER -