跳到主要导航 跳到搜索 跳到主要内容

Improving data quality: Consistency and accuracy

  • Gao Cong
  • , Wenfei Fan
  • , Floris Geerts
  • , Xibei Jia
  • , Shuai Ma
  • Microsoft USA
  • University of Edinburgh
  • Hasselt University
  • Nokia
  • Transnational Univ. Limburg

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Two central criteria for data quality are consistency and accuracy. Inconsistencies and errors in a database often emerge as violations of integrity constraints. Given a dirty database D, one needs automated methods to make it consistent, i.e., find a repair D′ that satisfies the constraints and "minimally" differs from D. Equally important is to ensure that the automatically-generated repair D′ is accurate, or makes sense, i.e., D′ differs from the "correct" data within a predefined bound. This paper studies effective methods for improving both data consistency and accuracy. We employ a class of conditional functional dependencies (CFDs) proposed in [6] to specify the consistency of the data, which are able to capture inconsistencies and errors beyond what their traditional counterparts can catch. To improve the consistency of the data, we propose two algorithms: one for automatically computing a repair D′ that satisfies a given set of CFDs, and the other for incrementally finding a repair in response to updates to a clean database. We show that both problems are intractable. Although our algorithms are necessarily heuristic, we experimentally verify that the methods are effective and efficient. Moreover, we develop a statistical method that guarantees that the repairs found by the algorithms are accurate above a predefined rate without incurring excessive user interaction.

源语言英语
主期刊名33rd International Conference on Very Large Data Bases, VLDB 2007 - Conference Proceedings
编辑Johannes Gehrke, Christoph Koch, Minos Garofalakis, Karl Aberer, Carl-Christian Kanne, Erich J. Neuhold, Venkatesh Ganti, Wolfgang Klas, Chee-Yong Chan, Divesh Srivastava, Dana Florescu, Anand Deshpande
出版商Association for Computing Machinery, Inc
315-326
页数12
ISBN(电子版)9781595936493
出版状态已出版 - 2007
已对外发布
活动33rd International Conference on Very Large Data Bases, VLDB 2007 - Vienna, 奥地利
期限: 23 9月 200727 9月 2007

出版系列

姓名33rd International Conference on Very Large Data Bases, VLDB 2007 - Conference Proceedings

会议

会议33rd International Conference on Very Large Data Bases, VLDB 2007
国家/地区奥地利
Vienna
时期23/09/0727/09/07

指纹

探究 'Improving data quality: Consistency and accuracy' 的科研主题。它们共同构成独一无二的指纹。

引用此