跳到主要导航 跳到搜索 跳到主要内容

Discovering Approximate Inclusion Dependencies

  • Qingdong Su
  • , Zhikang Wang
  • , Zijing Tan
  • , Shuai Ma

科研成果: 期刊稿件会议文章同行评审

摘要

Inclusion dependencies (INDs) are widely used in data management tasks. The discovery techniques of INDs have thus received a lot of attention, for discovering INDs valid in data. However, real-world data quality issues may lead to partial violations of INDs. This paper makes the first effort to provide a comprehensive study on the discovery of approximate INDs (AINDs), aiming to identify INDs with error rates below a given threshold. This paper introduces a new definition of AIND based on deletion semantics, in addition to the existing definition based on insertion semantics. A discovery method is developed that can be configured to identify AINDs based on either of these semantics. The method combines partitioning techniques to handle tables that cannot all fit into memory simultaneously, with novel approaches to quantify AIND violations based on partitioned tables. To improve efficiency, the method employs a novel three-layer filtering structure and techniques that can potentially prune invalid candidate AINDs and identify valid AINDs without necessarily processing all tuples. We conduct an extensive experimental evaluation and verify the following: the proposed method significantly outperforms existing methods for AIND discovery based on insertion semantics, the AIND discoveries with insertion and deletion semantics can provide complementary results, and our discovery method can effectively deal with dirty dataset containing various types of errors.

源语言英语
页(从-至)1210-1222
页数13
期刊Proceedings of the VLDB Endowment
18
4
DOI
出版状态已出版 - 2025
活动51st International Conference on Very Large Data Bases, VLDB 2025 - London, 英国
期限: 1 9月 20255 9月 2025

指纹

探究 'Discovering Approximate Inclusion Dependencies' 的科研主题。它们共同构成独一无二的指纹。

引用此