跳到主要导航 跳到搜索 跳到主要内容

Discovery of Approximate Lexicographical Order Dependencies

  • Yifeng Jin
  • , Zijing Tan*
  • , Jixuan Chen
  • , Shuai Ma
  • *此作品的通讯作者
  • Fudan University

科研成果: 期刊稿件文章同行评审

摘要

Lexicographical order dependencies (LODs) specify orders between list of attributes, and are proven useful in optimizing SQL queries with order by clauses. To discover hidden dependencies from dirty data in practice, approximate dependency discoveries are actively studied, aiming at automatically discovering dependencies that hold on data with some exceptions. In this paper we study the discovery of approximate LODs. (1) We adapt two error measures, namely g g1 and g3, to LODs. We prove their desirable properties, present efficient algorithms for computing the measures and related lower and upper bounds, and study the relationship between the two measures. (2) We present an efficient approximate LOD discovery algorithm that is well suited to the two error measures, with a set of pruning rules, optimization techniques and ranking functions. (3) We study techniques for estimating g1 by sampling, with high accuracy and far less time. (4) We conduct extensive experiments to verify the effectiveness and scalability of our methods, using both real-life and synthetic data.

源语言英语
页(从-至)3684-3698
页数15
期刊IEEE Transactions on Knowledge and Data Engineering
35
4
DOI
出版状态已出版 - 1 4月 2023

指纹

探究 'Discovery of Approximate Lexicographical Order Dependencies' 的科研主题。它们共同构成独一无二的指纹。

引用此