Abstract
This paper studies rule-based blocking in Entity Resolution (ER). We propose Hyper Blocker, a GPU-accelerated system for blocking in ER. As opposed to previous blocking algorithms and parallel blocking solvers, Hyper Blocker employs a pipelined architecture to overlap data transfer and GPU operations. It generates a data-aware and rule-aware execution plan on CPUs, for specifying how rules are evaluated, and develops a number of hardware-aware optimizations to achieve massive parallelism on GPUs. Using real-life datasets, we show that Hyper Blocker is at least 6.8× and 9.1× faster than prior CPU-powered distributed systems and GPU-based ER solvers, respectively. Better still, by combining Hyper Blocker with the state-of-the-art ER matcher, we can speed up the overall ER process by at least 30% with comparable accuracy.
| Original language | English |
|---|---|
| Pages (from-to) | 308-321 |
| Number of pages | 14 |
| Journal | Proceedings of the VLDB Endowment |
| Volume | 18 |
| Issue number | 2 |
| DOIs | |
| State | Published - 2025 |
| Event | 51st International Conference on Very Large Data Bases, VLDB 2025 - London, United Kingdom Duration: 1 Sep 2025 → 5 Sep 2025 |
Fingerprint
Dive into the research topics of 'HyperBlocker: Accelerating Rule-based Blocking in Entity Resolution using GPUs'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver