Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleOctober 2024
Enhancing Deep Entity Resolution with Integrated Blocker-Matcher Training: Balancing Consensus and Discrepancy
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge ManagementPages 508–518https://doi.org/10.1145/3627673.3679843Deep entity resolution (ER) identifies matching entities across data sources using techniques based on deep learning. It involves two steps: a blocker for identifying the potential matches to generate the candidate pairs, and a matcher for accurately ...
- ArticleAugust 2024
Bridging Domains in Chronic Lower Back Pain: Large Language Models and Ontology-Driven Strategies for Knowledge Graph Construction
- Paul Anderson,
- Damon Lin,
- Jean Davidson,
- Theresa Migler,
- Iris Ho,
- Cooper Koenig,
- Madeline Bittner,
- Samuel Kaplan,
- Mayumi Paraiso,
- Nasreen Buhn,
- Emily Stokes,
- C. Anthony Hunt,
- Glen Ropella,
- Jeffrey Lotz
Bioinformatics and Biomedical EngineeringPages 14–30https://doi.org/10.1007/978-3-031-64636-2_2AbstractLink prediction and entity resolution play pivotal roles in uncovering hidden relationships within networks and ensuring data quality in the era of heterogeneous data integration. This paper explores the utilization of large language models to ...
- research-articleJune 2024
Rock: Cleaning Data by Embedding ML in Logic Rules
- Xianchun Bao,
- Zian Bao,
- Bie Binbin,
- QingSong Duan,
- Wenfei Fan,
- Hui Lei,
- Daji Li,
- Wei Lin,
- Peng Liu,
- Zhicong Lv,
- Mingliang Ouyang,
- Shuai Tang,
- Yaoshu Wang,
- Qiyuan Wei,
- Min Xie,
- Jing Zhang,
- Xin Zhang,
- Runxiao Zhao,
- Shuping Zhou
SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of DataPages 106–119https://doi.org/10.1145/3626246.3653372We introduce Rock, a system for cleaning relational data. Rock implements a framework that unifies machine learning (ML) and logic deduction by embedding ML classifiers in rules as predicates. In a unified process, it identifies tuples that refer to the ...
- short-paperMay 2024
BoostER: Leveraging Large Language Models for Enhancing Entity Resolution
WWW '24: Companion Proceedings of the ACM Web Conference 2024Pages 1043–1046https://doi.org/10.1145/3589335.3651245Entity resolution, which involves identifying and merging records that refer to the same real-world entity, is a crucial task in areas like Web data integration. This importance is underscored by the presence of numerous duplicated and multi-version data ...
-
- research-articleFebruary 2024
Using Neural and Graph Neural Recommender Systems to Overcome Choice Overload: Evidence From a Music Education Platform
ACM Transactions on Information Systems (TOIS), Volume 42, Issue 4Article No.: 92, Pages 1–26https://doi.org/10.1145/3637873The application of recommendation technologies has been crucial in the promotion of physical and digital content across numerous global platforms such as Amazon, Apple, and Netflix. Our study aims to investigate the advantages of employing recommendation ...
- research-articleDecember 2023
Splitting Tuples of Mismatched Entities
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 4Article No.: 269, Pages 1–29https://doi.org/10.1145/3626763There has been a host of work on entity resolution (ER), to identify tuples that refer to the same entity. This paper studies the inverse of ER, to identify tuples to which distinct real-world entities are matched by mistake, and split such tuples into a ...
- research-articleDecember 2023
The Battleship Approach to the Low Resource Entity Matching Problem
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 4Article No.: 224, Pages 1–25https://doi.org/10.1145/3626711Entity matching, a core data integration problem, is the task of deciding whether two data tuples refer to the same real-world entity. Recent advances in deep learning methods, using pre-trained language models, were proposed for resolving entity ...
- short-paperOctober 2023
Product Entity Matching via Tabular Data
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge ManagementPages 4215–4219https://doi.org/10.1145/3583780.3615172Product Entity Matching (PEM)--a subfield of record linkage that focuses on linking records that refer to the same product--is a challenging task for many entity matching models. For example, recent transformer models report a near-perfect performance ...
- research-articleSeptember 2023
Leveraging Semantic Technologies for Collaborative Inference of Threatening IoT Dependencies
ACM SIGAPP Applied Computing Review (SIGAPP), Volume 23, Issue 3Pages 32–48https://doi.org/10.1145/3626307.3626310IoT Device Management (DM) refers to the remote administration of customer devices. In practice, DM is ensured by multiple actors such as operators or device manufacturers, each operating independently via their DM solution. These siloed DM solutions ...
- ArticleNovember 2023
Exploring the Design Space of Unsupervised Blocking with Pre-trained Language Models in Entity Resolution
Advanced Data Mining and ApplicationsPages 228–244https://doi.org/10.1007/978-3-031-46661-8_16AbstractEntity resolution (ER) finds records that refer to the same entities in the real world. Blocking is an important task in ER, filtering out unnecessary comparisons and speeding up ER. Blocking is usually an unsupervised task. In this paper, we ...
- research-articleAugust 2023
Web-Scale Academic Name Disambiguation: The WhoIsWho Benchmark, Leaderboard, and Toolkit
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 3817–3828https://doi.org/10.1145/3580305.3599930Name disambiguation---a fundamental problem in online academic systems--is now facing greater challenges with the increasing growth of research papers. For example, on AMiner, an online academic search platform, about 10% of names own more than 100 ...
- research-articleAugust 2023
MedLink: De-Identified Patient Health Record Linkage
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 2672–2682https://doi.org/10.1145/3580305.3599427A comprehensive patient health history is essential for patient care and healthcare research. However, due to the distributed nature of healthcare services, patient health records are often scattered across multiple systems. Existing record linkage ...
- research-articleAugust 2023
CampER: An Effective Framework for Privacy-Aware Deep Entity Resolution
KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data MiningPages 626–637https://doi.org/10.1145/3580305.3599266Entity Resolution (ER) is a fundamental problem in data preparation. Standard deep ER methods have achieved state-of-the-art effectiveness, assuming that relations from different organizations are centrally stored. However, due to privacy concerns, it ...
- short-paperJuly 2023
SimTDE: Simple Transformer Distillation for Sentence Embeddings
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information RetrievalPages 2389–2393https://doi.org/10.1145/3539618.3592063In this paper we introduce SimTDE, a simple knowledge distillation framework to compress sentence embeddings transformer models with minimal performance loss and significant size and latency reduction. SimTDE effectively distills large and small ...
- short-paperJuly 2023
KATIE: A System for Key Attributes Identification in Product Knowledge Graph Construction
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information RetrievalPages 3320–3324https://doi.org/10.1145/3539618.3591846We present part of Huawei's efforts in building a Product Knowledge Graph (PKG). We want to identify which product attributes (i.e. properties) are relevant and important in terms of shopping decisions to product categories (i.e. classes). This is ...
- research-articleMay 2023
Making It Tractable to Catch Duplicates and Conflicts in Graphs
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 1Article No.: 86, Pages 1–28https://doi.org/10.1145/3588940This paper proposes an approach for entity resolution (ER) and conflict resolution (CR) in large-scale graphs. It is based on a class of Graph Cleaning Rules (GCRs), which support the primitives of relational data cleaning rules, and may embed machine ...
- research-articleMay 2023
FlexER: Flexible Entity Resolution for Multiple Intents
Proceedings of the ACM on Management of Data (PACMMOD), Volume 1, Issue 1Article No.: 42, Pages 1–27https://doi.org/10.1145/3588722Entity resolution, a longstanding problem of data cleaning and integration, aims at identifying data records that represent the same real-world entity. Existing approaches treat entity resolution as a universal task, assuming the existence of a single ...
- posterJune 2023
On evaluating text similarity measures for customer data deduplication
SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied ComputingPages 297–300https://doi.org/10.1145/3555776.3578724In this paper, we summarize the results obtained while evaluating 44 similarity measures for text values, which represent real institutional customers data. These data come from a project conducted for a large financial institution in Poland. The ...
- research-articleJune 2023
Inferring Threatening IoT Dependencies using Semantic Digital Twins Toward Collaborative IoT Device Management
SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied ComputingPages 1732–1741https://doi.org/10.1145/3555776.3578573IoT Device Management (DM) refers to registering, configuring, monitoring, and updating IoT devices. DM is facing new challenges as dependencies between IoT devices generate various threats, such as update breaks and cascading failures. Dependencies-...