Cross-source Data Error Detection Approach Based on Federated Learning

doi:10.21655/ijsi.1673-7288.00295

Home > Archive>Volume 13, Issue 1, 2023 >27-55. DOI:10.21655/ijsi.1673-7288.00295

Cross-source Data Error Detection Approach Based on Federated Learning
DOI:
                        10.21655/ijsi.1673-7288.00295
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

With the emergence and accumulation of massive data, data governance has become an important manner to improve data quality and maximize data value. Specifically, data error detection is a crucial step to improve data quality, which has attracted wide attention from both industry and academia. At present, various detection methods tailored for a single data source have been proposed. However, in many real-world scenarios, data are not centrally stored or managed. Data from different sources but highly correlated can be employed to improve the accuracy of error detection. Unfortunately, due to privacy/security issues, cross-source data are often not allowed to be integrated centrally. To this end, this paper proposes FeLeDetect, a cross-source data error detection method based on federated learning, so as to improve the error detection accuracy by using cross-source data information on the premise of data privacy. First, a Graph-based Error Detection Model, namely GEDM, is presented to capture sufficient data features from each data source. On this basis, the paper then designs a federated co-training algorithm, namely FCTA, to collaboratively train GEDM by using different cross-source data without privacy leakage of data. Furthermore, the paper designs a series of optimization methods to reduce communication costs during federated learning and manual labeling efforts. Finally, extensive experiments on three real-world datasets demonstrate that (1) GEDM achieves an average improvement of 10.3% and 25.2% in terms of the $F1$ score in the local and centralized scenarios, respectively, outperforming all the five existing state-of-the-art methods for error detection; (2) the F1 score of the error detection by FeLeDetect is 23.2% on average higher than that by GEDM in the local scenario.

Reference

Cited by

Get Citation

Lu Chen, Yuxiang Guo, Congcong Ge, Baihua Zheng, Yunjun Gao. Cross-source Data Error Detection Approach Based on Federated Learning. International Journal of Software and Informatics, 2023,13(1):27~55

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:May 13,2022
Revised:
Adopted:September 23,2022
Online: March 30,2023
Published:

Home

About Journal

Editorial Board

Guidelines

Content

News

Top papers

E-mail Alert

Publication Ethics

Old Version

Get Citation

Share

Article Metrics

History