Detect, Distill and Update: Learned DB Systems Facing Out of Distribution Data

Kurmanji, Meghdad; Triantafillou, Peter

Computer Science > Databases

arXiv:2210.05508 (cs)

[Submitted on 11 Oct 2022 (v1), last revised 8 Dec 2022 (this version, v2)]

Title:Detect, Distill and Update: Learned DB Systems Facing Out of Distribution Data

Authors:Meghdad Kurmanji, Peter Triantafillou

View PDF

Abstract:Machine Learning (ML) is changing DBs as many DB components are being replaced by ML models. One open problem in this setting is how to update such ML models in the presence of data updates. We start this investigation focusing on data insertions (dominating updates in analytical DBs). We study how to update neural network (NN) models when new data follows a different distribution (a.k.a. it is "out-of-distribution" -- OOD), rendering previously-trained NNs inaccurate. A requirement in our problem setting is that learned DB components should ensure high accuracy for tasks on old and new data (e.g., for approximate query processing (AQP), cardinality estimation (CE), synthetic data generation (DG), etc.). This paper proposes a novel updatability framework (DDUp). DDUp can provide updatability for different learned DB system components, even based on different NNs, without the high costs to retrain the NNs from scratch. DDUp entails two components: First, a novel, efficient, and principled statistical-testing approach to detect OOD data. Second, a novel model updating approach, grounded on the principles of transfer learning with knowledge distillation, to update learned models efficiently, while still ensuring high accuracy. We develop and showcase DDUp's applicability for three different learned DB components, AQP, CE, and DG, each employing a different type of NN. Detailed experimental evaluation using real and benchmark datasets for AQP, CE, and DG detail DDUp's performance advantages.

Comments:	Accepted as a conference paper for SIGMOD 2023
Subjects:	Databases (cs.DB); Machine Learning (cs.LG)
Cite as:	arXiv:2210.05508 [cs.DB]
	(or arXiv:2210.05508v2 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.2210.05508

Submission history

From: Meghdad Kurmanji [view email]
[v1] Tue, 11 Oct 2022 15:00:25 UTC (3,496 KB)
[v2] Thu, 8 Dec 2022 17:45:24 UTC (3,473 KB)

Computer Science > Databases

Title:Detect, Distill and Update: Learned DB Systems Facing Out of Distribution Data

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title:Detect, Distill and Update: Learned DB Systems Facing Out of Distribution Data

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators