Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
DARKER: Efficient Transformer with Data-Driven Attention Mechanism for Time Series
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 11Pages 3229–3242https://doi.org/10.14778/3681954.3681996Transformer-based models have facilitated numerous applications with superior performance. A key challenge in transformers is the quadratic dependency of its training time complexity on the length of the input sequence. A recent popular solution is using ...
Efficiently Mitigating the Impact of Data Drift on Machine Learning Pipelines
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 11Pages 3072–3081https://doi.org/10.14778/3681954.3681984Despite the increasing success of Machine Learning (ML) techniques in real-world applications, their maintenance over time remains challenging. In particular, the prediction accuracy of deployed ML models can suffer due to significant changes between ...
QCore: Data-Efficient, On-Device Continual Calibration for Quantized Models
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 11Pages 2708–2721https://doi.org/10.14778/3681954.3681957We are witnessing an increasing availability of streaming data that may contain valuable information on the underlying processes. It is thus attractive to be able to deploy machine learning models, e.g., for classification, on edge devices near sensors ...
ADF & TransApp: A Transformer-Based Framework for Appliance Detection Using Smart Meter Consumption Series
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 3Pages 553–562https://doi.org/10.14778/3632093.3632115Over the past decade, millions of smart meters have been installed by electricity suppliers worldwide, allowing them to collect a large amount of electricity consumption data, albeit sampled at a low frequency (one point every 30min). One of the ...
FormaT5: Abstention and Examples for Conditional Table Formatting with Natural Language
- Mukul Singh,
- José Cambronero,
- Sumit Gulwani,
- Vu Le,
- Carina Negreanu,
- Elnaz Nouri,
- Mohammad Raza,
- Gust Verbruggen
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 3Pages 497–510https://doi.org/10.14778/3632093.3632111Formatting is an important property in tables for visualization, presentation, and analysis. Spreadsheet software allows users to automatically format their tables by writing data-dependent conditional formatting (CF) rules. Writing such rules is often ...
-
- research-articleAugust 2022
Demonstrating quest: a query-driven framework to explain classification models on tabular data
Proceedings of the VLDB Endowment (PVLDB), Volume 15, Issue 12Pages 3722–3725https://doi.org/10.14778/3554821.3554884Machine learning models are everywhere now; but only few of them are transparent in how they work. To remedy this, local explanations aim to show users how and why learned models produce a certain output for a given input (data sample). However, most ...
Witan: unsupervised labelling function generation for assisted data programming
Proceedings of the VLDB Endowment (PVLDB), Volume 15, Issue 11Pages 2334–2347https://doi.org/10.14778/3551793.3551797Effective supervised training of modern machine learning models often requires large labelled training datasets, which could be prohibitively costly to acquire for many practical applications. Research addressing this problem has sought ways to leverage ...
xFraud: explainable fraud transaction detection
Proceedings of the VLDB Endowment (PVLDB), Volume 15, Issue 3Pages 427–436https://doi.org/10.14778/3494124.3494128At online retail platforms, it is crucial to actively detect the risks of transactions to improve customer experience and minimize financial loss. In this work, we propose xFraud, an explainable fraud transaction prediction framework which is mainly ...
- research-articleJuly 2021
How divergent is your data?
Proceedings of the VLDB Endowment (PVLDB), Volume 14, Issue 12Pages 2835–2838https://doi.org/10.14778/3476311.3476357We present DivExplorer, a tool that enables users to explore datasets and find subgroups of data for which a classifier behaves in an anomalous manner. These subgroups, denoted as divergent subgroups, may exhibit, for example, higher-than-normal false ...
- research-articleJuly 2021
Assassin: an automatic classification system based on algorithm selection
Proceedings of the VLDB Endowment (PVLDB), Volume 14, Issue 12Pages 2751–2754https://doi.org/10.14778/3476311.3476336The increasing complexity of data analysis tasks makes it dependent on human expertise and challenging for non-experts. One of the major challenges faced in data analysis is the selection of the proper algorithm for given tasks and data sets. Motivated ...
- research-articleJuly 2021
Automatic data acquisition for deep learning
Proceedings of the VLDB Endowment (PVLDB), Volume 14, Issue 12Pages 2739–2742https://doi.org/10.14778/3476311.3476333Deep learning (DL) has widespread applications and has revolutionized many industries. Although automated machine learning (AutoML) can help us away from coding for DL models, the acquisition of lots of high-quality data for model training remains a ...
- research-articleJuly 2021
CHEF: a cheap and fast pipeline for iteratively cleaning label uncertainties
Proceedings of the VLDB Endowment (PVLDB), Volume 14, Issue 11Pages 2410–2418https://doi.org/10.14778/3476249.3476290High-quality labels are expensive to obtain for many machine learning tasks, such as medical image classification tasks. Therefore, probabilistic (weak) labels produced by weak supervision tools are used to seed a process in which influential samples ...
- research-articleJuly 2019
Efficient task-specific data valuation for nearest neighbor algorithms
- Ruoxi Jia,
- David Dao,
- Boxin Wang,
- Frances Ann Hubis,
- Nezihe Merve Gurel,
- Bo Li,
- Ce Zhang,
- Costas Spanos,
- Dawn Song
Proceedings of the VLDB Endowment (PVLDB), Volume 12, Issue 11Pages 1610–1623https://doi.org/10.14778/3342263.3342637Given a data set D containing millions of data points and a data consumer who is willing to pay for $X to train a machine learning (ML) model over D, how should we distribute this $X to each data point to reflect its "value"? In this paper, we define the ...
- research-articleAugust 2015
Vizdom: interactive analytics through pen and touch
Proceedings of the VLDB Endowment (PVLDB), Volume 8, Issue 12Pages 2024–2027https://doi.org/10.14778/2824032.2824127Machine learning (ML) and advanced statistics are important tools for drawing insights from large datasets. However, these techniques often require human intervention to steer computation towards meaningful results. In this demo, we present Vizdom, a new ...
- research-articleOctober 2014
Scaling up crowd-sourcing to very large datasets: a case for active learning
Proceedings of the VLDB Endowment (PVLDB), Volume 8, Issue 2Pages 125–136https://doi.org/10.14778/2735471.2735474Crowd-sourcing has become a popular means of acquiring labeled data for many tasks where humans are more accurate than computers, such as image tagging, entity resolution, and sentiment analysis. However, due to the time and cost of human labor, ...
- research-articleAugust 2014
Big data small footprint: the design of a low-power classifier for detecting transportation modes
Proceedings of the VLDB Endowment (PVLDB), Volume 7, Issue 13Pages 1429–1440https://doi.org/10.14778/2733004.2733015Sensors on mobile phones and wearables, and in general sensors on IoT (Internet of Things), bring forth a couple of new challenges to big data research. First, the power consumption for analyzing sensor data must be low, since most wearables and ...
- articleAugust 2013
Learning and intelligent optimization (LION): one ring to rule them all
Proceedings of the VLDB Endowment (PVLDB), Volume 6, Issue 11Pages 1176–1177https://doi.org/10.14778/2536222.2536247Almost by definition, optimization is a source of a tremendous power for automatically improving processes, decisions, products and services. But its potential is still largely unexploited in most real-world contexts. One of the main reasons blocking its ...
- articleApril 2013
Schema extraction for tabular data on the web
Proceedings of the VLDB Endowment (PVLDB), Volume 6, Issue 6Pages 421–432https://doi.org/10.14778/2536336.2536343Tabular data is an abundant source of information on the Web, but remains mostly isolated from the latter's interconnections since tables lack links and computer-accessible descriptions of their structure. In other words, the schemas of these tables -- ...
- research-articleFebruary 2011
Incrementally maintaining classification using an RDBMS
Proceedings of the VLDB Endowment (PVLDB), Volume 4, Issue 5Pages 302–313https://doi.org/10.14778/1952376.1952380The proliferation of imprecise data has motivated both researchers and the database industry to push statistical techniques into relational database management systems (RDBMSes). We study strategies to maintain model-based views for a popular statistical ...
- research-articleAugust 2009
Publishing naive Bayesian classifiers: privacy without accuracy loss
Proceedings of the VLDB Endowment (PVLDB), Volume 2, Issue 1Pages 1174–1185https://doi.org/10.14778/1687627.1687759We address the problem of publishing a Naïve Bayesian Classifier (NBC) or, equivalently, publishing the necessary views for building an NBC, while protecting privacy of the individuals who provided the training data. Our approach completely preserves ...