Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
Harmonizing ML and Databases: A Symphony of Data (VLDB 2024 Keynote)
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Page 4556https://doi.org/10.14778/3685800.3685918Large language models (LLMs) are rapidly transforming the landscape of computing and daily life, demonstrating immense potential across diverse applications like natural language processing, machine translation, and code generation. This talk delves into ...
- research-articleNovember 2024
Sharing Information with Differential Privacy: A Database Perspective (VLDB 2024 Keynote)
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Page 4555https://doi.org/10.14778/3685800.3685917In the digital age, the widespread collection and analysis of data pose significant privacy challenges. Differential privacy (DP) has emerged as a leading framework for ensuring that information release does not compromise individual privacy. In this ...
- research-articleNovember 2024
Databases Unbound: Querying All of the World's Bytes with AI
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4546–4554https://doi.org/10.14778/3685800.3685916Over the past five decades, the relational database model has proven to be a scaleable and adaptable model for querying a variety of structured data, with use cases in analytics, transactions, graphs, streaming and more. However, most of the world's data ...
- research-articleNovember 2024
Reimagining Deep Learning Systems through the Lens of Data Systems
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4531–4535https://doi.org/10.14778/3685800.3685914The high-profile success of Deep Learning (DL) at Big Tech companies, including recent Large Language Models (LLMs) such as the GPT and Llama families, has led to high demand among Web companies, consumer app companies, enterprises, healthcare, domain ...
- research-articleNovember 2024
Intelligent Agents for Data Exploration
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4521–4530https://doi.org/10.14778/3685800.3685913Data Exploration is an incremental process that helps users express what they want through a conversation with the data. Reinforcement Learning (RL) is one of the most notable approaches to automate data exploration and several solutions have been ...
-
- research-articleNovember 2024
Vector Databases: What's Really New and What's Next? (VLDB 2024 Panel)
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4505–4506https://doi.org/10.14778/3685800.3685911Vector databases have recently emerged as a hot topic in the field of databases, especially in industry. This is due to the widespread interest in Large Language Models (LLMs), where vector databases provide the relevant context for LLMs to produce more ...
DB-MAGS: Multi-Anomaly Data Generation System for Transactional Databases
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4497–4500https://doi.org/10.14778/3685800.3685909Existing database performance anomaly datasets have the problems of comprehensiveness in anomaly types, coarse-grained root causes, and unrealistic simulation for reproducing concurrent anomalies. To address these issues, we propose a data generation ...
PrismX: A Single-Machine System for Querying Big Graphs
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4485–4488https://doi.org/10.14778/3685800.3685906We demonstrate PrismX (PRAM with SSDs as Memory eXtension), a single-machine system for graph analytics. PrismX allows users to make practical use of existing PRAM algorithms without any change. To cope with the limited DRAM capacity, it employs NVMe ...
HocoPG: A Database System with Homomorphic Compression for Text Processing
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4477–4480https://doi.org/10.14778/3685800.3685904Databases employ out-of-line storage and compression strategies to manage extensive text data. However, the growth in both the size of individual data items and overall data volume has significantly increased the burden of decompression, adversely ...
- research-articleNovember 2024
Pyneapple-G: Scalable Spatial Grouping Queries
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4469–4472https://doi.org/10.14778/3685800.3685902This paper demonstrates Pynapple-G, an open-source library for scalable spatial grouping queries based on Apache Sedona (formerly known as GeoSpark). We demonstrate two modules, namely, SGPAC and DDCEL, that support grouping points, grouping lines, and ...
- research-articleNovember 2024
OFL-W3: A One-Shot Federated Learning System on Web 3.0
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4461–4464https://doi.org/10.14778/3685800.3685900Federated Learning (FL) addresses the challenges posed by data silos, which arise from privacy, security regulations, and ownership concerns. Despite these barriers, FL enables these isolated data repositories to participate in collaborative learning ...
- research-articleNovember 2024
FedSQ: A Secure System for Federated Vector Similarity Queries
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4441–4444https://doi.org/10.14778/3685800.3685895Vector databases have emerged as crucial tools for managing and retrieving representation embeddings of unstructured data. Given the explosive growth of data, vector data is often distributed and stored across multiple organizations. However, privacy ...
Demonstration of the VeriEQL Equivalence Checker for Complex SQL Queries
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4437–4440https://doi.org/10.14778/3685800.3685894Equivalence checking for SQL queries has many real-world applications but typically requires supporting an expressive SQL language in order to be practical. We develop VeriEQL, a system that can prove and disprove equivalence of complex SQL queries. ...
FairEM360: A Suite for Responsible Entity Matching
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4417–4420https://doi.org/10.14778/3685800.3685889Entity matching is one of the earliest tasks that occur in the big data pipeline and is alarmingly exposed to unintentional biases that affect the quality of data. Identifying and mitigating the biases that exist in the data or are introduced by the ...
EncChain: Enhancing Large Language Model Applications with Advanced Privacy Preservation Techniques
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4413–4416https://doi.org/10.14778/3685800.3685888In response to escalating concerns about data privacy in the Large Language Model (LLM) domain, we demonstrate EncChain, a pioneering solution designed to bolster data security in LLM applications. EncChain presents an all-encompassing approach to data ...
CyNetDiff: A Python Library for Accelerated Implementation of Network Diffusion Models
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4409–4412https://doi.org/10.14778/3685800.3685887In recent years, there has been increasing interest in network diffusion models and related problems. The most popular of these are the independent cascade and linear threshold models. Much of the recent experimental work done on these models requires a ...
CMixing: An Efficient Coin Mixing Platform to Enhance Anonymity in Cryptocurrency Transactions
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4405–4408https://doi.org/10.14778/3685800.3685886Coin mixing methods are widely used to enhance anonymity in cryptocurrency transactions by obfuscating the linkages between recipients and senders. Specifically, coin mixing methods combine several users' transactions into a CoinJoin transaction and ...
Catcher: A Cache Analysis System for Top-k Pub/Sub Service
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4389–4392https://doi.org/10.14778/3685800.3685882Top-k Publish/Subscribe (TkPS) service is widely studied in spatial database, with various cache-based methods proposed to address its efficiency challenge in top-k result maintenance. These methods require in-depth exploration of relationships between ...
- research-articleNovember 2024
DOP-SQL: A General-Purpose, High-Utility, and Extensible Private SQL System
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4385–4388https://doi.org/10.14778/3685800.3685881Differential privacy (DP) has garnered significant attention from both academia and industry due to its potential in offering robust privacy protection for individual data during analysis. With the increasing volume of sensitive information being ...
- research-articleNovember 2024
Rock: Cleaning Data with both ML and Logic Rules
- Zian Bao,
- Binbin Bie,
- Wenfei Fan,
- Daji Li,
- Mengyun Li,
- Kaiwen Lin,
- Wei Lin,
- Peijie Liu,
- Peng Liu,
- Zhicong Lv,
- Mingliang Ouyang,
- Chenyang Sun,
- Shuai Tang,
- Yaoshu Wang,
- Qiyuan Wei,
- Xiangqian Wu,
- Min Xie,
- Jing Zhang,
- Runxiao Zhao,
- Jie Zhu,
- Yilin Zhu
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4373–4376https://doi.org/10.14778/3685800.3685878We demonstrate Rock, a system for cleaning relational data. Rock highlights the following unique features: (1) it extends logic rules by embedding machine learning models as predicates, to benefit from both ML and logic deduction; (2) it supports entity ...