PVLDB: Vol 17, No 6

Volume 17, Issue 6February 2024

Volume 17, Issue 6

February 2024

Editor:

Meihui Zhang
Beijing Institute of Technology
,
Cyrus Shahabi
University of Southern California

Publisher:

VLDB Endowment

ISSN:2150-8097

Subscribe to Journal Recommend ACM DL

ALREADY A SUBSCRIBER?SIGN IN

Bibliometrics

Issue Downloads

PDFFront matter (Cover, Contents, Organization, Letter from the editors in chief)

Select All

Export Citations Save to Binder

research-article

MisDetect: Iterative Mislabel Detection using Early Loss

Pages 1159–1172https://doi.org/10.14778/3648160.3648161

Supervised machine learning (ML) models trained on data with mislabeled instances often produce inaccurate results due to label errors. Traditional methods of detecting mislabeled instances rely on data proximity, where an instance is considered ...

research-article

Capturing More Associations by Referencing External Graphs

Pages 1173–1186https://doi.org/10.14778/3648160.3648162

This paper studies association rule discovery in a graph G₁ by referencing an external graph G₂ with overlapping information. The objective is to enrich G₁ with relevant properties and links from G₂. As a testbed, we consider Graph Association Rules (...

research-article

QTCS: Efficient Query-Centered Temporal Community Search

Pages 1187–1199https://doi.org/10.14778/3648160.3648163

Temporal community search is an important task in graph analysis, which has been widely used in many practical applications. However, existing methods suffer from two major defects: (i) they only require that the target result contains the query vertex q,...

research-article

DPSUR: Accelerating Differentially Private Stochastic Gradient Descent Using Selective Update and Release

Pages 1200–1213https://doi.org/10.14778/3648160.3648164

Machine learning models are known to memorize private data to reduce their training loss, which can be inadvertently exploited by privacy attacks such as model inversion and membership inference. To protect against these attacks, differential privacy (DP)...

research-article

How Can We Train Deep Learning Models Across Clouds and Continents? An Experimental Study

Pages 1214–1226https://doi.org/10.14778/3648160.3648165

This paper aims to answer the question: Can deep learning models be cost-efficiently trained on a global market of spot VMs spanning different data centers and cloud providers? To provide guidance, we extensively evaluate the cost and throughput ...

research-article

Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses

Pages 1227–1240https://doi.org/10.14778/3648160.3648166

Graph Neural Networks (GNNs) are emerging as a powerful tool for learning from graph-structured data and performing sophisticated inference tasks in various application domains. Although GNNs have been shown to be effective on modest-sized graphs, ...

research-article

Comprehensive Evaluation of GNN Training Systems: A Data Management Perspective

Pages 1241–1254https://doi.org/10.14778/3648160.3648167

Many Graph Neural Network (GNN) training systems have emerged recently to support efficient GNN training. Since GNNs embody complex data dependencies between training samples, the training of GNNs should address distinct challenges different from DNN ...

research-article

LION: Fast and High-Resolution Network Kernel Density Visualization

Pages 1255–1268https://doi.org/10.14778/3648160.3648168

Network Kernel Density Visualization (NKDV) has often been used in a wide range of applications, e.g., criminology, transportation science, and urban planning. However, NKDV is computationally expensive, which cannot be scalable to large-scale datasets ...

research-article

Performance-Based Pricing for Federated Learning via Auction

Pages 1269–1282https://doi.org/10.14778/3648160.3648169

Many machine learning techniques rely on plenty of training data. However, data are often possessed unequally by different entities, with a large proportion of data being held by a small number of data-rich entities. It can be challenging to incentivize ...

research-article

OEBench: Investigating Open Environment Challenges in Real-World Relational Data Streams

Pages 1283–1296https://doi.org/10.14778/3648160.3648170

How to get insights from relational data streams in a timely manner is a hot research topic. Data streams can present unique challenges, such as distribution drifts, outliers, emerging classes, and changing features, which have recently been described as ...

research-article

Influence Maximization via Vertex Countering

Pages 1297–1309https://doi.org/10.14778/3648160.3648171

Competitive viral marketing considers the product competition of multiple companies, where each user may adopt one product and propagate the product to other users. Existing studies focus on a traditional seeding strategy where a company only selects ...

research-article

Optimizing Data Acquisition to Enhance Machine Learning Performance

Pages 1310–1323https://doi.org/10.14778/3648160.3648172

In this paper, we study how to acquire labeled data points from a large data pool to enrich a training set for enhancing supervised machine learning (ML) performance. The state-of-the-art solution is the clustering-based training set selection (CTS) ...

research-article

Minimum Strongly Connected Subgraph Collection in Dynamic Graphs

Pages 1324–1336https://doi.org/10.14778/3648160.3648173

Real-world directed graphs are dynamically changing, and it is important to identify and maintain the strong connectivity information between nodes, which is useful in numerous applications. Given an input graph G, we study a new problem, minimum ...

research-article

FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous Data

Pages 1337–1349https://doi.org/10.14778/3648160.3648174

Centralised data management systems (e.g., data lakes) support queries over multi-source heterogeneous data. However, the query results from multiple sources commonly involve between-source conflicts, which makes query results unreliable and confusing ...

research-article

POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least Resistance

Pages 1350–1363https://doi.org/10.14778/3648160.3648175

Join ordering and query optimization are crucial for query performance but remain challenging due to unknown or changing characteristics of query intermediates, especially for complex queries with many joins. Over the past two decades, a spectrum of ...

research-article

DAHA: Accelerating GNN Training with Data and Hardware Aware Execution Planning

Pages 1364–1376https://doi.org/10.14778/3648160.3648176

Graph neural networks (GNNs) have been gaining a reputation for effective modeling of graph data. Yet, it is challenging to train GNNs efficiently. Many frameworks have been proposed but most of them suffer from high batch preparation cost and data ...

research-article

FluidKV: Seamlessly Bridging the Gap between Indexing Performance and Memory-Footprint on Ultra-Fast Storage

Pages 1377–1390https://doi.org/10.14778/3648160.3648177

Our extensive experiments reveal that existing key-value stores (KVSs) achieve high performance at the expense of a huge memory footprint that is often impractical or unacceptable. Even with the emerging ultra-fast byte-addressable persistent memory (PM),...

research-article

How Do Categorical Duplicates Affect ML? A New Benchmark and Empirical Analyses

Pages 1391–1404https://doi.org/10.14778/3648160.3648178

The tedious grunt work involved in data preparation (prep) before ML reduces ML user productivity. It is also a roadblock to industrial-scale cloud AutoML workflows that build ML models for millions of datasets. One important data prep step for ML is ...

research-article

CGgraph: An Ultra-Fast Graph Processing System on Modern Commodity CPU-GPU Co-processor

Pages 1405–1417https://doi.org/10.14778/3648160.3648179

In recent years, many CPU-GPU heterogeneous graph processing systems have been developed in both academic and industrial to facilitate large-scale graph processing in various applications, e.g., social networks and biological networks. However, the ...

research-article

FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point Data

Pages 1418–1431https://doi.org/10.14778/3648160.3648180

While both the database and high-performance computing (HPC) communities utilize lossless compression methods to minimize floating-point data size, a disconnect persists between them. Each community designs and assesses methods in a domain-specific ...

research-article

PairwiseHist: Fast, Accurate and Space-Efficient Approximate Query Processing with Data Compression

Pages 1432–1445https://doi.org/10.14778/3648160.3648181

Exponential growth in data collection is creating significant challenges for data storage and analytics latency. Approximate Query Processing (AQP) has long been touted as a solution for accelerating analytics on large datasets, however, there is still ...

research-article

MetaStore: Analyzing Deep Learning Meta-Data at Scale

Pages 1446–1459https://doi.org/10.14778/3648160.3648182

The process of training deep learning models produces a huge amount of meta-data, including but not limited to losses, hidden feature embeddings, and gradients. Model diagnosis tools have been developed to analyze losses and feature embeddings with the ...

research-article

RTScan: Efficient Scan with Ray Tracing Cores

Pages 1460–1472https://doi.org/10.14778/3648160.3648183

Indexing is a core technique for accelerating predicate evaluation in databases. After many years of effort, the indexing performance has reached its peak on the existing hardware infrastructure. We propose to use ray tracing (RT) cores to move the ...

research-article

FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network Training

Pages 1473–1486https://doi.org/10.14778/3648160.3648184

A key performance bottleneck when training graph neural network (GNN) models on large, real-world graphs is loading node features onto a GPU. Due to limited GPU memory, expensive data movement is necessary to facilitate the storage of these features on ...

research-article

Sorting on Byte-Addressable Storage: The Resurgence of Tree Structure

Pages 1487–1500https://doi.org/10.14778/3648160.3648185

The tree structure is notably popular for storage and indexing; however, tree-based sorting such as tree sort is rarely used in practice. Nevertheless, with the advent of byte-addressable storage (BAS), the tree structure captures our attention with its ...

research-article

Efficient Placement of Decomposable Aggregation Functions for Stream Processing over Large Geo-Distributed Topologies

Pages 1501–1514https://doi.org/10.14778/3648160.3648186

A recent trend in stream processing is offloading the computation of decomposable aggregation functions (DAF) from cloud nodes to geo-distributed fog/edge devices to decrease latency and improve energy efficiency. However, deploying DAFs on low-end ...

research-article

AeonG: An Efficient Built-in Temporal Support in Graph Databases

Pages 1515–1527https://doi.org/10.14778/3648160.3648187

Real-world graphs are often dynamic and evolve over time. It is crucial for storing and querying a graph's evolution in graph databases. However, existing works either suffer from high storage overhead or lack efficient temporal query support, or both. ...

Subjects

Currently Not Available

Proceedings of the VLDB Endowment

Sections

Issue Downloads

MisDetect: Iterative Mislabel Detection using Early Loss

Capturing More Associations by Referencing External Graphs

QTCS: Efficient Query-Centered Temporal Community Search

DPSUR: Accelerating Differentially Private Stochastic Gradient Descent Using Selective Update and Release

How Can We Train Deep Learning Models Across Clouds and Continents? An Experimental Study

Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses

Comprehensive Evaluation of GNN Training Systems: A Data Management Perspective

LION: Fast and High-Resolution Network Kernel Density Visualization

Performance-Based Pricing for Federated Learning via Auction

OEBench: Investigating Open Environment Challenges in Real-World Relational Data Streams

Influence Maximization via Vertex Countering

Optimizing Data Acquisition to Enhance Machine Learning Performance

Minimum Strongly Connected Subgraph Collection in Dynamic Graphs

FusionQuery: On-demand Fusion Queries over Multi-source Heterogeneous Data

POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least Resistance

DAHA: Accelerating GNN Training with Data and Hardware Aware Execution Planning

FluidKV: Seamlessly Bridging the Gap between Indexing Performance and Memory-Footprint on Ultra-Fast Storage

How Do Categorical Duplicates Affect ML? A New Benchmark and Empirical Analyses

CGgraph: An Ultra-Fast Graph Processing System on Modern Commodity CPU-GPU Co-processor

FCBench: Cross-Domain Benchmarking of Lossless Compression for Floating-Point Data

PairwiseHist: Fast, Accurate and Space-Efficient Approximate Query Processing with Data Compression

MetaStore: Analyzing Deep Learning Meta-Data at Scale

RTScan: Efficient Scan with Ray Tracing Cores

FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network Training

Sorting on Byte-Addressable Storage: The Resurgence of Tree Structure

Efficient Placement of Decomposable Aggregation Functions for Stream Processing over Large Geo-Distributed Topologies

AeonG: An Efficient Built-in Temporal Support in Graph Databases

Sections

Issue Downloads

Save to Binder

Subjects

Comments