No abstract available.
Proceeding Downloads
Model Reuse in Learned Spatial Indexes
Learned database indexes use machine learning algorithms to directly predict the location of a query key within a sorted key array, thereby eliminating index search overhead. Prior work shows learned indexes outperforming traditional indexes in both ...
The Intersection of Compliance, Databases, and IT Operations
Most organizations rely on relational database(s) for their day-to-day business functions. Data management policies fall under the umbrella of IT Operations, dictated by a combination of internal organizational policies and government regulations. Many ...
Similarity Measures Recommendation for Mixed Data Clustering
Clustering is an important data mining task which is widely spread in various domains such as biology, finance, marketing, healthcare, and social sciences. It allows the end user to discover, through built clusters, relationships within data. Many non-...
CURD: Context-aware Relevance and Urgency Determination
During emergencies where time is of the essence, efficient management of disasters depends on swiftly recognizing relevant and urgent information from online platforms like X (Twitter), which is imperative for augmenting established response frameworks, ...
Imbalanced Graph-Level Anomaly Detection via Counterfactual Augmentation and Feature Learning
Graph-level anomaly detection (GLAD) has already gained significant importance and has become a popular field of study, attracting considerable attention across numerous downstream works. The core focus of this domain is to capture and highlight the ...
A Compact and Efficient Neural Data Structure for Mutual Information Estimation in Large Timeseries
Database systems face challenges when using mutual information (MI) for analyzing non-linear relationships between large timeseries, due to computational and memory requirements. Interactive workflows are especially hindered by long response times. To ...
AI Data Readiness Inspector (AIDRIN) for Quantitative Assessment of Data Readiness for AI
Garbage In Garbage Out is a universally agreed quote by computer scientists from various domains, including Artificial Intelligence (AI). As data is the fuel for AI, models trained on low-quality, biased data are often ineffective. Computer scientists ...
Statistical Privacy and Consent in Data Aggregation
As new laws governing management of personal data are introduced, e.g., the European Union’s General Data Protection Regulation of 2016 and the California Consumer Privacy Act of 2018, compliance with data governance legislation is becoming an ...
Scale Fairness on Spectral Clustering
The fairness and bias of spectral clustering algorithms have attracted considerable research interest in recent years. Currently fair spectral clustering algorithms are based on the notions of group fairness and individual fairness, which effectively ...
How Do Users Design Scientific Workflows? The Case of Snakemake and Nextflow
Scientific workflows automate the analysis of large-scale scientific data, fostering the reuse of data processing operators as well as the reproducibility and traceability of analysis results. In exploratory research, however, workflows are continuously ...
BIT: Using Bitmap Index to Speed Up NCBI Taxonomy Computing
The National Center for Biotechnology Information (NCBI) Taxonomy is extensively used in biomedical and ecological research. Typical demands include computing the lowest common ancestor, determining descendant relationships, and listing the descendants ...
A Model and Query Language for Multi-modal Hybrid Query
As data grows exponentially, its diversity also increases, including both structured forms and unstructured forms like audio, images, and videos. Advances in AI have improved our ability to analyze unstructured data, leading to the use of multimodal ...
Towards a Temporal Graph Query Language for Durable Patterns
Dynamic graphs are often the initial data for scientific analyses. However, existing methods designed for static graphs struggle with efficiency and accuracy when applied dynamically. One challenge occurs when local interactions in dynamic graphs ...
Knowledge Graph Enhancement for Improved Natural Language Health Question Answering using Large Language Models
In this paper we present a method for enhancing Question Answering (QA) systems by iteratively improving Knowledge Graphs (KGs) with a focus on maintaining monotonicity in the enhancement process. We introduce a mathematical framework employing ...
Why Do Scientific Workflows Still Break?
Scientific workflows have established themselves as valuable tools for designing, automating, and sharing scientific experiments and analyses, with the aim of promoting reproducibility and reuse. However, early evidence from a decade-old experiment ...
WebAssembly serverless join: A Study of its Application
Big data’s impact is driving research into efficient solutions for managing growing datasets, with a focus on distributed systems. Recent advancements in query processing, particularly the join operator, have been significant. WebAssembly (Wasm), known ...
On Vulnerability of Access Control Restrictions to Timing Attacks in a Database Management System
Side-channel attacks leverage implementation of algorithms to bypass security and leak restricted data. A timing attack observes differences in runtime in response to varying inputs to learn restricted information. Most prior work has focused on applying ...
VG-Prefetcher Cache: Towards Edge-Based Time Series Data Management Using Visibility Graph Prefetching
The demand for efficient and reliable cloud computing systems is increasing. However, effectively managing data workloads in edge cloud systems, especially for connected cars, can be challenging. To address this issue, we have developed a new cache ...
Performance-cost trade-offs in service orchestration for edge computing
Low latencies connections and decentralized servers are currently showcasing a new potential for distributed computing. By moving away from traditional centralized cloud models and toward edge computing, which allows for more autonomy and decision-...
Index Terms
- Proceedings of the 36th International Conference on Scientific and Statistical Database Management