Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleDecember 2024
Disclosure-Compliant Query Answering
Proceedings of the ACM on Management of Data (PACMMOD), Volume 2, Issue 6Article No.: 233, Pages 1–28https://doi.org/10.1145/3698808In today's data-driven world, organizations face increasing pressure to comply with data disclosure policies, which require data masking measures and robust access control mechanisms. This paper presents Mascara, a middleware for specifying and enforcing ...
Looking Deeply into the Magic Mirror: An Interactive Analysis of Database Index Selection Approaches
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 12Pages 4301–4304https://doi.org/10.14778/3685800.3685860Indexes are important data structures for database tuning. However, finding the best indexes for a given workload is challenging. In this demonstration, we present our extensible open-source index selection evaluation platform and the corresponding ...
Fainder: A Fast and Accurate Index for Distribution-Aware Dataset Search
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 11Pages 3269–3282https://doi.org/10.14778/3681954.3681999Efficient data discovery is crucial in the era of data-driven decisionmaking. However, current practices face significant challenges due to the intricacies of identifying datasets with specific distributional characteristics, such as percentiles, when ...
- keynoteJuly 2024
Using and Enhancing NebulaStream - A Tutorial
- Steffen Zeuch,
- Ankit Chaudhary,
- Viktor Rosenfeld,
- Taha Tekdogan,
- Adrian Michalke,
- Matthis Gördel,
- Ariane Ziehn,
- Volker Markl
DEBS '24: Proceedings of the 18th ACM International Conference on Distributed and Event-based SystemsPages 212–216https://doi.org/10.1145/3629104.3674126Soon, more data will be produced outside the cloud than inside. This constantly increasing amount of data requires new systems that can holistically optimize the processing for so-called sensor-fog-cloud environments. In this tutorial, we present ...
- keynoteJuly 2024
NebulaStream - Data Stream Processing in Massively Distributed, Heterogeneous, Volatile Environments
DEBS '24: Proceedings of the 18th ACM International Conference on Distributed and Event-based SystemsPages 1–3https://doi.org/10.1145/3629104.3672505Modern data-driven applications arising in such domains as smart manufacturing, healthcare, and the Internet of Things, pose new challenges to data processing systems. Traditional stream processing systems, such as Flink, Spark, and Kafka Streams are ill-...
-
- short-paperJune 2024
Multi-Backend Zonal Statistics Execution with Raven
SIGMOD/PODS '24: Companion of the 2024 International Conference on Management of DataPages 532–535https://doi.org/10.1145/3626246.3654730The recent explosion in the number and size of spatial remote sensing datasets from satellite missions creates new opportunities for data-driven approaches in domains such as climate change monitoring and disaster management. These approaches typically ...
- research-articleMay 2024
Query Compilation Without Regrets
- Philipp M. Grulich,
- Aljoscha P. Lepping,
- Dwi P. A. Nugroho,
- Varun Pandey,
- Bonaventura Del Monte,
- Steffen Zeuch,
- Volker Markl
Proceedings of the ACM on Management of Data (PACMMOD), Volume 2, Issue 3Article No.: 165, Pages 1–28https://doi.org/10.1145/3654968Engineering high-performance query execution engines is a challenging task. Query compilation provides excellent performance, but at the same time introduces significant system complexity, as it makes the engine hard to build, debug, and maintain. To ...
- research-articleMay 2024
Fault Tolerance Placement in the Internet of Things
Proceedings of the ACM on Management of Data (PACMMOD), Volume 2, Issue 3Article No.: 138, Pages 1–29https://doi.org/10.1145/3654941Today's IoT applications exploit the capabilities of three different computation environments: sensors, edge, and cloud. Ensuring fault tolerance at the edge level presents unique challenges due to complex network hierarchies and the presence of resource-...
- research-articleFebruary 2024
Assisted design of data science pipelines
The VLDB Journal — The International Journal on Very Large Data Bases (VLDB), Volume 33, Issue 4Pages 1129–1153https://doi.org/10.1007/s00778-024-00835-2AbstractWhen designing data science (DS) pipelines, end-users can get overwhelmed by the large and growing set of available data preprocessing and modeling techniques. Intelligent discovery assistants (IDAs) and automated machine learning (AutoML) ...
Efficient Placement of Decomposable Aggregation Functions for Stream Processing over Large Geo-Distributed Topologies
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 6Pages 1501–1514https://doi.org/10.14778/3648160.3648186A recent trend in stream processing is offloading the computation of decomposable aggregation functions (DAF) from cloud nodes to geo-distributed fog/edge devices to decrease latency and improve energy efficiency. However, deploying DAFs on low-end ...
POLAR: Adaptive and Non-invasive Join Order Selection via Plans of Least Resistance
- David Justen,
- Daniel Ritter,
- Campbell Fraser,
- Andrew Lamb,
- Allison Lee,
- Thomas Bodner,
- Mhd Yamen Haddad,
- Steffen Zeuch,
- Volker Markl,
- Matthias Boehm
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 6Pages 1350–1363https://doi.org/10.14778/3648160.3648175Join ordering and query optimization are crucial for query performance but remain challenging due to unknown or changing characteristics of query intermediates, especially for complex queries with many joins. Over the past two decades, a spectrum of ...
Missing Value Imputation for Multi-Attribute Sensor Data Streams via Message Propagation
Proceedings of the VLDB Endowment (PVLDB), Volume 17, Issue 3Pages 345–358https://doi.org/10.14778/3632093.3632100Sensor data streams occur widely in various real-time applications in the context of the Internet of Things (IoT). However, sensor data streams feature missing values due to factors such as sensor failures, communication errors, or depleted batteries. ...
- articleNovember 2023
Apache Wayang: A Unified Data Analytics Framework
- Kaustubh Beedkar,
- Bertty Contreras-Rojas,
- Haralampos Gavriilidis,
- Zoi Kaoudi,
- Volker Markl,
- Rodrigo Pardo-Meza,
- Jorge-Arnulfo Quiané-Ruiz
ACM SIGMOD Record (SIGMOD), Volume 52, Issue 3Pages 30–35https://doi.org/10.1145/3631504.3631510The large variety of specialized data processing platforms and the increased complexity of data analytics has led to the need for unifying data analytics within a single framework. Such a framework should free users from the burden of (i) choosing the ...
- research-articleOctober 2023
Good Intentions: Adaptive Parameter Management via Intent Signaling
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge ManagementPages 2156–2166https://doi.org/10.1145/3583780.3614895Model parameter management is essential for distributed training of large machine learning (ML) tasks. Some ML tasks are hard to distribute because common approaches to parameter management can be highly inefficient. Advanced parameter management ...
- research-articleSeptember 2023
A survey on transactional stream processing
The VLDB Journal — The International Journal on Very Large Data Bases (VLDB), Volume 33, Issue 2Pages 451–479https://doi.org/10.1007/s00778-023-00814-zAbstractTransactional stream processing (TSP) strives to create a cohesive model that merges the advantages of both transactional and stream-oriented guarantees. Over the past decade, numerous endeavors have contributed to the evolution of TSP solutions, ...
- research-articleAugust 2023
XDB in Action: Decentralized Cross-Database Query Processing for Black-Box DBMSes
- Haralampos Gavriilidis,
- Leonhard Rose,
- Joel Ziegler,
- Kaustubh Beedkar,
- Jorge-Arnulfo Quiané-Ruiz,
- Volker Markl
Proceedings of the VLDB Endowment (PVLDB), Volume 16, Issue 12Pages 4078–4081https://doi.org/10.14778/3611540.3611625Data are naturally produced at different locations and hence stored on different DBMSes. To maximize the value of the collected data, today's users combine data from different sources. Research in data integration has proposed the Mediator-Wrapper (MW) ...
- research-articleAugust 2023
Showcasing Data Management Challenges for Future IoT Applications with NebulaStream
- Aljoscha Lepping,
- Hoang Mi Pham,
- Laura Mons,
- Balint Rueb,
- Philipp M. Grulich,
- Ankit Chaudhary,
- Steffen Zeuch,
- Volker Markl
Proceedings of the VLDB Endowment (PVLDB), Volume 16, Issue 12Pages 3930–3933https://doi.org/10.14778/3611540.3611588Data management systems will face several new challenges in supporting IoT applications during the coming years. These challenges arise from managing large numbers of heterogeneous IoT devices and require combining elastic cloud and fog resources in ...
- ArticleJuly 2023
On Irregularity Localization for Scientific Data Analysis Workflows
Computational Science – ICCS 2023Pages 336–351https://doi.org/10.1007/978-3-031-35995-8_24AbstractThe paradigm shift towards data-driven science is massively transforming the scientific process. Scientists use exploratory data analysis to arrive at new insights. This requires them to specify complex data analysis workflows, which consist of ...
- research-articleJune 2023
P2D: A Transpiler Framework for Optimizing Data Science Pipelines
DEEM '23: Proceedings of the Seventh Workshop on Data Management for End-to-End Machine LearningArticle No.: 3, Pages 1–4https://doi.org/10.1145/3595360.3595853In this paper, we propose a transpilation-based approach to optimize data science pipelines that comprise database management systems (DBMSes) and data science runtimes (e.g., Python). Our approach allows to identify DBMS-supported operations and ...
- research-articleJune 2023
Exploiting Access Pattern Characteristics for Join Reordering
DaMoN '23: Proceedings of the 19th International Workshop on Data Management on New HardwarePages 10–18https://doi.org/10.1145/3592980.3595304With increasing main memory sizes, data processing has significantly shifted from secondary storage to main memory. However, choosing a good join order is still very important for efficient query execution in modern DBMS. This choice bases mainly on ...