No abstract available.
Proceeding Downloads
Lotaru: Locally Estimating Runtimes of Scientific Workflow Tasks in Heterogeneous Clusters
Many scientific workflow scheduling algorithms need to be informed about task runtimes a-priori to conduct efficient scheduling. In heterogeneous cluster infrastructures, this problem becomes aggravated because these runtimes are required for each task-...
Journey to the center of the words: Word weighting scheme based on the geometry of word embeddings
A notable amount of work has been done to find sentence embeddings using compositional models in recent years. These works have shown that one of the simplest and most effective approaches to obtaining sentence embeddings is simple vector averaging of ...
Data Leakage Mitigation of User-Defined Functions on Secure Personal Data Management Systems
Personal Data Management Systems (PDMSs) arrive at a rapid pace providing individuals with appropriate tools to collect, manage and share their personal data. At the same time, the emergence of Trusted Execution Environments (TEEs) opens new ...
PM-Rtree: A Highly-Efficient Crash-Consistent R-tree for Persistent Memory
Persistent R-trees are important data structures for indexing large-scale spatial datasets using persistent memory (e.g., Intel Optane DIMMs). Existing persistent R-trees (e.g., FBR-tree) suffer from four major issues. (1) Node updates cause unnecessary ...
Region-adaptive, Error-controlled Scientific Data Compression using Multilevel Decomposition
- Qian Gong,
- Ben Whitney,
- Chengzhu Zhang,
- Xin Liang,
- Anand Rangarajan,
- Jieyang Chen,
- Lipeng Wan,
- Paul Ullrich,
- Qing Liu,
- Robert Jacob,
- Sanjay Ranka,
- Scott Klasky
The increase of computer processing speed is significantly outpacing improvements in network and storage bandwidth, leading to the big data challenge in modern science, where scientific applications can quickly generate much more data than that can be ...
From Images to Hydrologic Networks - Understanding the Arctic Landscape with Graphs
- Tabea Rettelbach,
- Moritz Langer,
- Ingmar Nitze,
- Benjamin M. Jones,
- Veit Helm,
- Johann-Christoph Freytag,
- Guido Grosse
Remote sensing-based Earth Observation plays an important role in assessing environmental changes throughout our planet. As an image-heavy domain, the evaluation of the data strongly focuses on statistical and pixel-based spatial analysis methods. ...
Incremental Influential Community Detection in Large Networks
The concept of network communities has been studied thoroughly in the network science literature since it has many important applications in diverse fields. Recently, the community concept has been combined with the concept of influence. The aim of ...
Social Spatio-temporal Keyword Pattern (S²KP) Queries in Multiple Aspect Trajectories Databases
The increasing use of devices with GPS capabilities has raised the need for storing and managing large amounts of spatio-temporal data, which can then be used by appropriate services and applications for extracting useful information from movement data. ...
Impact of similarity measures on clustering mixed data
In many domains, we face heterogeneous data with both numeric and categorical attributes. Clustering such data is challenging because the notion of similarity is not well defined due to the multiple data types. Existing clustering algorithms for these ...
Benchmark of DNN Model Search at Deployment Time
Deep learning has become the most popular direction in machine learning and artificial intelligence. However, the preparation of training data, as well as model training, are often time-consuming and become the bottleneck of the end-to-end machine ...
Harmonizing Privacy Regarding Data Retention and Purging
Data privacy requirements are a complex and quickly evolving part of the data management domain. Especially in Healthcare (e.g., United States Health Insurance Portability and Accountability Act and Veterans Affairs requirements), there has been a ...
SciTS: A Benchmark for Time-Series Databases in Scientific Experiments and Industrial Internet of Things
Time-series data has an increasingly growing usage in Industrial Internet of Things (IIoT) and large-scale scientific experiments. Managing time-series data needs a storage engine that can keep up with their constantly growing volumes while providing ...
Northlight: Declarative and Optimized Analysis of Atmospheric Datasets in SparkSQL
Research in atmospheric physics, meteorology, and weather prediction requires the processing of very large multi-dimensional observational or modeled datasets on a daily basis. One of the numerous existing array engines looks like the natural choice for ...
Crack Detection and Localization based on Spatio-Temporal Data using Residual Networks
- Fathalla Moreh,
- Hao Lyu,
- Christian Beth,
- Steffen Strohm,
- Zarghaam Haider Rizvi,
- Frank Wuttke,
- Matthias Renz
Damage detection in materials and structures plays a critical role in engineering and science applications like structural health monitoring. A particular challenge is presented by micro-scale cracks, which are imperceptible to the naked eye or in ...
Static-Dynamic Graph Neural Network for Stock Recommendation
Stock prediction is a hot topic of research in the field of Fintech. Stocks are not independent of each other. But, existing studies ignore the relations between stocks or simply utilize stock spatial dependencies based on predefined graphs. The ...
WSSA: Weakly Supervised Semantic-based approach for Sentiment Analysis
In this work, we propose a Weakly Semantic-based approach for Sentiment Analysis (WSSA), a novel approach that analyzes sentiment by considering weak labels from different sources (sentiment analysis tools) and aggregates them based on features such as ...
How Powerful are Membership Inference Attacks on Graph Neural Networks?
Graph Neural Networks (GNNs) are Machine Learning models that operate on structured graph data. By leveraging their node/graphs classification and link prediction capabilities, they have been successfully applied in several domains such as community ...
In-Place Updates in Tree-Encoded Bitmaps
The Tree-Encoded Bitmap (TEB) is a tree-based bitmap compression scheme that maps runs in a bitmap to leaf nodes in a binary tree. Currently, TEBs perform updates using an auxiliary differential data structure. However, consulting this additional data ...
Bi-directional Log-Structured Merge Tree
The Log-Structured Merge (LSM) Tree has become a popular storage scheme for modern NoSQL and New SQL database systems. The LSM-tree scheme achieves high write throughput by first buffering writes in memory, then flushing them to the disk with ...
Building Natural Language Interfaces for Databases in Practice
Natural language interfaces to databases have recently made substantial progress due to advances in machine learning. Users no longer need technical knowledge to search for insights in their database. However, research is largely focused on increasing ...
Recursive SQL for Data Mining
To implement algorithms within database systems beyond the design of SQL as a data query language, library functions or external tools were used that require the extraction of data first. To eliminate the need of data extraction out of database systems,...
News topic detection based on the principle of minimum entropy
Current topic detection methods generally use different algorithms to aggregate the extracted features to obtain corresponding topics, but do not fully utilize the features of news texts and key elements of news. For this, a news topic detection model ...
Exploring Large All-Flash Storage System with Scientific Simulation
Solid state storage systems have been very effectively used in small devices; however, their effectiveness for large systems such as supercomputers is not yet proven. Recently, for the first time, a new supercomputer is being deployed with an all-flash ...
Facilitating DoS Attack Detection using Unsupervised Anomaly Detection
- Christos Bellas,
- Georgia Kougka,
- Athanasios Naskos,
- Anastasios Gounaris,
- Athena Vakali,
- Christos Xenakis,
- Apostolos Papadopoulos
Modern techniques in intrusion and DoS (Denial of Service) detection tend to be either supervised or semi-supervised, i.e., they require training and labelled data. In this work, we study the problem of correlating security attacks with anomalies ...
Tandem Outlier Detectors for Decentralized Data
Today, the collection of decentralized data is a common scenario: smartphones store users’ messages locally, smart meters collect energy consumption data, and modern power tools monitor operator behavior. We identify different types of outliers in such ...
Digitalization in the Service of Society: The Case of Big Vehicle Trajectory Data
The ongoing, sweeping digitalization of societal processes generates massive volumes of data that capture the underlying processes at an unprecedented level of detail, in turn enabling us to better understand and improve those processes. Put ...
The rasdaman Array DBMS: Concepts, Architecture, and What People Do With It
Arrays as a fundamental data category have found their way into the orchestration of data models supported by databases. While OLAP "datacubes" can be emulated relationally to some extent it was in particular applications in science and engineering that ...