skip to main content
10.1145/2949689acmotherconferencesBook PagePublication PagesssdbmConference Proceedingsconference-collections
SSDBM '16: Proceedings of the 28th International Conference on Scientific and Statistical Database Management
ACM2016 Proceeding
Publisher:
  • Association for Computing Machinery
  • New York
  • NY
  • United States
Conference:
SSDBM '16: Conference on Scientific and Statistical Database Management Budapest Hungary July 18 - 20, 2016
ISBN:
978-1-4503-4215-5
Published:
18 July 2016
Recommend ACM DL
ALREADY A SUBSCRIBER?SIGN IN

Reflects downloads up to 31 Dec 2024Bibliometrics
Skip Abstract Section
Abstract

Never before it was so easy and inexpensive to gather data in amounts which were beyond imagination only a few years in the past. However, as we all are aware, this richness in data goes hand in hand with a poverty in insight, as data understanding cannot keep up with this data deluge. Today, this phenomenon is not confined to highly specialized applications like particle physics at CERN - all areas suffer, from science and engineering over business and administration to society at large, with imminent implications for all of us. There was and is a severe need for theories, methods, tools, and best practices that help us cope with the "volume, velocity, variety, and veracity" of data.

The aim of the International Conference on Scientific and Statistical Database Management (SSDBM) series is to bring database researchers, practitioners and developers together with scienfic domain experts to exchange the most recent research results on database techniques, concepts, tools and applications for scientific and statistical applicat-ions. The 28th SSDBM took place in Budapest, Hungary, between June 18-20, 2016, organized by the Hungarian Academy of Sciences Wigner Research Centre for Physics.

The conference this year had ten sessions, five for presenting research papers, two for keynote talks, one dedicated to demonstrations and poster viewing, a tutorial session, and a panel discussion. The research and poster committee has enjoyed the variety and interest of the submissions, which came from quite different angles and sub-disciplines of the broad areas of statistical and scientific data management. Altogether, 63 papers were submitted for review out of which 21 were accepted as full papers, three as posters and four as demonstrations. The full paper acceptance rate thus was 33%, the overall acceptance rate was 44%. An innovation this year was the introduction of a tutorial session to advocate tools and technologies which might be of wide interest to the research community. The tutorial on array databases has solicited keen interest, given the prevalence of array data in applications from the scientific domain (think time series or sequences of biological observations) as well as beyond (financial and transport data etc). The committee also decied to provide an opportunity to students participating and the conference to present non-peer-reviewed research posters in a special section so as to lower the entry barrier into scientific publishing.

The keynote speakers were invited to represent the two main areas of scientific data research: data-intensive system development and scientific applications. Prof. Volker Markl from the Technical University Berlin gave a keynote speech about Apache Flink, an open source scalable system for batch and stream processing developed by a team under his supervision, an how this system helps researchers implement deep data analysis workflows by automatic parallelization, optimization and efficient execution. Prof. István Csabai from the Eötvös Loránd University of Buda-pest discussed the challenges of massive data analysis in a wide range of scientific, fields from cosmology via gen-omics to social sciences.

In an interdisciplinary panel titled "Bye Bye Big Data - all problems solved, finally?" a lively discussion took place on the state of affairs in Big Data, how much the database field is contributing visibly, and where future avenues in terms of research areas and technological contributions can be found.

The conference organizers are grateful to all paper authors for the high-quality submissions, and to the research program committee, the demo committee, and the external reviewers for the thorough and timely reviews.

We hope you will find, like we do, the program of this year's conference interesting, inspiring and relevant for the scientific and statistical data management community.

Peter Baumann - General Chair

Ioana Manolescu - Program Chair

Skip Table Of Content Section
SESSION: Research Sessions: Data Exchange, Security and Privacy
research-article
Efficient Feedback Collection for Pay-as-you-go Source Selection

Technical developments, such as the web of data and web data extraction, combined with policy developments such as those relating to open government or open science, are leading to the availability of increasing numbers of data sources. Indeed, given ...

research-article
Functional Dependencies Unleashed for Scalable Data Exchange

We address the problem of efficiently evaluating target functional dependencies (fds) in the Data Exchange (DE) process. Target fds naturally occur in many DE scenarios, including the ones in Life Sciences in which multiple source relations need to be ...

research-article
Graph-based modelling of query sets for differential privacy

Differential privacy has gained attention from the community as the mechanism for privacy protection. Significant effort has focused on its application to data analysis, where statistical queries are submitted in batch and answers to these queries are ...

research-article
PAMPAS: Privacy-Aware Mobile Participatory Sensing Using Secure Probes

Mobile participatory sensing could be used in many applications such as vehicular traffic monitoring, pollution tracking, or even health surveying. However, its success depends on finding a solution for querying large numbers of users which protects ...

SESSION: Research Sessions: Similarity Search and Event Detection
research-article
Geometric Graph Indexing for Similarity Search in Scientific Databases

Searching a database for similar graphs is a critical task in many scientific applications, such as in drug discovery, geoinformatics, or pattern recognition. Typically, graph edit distance is used to estimate the similarity of non-identical graphs, ...

research-article
Efficient Similarity Search across Top-k Lists under the Kendall's Tau Distance

We consider the problem of similarity search in a set of top-k lists under the generalized Kendall's Tau distance. This distance describes how related two rankings are in terms of discordantly ordered items. We consider pair- and triplets-based indices ...

research-article
Monitoring Spatial Coverage of Trending Topics in Twitter

Most messages posted in Twitter usually discuss an ongoing event, triggering a series of tweets that together may constitute a trending topic (e.g., #election2012, #jesuischarlie, #oscars2016). Sometimes, such a topic may be trending only locally, ...

research-article
SPOTHOT: Scalable Detection of Geo-spatial Events in Large Textual Streams

The analysis of social media data poses several challenges: first of all, the data sets are very large, secondly they change constantly, and third they are heterogeneous, consisting of text, images, geographic locations and social connections. In this ...

SESSION: Research Sessions: Massive Data Processing
research-article
Efficient Maintenance of All-Pairs Shortest Distances

Computing shortest distances is a central task in many graph applications. Since it is impractical to recompute shortest distances from scratch every time the graph changes, many algorithms have been proposed to incrementally maintain shortest distances ...

research-article
Bermuda: An Efficient MapReduce Triangle Listing Algorithm for Web-Scale Graphs

Triangle listing plays an important role in graph analysis and has numerous graph mining applications. With the rapid growth of graph data, distributed methods for listing triangles over massive graphs are urgently needed. Therefore, the triangle ...

research-article
PIEJoin: Towards Parallel Set Containment Joins

The efficient computation of set containment joins (SCJ) over set-valued attributes is a well-studied problem with many applications in commercial and scientific fields. Nevertheless, there still exists a number of open questions: An extensive ...

research-article
Multi-Assignment Single Joins for Parallel Cross-Match of Astronomic Catalogs on Heterogeneous Clusters

Cross-match is a central operation in astronomic databases to integrate multiple catalogs of celestial objects. With the rapid development of new astronomy projects, large amounts of astronomic catalogs are generated and require fast cross-match with ...

research-article
Regular Path Queries on Massive Graphs

Regular Path Queries (RPQs) represent a powerful tool for querying graph databases and are of particular interest, because they form the building blocks of other query languages, and because they can be used in many theoretical or practical contexts for ...

SESSION: Research Sessions: Novel Data Management Paradigms
research-article
SolveDB: Integrating Optimization Problem Solvers Into SQL Databases

Many real-world decision problems involve solving optimization problems based on data in an SQL database. Traditionally, solving such problems requires combining a DBMS with optimization software packages for each required class of problems (e.g. linear ...

research-article
Compact and queryable representation of raster datasets

Compact data structures combine in a unique data structure a compressed representation of the data and the structures to access such data. The target is to be able to manage data directly in compressed form, and in this way, to keep data always ...

research-article
Vectorized UDFs in Column-Stores

Data Scientists rely on vector-based scripting languages such as R, Python and MATLAB to perform ad-hoc data analysis on potentially large data sets. When facing large data sets, they are only efficient when data is processed using vectorized or bulk ...

research-article
SPECTRA: Continuous Query Processing for RDF Graph Streams Over Sliding Windows

This paper proposes a new approach for the the incremental evaluation of RDF graph streams over sliding windows. Our system, called "SPECTRA", combines a novel formof RDF graph summarisation, a new incremental evaluation method and adaptive indexing ...

SESSION: Research Sessions: Mining and Data Analysis
research-article
Public Access
Pruning Forests to Find the Trees

The vast majority of phylogenetic databases do not support a declarative querying platform using which their contents can be flexibly and conveniently accessed. The template based query interfaces they support do not allow arbitrary speculative queries. ...

research-article
Framework for real-time clustering over sliding windows

Clustering queries over sliding windows require maintaining cluster memberships that change as windows slide. To address this, the Generic 2-phase Continuous Summarization framework (G2CS) utilizes a generation based window maintenance approach where ...

research-article
Fast, Explainable View Detection to Characterize Exploration Queries

The aim of data exploration is to get acquainted with an unfamiliar database. Typically, explorers operate by trial and error: they submit a query, study the result, and refine their query subsequently. In this paper, we investigate how to help them ...

research-article
Novel Data Reduction Based on Statistical Similarity

Applications such as scientific simulations and power grid monitoring are generating so much data quickly that compression is essential to reduce storage requirement or transmission capacity. To achieve better compression, one is often willing to ...

POSTER SESSION: Posters
poster
Data Exchange with MapReduce: A First Cut

Data exchange is one of the oldest database problems, being of both practical and theoretical interest. Given the pace at which heterogeneous data are published on the web, thanks to initiatives such as Linked Data and Open Science, scalability of data ...

poster
Privacy or Security?: Take A Look And Then Decide

Big data paradigm is currently the leading paradigm for data production and management. As a matter of fact, new information are generated at high rates in specialized fields (e.g., cybersecurity scenario). This may cause that the events to be studied ...

poster
SMS: Stable Matching Algorithm using Skylines

In this paper we show how skylines can be used to improve the stable matching algorithm with asymmetric preference sets for men and women. The skyline set of men (or women) in a dataset comprises of those who are not worse off in all the qualities in ...

DEMONSTRATION SESSION: System Demonstrations
demonstration
Array Database Scalability: Intercontinental Queries on Petabyte Datasets

With the deluge of scientific big data affecting a large variety of research institutions, support for large multidimensional arrays has gained traction in the database community in the past decade. Array databases aim to cover the gap left by ...

demonstration
Demonstrating KDBMS: A Knowledge-based Database Management System

We demonstrate a KDBMS, a prototype system which seamlessly integrates Knowledge base and DBMS. While state-of-the-art approaches, i.e., Ontology-based data access, denoted as OBDA, use ontologies to only query data stored in relational databases using ...

demonstration
Public Access
SciServer Compute: Bringing Analysis Close to the Data

SciServer Compute uses Jupyter notebooks running within server-side Docker containers attached to large relational databases and file storage to bring advanced analysis capabilities close to the data. SciServer Compute is a component of SciServer, a big-...

demonstration
Selective Scan for Filter Operator of SciDB

Recently there has been an increasing interest in analyzing scientific data generated by observations and scientific experiments. For managing these data efficiently, SciDB, a multi-dimensional array-based DBMS, is suggested. When SciDB processes a ...

Contributors
  • National and Kapodistrian University of Athens
  • Eötvös Loránd University
  1. Proceedings of the 28th International Conference on Scientific and Statistical Database Management

      Recommendations

      Acceptance Rates

      Overall Acceptance Rate 56 of 146 submissions, 38%
      YearSubmittedAcceptedRate
      SSDBM '18753040%
      SSDBM '14712637%
      Overall1465638%