COLIN: A Cache-Conscious Dynamic Learned Index with High Read/Write Performance
The recently proposed learned index has higher query performance and space efficiency than the conventional B+-tree. However, the original learned index has the problems of insertion failure and unbounded query complexity, meaning that it supports ...
WATuning: A Workload-Aware Tuning System with Attention-Based Deep Reinforcement Learning
Configuration tuning is essential to optimize the performance of systems (e.g., databases, key-value stores). High performance usually indicates high throughput and low latency. At present, most of the tuning tasks of systems are performed ...
Cardinality Estimator: Processing SQL with a Vertical Scanning Convolutional Neural Network
Although the popular database systems perform well on query optimization, they still face poor query execution plans when the join operations across multiple tables are complex. Bad execution planning usually results in bad cardinality ...
TransGPerf: Exploiting Transfer Learning for Modeling Distributed Graph Computation Performance
It is challenging to model the performance of distributed graph computation. Explicit formulation cannot easily capture the diversified factors and complex interactions in the system. Statistical learning methods require a large number of training ...
Efficient Model Store and Reuse in an OLML Database System
Deep learning has shown significant improvements on various machine learning tasks by introducing a wide spectrum of neural network models. Yet, for these neural network models, it is necessary to label a tremendous amount of training data, which ...
Impacts of Dirty Data on Classification and Clustering Models: An Experimental Evaluation
Data quality issues have attracted widespread attentions due to the negative impacts of dirty data on data mining and machine learning results. The relationship between data quality and the accuracy of results could be applied on the selection of ...
Mixed Hierarchical Networks for Deep Entity Matching
Entity matching is a fundamental problem of data integration. It groups records according to underlying real-world entities. There is a growing trend of entity matching via deep learning techniques. We design mixed hierarchical deep neural ...
Using Markov Chain Based Estimation of Distribution Algorithm for Model-Based Safety Analysis of Graph Transformation
The ability to assess the reliability of safety-critical systems is one of the most crucial requirements in the design of modern safety-critical systems where even a minor failure can result in loss of life or irreparable damage to the ...
An Empirical Comparison Between Tutorials and Crowd Documentation of Application Programming Interface
API (application programming interface) documentation is critical for developers to learn APIs. However, it is unclear whether API documentation indeed improves the API learnability for developers. In this paper, we focus on two types of API ...
Order-Revealing Encryption: File-Injection Attack and Forward Security
Order-preserving encryption (OPE) and order-revealing encryption (ORE) are among the core ingredients for encrypted databases (EDBs). In this work, we study the leakage of OPE and ORE and their forward security. We propose generic yet powerful ...
A Heuristic Sampling Method for Maintaining the Probability Distribution
Sampling is a fundamental method for generating data subsets. As many data analysis methods are developed based on probability distributions, maintaining distributions when sampling can help to ensure good data analysis performance. However, ...
Method for Processing Graph Degeneracy in Dynamic Geometry Based on Domain Design
A dynamic geometry system, as an important application in the field of geometric constraint solving, is widely used in elementary mathematics education; moreover, the dynamic geometry system is also a fundamental environment for automated theorem ...
Discovering API Directives from API Specifications with Text Classification
Application programming interface (API) libraries are extensively used by developers. To correctly program with APIs and avoid bugs, developers shall pay attention to API directives, which illustrate the constraints of APIs. Unfortunately, API ...
Multi-Attribute Preferences Mining Method for Group Users with the Process of Noise Reduction
Traditional researches on user preferences mining mainly explore the user's overall preferences on the project, but ignore that the fundamental motivation of user preferences comes from their attitudes on some attributes of the project. In ...