No abstract available.
Efficient computation of Iceberg cubes with complex measures
It is often too expensive to compute and materialize a complete high-dimensional data cube. Computing an iceberg cube, which contains only aggregates above certain thresholds, is an effective way to derive nontrivial multi-dimensional aggregations for ...
On computing correlated aggregates over continual data streams
In many applications from telephone fraud detection to network management, data arrives in a stream, and there is a need to maintain a variety of statistical summary information about a large number of customers in an online fashion. At present, such ...
Iceberg-cube computation with PC clusters
In this paper, we investigate the approach of using low cost PC cluster to parallelize the computation of iceberg-cube queries. We concentrate on techniques directed towards online querying of large, high-dimensional datasets where it is assumed that ...
Outlier detection for high dimensional data
The outlier detection problem has important applications in the field of fraud detection, network robustness analysis, and intrusion detection. Most such applications are high dimensional domains in which the data can contain hundreds of dimensions. ...
Bit-sliced index arithmetic
The bit-sliced index (BSI) was originally defined in [ONQ97]. The current paper introduces the concept of BSI arithmetic. For any two BSI's X and Y on a table T, we show how to efficiently generate new BSI's Z, V, and W, such that Z = X + Y, V = X - Y, ...
Space-efficient online computation of quantile summaries
An ∈-approximate quantile summary of a sequence of N elements is a data structure that can answer quantile queries about the sequence to within a precision of ∈N.
We present a new online algorithm for computing∈-approximate quantile summaries of very ...
Probe, count, and classify: categorizing hidden web databases
The contents of many valuable web-accessible databases are only accessible through search interfaces and are hence invisible to traditional web “crawlers.” Recent studies have estimated the size of this “hidden web” to be 500 billion pages, while the ...
Data bubbles: quality preserving performance boosting for hierarchical clustering
In this paper, we investigate how to scale hierarchical clustering methods (such as OPTICS) to extremely large databases by utilizing data compression methods (such as BIRCH or random sampling). We propose a three step procedure: 1) compress the data ...
Mining needle in a haystack: classifying rare classes via two-phase rule induction
Learning models to classify rarely occurring target classes is an important problem with applications in network intrusion detection, fraud detection, or deviation detection in general. In this paper, we analyze our previously proposed two-phase rule ...
Efficient evaluation of XML middle-ware queries
We address the problem of efficiently constructing materialized XML views of relational databases. In our setting, the XML view is specified by a query in the declarative query language of a middle-ware system, called SilkRoute. The middle-ware system ...
Filtering algorithms and implementation for very fast publish/subscribe systems
Publish/Subscribe is the paradigm in which users express long-term interests (“subscriptions”) and some agent “publishes” events (e.g., offers). The job of Publish/Subscribe software is to send events to the owners of subscriptions satisfied by those ...
Adaptable query optimization and evaluation in temporal middleware
Time-referenced data are pervasive in most real-world databases. Recent advances in temporal query languages show that such database applications may benefit substantially from built-in temporal support in the DBMS. To achieve this, temporal query ...
Optimizing multidimensional index trees for main memory access
Recent studies have shown that cache-conscious indexes such as the CSB+-tree outperform conventional main memory indexes such as the T-tree. The key idea of these cache-conscious indexes is to eliminate most of child pointers from a node to increase the ...
Locally adaptive dimensionality reduction for indexing large time series databases
Similarity search in large time series databases has attracted much research interest recently. It is a difficult problem because of the typically high dimensionality of the data.. The most promising solutions involve performing dimensionality reduction ...
Main-memory index structures with fixed-size partial keys
The performance of main-memory index structures is increasingly determined by the number of CPU cache misses incurred when traversing the index. When keys are stored indirectly, as is standard in main-memory databases, the cost of key retrieval in terms ...
Automatic segmentation of text into structured records
In this paper we present a method for automatically segmenting unformatted text records into structured elements. Several useful data sources today are human-generated as continuous text whereas convenient usage requires the data to be organized as ...
Efficient and effective metasearch for text databases incorporating linkages among documents
Linkages among documents have a significant impact on the importance of documents, as it can be argued that important documents are pointed to by many documents or by other important documents. Metasearch engines can be used to facilitate ordinary users ...
Independence is good: dependency-based histogram synopses for high-dimensional data
Approximating the joint data distribution of a multi-dimensional data set through a compact and accurate histogram synopsis is a fundamental problem arising in numerous practical scenarios, including query optimization and approximate query answering. ...
STHoles: a multidimensional workload-aware histogram
Attributes of a relation are not typically independent. Multidimensional histograms can be an effective tool for accurate multiattribute query selectivity estimation. In this paper, we introduce STHoles, a “workload-aware” histogram that allows bucket ...
Global optimization of histograms
Histograms are frequently used to represent the distribution of data values in an attribute of a relation. Most previous work has focused on identifying the optimal histogram (given a limited number of buckets) for a single attribute independent of ...
Improving index performance through prefetching
This paper proposes and evaluate Prefetching B+-Trees (pB+-Trees), which use prefetching to accelerate two important operations on B+-Tree indices: searches and range scans. To accelerate searches, pB+-Trees use prefetching to effectively create wider ...
Efficient and tumble similar set retrieval
Set value attributes are a concise and natural way to model complex data sets. Modern Object Relational systems support set value attributes and allow various query capabilities on them. In this paper we initiate a formal study of indexing techniques ...
PREFER: a system for the efficient execution of multi-parametric ranked queries
Users often need to optimize the selection of objects by appropriately weighting the importance of multiple object attributes. Such optimization problems appear often in operations' research and applied mathematics as well as everyday life; e.g., a ...
Query optimization in compressed database systems
Over the last decades, improvements in CPU speed have outpaced improvements in main memory and disk access rates by orders of magnitude, enabling the use of data compression techniques to improve the performance of database systems. Previous work ...
SPARTAN: a model-based semantic compression system for massive data tables
While a variety of lossy compression schemes have been developed for certain forms of digital data (e.g., images, audio, video), the area of lossy compression techniques for arbitrary data tables has been left relatively unexplored. Nevertheless, such ...
A robust, optimization-based approach for approximate answering of aggregate queries
The ability to approximately answer aggregation queries accurately and efficiently is of great benefit for decision support and data mining tools. In contrast to previous sampling-based studies, we treat the problem as an optimization problem whose goal ...
Materialized view selection and maintenance using multi-query optimization
Materialized views have been found to be very effective at speeding up queries, and are increasingly being supported by commercial databases and data warehouse systems. However, whereas the amount of data entering a warehouse and the number of ...
Generating efficient plans for queries using views
We study the problem or generating efficient, equivalent rewritings using views to compute the answer to a query. We take the closed-world assumption, in which views are materialized from base relations, rather than views describing sources in terms of ...
Optimizing queries using materialized views: a practical, scalable solution
Materialized views can provide massive improvements in query processing time, especially for aggregation queries over large tables. To realize this potential, the query optimizer must know how and when to exploit materialized views. This paper presents ...
Dynamic buffer allocation in video-on-demand systems
In video-on-demand (VOD) systems, as the size of the buffer allocated to user requests increases, initial latency and memory requirements increase. Hence, the buffer size must be minimized. The existing static buffer allocation scheme, however, ...
Cited By
-
Farnsworth D and Tang N (2023). Modeling and Fitting Two-Way Tables Containing Outliers, International Journal of Mathematics and Mathematical Sciences, 10.1155/2023/6352058, 2023, (1-6), Online publication date: 11-Feb-2023.
- Schlaipfer M, Rajan K, Lal A and Samak M Optimizing Big-Data Queries Using Program Synthesis Proceedings of the 26th Symposium on Operating Systems Principles, (631-646)
Index Terms
- Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Recommendations
Acceptance Rates
Year | Submitted | Accepted | Rate |
---|---|---|---|
SIGMOD '19 | 430 | 88 | 20% |
SIGMOD '18 | 461 | 90 | 20% |
SIGMOD '15 | 415 | 106 | 26% |
SIGMOD '14 | 421 | 107 | 25% |
SIGMOD '13 | 372 | 76 | 20% |
SIGMOD '12 | 289 | 48 | 17% |
SIGMOD '03 | 342 | 53 | 15% |
SIGMOD '02 | 240 | 42 | 18% |
SIGMOD '01 | 293 | 44 | 15% |
SIGMOD '00 | 248 | 42 | 17% |
SIGMOD '97 | 202 | 42 | 21% |
SIGMOD '96 | 290 | 47 | 16% |
Overall | 4,003 | 785 | 20% |