No abstract available.
Mining frequent patterns without candidate generation
Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test ...
Data mining on an OLTP system (nearly) for free
This paper proposes a scheme for scheduling disk requests that takes advantage of the ability of high-level functions to operate directly at individual disk drives. We show that such a scheme makes it possible to support a Data Mining workload on an ...
Turbo-charging vertical mining of large databases
In a vertical representation of a market-basket database, each item is associated with a column of values representing the transactions in which it is present. The association-rule mining algorithms that have been recently proposed for this ...
High speed on-line backup when using logical log operations
Media recovery protects a database from failures of the stable medium by maintaining an extra copy of the database, called the backup, and a media recovery log. When a failure occurs, the database is “restored” from the backup, and the media recovery ...
Efficient resumption of interrupted warehouse loads
Data warehouses collect large quantities of data from distributed sources into a single repository. A typical load to create or maintain a warehouse processes GBs of data, takes hours or even days to execute, and involves many complex and user-defined ...
On-line reorganization in object databases
Reorganization of objects in an object databases is an important component of several operations like compaction, clustering, and schema evolution. The high availability requirements (24 × 7 operation) of certain application domains requires ...
Finding generalized projected clusters in high dimensional spaces
High dimensional data has always been a challenge for clustering algorithms because of the inherent sparsity of the points. Recent research results indicate that in high dimensional data, even the concept of proximity or clustering may not be ...
Density biased sampling: an improved method for data mining and clustering
Data mining in large data sets often requires a sampling or summarization step to form an in-core representation of the data that can be processed more efficiently. Uniform random sampling is frequently used in practice and also frequently criticized ...
LOF: identifying density-based local outliers
For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an outlier as a binary ...
Answering complex SQL queries using automatic summary tables
We investigate the problem of using materialized views to answer SQL queries. We focus on modern decision-support queries, which involve joins, arithmetic operations and other (possibly user-defined) functions, aggregation (often along multiple ...
Synchronizing a database to improve freshness
In this paper we study how to refresh a local copy of an autonomous data source to maintain the copy up-to-date. As the size of the data grows, it becomes more difficult to maintain the copy \ fresh, “making it crucial to synchronize the copy ...
How to roll a join: asynchronous incremental view maintenance
Incremental refresh of a materialized join view is often less expensive than a full, non-incremental refresh. However, it is still a potentially costly atomic operation. This paper presents an algorithm that performs incremental view maintenance as a ...
On wrapping query languages and efficient XML integration
Modern applications (Web portals, digital libraries, etc.) require integrated access to various information sources (from traditional DBMS to semistructured Web repositories), fast deployment and low maintenance cost in a rapidly evolving environment. ...
XMill: an efficient compressor for XML data
We describe a tool for compressing XML data, with applications in data exchange and archiving, which usually achieves about twice the compression ratio of gzip at roughly the same speed. The compressor, called XMill, incorporates and combines existing ...
XTRACT: a system for extracting document type descriptors from XML documents
XML is rapidly emerging as the new standard for data representation and exchange on the Web. An XML document can be accompanied by a Document Type Descriptor (DTD) which plays the role of a schema for an XML data collection. DTDs contain valuable ...
Spatial join selectivity using power laws
We discovered a surprising law governing the spatial join selectivity across two sets of points. An example of such a spatial join is “find the libraries that are within 10 miles of schools”. Our law dictates that the number of such qualifying pairs ...
Closest pair queries in spatial databases
This paper addresses the problem of finding the K closest pairs between two spatial data sets, where each set is stored in a structure belonging in the R-tree family. Five different algorithms (four recursive and one iterative) are presented for solving ...
Influence sets based on reverse nearest neighbor queries
Inherent in the operation of many decision support and continuous referral systems is the notion of the “influence” of a data point on the database. This notion arises in examples such as finding the set of customers affected by the opening of a new ...
MOCHA: a self-extensible database middleware system for distributed data sources
We present MOCHA, a new self-extensible database middleware system designed to interconnect distributed data sources. MOCHA is designed to scale to large environments and is based on the idea that some of the user-defined functionality in the system ...
Towards self-tuning data placement in parallel database systems
Parallel database systems are increasingly being deployed to support the performance demands of end-users. While declustering data across multiple nodes facilitates parallelism, initial data placement may not be optimal due to skewed workloads and ...
LH*RS: a high-availability scalable distributed data structure using Reed Solomon Codes
LH*RS is a new high-availability Scalable Distributed Data Structure (SDDS). The data storage scheme and the search performance of LH*RS are basically these of LH*. LH*RS manages in addition the parity information to tolerate the unavailability of k ⪈ 1 ...
Efficient and extensible algorithms for multi query optimization
Complex queries are becoming commonplace, with the growing use of decision support systems. These complex queries often have a lot of common sub-expressions, either within a single query, or across multiple such queries run as a batch. Multiquery ...
Eddies: continuously adaptive query processing
In large federated and shared-nothing databases, resources can exhibit widely fluctuating characteristics. Assumptions made at the time a query is submitted will rarely hold throughout the duration of query processing. As a result, traditional static ...
A chase too far?
In a previous paper we proposed a novel method for generating alternative query plans that uses chasing (and back-chasing) with logical constraints. The method brings together use of indexes, use of materialized views, semantic optimization and join ...
WSQ/DSQ: a practical approach for combined querying of databases and the Web
We present WSQ/DSQ (pronounced “wisk-disk”), a new approach for combining the query facilities of traditional databases with existing search engines on the Web. WSQ, for Web-Supported (Database) Queries, leverages results from Web searches to enhance ...
A framework for expressing and combining preferences
The advent of the World Wide Web has created an explosion in the available on-line information. As the range of potential choices expand, the time and effort required to sort through them also expands. We propose a formal framework for expressing and ...
Microsoft TerraServer: a spatial data warehouse
Microsoft® TerraServer stores aerial, satellite, and topographic images of the earth in a SQL database available via the Internet. It is the world's largest online atlas, combining eight terabytes of image data from the United States Geological Survey (...
A data model and data structures for moving objects databases
We consider spatio-temporal databases supporting spatial objects with continuously changing position and extent, termed moving objects databases. We formally define a data model for such databases that includes complex evolving spatial structures such ...
Indexing the positions of continuously moving objects
The coming years will witness dramatic advances in wireless communications as well as positioning technologies. As a result, tracking the changing positions of objects capable of continuous movement is becoming increasingly feasible and necessary. The ...
Adaptive multi-stage distance join processing
A spatial distance join is a relatively new type of operation introduced for spatial and multimedia database applications. Additional requirements for ranking and stopping cardinality are often combined with the spatial distance join in on-line query ...
Cited By
Connolly P, Flanagan K and Fallon E (2024). Parameter Reduction Optimisation for Analysis of E-Commerce Consumer Purchase Patterns 2024 IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology (IAICT), 10.1109/IAICT62357.2024.10617480, 979-8-3503-5346-4, (341-347)
Han L, Gao Y, Zheng X, Liu F and Wei Y (2024). Anomaly detection and identification of power consumption data based on LOF and isolation forest International Conference on Mechatronic Engineering and Artificial Intelligence (MEAI 2023), 10.1117/12.3025688, 9781510674608, (174)
Cheng D, Fang S, Jiang M, Dong F, Sirkemaa S and Agyeman M (2022). Research on anomaly detection algorithm in complex electromagnetic environment 2022 International Conference on Image, Signal Processing, and Pattern Recognition, 10.1117/12.2636843, 9781510654846, (49)
Cichosz P, Romaniuk R and Linczuk M (2018). Anomaly detection in discussion forum posts using global vectors Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018, 10.1117/12.2501345, 9781510622036, (40)
Index Terms
- Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Acceptance Rates
Year | Submitted | Accepted | Rate |
SIGMOD '19 | 430 | 88 | 20% |
SIGMOD '18 | 461 | 90 | 20% |
SIGMOD '15 | 415 | 106 | 26% |
SIGMOD '14 | 421 | 107 | 25% |
SIGMOD '13 | 372 | 76 | 20% |
SIGMOD '12 | 289 | 48 | 17% |
SIGMOD '03 | 342 | 53 | 15% |
SIGMOD '02 | 240 | 42 | 18% |
SIGMOD '01 | 293 | 44 | 15% |
SIGMOD '00 | 248 | 42 | 17% |
SIGMOD '97 | 202 | 42 | 21% |
SIGMOD '96 | 290 | 47 | 16% |
Overall | 4,003 | 785 | 20% |