skip to main content
10.1145/1376616.1376685acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Ranking queries on uncertain data: a probabilistic threshold approach

Published: 09 June 2008 Publication History

Abstract

Uncertain data is inherent in a few important applications such as environmental surveillance and mobile object tracking. Top-k queries (also known as ranking queries) are often natural and useful in analyzing uncertain data in those applications. In this paper, we study the problem of answering probabilistic threshold top-k queries on uncertain data, which computes uncertain records taking a probability of at least p to be in the top-k list where p is a user specified probability threshold. We present an efficient exact algorithm, a fast sampling algorithm, and a Poisson approximation based algorithm. An empirical study using real and synthetic data sets verifies the effectiveness of probabilistic threshold top-k queries and the efficiency of our methods.

References

[1]
S. Abiteboul et al. On the representation and querying of sets of possible worlds. In SIGMOD'87.
[2]
D. Angluin and L. G. Valiant. Fast probabilistic algorithms for hamiltonian circuits and matchings. In STOC'77.
[3]
L. Antova et al. 10106 worlds and beyond: Efficient representation and processing of incomplete information. In ICDE'07.
[4]
O. Benjelloun et al. Uldbs: databases with uncertainty and lineage. In VLDB'06.
[5]
L. Le Cam. An approximation theorem for poisson binomial distribution. Pacific J. Math., 10:1181--1197, 1960.
[6]
R. Cheng et al. Evaluating probabilistic queries over imprecise data. In SIGMOD'03.
[7]
R. Cheng et al. Efficient indexing methods for probabilistic threshold queries over uncertain data. In VLDB'04.
[8]
N. Dalvi and D. Suciu. Management of probabilistic data: foundations and challenges. In PODS'07.
[9]
N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB'04.
[10]
N. Dalvi and D. Suciu. The dichotomy of conjunctive queries on probabilistic structures. In PODS'07.
[11]
D. Dey and S. Sarkar. A probabilistic relational model and algebra. ACM Trans. Database Syst., 21(3):339--369, 1996.
[12]
R. Fagin et al. Optimal aggregation algorithms for middleware. In PODS'01.
[13]
J. L. Hodges and Jr. L. Le Cam. The poisson approximation to the poisson binomial distribution. The Annals of Mathematical Statistics, 31(3):737--740, 1960.
[14]
W. Hoeffding. On the distribution of the number of successes in independent trials. Annals of Mathematical Statistics, 27:713--721, 1956.
[15]
M. Hua et al. Efficiently Answering Probabilistic Threshold Top-k Queries on Uncertain Data. In ICDE'08.
[16]
T. Imielinski and Jr. Witold Lipski. Incomplete information in relational databases. Journal of ACM, 31(4):761--791, 1984.
[17]
K. Lange. Numerical analysis for statisticians. Statistics and computing. 1999.
[18]
S. K. Lee. Imprecise and uncertain information in databases: An evidential approach. In ICDE'92.
[19]
X. Lian and L. Chen. Probabilistic Ranked Queries in Uncertain Databases. In EDBT'08.
[20]
R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, United Kingdom, 1995.
[21]
J. Pei et al. Probabilistic skylines on uncertain data. In VLDB'07, Viena, Austria, September 2007.
[22]
C. Ré et al. Efficient top-k query evaluation on probabilistic data. In ICDE'07.
[23]
A. D. Sarma et al. Working models for uncertain data. In ICDE'06.
[24]
A. Silberstein et al. A Sampling-Based Approach to Optimizing Top-k Queries in Sensor Networks. In ICDE'06.
[25]
S. Singh et al. Indexing uncertain categorical data. In ICDE'07.
[26]
M. A. Soliman et al. Top-k query processing in uncertain databases. In ICDE'07.
[27]
Y. Tao et al. Indexing multi-dimensional uncertain data with arbitrary probability density functions. In VLDB'05.
[28]
Y. H. Wang. On the number of successes in independent trials. Statistica Sinica, 3:295--312, 1993.
[29]
K. Yi et al. Efficient processing of top-k queries in uncertain databases. In ICDE'08.
[30]
X. Zhang and J. Chomicki. On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases. In DBRank'08 Workshop.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '08: Proceedings of the 2008 ACM SIGMOD international conference on Management of data
June 2008
1396 pages
ISBN:9781605581026
DOI:10.1145/1376616
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 June 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. probabilistic threshold top-k queries
  2. query processing
  3. uncertain data

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '08
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)44
  • Downloads (Last 6 weeks)2
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media