skip to main content
research-article

Online Model Management via Temporally Biased Sampling

Published: 05 November 2019 Publication History

Abstract

To maintain the accuracy of supervised learning models in the presence of evolving data streams, we provide temporallybiased sampling schemes that weight recent data most heavily, with inclusion probabilities for a given data item decaying exponentially over time. We then periodically retrain the models on the current sample. We provide and analyze both a simple sampling scheme (T-TBS) that probabilistically maintains a target sample size and a novel reservoirbased scheme (R-TBS) that is the first to provide both control over the decay rate and a guaranteed upper bound on the sample size. The R-TBS and T-TBS schemes are of independent interest, extending the known set of unequalprobability sampling schemes. We discuss distributed implementation strategies; experiments in Spark show that our approach can increase machine learning accuracy and robustness in the face of evolving data.

References

[1]
An interactive deep dive into the Kaggle data science survey. https://www.kaggle.com/sudalairajkumar/an-interactive-deepdive- into-survey-results.
[2]
Memcached. https://memcached.org.
[3]
Redis. https://redis.io.
[4]
C. C. Aggarwal. On biased reservoir sampling in the presence of stream evolution. In VLDB, pages 607--618. VLDB Endowment, 2006.
[5]
P. Bailis, E. Gan, S. Madden, D. Narayanan, K. Rong, and S. Suri. MacroBase: Prioritizing attention in fast data. In SIGMOD, pages 541--556, 2017.
[6]
M. T. Chao. A general purpose unequal probability sampling plan. Biometrika, pages 653--656, 1982.
[7]
E. Cohen and M. J. Strauss. Maintaining time-decaying stream aggregates. J. Algo., 59(1):19--36, 2006.
[8]
P. S. Efraimidis and P. G. Spirakis. Weighted random sampling with a reservoir. Inf. Process. Lett., 97(5):181--185, 2006.
[9]
J. Gama, I. Zliobaite, A. Bifet, M. Pechenizkiy, and A. Bouchachia. A survey on concept drift adaptation. ACM Comput. Surv., 46(4):44, 2014.
[10]
R. Gemulla and W. Lehner. Sampling time-based sliding windows in bounded space. In SIGMOD, pages 379--392, 2008.
[11]
H. Haramoto, M. Matsumoto, T. Nishimura, F. Panneton, and P. L'Ecuyer. Efficient Jump Ahead for 2-Linear Random Number Generators. INFORMS Journal on Computing, 20(3):385--390, 2008.
[12]
B. Hentschel, P. J. Haas, and Y. Tian. Temporally-biased sampling schemes for online model management. CoRR, abs/1906.05677, 2019.
[13]
V. Kachitvichyanukul and B. W. Schmeiser. Binomial random variate generation. Commun. ACM, 31(2):216--222, 1988.
[14]
A. J. McNeil, R. Frey, and P. Embrechts. Quantitative Risk Management: Concepts, Techniques and Tools. Second edition, 2015.
[15]
W. Xie, Y. Tian, Y. Sismanis, A. Balmin, and P. J. Haas. Dynamic interaction graphs with probabilistic edge decay. In ICDE, pages 1143--1154, 2015.

Cited By

View all
  • (2024)HyperTime: Hyperparameter Optimization for Combating Temporal Distribution ShiftsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681608(4610-4619)Online publication date: 28-Oct-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 48, Issue 1
March 2019
81 pages
ISSN:0163-5808
DOI:10.1145/3371316
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 November 2019
Published in SIGMOD Volume 48, Issue 1

Check for updates

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)HyperTime: Hyperparameter Optimization for Combating Temporal Distribution ShiftsProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681608(4610-4619)Online publication date: 28-Oct-2024

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media