skip to main content
article

A new quantile tracking algorithm using a generalized exponentially weighted average of observations

Published: 01 April 2019 Publication History

Abstract

The Exponentially Weighted Average (EWA) of observations is known to be a state-of-art estimator for tracking expectations of dynamically varying data stream distributions. However, how to devise an EWA estimator to track quantiles of data stream distributions is not obvious. In this paper, we present a lightweight quantile estimator using a generalized form of the EWA. To the best of our knowledge, this work represents the first reported quantile estimator of this form in the literature. An appealing property of the estimator is that the update step size is adjusted online proportionally to the difference between current observation and the current quantile estimate. Thus, if the estimator is off-track compared to the data stream, large steps will be taken to promptly get the estimator back on-track. The convergence of the estimator to the true quantile is proven using the theory of stochastic learning. Extensive experimental results using both synthetic and real-life data show that our estimator clearly outperforms legacy state-of-the-art quantile tracking estimators and achieves faster adaptivity in dynamic environments. The quantile estimator was further tested on real-life data where the objective is efficient in online control of indoor climate. We show that the estimator can be incorporated into a concept drift detector to efficiently decide when a machine learning model used to predict future indoor temperature should be retrained/updated.

References

[1]
Abbasi B, Guillen M (2013) Bootstrap control charts in monitoring value at risk in insurance. Expert Syst Appl 40(15):6125---6135
[2]
Arandjelovic O, Pham D-S, Venkatesh S (2015) Two maximum entropy-based algorithms for running quantile estimation in nonstationary data streams. IEEE Trans Circ Syst Video Technol 9:1469---1479
[3]
Cao J, Li L, Chen A, Bu T (2010) Tracking quantiles of network data streams with dynamic operations. In: INFOCOM Proceedings IEEE. IEEE, pp 1---5
[4]
Cao J, Li EL, Chen A, Bu T (2009) Incremental tracking of multiple quantiles for network monitoring in cellular networks. In: Proceedings of the 1st ACM workshop on mobile internet through cellular networks. ACM, pp 7---12
[5]
Chambers JM, James DA, Lambert D, Wiel SV et al (2006) Monitoring networked applications with incremental quantile estimation. Stat Sci 21(4):463---475
[6]
Chen F, Lambert D, Pinheiro JC (2000) Incremental quantile estimation for massive tracking. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 516---522
[7]
Choi B-Y, Moon S, Cruz R, Zhang Z-L, Diot C (2007) Quantile sampling for practical delay monitoring in internet backbone networks. Comput Netw 51(10):2701---2716
[8]
Cormode G, Muthukrishnan S (2005) An improved data stream summary: the count-min sketch and its applications. J Algorithm 55(1):58---75
[9]
Espinosa HP, García CAR, Pineda LV (2010) Features selection for primitives estimation on emotional speech. In: 2010 IEEE international conference on acoustics speech and signal processing (ICASSP). IEEE, pp 5138---5141
[10]
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1
[11]
Gaber MM, Gama J, Krishnaswamy S, Gomes JB, Stahl F (2014) Data stream mining in ubiquitous environments: state-of-the-art and current directions. Wiley Interdiscip Rev: Data Min Knowl Discov 4(2):116---138
[12]
Gama J (2013) Data stream mining: the bounded rationality. Informatica 37(1)
[13]
Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv (CSUR) 46(4):44:1---44:37.
[14]
Everette S (2006) Gardner. Exponential smoothing: the state of the art, part II. Int J Forecast 22(4):637---666
[15]
Gilli M et al (2006) An application of extreme value theory for measuring financial risk. Comput Econ 27 (2-3):207---228
[16]
Gregory A, Lau F, Butler L (2018) A quantile-based approach to modelling recovery time in structural health monitoring. arXiv:1803.08444
[17]
Guha S, McGregor A (2009) Stream order and order statistics: quantile estimation in random-order streams. SIAM J Comput 38(5):2044---2059
[18]
Kejariwal A, Kulkarni S, Ramasamy K (2015) Real time analytics: algorithms and systems. Proc VLDB Endowment 8(12):2040---2041
[19]
Konda VR, Tsitsiklis JN (2004) Convergence rate of linear two-time-scale stochastic approximation. The Annals of Applied Probability 14(2):796---819
[20]
Krempl G, ?liobaite I, Brzeziński D, Hüllermeier E, Last M, Lemaire V, Noack T, Shaker A, Sievi S, Spiliopoulou M et al (2014) Open challenges for data stream mining research. ACM SIGKDD Explor Newsl 16(1):1---10
[21]
Lall A Data streaming algorithms for the kolmogorov-smirnov test. In: 2015 IEEE international conference on big data (Big Data). IEEE, pp 95---104
[22]
Liu J, Zheng W, Zheng L, Lin N (2018) Accurate quantile estimation for skewed data streams using nonlinear interpolation. IEEE Access
[23]
Luo G, Wang L, Yi K, Cormode G (2016) Quantiles over data streams: experimental comparisons, new analyses, and further improvements. The VLDB Journal---The International Journal on Very Large Data Bases 25 (4):449---472
[24]
Ma Q, Muthukrishnan S, Sandler M (2013) Frugal streaming for estimating quantiles. In: space-efficient data structures, streams, and algorithms. Springer, pp 77---96
[25]
Ian Munro J, Paterson MS (1980) Selection and sorting with limited storage. Theor Comput Sci 12 (3):315---323
[26]
Frank Norman M (1972) Markov processes and learning models, vol 84. Academic Press, New York
[27]
Ramírez-Gallego S, Krawczyk B, García S, Wo?niak M, Herrera F (2017) A survey on data preprocessing for data stream mining: current status and future directions, vol 239
[28]
Schmeiser BW, Deutsch SJ (1977) Quantile estimation from grouped data: The cell midpoint. Commun Stat Simul Comput 6(3):221---234
[29]
Sen R, Maurya A, Raman B, Mehta R, Kalyanaraman R, Singh A (2014) Road-rfsense: a practical rf sensing---based road traffic estimation system for developing regions. ACM Trans Sensor Netw (TOSN) 11(1):4
[30]
Sommers J, Barford P, Duffield N, Ron A (2007) Accurate and efficient sla compliance monitoring. In: ACM SIGCOMM computer communication review. ACM, vol 37-4, pp 109--- 120
[31]
Sommers J, Barford P, Duffield N, Ron A (2010) Multiobjective monitoring for sla compliance. IEEE/ACM Trans Netw (TON) 18(2):652---665
[32]
Stahl V, Fischer A, Bippus R (2000) Quantile based noise estimation for spectral subtraction and wiener filtering. In: acoustics, speech, and signal processing, 2000. ICASSP'00. Proceedings IEEE International Conference on. IEEE, vol 3, pp 1875---1878
[33]
Tierney L (1983) A space-efficient recursive procedure for estimating a quantile of an unknown distribution. SIAM J Sci Stat Comput 4(4):706---711
[34]
Tiwari N, Pandey PC (2018) A technique with low memory and computational requirements for dynamic tracking of quantiles. Journal of Signal Processing Systems.
[35]
Vogt T, André E (2005) Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In: 2005 ICME 2005 IEEE international conference on multimedia and expo. IEEE, pp 474---477
[36]
Wang W, Ching W-K, Wang S, Yu L (2016) Quantiles on stream An application to monte carlo simulation. J Syst Sci Inf 4(4):334---342
[37]
Weide B (1978) Space-efficient on-line selection algorithms. In: Computer science and statistics: proceedings of the eleventh annual symposium on the interface, pp 308---311
[38]
Yazidi A, Hammer HL (2017) Multiplicative Update Methods for Incremental Quantile Estimation. IEEE Transactions on Cybernetics (accepted)
[39]
Zamora-Martínez F, Romeu P, Botella-Rocamora P, Pardo J (2014) On-line learning of indoor temperature forecasting models towards energy efficiency. Energy Build 83:162---172
[40]
Zhang L, Guan Y (2008) Detecting click fraud in pay-per-click streams of online advertising networks. In: 28th international conference on distributed computing systems ICDCS'08
[41]
Zhang X, Alexander L, Hegerl GC, Jones P, Tank AK, Peterson TC, Trewin B, Zwiers FW (2011) Indices for monitoring changes in extremes based on daily temperature and precipitation data. Wiley Interdiscip Rev Clim Chang 2(6):851---870

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Applied Intelligence
Applied Intelligence  Volume 49, Issue 4
April 2019
408 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 April 2019

Author Tags

  1. Concept drift detection
  2. Data stream
  3. Generalized exponentially weighted average
  4. Quantile tracking

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media