skip to main content
10.1007/978-3-662-49192-8_6guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype

Big Sequence Management: A glimpse of the Past, the Present, and the Future

Published: 23 January 2016 Publication History


There is an increasingly pressing need, by several applications in diverse domains, for developing techniques able to index and mine very large collections of sequences, or data series. Examples of such applications come from biology, astronomy, entomology, the web, and other domains. It is not unusual for these applications to involve numbers of data series in the order of hundreds of millions to billions, which are often times not analyzed in their full detail due to their sheer size. In this work, we describe recent efforts in designing techniques for indexing and mining truly massive collections of data series that will enable scientists to easily analyze their data. We show that the main bottleneck in mining such massive datasets is the time taken to build the index, and we thus introduce solutions to this problem. Furthermore, we discuss novel techniques that adaptively create data series indexes, allowing users to correctly answer queries before the indexing task is finished. We also show how our methods allow mining on datasets that would otherwise be completely untenable, including the first published experiments using one billion data series. Finally, we present our vision for the future in big sequence management research.


Adhd-200 2011.\
Sloan digital sky survey 2015.
Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient similarity search in sequence databases. In: Lomet, D.B. ed. FODO 1993. LNCS, vol. 730, pp. 69---84. Springer, Heidelberg 1993
An, N., Kanth, R., Kothuri, V., Ravada, S.: Improving performance with bulk-inserts in oracle r-trees. In: VLDB, pp. 948---951. VLDB Endowment 2003
Assent, L., Krieger, R., Afschari, F., Seidl, T.: The TS-tree: efficient time series search and retrieval. In EDBT 2008
Aβfalg, J., Kriegel, H.-P., Kröger, P., Renz, M.: Probabilistic similarity search for uncertain time series. In: Winslett, M. ed. SSDBM 2009. LNCS, vol. 5566, pp. 435---443. Springer, Heidelberg 2009
Astrahan, M.M., Blasgen, M.W., Chamberlin, D.D., Eswaran, K.P., Gray, J., Griffiths, P.P., King, W.F., Lorie, R.A., McJones, P.R., Mehl, J.W., Putzolu, G.R., Traiger, I.L., Wade, B.W., Watson, V.: System R: relational approach to database management. TODS 12, 97---137 1976
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 189, 509---517 1975
Berchtold, S., Keim, D.A., Kriegel, H.P.: The X-tree: an index structure for high-dimensional data. In: VLDB, pp. 28---39 1996
Bernstein, P., Bykov, S., Geller, A., Kliot, G., Thelin, J.: Orleans: distributed virtual actors for programmability and scalability. MSR-TR-2014-41 2014
Bu, Y., wing Leung, T., chee Fu, A.W., Keogh, E., Pei, J., Meshkin, S.: Wat: finding top-k discords in time series database. In: SDM, pp. 449---454 2007
Camerra, A., Palpanas, T., Shieh, J., Keogh, E.: iSAX 2.0: indexing and mining one billion time series. In: ICDM 2010
Camerra, A., Shieh, J., Palpanas, T., Rakthanmanon, T., Keogh, E.J.: Beyond one billion time series: indexing and mining very large time series collections with iSAX2+. KAIS 391, 123---151 2014
Chakrabarti, K., Keogh, E., Mehrotra, S., Pazzani, M.: Locally adaptive dimensionality reduction for indexing large time series databases. In: SIGMOD 2002
Chan, K.-P., Fu. A.-C.: Efficient time series matching by wavelets. In: ICDE 1999
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 413, 1---58 2009
Dallachiesa, M., Nushi, B., Mirylenka, K., Palpanas, T.: Uncertain time-series similarity: return to the basics. PVLDB 511, 1662---1673 2012
Dallachiesa, M., Palpanas, T., Ilyas, I.F.: Top-k nearest neighbor search in uncertain data series. PVLDB 81, 13---24 2014
Ding, H., Trajcevski, G., Scheuermann, P., Wang, X., Keogh, E.: Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB 1, 1542---1552 2008
Soisalon-Soininen, E., Widmayer, P.: Single and bulk updates in stratified trees: an amortized and worst-case analysis. In: Klein, R., Six, H.-W., Wegner, L. eds. Computer Science in Perspective. LNCS, vol. 2598, pp. 278---292. Springer, Heidelberg 2003
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD 1984
Huijse, P., Estévez, P.A., Protopapas, P., Principe, J.C., Zegers, P.: Computational intelligence challenges and applications on large-scale astronomical time series databases. IEEE Comp. Int. Mag. 93, 27---39 2014
Van den Bercken, J., Seeger, B.: An evaluation of generic bulk loading techniques. In: VLDB, pp. 461---470 2001
Van den Bercken, J., Widmayer, P., Seeger, B.: A generic approach to bulk loading multidimensional index structures. In: VLDB 1997
Kadiyala, S., Shiri, N.: A compact multi-resolution index for variable length queries in time series databases. KAIS 152, 131---147 2008
Kashino, K., Smith, G., Murase, H.: Time-series active search for quick retrieval of audio and video. In: ICASSP 1999
Kashyap, S., Karras, P.: Scalable knn search on vertically stored time series. In: KDD 2011
Keogh, E., Chakrabarti, K., Pazzani, M., Mehrotra, S.: Dimensionality reduction for fast similarity search in large time series databases. KAIS 33, 263---286 2000
Keogh, E.J., Palpanas, T., Zordan, V.B., Gunopulos, D., Cardle, M.: Indexing large human-motion databases. In: VLDB, pp. 780---791 2004
Arge, L., Hinrichs, K.H., Vahrenhold, J., Vitter, J.V.: Efficient bulk operations on dynamic R-trees. Algorithmica 331, 104---128 2002
Lerner, A., Shasha, D.: Aquery: query language for ordered data, optimization techniques, and experiments. In: VLDB 2003
Li, C.S., Yu, P., Castelli, V.: Hierarchyscan: a hierarchical similarity search algorithm for databases of long sequences. In: ICDE 1996
Liao, H., Han, J., Fang, J.: Multi-dimensional index on hadoop distributed file system. In: NAS 2010
Lin, J., Keogh, E., Lonardi, S.: A symbolic representation of time series, with implications for streaming algorithms. In: DMKD 2003
Lin, J., Khade, R., Li, Y.: Rotation-invariant similarity in time series using bag-of-patterns representation. J. Intell. Inf. Syst. 392, 287---315 2012
Palpanas, T.: Data series management: the road to big sequence analytics. SIGMOD Rec. 442, 47---52 2015
Palpanas, T., Vlachos, M., Keogh, E.J., Gunopulos, D.: Streaming time series summarization using user-defined amnesic functions. IEEE Trans. Knowl. Data Eng. 207, 992---1006 2008
Palpanas, T., Vlachos, M., Keogh, E.J., Gunopulos, D., Truppel, W.: Online amnesic approximation of streaming time series. In: ICDE, pp. 339---349 2004
Rafiei, D., Mendelzon, A.: Similarity-based queries for time series data. In: SIGMOD 1997
Rakthanmanon, T., Campana, B.J.L., Mueen, A., Batista, G., Westover, M.B., Zhu, Q., Zakaria, J., Keogh, E.J.: Searching and mining trillions of time series subsequences under dynamic time warping. In: KDD 2012
Raman, V., Attaluri, G.K., Barber, R., Chainani, N., Kalmuk, D., KulandaiSamy, V., Leenstra, J., Lightstone, S., S. Liu, S., Lohman, G.M., Malkemus, T., Müller, R., Pandis, I., Schiefer, B., Sharpe, D., Sidle, R., Storm, A.J., Zhang, L.: DB2 with BLU acceleration: so much more than just a column store. PVLDB 611, 1080---1091 2013
Raza, U., Camerra, A., Murphy, A.L., Palpanas, T., Picco, G.P.: Practical data prediction for real-world wireless sensor networks. IEEE Trans. Knowl. Data Eng. 278, 2231---2244 2015
Choubey, R., Chen, L., Rundensteiner, E.A.: GBI: a generalized R-tree bulk-insertion strategy. In: Güting, R.H., Papadias, D., Lochovsky, F.H. eds. SSD 1999. LNCS, vol. 1651, pp. 91---108. Springer, Heidelberg 1999
Sadri, R., Zaniolo, C., Zarkesh, A.M., Adibi, J.: A sequential pattern query language for supporting instant data mining for e-services. In: VLDB 2001
Sarangi, S.R., Murthy, K.: DUST: a generalized notion of similarity between uncertain time series. In: KDD 2010
Schäfer, P., Högqvist, M.: SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets. In: EDBT 2012
Shasha, D.: Tuning time series queries in finance: case studies and recommendations. IEEE Data Eng. Bull. 222, 40---46 1999
Shieh, J., Keogh, E.: iSAX: disk-aware mining and indexing of massive time series datasets. DMKD 191, 24---57 2009
Shieh, J., Keogh, E.J.: iSAX: indexing and mining terabyte sized time series. In: KDD, pp. 623---631 2008
Stonebraker, M., Abadi, M., Batkin, D.J., Chen, J. X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O'Neil, E.J., O'Neil, P.E., Rasin, A., Tran, N., Zdonik, S.B.: C-store: a column-oriented DBMS. In: VLDB 2005
Stonebraker, M., Brown, P., Poliakov, A., Raman, S.: The architecture of SciDB. In: Bayard Cushing, J., French, J., Bowers, S. eds. SSDBM 2011. LNCS, vol. 6809, pp. 1---16. Springer, Heidelberg 2011
Wang, Y., Wang, P., Pei, J., Wang, W., Huang, S.: A data-adaptive and dynamic segmentation index for whole matching on time series. PVLDB 610, 793---804 2013
Warren Liao, T.: Clustering of time series data - a survey. Pattern Recogn. 3811, 1857---1874 2005
Ye, L., Keogh, E.J.: Time series shapelets: a new primitive for data mining. In: KDD 2009
Yeh, M., Wu, K., Yu, P.S., Chen, M.: PROUD: a probabilistic approach to processing similarity queries over uncertain data streams. In: EDBT 2009
Yi, B., Faloutsos, C.: Fast time sequence indexing for arbitrary Lp norms. In: VLDB 2000
Zoumpatianos, K., Idreos, S., Palpanas, T.: Indexing for interactive exploration of big data series. In: SIGMOD 2014
Zoumpatianos, K., Idreos, S., Palpanas, T.: RINSE: interactive data series exploration with ADS+. PVLDB 812, 1912---1923 2015
Zoumpatianos, K., Lou, Y., Palpanas, T., Gehrke, J.: Query workloads for data series indexes. In: KDD 2015

Cited By

View all



Information & Contributors


Published In

cover image Guide Proceedings
Proceedings of the 42nd International Conference on SOFSEM 2016: Theory and Practice of Computer Science - Volume 9587
January 2016
613 pages



Berlin, Heidelberg

Publication History

Published: 23 January 2016

Author Tags

  1. Data analytics
  2. Data indexing
  3. Data management
  4. Data series


  • Article


Other Metrics

Bibliometrics & Citations


Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Jan 2025

Other Metrics


Cited By

View all

View Options

View options







Share this Publication link

Share on social media