skip to main content
10.1145/3274895.3274949acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

Scalable hybrid similarity join over geolocated time series

Published: 06 November 2018 Publication History

Abstract

A geolocated time series is a sequence of values associated with a geolocation, such as measurements provided by a sensor installed at a certain location. In this paper, we address the problem of hybrid similarity joins over such geolocated time series. This operation returns all pairs of geolocated time series that exhibit similar behavior in the time series domain while also being closely located in space. First, we propose algorithms for performing such join operations using different types of indices, including spatial-only, time series-only, and hybrid indices. Such centralized indexing schemes can cope well with moderate data volumes but they face scalability issues when the dataset size increases significantly. To overcome this problem, we present a MapReduce-based processing scheme with space-driven partitioning. Our parallel and distributed algorithm leverages our hybrid index for geolocated time series to efficiently execute similarity joins locally within each partition and minimize the amount of data that needs to be shuffled between processing nodes. An extensive experimental evaluation confirms that our approach can efficiently compute all matching pairs even for datasets containing millions of geolocated time series.

References

[1]
N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An efficient and robust access method for points and rectangles. In SIGMOD, pages 322--331, 1990.
[2]
P. Bouros, S. Ge, and N. Mamoulis. Spatio-textual similarity joins. PVLDB, 6(1):1--12, 2012.
[3]
T. Brinkhoff, H. Kriegel, and B. Seeger. Efficient processing of spatial joins using R-trees. In SIGMOD, pages 237--246, 1993.
[4]
A. Camerra, T. Palpanas, J. Shieh, and E. J. Keogh. iSAX 2.0: Indexing and mining one billion time series. In ICDM, pages 58--67, 2010.
[5]
A. Camerra, J. Shieh, T. Palpanas, T. Rakthanmanon, and E. J. Keogh. Beyond one billion time series: indexing and mining very large time series collections with i SAX2+. Knowl. Inf. Syst., 39(1):123--151, 2014.
[6]
K. Chan and A. W. Fu. Efficient time series matching by wavelets. In ICDE, pages 126--133, 1999.
[7]
G. Chatzigeorgakidis, D. Skoutas, K. Patroumpas, S. Athanasiou, and S. Skiadopoulos. Indexing geolocated time series data. In SIGSPATIAL, pages 19:1--19:10, 2017.
[8]
H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. J. Keogh. Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB, 1(2):1542--1552, 2008.
[9]
D. Q. Goldin and P. C. Kanellakis. On similarity queries for time-series data: Constraint specification and implementation. In CP, pages 137--153, 1995.
[10]
A. Guttman. R-trees: A dynamic index structure for spatial searching. In SIGMOD, pages 47--57, 1984.
[11]
E. J. Keogh, K. Chakrabarti, M. J. Pazzani, and S. Mehrotra. Dimensionality reduction for fast similarity search in large time series databases. Knowl. Inf. Syst., 3(3):263--286, 2001.
[12]
J. Lin, E. J. Keogh, L. Wei, and S. Lonardi. Experiencing SAX: a novel symbolic representation of time series. Data Min. Knowl. Discov., 15(2):107--144, 2007.
[13]
T. Palpanas. Big sequence management: A glimpse of the past, the present, and the future. In SOFSEM, pages 63--80, 2016.
[14]
D. Papadias, N. Mamoulis, and Y. Theodoridis. Processing and optimization of multiway spatial joins using r-trees. In SIGMOD, pages 44--55, 1999.
[15]
I. Popivanov and R. J. Miller. Similarity search over time-series data using wavelets. In ICDE, pages 212--221, 2002.
[16]
S. Qi, P. Bouros, and N. Mamoulis. Efficient top-k spatial distance joins. In SSTD, pages 1--18, 2013.
[17]
N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries. In SIGMOD, pages 71--79, 1995.
[18]
J. Shieh and E. J. Keogh. iSAX: indexing and mining terabyte sized time series. In SIGKDD, pages 623--631, 2008.
[19]
B. Yi and C. Faloutsos. Fast time sequence indexing for arbitrary Lp norms. In VLDB, pages 385--394, 2000.
[20]
Y. Zhang, Y. Ma, and X. Meng. Efficient spatio-textual similarity join using mapreduce. In WI, pages 52--59, 2014.
[21]
K. Zoumpatianos, S. Idreos, and T. Palpanas. Indexing for interactive exploration of big data series. In SIGMOD, pages 1555--1566, 2014.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSPATIAL '18: Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
November 2018
655 pages
ISBN:9781450358897
DOI:10.1145/3274895
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. big data
  2. similarity joins
  3. spatial indices
  4. time series

Qualifiers

  • Research-article

Funding Sources

  • General Secretariat for Research and Technology (GSRT)
  • Hellenic Foundation for Research and Innovation (HFRI)
  • EU H2020 project SLIPO (731581)
  • NSRF 2014-2020 project HELIX (5002781)

Conference

SIGSPATIAL '18
Sponsor:

Acceptance Rates

SIGSPATIAL '18 Paper Acceptance Rate 30 of 150 submissions, 20%;
Overall Acceptance Rate 257 of 1,238 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media