skip to main content
research-article

Efficient discovery of sequence outlier patterns

Published: 01 April 2019 Publication History

Abstract

Modern Internet of Things (IoT) applications generate massive amounts of time-stamped data, much of it in the form of discrete, symbolic sequences. In this work, we present a new system called TOP that de<u>T</u>ects <u>O</u>utlier <u>P</u>atterns from these sequences. To solve the fundamental limitation of existing pattern mining semantics that miss outlier patterns hidden inside of larger frequent patterns, TOP offers new pattern semantics based on contextual patterns that distinguish the independent occurrence of a pattern from its occurrence as part of its super-pattern. We present efficient algorithms for the mining of this new class of contextual patterns. In particular, in contrast to the bottom-up strategy for state-of-the-art pattern mining techniques, our top-down Reduce strategy piggy backs pattern detection with the detection of the context in which a pattern occurs. Our approach achieves linear time complexity in the length of the input sequence. Effective optimization techniques such as context-driven search space pruning and inverted index-based outlier pattern detection are also proposed to further speed up contextual pattern mining. Our experimental evaluation demonstrates the effectiveness of TOP at capturing meaningful outlier patterns in several real-world IoT use cases. We also demonstrate the efficiency of TOP, showing it to be up to 2 orders of magnitude faster than adapting state-of-the-art mining to produce this new class of contextual outlier patterns, allowing us to scale outlier pattern mining to large sequence datasets.

References

[1]
C. C. Aggarwal and J. Han, editors. Frequent Pattern Mining. Springer, 2014.
[2]
R. Agrawal, K.-I. Lin, H. S. Sawhney, and K. Shim. Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In VLDB, pages 490--501, San Francisco, CA, USA, 1995.
[3]
R. Agrawal and R. Srikant. Mining sequential patterns. In ICDE, pages 3--14, 1995.
[4]
J. Ayres, J. Flannick, J. Gehrke, and T. Yiu. Sequential pattern mining using a bitmap representation. In SIGKDD, pages 429--435, 2002.
[5]
K. Beedkar, K. Berberich, R. Gemulla, and I. Miliaraki. Closing the gap: Sequence mining at scale. ACM Trans. Database Syst., 40(2):8:1--8:44, June 2015.
[6]
S. Budalakoti, A. N. Srivastava, and M. E. Otey. Anomaly detection and diagnosis algorithms for discrete symbol sequences with applications to airline safety. Trans. Sys. Man Cyber Part C, 39(1):101--113, Jan. 2009.
[7]
V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection for discrete sequences: A survey. IEEE Trans. on Knowl. and Data Eng., 24(5), May 2012.
[8]
H. A. Dau, A. J. Bagnall, K. Kamgar, C. M. Yeh, Y. Zhu, S. Gharghabi, C. A. Ratanamahatana, and E. J. Keogh. The UCR time series archive. CoRR, abs/1810.07758, 2018.
[9]
P. Fournier-Viger, A. Gomariz, M. Campos, and R. Thomas. Fast vertical mining of seq. patterns using co-occurrence information. In PAKDD, pages 40--52, 2014.
[10]
P. Fournier-Viger, J. C.-W. Lin, R. U. Kiran, and Y. S. Koh. A survey of sequential pattern mining. Data Science and Pattern Recognition, 1(1):54--77, 2017.
[11]
P. Fournier-Viger, C.-W. Wu, A. Gomariz, and V. S. Tseng. Vmsp: Efficient vertical mining of maximal sequential patterns. In CAIAC, pages 83--94, 2014.
[12]
P. Fournier-Viger, C.-W. Wu, and V. S. Tseng. Mining maximal sequential patterns without candidate maintenance. In ADMA, pages 169--180. Springer, 2013.
[13]
A. Gomariz, M. Campos, R. Marin, and B. Goethals. Clasp: An efficient algorithm for mining frequent closed sequences. In PAKDD, pages 50--61. Springer, 2013.
[14]
K. Gouda, M. Hassaan, and M. J. Zaki. Prism: A primal-encoding approach for freq sequence mining. In ICDM, pages 487--492, 2007.
[15]
J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu. Freespan: frequent pattern-projected seq. pattern mining. In SIGKDD, pages 355--359, 2000.
[16]
J. Han, J. Pei, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth. In ICDE, pages 215--224, 2001.
[17]
S. A. Hofmeyr, S. Forrest, and A. Somayaji. Intrusion detection using sequences of system calls. J. Comput. Secur., 6(3):151--180, Aug. 1998.
[18]
H. Huang, Y. Miao, and J. Shi. Top-down mining of top-k frequent closed patterns from microarray datasets. In ICISS, 2013.
[19]
E. Keogh, J. Lin, and A. Fu. Hot sax: Efficiently finding the most unusual time series subsequence. In Proceedings of the Fifth IEEE International Conference on Data Mining, ICDM '05, pages 226--233, 2005.
[20]
E. Keogh, J. Lin, S.-H. Lee, and H. V. Herle. Finding the most unusual time series subsequence: Algorithms and applications. Knowl. Inf. Syst., 11(1):1--27, Jan. 2007.
[21]
E. Keogh, S. Lonardi, and B. Y.-c. Chiu. Finding surprising patterns in a time series database in linear time and space. In KDD, KDD '02, pages 550--556, 2002.
[22]
T. Kieu, B. Vo, T. Le, Z.-H. Deng, and B. Le. Mining top-k co-occurrence items with sequential pattern. Expert Syst. Appl., 85(C):123--133, Nov. 2017.
[23]
S. Laxman, P. S. Sastry, and K. P. Unnikrishnan. A fast algorithm for finding frequent episodes in event streams. In KDD, pages 410--419, 2007.
[24]
J. Lin, E. Keogh, A. Fu, and H. Van Herle. Approximations to magic: Finding unusual medical time series. In Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems, CBMS '05, pages 329--334, 2005.
[25]
H. Liu, X. Wang, J. He, J. Han, D. Xin, and Z. Shao. Top-down mining of frequent closed patterns from very high dimen. data. Inf. Sci., 179(7):899--924, Mar. 2009.
[26]
C. Luo and S. M. Chung. Efficient mining of maximal sequential patterns using multiple samples. In SDM, pages 415--426, 2005.
[27]
H. Mannila, H. Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov., 1(3):259--289, 1997.
[28]
E. Salvemini, F. Fumarola, D. Malerba, and J. Han. Fast sequence mining based on sparse id-lists. ISMIS'11, pages 316--325, 2011.
[29]
R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. EDBT, pages 1--17, 1996.
[30]
N. Tatti and J. Vreeken. The long and the short of it: summarising event sequences with serial episodes. In KDD, pages 462--470, 2012.
[31]
D. Wang, E. A. Rundensteiner, and R. T. Ellison, III. Active complex event processing over event streams. PVLDB, 4(10):634--645, 2011.
[32]
J. Wang, J. Han, and C. Li. Frequent closed sequence mining without candidate maintenance. TKDE, 19(8), 2007.
[33]
C. Warrender, S. Forrest, and B. Pearlmutter. Detecting intrusions using system calls: Alternative data models. In IN IEEE SYMPOSIUM ON SECURITY AND PRIVACY, pages 133--145. IEEE Computer Society, 1999.
[34]
C. Wu, Y. Lin, P. S. Yu, and V. S. Tseng. Mining high utility episodes in complex event sequences. In SIGKDD, pages 536--544, 2013.
[35]
X. Yan, J. Han, and R. Afshar. Clospan: Mining closed sequential patterns in large datasets. In SDM, pages 166--177, 2003.
[36]
Z. Yang and M. Kitsuregawa. Lapin-spam: An improved algorithm for mining sequential pattern. ICDE Workshop '05, pages 1222-, 2005.
[37]
M. J. Zaki. Spade: An efficient algorithm for mining frequent sequences. Machine learning, 42(1):31--60, 2001.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 12, Issue 8
April 2019
112 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 April 2019
Published in PVLDB Volume 12, Issue 8

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media