skip to main content
research-article

OWSP-Miner: Self-adaptive One-off Weak-gap Strong Pattern Mining

Published: 04 February 2022 Publication History

Abstract

Gap constraint sequential pattern mining (SPM), as a kind of repetitive SPM, can avoid mining too many useless patterns. However, this method is difficult for users to set a suitable gap without prior knowledge and each character is considered to have the same effects. To tackle these issues, this article addresses a self-adaptive One-off Weak-gap Strong Pattern (OWSP) mining, which has three characteristics. First, it determines the gap constraint adaptively according to the sequence. Second, all characters are divided into two groups: strong and weak characters, and the pattern is composed of strong characters, while weak characters are allowed in the gaps. Third, each character can be used at most once in the process of support (the frequency of pattern) calculation. To handle this problem, this article presents OWSP-Miner, which equips with two key steps: support calculation and candidate pattern generation. A reverse-order filling strategy is employed to calculate the support of a candidate pattern, which reduces the time complexity. OWSP-Miner generates candidate patterns using pattern join strategy, which effectively reduces the candidate patterns. For clarification, time series is employed in the experiments and the results show that OWSP-Miner is not only more efficient but also is easier to mine valuable patterns. In the experiment of stock application, we also employ OWSP-Miner to mine OWSPs and the results show that OWSPs mining is more meaningful in real life. The algorithms and data can be downloaded at https://github.com/wuc567/Pattern-Mining/tree/master/OWSP-Miner.

References

[1]
Philippe Fournier-Viger, Antonio Gomariz, Ted Gueniche, Azadeh Soltani, Chengwei Wu, and Vincent S. Tseng. 2014. SPMF: A java open-source pattern mining library. J. Mach. Learn. Res. 15, 1 (2014), 3569–3573.
[2]
Youcef Djenouri, Jerry Chun-Wei Lin, Kjetil Norvag, Heri Ramampiaro, and Philip S. Yu. 2021. Exploring decomposition for solving pattern mining problems. ACM Trans. Manage. Info. Syst. 12, 2 (2021), 15.
[3]
Lei Duan, Guanting Tang, Jian Pei, James Bailey, Akiko Campbell, and Changjie Tang. 2015. Mining outlying aspects on numeric data. Data Min. Knowl. Discov. 29, 5 (2015), 1116–1151.
[4]
Tingting Wang, Lei Duan, Guozhu Dong, and Zhifeng Bao. 2020. Efficient mining of outlying sequence patterns for analyzing outlierness of sequence data. ACM Trans. Knowl. Discov. Data 14, 5 (2020), 62.
[5]
Youxi Wu, Yuehua Wang, Yan Li, Xingquan Zhu, and Xindong Wu. 2021. Top-k self-adaptive contrast sequential pattern mining. IEEE Trans. Cybernet.DOI:https://doi.org/10.1109/TCYB.2021.3082114
[6]
Philippe Fournier-Viger, Jiaxuan Li, Jerry Chun-Wei Lin, Tin Truong Chi, and R. Uday Kiran. 2020. Mining cost-effective patterns in event logs. Knowl.-Based Syst. 191 (2020), 105241.
[7]
Anastasiia Pika, Michael Leyer, Moe T. Wynn, Colin J. Fidge, Arthur H. M. ter Hofstede, and Wil M. P. vander Aalst. 2017. Mining resource profiles from event logs. ACM Trans. Manage. Info. Syst. 8, 1 (2017) 1–30.
[8]
Youcong Ni, Rui Wu, Xin Du, Peng Ye, Wangbiao Li, and Ruliang Xiao. 2019. Evolutionary algorithm for optimization of energy consumption at GCC compile time based on frequent pattern mining. J. Softw. 30, 5 (2019), 1269–1287.
[9]
Xindong Wu, Xingquan Zhu, Yu He, and Abdullah N. Arslan. 2013. PMBC: Pattern mining from biological sequences with wildcard constraints. Comput. Biol. Med. 43, 5 (2013), 481–492.
[10]
Xindong Wu, Xingquan Zhu, Gongqing Wu, and Wei Ding. 2014. Data mining with big data. IEEE Trans. Knowl. Data Eng. 26, 1 (2014), 97–107.
[11]
He Jiang, Xiaochen Li, Zhilei Ren, Jifeng Xuan, and Zhi Jin. 2019. Toward better summarizing bug reports with crowdsourcing EliciteWd attribute. IEEE Trans. Reliabil. 68, 1 (2019), 2–22.
[12]
He Jiang, Xin Chen, Tieke He, Zhenyu Chen, and Xiaochen Li. 2018. Fuzzy clustering of crowdsourced test reports for apps. ACM Trans. Internet Technol. 18, 2 (2018), 1–28.
[13]
Hoonseok Park and Jae-Yoon Jung. 2020. SAX-ARM: Deviant event pattern discovery from multivariate time series using symbolic aggregate approximation and association rule mining. Expert Syst. Appl. 141 (2020), 112950.
[14]
Mohammad Shokoohi-Yekta, Yanping Chen, Bilson Campana, Bing Hu, Jesin Zakaria, and Eamonn Keogh. 2015. Discovery of meaningful rules in time series. In Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1085–1094.
[15]
Xiangjun Dong, Yongshun Gong, and Longbing Cao. 2020. e-RNSP: An efficient method for mining repetition negative sequential patterns. IEEE Trans. Cybernet. 50, 5 (2020), 2084–2096.
[16]
Xiangjun Dong, Ping Qiu, Jinhu Lü, Longbing Cao, and Tiantian Xu. 2019. Mining top-k useful negative sequential patterns via learning. IEEE Trans. Neural Netw. Learn. Syst. 30, 9 (2019), 2764–2778.
[17]
Xingming Chen, Yanghui Rao, Haoran Xie, Fu Lee Wang, Yingchao Zhao, and Jian Yin. 2019. Sentiment classification using negative and intensive sentiment supplement information. Data Sci. Eng. 4, 2 (2019), 109–118.
[18]
Lizhen Wang, Xuguang Bao, and Lihua Zhou. 2021. Redundancy reduction for prevalent co-location patterns. IEEE Trans. Knowl. Data Eng. 30 (1) (2018), 142–155.
[19]
Philippe Fournier-Viger, Peng Yang, Rage Uday Kiran, Sebastián Ventura, and José María Luna. 2021. Mining local periodic patterns in a discrete sequence. Info. Sci. 544 (2021), 519–548.
[20]
Tin C. Truong, Hai Duong, Bac Le, and Philippe Fournier-Viger. 2019. FMaxCloHUSM: An efficient algorithm for mining frequent closed and maximal high utility sequences. Eng. Appl. Artific. Intell. 85 (2019), 1–20.
[21]
Bav Vo, Sang Pham, Tuong Le, and Zhi-Hong Deng. 2017. A novel approach for mining maximal frequent patterns. Expert Syst. Appl. 73 (2017), 178–186.
[22]
Yacine Abboud, Anne Boyer, and Armelle Brun. 2017. CCPM: A scalable and noise-resistant closed contiguous sequential patterns mining algorithm. In Proceedings of the International Conference on Machine Learning and Data Mining in Pattern Recognition. Springer, 147–162.
[23]
Unil Yun, Donggyu Kim, Eunchul Yoon, and Hamido Fujita. 2018. Damped window based high average utility pattern mining over data streams. Knowl.-Based Syst. 144 (2018), 188–205.
[24]
Jerry Chun-Wei Lin, Matin Pirouz, Youcef Djenouri, Chien-Fu Cheng, and Usman Ahmed. 2020. Incrementally updating the high average-utility patterns with pre-large concept. Appl. Intell. 50, 11 (2020), 3788–3807.
[25]
Jerry Chun-Wei Lin, Ting Li, Matin Pirouz, Ji Zhang, and Philippe Fournier-Viger. 2020. High average-utility sequential pattern mining based on uncertain databases. Knowl. Info. Syst. 62, 3 (2020), 1199–1228.
[26]
Youxi Wu, Yao Tong, Xingquan Zhu, and Xindong Wu. 2018. NOSEP: Nonoverlapping sequence pattern mining with gap constraints. IEEE Trans. Cybernet. 48, 10 (2018), 2809–2822.
[27]
Youxi Wu, Jinquan Fan, Yan Li, Lei Guo, and Xindong Wu. 2020. NetDAP: (delta, gamma) - Approximate pattern matching with length constraints. Appl. Intell. 50, 11 (2020), 4094–4116.
[28]
Youxi Wu, Zhiqiang Tang, He Jiang, and Xindong Wu. 2016. Approximate pattern matching with gap constraints. J. Info. Sci. 42, (5) (2016), 639–658.
[29]
Tin Truong, Hai Duong, Bac Le, Philippe Fournier-Viger, and Unil Yun. 2019. Efficient high average-utility itemset mining using novel vertical weak upper-bounds. Knowl.-Based Syst. 183 (2019), 104847.
[30]
Jen-Wei Huang, Bijay Prasad Jaysawal, Kuan-Ying Chen, and Yong-Bin Wu. 2019. Mining frequent and top-K high utility time interval-based events with duration patterns. Knowl. Info. Syst. 61, 3 (2019), 1331–1359.
[31]
Martin Husák, Tomáš Bajtoš, Jaroslav Kašpar, Elias Bou-Harb, and Pavel Čeleda. 2020. Predictive cyber situational awareness and personalized blacklisting: A sequential rule mining approach. ACM Trans. Manage. Info. Syst. 11, 4, Article 19 (Sep. 2020), 16 pages.
[32]
Wei Song, Yu Liu, and Jinhong Li. 2014. Mining high utility itemsets by dynamically pruning the tree structure. Appl. Intell. 40, 1 (2014), 29–43.
[33]
Wei Song, Beisi Jiang, and Yangyang Qiao. 2018. Mining multi-relational high utility itemsets from star schemas. Intell. Data Anal. 22, 1 (2018), 143–165.
[34]
Loan T. T. Nguyen, Phuc Nguyen, Trinh D. D. Nguyen, Bay Vo, Philippe Fournier-Viger, and Vincent S. Tseng. 2019. Mining high-utility itemsets in dynamic profit databases. Knowl.-Based Syst. 175 (2019), 130–144.
[35]
Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong, and Justin Zhan. 2016. Efficient mining of high-utility itemsets using multiple minimum utility thresholds. Knowl.-Based Syst. 113 (2016), 100–115.
[36]
Fan Min, Zhi-Heng Zhang, Wen-Jie Zhai, and Rong-Ping Shen. 2020. Frequent pattern discovery with tri-partition alphabets. Info. Sci. 507 (2020), 715–732.
[37]
Chaodong Tan, Fan Min, Min Wang, Hengru Zhang, and Zhiheng Zhang. 2016. Discovering patterns with weak-wildcard gaps. IEEE Access 4 (2016), 4922–4932.
[38]
Youxi Wu, Rong Lei, Yan Li, Lei Guo, and Xindong Wu. 2021. HAOP-Miner: Self-adaptive high-average utility one-off sequential pattern mining. Expert Syst. Appl. 184 (2021), 115449.
[39]
Huiting Liu, Lili Wang, Zhizhong Liu, Peng Zhao, and Xindong Wu. 2018. Efficient pattern matching with periodical wildcards in uncertain sequences. Intell. Data Anal. 22, 4 (2018), 829–842.
[40]
Fei Xie, Xindong Wu, and Xingquan Zhu. 2017. Efficient sequential pattern mining with wildcards for keyphrase extraction. Knowl.-Based Syst. 115 (2017), 27–39.
[41]
Thanh Lam Hoang, Fabian Mörchen, Dmitriy Fradkin, and Toon Calders. 2014. Mining compressing sequential patterns. Stat. Anal. Data Min. 7, 1 (2014), 34–52.
[42]
Youxi Wu, Shuai Fu, He Jiang, and Xindong Wu. 2015. Strict approximate pattern matching with general gaps. Appl. Intell. 42, 3 (2015), 566–580.
[43]
Youxi Wu, Cong Shen, He Jiang, and Xindong Wu. 2017. Strict pattern matching under non-overlapping condition. Sci. China Info. Sci. 60, 1 (2017), 012101.
[44]
Yifan Wang. 2003. On-demand forecasting of stock prices using a real-time predictor. IEEE Trans. Knowl. Data Eng. 15, 4 (2003), 1033–1037.
[45]
Wei Wang, Thomas Guyet, Rene Quiniou, Marie-Odile Cordier, Florent Masseglia, and Xiangliang Zhang. 2014. Autonomic intrusion detection: Adaptively detecting anomalies over unlabeled audit data streams in computer networks. Knowl.-Based Syst. 70 (2014), 103–117.
[46]
Mengchi Liu and Jun-Feng Qu. 2012. Mining high utility itemsets without candidate generation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, 55–64.
[47]
Wensheng Gan, Jerry Chun-Wei Lin, Philippe Fournier-Viger, Han-Chieh Chao, and Philip S. Yu. 2020. HUOPM: High-utility occupancy pattern mining. IEEE Trans. Cybernet. 50(3) (2020), 1195–1208.
[48]
Wensheng Gan, Jerry Chun-Wei Lin, Jiexiong Zhang, Han-Chieh Chao, Hamido Fujita, and Philip S. Yu. 2019. ProUM: Projection-based utility mining on sequence data. Info. Sci. 513 (2019), 222–240.
[49]
Saleti Sumalatha and R. B. V. Subramanyam. 2020. Distributed mining of high utility time interval sequential patterns using mapreduce approach. Expert Syst. Appl. 141 (2020), 112967.
[50]
Kuoping Wu, Yungpiao Wu, and Hahn-Ming Lee. 2014. Stock Trend prediction by using K-Means and AprioriAll algorithm for sequential chart pattern mining. J. Info. Sci. Eng. 30, 3 (2014), 669–686.
[51]
Dan Guo, Ermao Yuan, and Xuegang Hu. 2016. Frequent pattern mining based on approximate edit distance matrix. In Proceedings of the IEEE 1st International Conference on Data Science in Cyberspace (DSC’16). 179–188.
[52]
Peng Zhang and Mikhail J. Atallah. 2017. On approximate pattern matching with thresholds. Info. Process. Lett. 123, 7 (2017), 21–26.
[53]
Xiaonan Ji, James Bailey, and Guozhu Dong. 2007. Mining minimal distinguishing subsequence patterns with gap constraints. Knowl. Info. Syst. 11, 3 (2007), 259–286.
[54]
Unil Yun. 2007. Efficient mining of weighted interesting patterns with a strong weight and/or support affinity. Info. Sci. 177, 17 (2007), 3477–3499.
[55]
Bac Le, Minh-Thai Tran, and Bay Vo. 2015. Mining frequent closed inter-sequence patterns efficiently using dynamic bit vectors. Appl. Intell. 43, 1 (2015), 74–84.
[56]
Chao Gao, Lei Duan, Guozhu Dong, Haiqing Zhang, Hao Yang, and Changjie Tang. 2016. Mining top-k distinguishing sequential patterns with flexible gap constraints. In Proceedings of the International Conference on Web-Age Information Management. Springer, Cham, 82–94.
[57]
Youxi Wu, Yuehua Wang, Jingyu Liu, Ming Yu, Jing Liu, and Yan Li. 2019. Mining distinguishing subsequence patterns with nonoverlapping condition. Cluster Comput. 22, 3 (2019), 5905–5917.
[58]
Lei Duan, Li Yan, Guozhu Dong, Jyrki Nummenmaa, and Hao Yang. 2017. Mining top-k distinguishing temporal sequential patterns from event sequences. In Proceedings of the International Conference on Database System for Advanced Applications. Springer, 235–250.
[59]
Sayma Akther, M. R. Rezaul Karim, Md. Samiullah, and Chowdhury Farhan Ahmed. 2018. Mining non-redundant closed flexible periodic patterns. Eng. Appl. Artific. Intell. 69 (2018), 1–23.
[60]
Huei-Wen Wu and Anthony J. T. Lee. 2010. Mining closed flexible patterns in time-series databases. Expert Syst. Appl. 37, 3 (2010), 2098–2107.
[61]
Youxi Wu, Changrui Zhu, Yan Li, Lei Guo, and Xindong Wu. 2020. NetNCSP: Nonoverlapping closed sequential pattern mining. Knowl.-Based Syst. 196 (2020), 105812.
[62]
Chun Li, Qingyan Yang, Jianyong Wang, and Ming Li. 2012. Efficient mining of gap-constrained subsequences and its various applications. ACM Trans. Knowl. Discov. Data 6, 1 (2012), 1–39.
[63]
Youxi Wu, Lingling Wang, Jiadong Ren, Wei Ding, and Xindong Wu. 2014. Mining sequential patterns with periodic wildcard gaps. Appl. Intell. 41, 1 (2014), 99–116.
[64]
Xingquan Zhu and Xindong Wu. 2007. Mining complex patterns across sequences with gap requirements. 2007. In Proceedings of the 20th International Joint Conference on Artificial Intelligence. Morgan Kaufmann, 2934–2940.
[65]
Qiaoshuo Shi, Jinsong Shan, Wenjie Yan, Youxi Wu, and Xindong Wu. 2020. NetNPG: Nonoverlapping pattern matching with general gap constraints. Appl. Intell. 50, 6 (2020), 1832–1845.
[66]
Bolin Ding, David Lo, Jiawei Han, and Siau-Cheng Khoo. 2009. Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database. In Proceedings of the 25th International Conference on Data Engineering. ICDE, 1024–1035.
[67]
Dan Guo, Xuegang Hu, Fei Xie, and Xindong Wu. 2013. Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph. Appl. Intell. 39, 1 (2013), 57–74.
[68]
Gong Chen, Xindong Wu, Xingquan Zhu, Abdullah N. Arslan, and Yu He. 2006. Efficient string matching with wildcards and length constraints. Knowl. Info. Syst. 10, 4 (2006), 399–419.
[69]
Youxi Wu, Xi Liu, Wenjie Yan, Lei Guo, and Xindong Wu. 2021. Efficient algorithm for solving strict pattern matching under nonoverlapping condition. J. Softw. 32, 11 (2021), 3331–3350.
[70]
Florian Heimerl, Steffen Lohmann, Simon Lange, and Thomas Ertl. 2014. Word cloud explorer: Text analytics based on word clouds. In Proceedings of the International Conference on System Science. IEEE, 1833–1842.

Cited By

View all

Index Terms

  1. OWSP-Miner: Self-adaptive One-off Weak-gap Strong Pattern Mining

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Management Information Systems
    ACM Transactions on Management Information Systems  Volume 13, Issue 3
    September 2022
    312 pages
    ISSN:2158-656X
    EISSN:2158-6578
    DOI:10.1145/3512349
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 February 2022
    Accepted: 01 July 2021
    Revised: 01 May 2021
    Received: 01 November 2020
    Published in TMIS Volume 13, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Sequential pattern mining
    2. self-adaptive
    3. time series
    4. gap constraint
    5. weak gap

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • National Natural Science Foundation of China
    • National Key Research and Development Program of China
    • Natural Science Foundation of Hebei Province, China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)79
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media