Efficient Algorithm for Mining Non-Redundant High-Utility Association Rules
Abstract
:1. Introduction
1.1. Motivation
1.2. Contributions
- We provide a complete definition of NR-HARs based on HGB [15] in order to follow the best practice of association rule mining that the rules should follow a condition related to their confidence.
- We propose the LNR-HAR algorithm generating all NR-HARs based on a lattice of HUIs.
- We experiment with various conditions to explore the efficiency of the LNR-HAR algorithm so that this algorithm will be the best choice applying into any real applications, which need to generate NR-HARs.
2. Related Work
2.1. High-Utility Itemset Mining
2.2. High-Utility Association Rule Mining
3. Problem Statement
4. Mining NR-HARs from a Lattice of High-Utility Itemsets
4.1. Algorithm
Algorithm 1: LNR-HAR (HUCIL, min-uconf) |
Input: HUCIL, min-uconf |
Output: Set of NR-HARs: NRs |
Methods: |
FindNR-HARsFromLattice() |
|
FindNR-HARs(node) |
|
EnumerateNR-HARs(node) |
|
4.2. Illustrations
- Enter the FindNR-HARs method with itemset A (node A).
- A.Flag is False and A.IsGenerator is True, so the EnumerateNR-HARs (A) method is called.
- In EnumerateNR-HAR with A as an input parameter, declare Queue = ∅, MarkLNode = ∅.
- Enqueue Q and extend trackingList by the list of child nodes of A, .
- Next, set .
- AE is not a HUCI. Push child nodes of AE into Q and trackingList. .
- Next, set
- Since AF.IsClosed = True (AF is an HUCI), . Since is valid, the ProcessChild variable is True, and all child nodes of AF are inserted into the queue, .
- Set .
- Since ACEF is a HUCI, . Then, child nodes of ACEF are pushed into Q and trackingList. Node ACEF has no child nodes, i.e.,
- FindNR-HARs is called recursively to process the child nodes {AE, AF} of A. The steps for finding rules from these child nodes are similar to those for the processing of node A.
Algorithm 2: HGB* (HUCI, min-uconf) |
Input: HUCI, min-uconf, min-util |
Output: Set of non-redundant high utility association rules RuleSet |
|
5. Experimental Results
5.1. Runtime for Mining NR-HARs
5.2. Memory Usage for Mining Non-Redundant Association Rules
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Fagerstrøm, A.; Eriksson, N.; Sigurðsson, V. What’s the “Thing” in Internet of Things in Grocery Shopping? A Customer Approach. Procedia Comput. Sci. 2017, 121, 384–388. [Google Scholar] [CrossRef]
- Dogan, O.; Bayo-Monton, J.L.; Fernandez-Llatas, C.; Oztaysi, B. Analyzing of Gender Behaviors from Paths Using Process Mining: A Shopping Mall Application. Sensors 2019, 19, 557. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bok, K.; Jeong, J.; Choi, D.; Yoo, J. Detecting Incremental Frequent Subgraph Patterns in IoT Environments. Sensors 2018, 18, 4020. [Google Scholar] [CrossRef] [Green Version]
- Ismail, W.N.; Hassan, M.M. Mining Productive-Associated Periodic-Frequent Patterns in Body Sensor Data for Smart Home Care. Sensors 2017, 17, 952. [Google Scholar] [CrossRef] [Green Version]
- Xin, L.; Hongxa, M. An Efficient Incremental Mining Algorithm for Discovering Sequential Pattern in Wireless Sensor Network Environments. Sensors 2019, 19, 29. [Google Scholar]
- Agrawal, R.; Imielinski, T.; Swami, A. Mining association rules between sets of items in large databases. ACM SIGMOD Rec. 1993, 22, 207–216. [Google Scholar] [CrossRef]
- Vo, B.; Nguyen, H.; Le, B. Mining High Utility Itemsets from Vertical Distributed Databases. In Proceedings of the 2009 IEEE-RIVF International Conference on Computing and Communication Technologies, Da Nang, Vietnam, 13–17 July 2009; pp. 1–4. [Google Scholar]
- Liu, Y.; Liao, W.-K.; Choudhary, A. A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets. Adv. Concepts Intell. Vis. Syst. 2005, 3518, 689–695. [Google Scholar]
- Liu, M.; Qu, J. Mining high utility itemsets without candidate generation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, HI, USA, 29 Ocotober–2 November 2012; pp. 55–64. [Google Scholar]
- Zida, S.; Fournier-Viger, P.; Lin, J.C.-W.; Wu, C.; Tseng, V.S. EFIM: A Fast and Memory Efficient Algorithm for High-Utility Itemset Mining. Knowl. Inf. Syst. 2017, 51, 595–625. [Google Scholar] [CrossRef]
- Tseng, V.S.; Wu, C.; Fournier-Viger, P.; Yu, P.S. Efficient Algorithms for Mining Top-K High-utility Itemsets. IEEE Trans. Knowl. Data Eng. 2015, 28, 54–67. [Google Scholar] [CrossRef]
- Gan, W.; Lin, J.C.-W.; Fournier-Viger, P.; Chao, H.-C. More Efficient Algorithms for Mining High-Utility Itemsets with Multiple Minimum Utility Thresholds. Comput. Vis. 2016, 9827, 71–87. [Google Scholar]
- Agrawal, R.; Srikant, R. Fast algorithms for mining association rules. VLDB 1994, 1215, 487–499. [Google Scholar]
- Nguyen, L.T.; Nguyen, P.; Nguyen, T.D.; Vo, B.; Fournier-Viger, P.; Tseng, V.S. Mining high-utility itemsets in dynamic profit databases. Knowl.-Based Syst. 2019, 175, 130–144. [Google Scholar] [CrossRef]
- Sahoo, J.; Das, A.K.; Goswami, A. An efficient approach for mining association rules from high utility itemsets. Expert Syst. Appl. 2015, 42, 5754–5778. [Google Scholar] [CrossRef]
- Mai, T.; Vo, B.; Nguyen, L.T. A lattice-based approach for mining high utility association rules. Inf. Sci. 2017, 399, 81–97. [Google Scholar] [CrossRef]
- Ahmed, C.F.; Tanbeer, S.K.; Jeong, B.S.; Lee, Y.K. Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases. IEEE Trans. Knowl. Data Eng. 2009, 21, 1708–1721. [Google Scholar] [CrossRef]
- Fournier-Viger, P.; Wu, C.; Zida, S.; Tseng, V.S. FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning. Adv. Concepts Intell. Vis. Syst. 2014, 8502, 83–92. [Google Scholar]
- Krishnamoorthy, S. HMiner: Efficiently mining high utility itemsets. Expert Syst. Appl. 2017, 90, 168–183. [Google Scholar] [CrossRef]
- Duong, Q.-H.; Fournier-Viger, P.; Ramampiaro, H.; Nørvåg, K.; Dam, T.-L. Efficient High-utility Itemset Mining using Buffered Utility-Lists. Appl. Intell. 2018, 48, 1859–1877. [Google Scholar] [CrossRef]
- Yun, U.; Kim, D.; Yoon, E.; Fujita, H. Damped window based high average utility pattern mining over data streams. Knowl.-Based Syst. 2018, 144, 188–205. [Google Scholar] [CrossRef]
- Kannimuthu, S.; Premalatha, K. Discovery of High Utility Itemsets Using Genetic Algorithm with Ranked Mutation. Appl. Artif. Intell. 2014, 28, 337–359. [Google Scholar] [CrossRef]
- Song, W.; Huang, C. Mining High Utility Itemsets Using Bio-Inspired Algorithms: A Diverse Optimal Value Framework. IEEE Access 2018, 6, 19568–19582. [Google Scholar] [CrossRef]
- Lin, J.C.-W.; Yang, L.; Fournier-Viger, P.; Hong, T.-P.; Voznak, M. A binary PSO approach to mine high-utility itemsets. Soft Comput. 2017, 21, 5103–5121. [Google Scholar] [CrossRef]
- Dawar, S.; Goyal, V.; Bera, D. A hybrid framework for mining high-utility itemsets in a sparse transaction database. Appl. Intell. 2017, 47, 809–827. [Google Scholar] [CrossRef]
- Qu, J.-F.; Liu, M.; Xin, C.; Wu, Z. Fast Identification of High Utility Itemsets from Candidates. Information 2018, 9, 119. [Google Scholar] [CrossRef] [Green Version]
- Wu, J.M.-T.; Lin, J.C.-W.; Tamrakar, A. High-Utility Itemset Mining with Effective Pruning Strategies. ACM Trans. Knowl. Discov. Data 2019, 13, 1–22. [Google Scholar] [CrossRef] [Green Version]
- Gan, W.; Lin, C.-W.; Fournier-Viger, P.; Chao, H.-C.; Yu, P.S. HUOPM: High-utility Occupancy Pattern Mining. IEEE Trans. Cybern. 2020, 50, 1195–1208. [Google Scholar] [CrossRef] [Green Version]
- Gan, W.; Lin, C.-W.; Fournier-Viger, P.; Chao, H.-C.; Tseng, V.; Yu, P. A Survey of Utility-Oriented Pattern Mining. IEEE Trans. Knowl. Data Eng. 2019, 1. [Google Scholar] [CrossRef] [Green Version]
- Lin, J.C.-W.; Li, T.; Fournier-Viger, P.; Hong, T.-P.; Zhan, J.; Voznak, M. An efficient algorithm to mine high average-utility itemsets. Adv. Eng. Inform. 2016, 30, 233–243. [Google Scholar] [CrossRef]
- Lin, J.C.-W.; Ren, S.; Fournier-Viger, P. MEMU: More Efficient Algorithm to Mine High Average-Utility Patterns With Multiple Minimum Average-Utility Thresholds. IEEE Access 2018, 6, 7593–7609. [Google Scholar] [CrossRef]
- Zhang, B.; Lin, J.C.-W.; Shao, Y.; Fournier-Viger, P.; Djenouri, Y. Maintenance of Discovered High Average-Utility Itemsets in Dynamic Databases. Appl. Sci. 2018, 8, 769. [Google Scholar] [CrossRef] [Green Version]
- Lee, N.; Park, S.-H.; Moon, S. Utility-based association rule mining: A marketing solution for cross-selling. Expert Syst. Appl. 2013, 40, 2715–2725. [Google Scholar] [CrossRef]
- Choi, V. Faster Algorithms for Constructing a Concept (Galois) Lattice. arXiv 2006, arXiv:cs.DM/0602069. [Google Scholar]
- Davey, B.A.; Priestley, H.A. Introduction to Lattices and Order; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
- Mai, T.; Nguyen, L.T.T. An efficient approach for mining closed high utility itemsets and generators. J. Inf. Telecommun. 2017, 1, 193–207. [Google Scholar] [CrossRef] [Green Version]
- Vo, B.; Hong, T.-P.; Le, B. A lattice-based approach for mining most generalization association rules. Knowl.-Based Syst. 2013, 45, 20–30. [Google Scholar] [CrossRef]
- Vo, B.; Le, B. Mining minimal non-redundant association rules using frequent itemsets lattice. Int. J. Intell. Syst. Technol. Appl. 2011, 10, 92–106. [Google Scholar] [CrossRef]
- Vo, B.; Le, B. Interestingness measures for association rules: Combination between lattice and hash tables. Expert Syst. Appl. 2011, 38, 11630–11640. [Google Scholar] [CrossRef]
- Fournier-Viger, P.; Gomariz, A.; Soltani, A.; Gueniche, T. SPMF: Open-source data mining library. SPMF: A Java open-source pattern mining library. J. Mach. Learn. Res. 2014, 15, 3389–3393. [Google Scholar]
Tid | Transaction |
---|---|
T1 | B:4, D:1, E:6, F:2 |
T2 | C:1, E:4, F:5 |
T3 | A:4, C:1, E:5, F:1 |
T4 | C:1, E:2, F:6 |
T5 | B:3, C:1, E:1 |
T6 | A:1, F:2, G:1 |
T7 | C:1, E:1, F:4, G:1, H:1 |
T8 | C:7, E:3 |
T9 | H:10 |
Item | Utility |
---|---|
A | 4 |
B | 3 |
C | 2 |
D | 5 |
E | 1 |
F | 1 |
G | 1 |
H | 2 |
Itemset | Utility | Itemset | Utility | Itemset | Utility |
---|---|---|---|---|---|
A | 20 | AF | 23 | ACE | 23 |
B | 21 | BE | 28 | BEF | 20 |
C | 22 | BD | 31 | BDE | 38 |
E | 22 | CE | 37 | CEF | 36 |
F | 20 | CF | 24 | ACEF | 24 |
H | 22 | EF | 36 | BDEF | 25 |
AE | 21 | AEF | 22 |
Rule | Confidence (%) | Utility | Support |
---|---|---|---|
A→F | 100 | 23 | 2 |
F→E | 90 | 36 | 6 |
A→CEF | 80 | 24 | 2 |
B→DE | 100 | 38 | 2 |
BEF→D | 100 | 25 | 1 |
C→E | 100 | 37 | 5 |
E→F | 81 | 36 | 7 |
AE→CF | 100 | 24 | 1 |
F→CE | 80 | 36 | 6 |
CF→E | 100 | 36 | 4 |
Dataset | Transactions | Items | Size (MB) | Type |
---|---|---|---|---|
Chess | 3196 | 75 | 0.63 | Dense |
Mushroom | 8124 | 119 | 1.03 | Dense |
Accidents | 340,183 | 468 | 63.1 | Dense |
Retail | 88,162 | 16,470 | 6.42 | Sparse |
Chainstore | 1,112,949 | 46,086 | 79.2 | Sparse |
min-uconf (%) | Chess | Mushroom | Retail | Chainstore | Accidents | |||||
---|---|---|---|---|---|---|---|---|---|---|
min-uti (%) | # of NR-HARs | min-uti (%) | # of NR-HARs | min-util (%) | # of NR-HARs | min-util (%) | # of NR-HARs | min-util (%) | # of NR-HARs | |
90 | 25 | 47,622 | 10 | 2405 | 0.01 | 1859 | 0.005 | 49 | 10 | 25,061 |
80 | 25 | 161,631 | 10 | 2774 | 0.01 | 5573 | 0.005 | 69 | 10 | 100,614 |
70 | 25 | 325,207 | 10 | 3178 | 0.01 | 12,819 | 0.005 | 143 | 10 | 232,272 |
60 | 25 | 439,584 | 10 | 3555 | 0.01 | 20,967 | 0.005 | 366 | 10 | 422,415 |
90 | 26 | 19,626 | 11 | 1376s | 0.02 | 437 | 0.01 | 16 | 11 | 6411 |
80 | 26 | 61,469 | 11 | 1481 | 0.02 | 1441 | 0.01 | 19 | 11 | 23,911 |
70 | 26 | 111,304 | 11 | 1616 | 0.02 | 3751 | 0.01 | 31 | 11 | 51,700 |
60 | 26 | 139,006 | 11 | 1721 | 0.02 | 6629 | 0.01 | 67 | 11 | 83,388 |
90 | 27 | 8028 | 12 | 685 | 0.03 | 219 | 0.02 | 9 | 12 | 1657 |
80 | 27 | 22,945 | 12 | 707 | 0.03 | 664 | 0.02 | 11 | 12 | 5568 |
70 | 27 | 36,495 | 12 | 740 | 0.03 | 1733 | 0.02 | 12 | 12 | 11,332 |
60 | 27 | 39,614 | 12 | 757 | 0.03 | 3132 | 0.02 | 15 | 12 | 17,778 |
90 | 28 | 2788 | 13 | 334 | 0.04 | 149 | 0.03 | 5 | 13 | 367 |
80 | 28 | 7215 | 13 | 334 | 0.04 | 394 | 0.03 | 6 | 13 | 1024 |
70 | 28 | 9203 | 13 | 340 | 0.04 | 1025 | 0.03 | 6 | 13 | 1855 |
60 | 28 | 9,286 | 13 | 340 | 0.04 | 1862 | 0.03 | 7 | 13 | 2453 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mai, T.; Nguyen, L.T.T.; Vo, B.; Yun, U.; Hong, T.-P. Efficient Algorithm for Mining Non-Redundant High-Utility Association Rules. Sensors 2020, 20, 1078. https://doi.org/10.3390/s20041078
Mai T, Nguyen LTT, Vo B, Yun U, Hong T-P. Efficient Algorithm for Mining Non-Redundant High-Utility Association Rules. Sensors. 2020; 20(4):1078. https://doi.org/10.3390/s20041078
Chicago/Turabian StyleMai, Thang, Loan T.T. Nguyen, Bay Vo, Unil Yun, and Tzung-Pei Hong. 2020. "Efficient Algorithm for Mining Non-Redundant High-Utility Association Rules" Sensors 20, no. 4: 1078. https://doi.org/10.3390/s20041078
APA StyleMai, T., Nguyen, L. T. T., Vo, B., Yun, U., & Hong, T.-P. (2020). Efficient Algorithm for Mining Non-Redundant High-Utility Association Rules. Sensors, 20(4), 1078. https://doi.org/10.3390/s20041078