article

Efficiently mining uncertain high-utility itemsets

Authors:

Jerry Chun-Wei Lin,

Philippe Fournier-Viger,

Tzung-Pei Hong,

Vincent S. TsengAuthors Info & Claims

Soft Computing - A Fusion of Foundations, Methodologies and Applications, Volume 21, Issue 11

Pages 2801 - 2820

https://doi.org/10.1007/s00500-016-2159-1

Published: 01 June 2017 Publication History

Abstract

Data mining consists of deriving implicit, potentially meaningful and useful knowledge from databases such as information about the most profitable items. High-utility itemset mining (HUIM) has thus emerged as an important research topic in data mining. But most HUIM algorithms can only handle precise data, although big data collected in real-life applications using experimental measurements or noisy sensors is often uncertain. In this paper, an efficient algorithm, named Mining Uncertain High-Utility Itemsets (MUHUI), is proposed to efficiently discover potential high-utility itemsets (PHUIs) in uncertain data. Based on the probability-utility-list (PU-list) structure, the MUHUI algorithm directly mines PHUIs without generating candidates, and can avoid constructing PU-lists for numerous unpromising itemsets by applying several efficient pruning strategies, which greatly improve its performance. Extensive experiments conducted on both real-life and synthetic datasets show that the proposed algorithm significantly outperforms the state-of-the-art PHUI-List algorithm in terms of efficiency and scalability, and that the proposed MUHUI algorithm scales well when mining PHUIs in large-scale uncertain datasets.

References

[1]

Aggarwal CC (2010) Managing and mining uncertain data, managing and mining uncertain data.

[2]

Aggarwal CC, Li Y, Wang J, Wang J (2009) Frequent pattern mining with uncertain data. In: The 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 29-38.

[3]

Aggarwal CC, Yu PS (2009) A survey of uncertain data algorithms and applications. IEEE Trans Knowl Data Eng 21(5):609-623.

Digital Library

[4]

Agrawal R, Imielinski T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914-925.

Digital Library

[5]

Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large database. In: The ACM SIGMOD International Conference on Management of Data, pp 207-216.

[6]

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: International Conference on Very Large Data Bases, pp 487-499.

[7]

Agrawal R, Srikant R (1994) Quest synthetic data generator. http://www.Almaden.ibm.com/cs/quest/syndata.html.

[8]

Ahmed CF, Tanbeer SK, Jeong BS, Le YK (2009) Efficient tree structures for high utility patternmining in incremental databases. IEEE Trans Knowl Data Eng 21(12):1708-1721.

Digital Library

[9]

Bernecker T, Kriegel HP, Renz M, Verhein F, Zuefl A (2009) Probabilistic frequent itemset mining in uncertain databases. In: The 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 119-128.

[10]

Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: IEEE International Conference on Data Mining, pp 19-26.

[11]

Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866-883.

Digital Library

[12]

Chui CK, Kao B, Hung E (2007) Mining frequent itemsets from uncertain data. In: Advances in Knowledge Discovery and Data Mining, pp 47-58.

[13]

Evfimievski A, Srikant R, Agrawal R, Gehrke J (2002) Privacy preserving mining of association rules. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 217-228.

[14]

Fournier-Viger P, Wu CW, Zida S, Tseng VS (2014) FHM: Faster high-utility itemset mining using estimated utility co-occurrence pruning. Found Intell Syst 8502:83-92.

[15]

Fournier-Viger P, Zida S (2016) FOSHU: Faster on-shelf high utility itemset mining--with or without negative unit profit. In: The 30th Symposium on Applied Computing, pp 857-864.

[16]

Frequent itemset mining dataset repository (2012). http://fimi.ua.ac.be/data/.

[17]

Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3):9 (Article 9).

Digital Library

[18]

Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min Knowl Disc 8(1):53-87.

Digital Library

[19]

Lan GC, Hong TP, Tseng VS (2011) Discovery of high utility itemsets from on-shelf time periods of products. Expert Syst Appl 38(5):5851-5857.

Digital Library

[20]

Lan GC, Hong TP, Huang JP, Tseng VS (2014) On-shelf utility mining with negative item values. Expert Syst Appl 41(7):3450-3459.

Digital Library

[21]

Leung CKS, Mateo MAF, Brajczuk DA (2008) A tree-based approach for frequent pattern mining from uncertain data. In: Advances in Knowledge Discovery and Data Mining, pp 653-661.

[22]

Lin JCW, Gan W, Fournier-Viger P, Hong TP (2015) Mining high-utility itemsets with multipleminimum utility thresholds. In: ACM International C* Conference on Computer Science & Software Engineering, pp 9-17.

[23]

Lin JCW, Gan W, Fournier-Viger P, Hong TP, Tseng VS (2015) Mining potential high-utility itemsets over uncertain databases. In: ACM 5th ASE Big Data & Social Informatics, pp 25.

[24]

Lin JCW, Gan W, Hong TP, Zhang B (2015) An incremental high-utility mining algorithm with transaction insertion. Sci World J.

[25]

Lin CW, Hong TP, Lu WH (2011) An effective tree structure for mining high utility itemsets. Expert Syst Appl 38(6):7419-7424.

Digital Library

[26]

Lin CW, Hong TP, Lan GC, Wong JW, Lin WY (2015) Efficient updating of discovered high-utility itemsets for transaction deletion in dynamic databases. Adv Eng Inform 29(1):16-27.

Digital Library

[27]

Lin JCW, Gan W, Hong TP (2015) A fast updated algorithm to maintain the discovered high-utility itemsets for transaction modification. Adv Eng Inform 29(3):562-574.

Digital Library

[28]

Lin JCW, Gan W, Hong TP, Tseng VS (2015) Efficient algorithms for mining up-to-date high-utility patterns. Adv Eng Inform 29(3):648-661.

Digital Library

[29]

Lin CW, Hong TP (2012) A new mining approach for uncertain databases using CUFP trees. Expert Syst Appl 39(4):4084-4093.

Digital Library

[30]

Liu C, Chen L, Zhang C (2013) Summarizing probabilistic frequent patterns: a fast approach. In: The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 527-535.

[31]

Liu Y, Liao WK, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Advances in Knowledge Discovery and Data Mining, pp 689-695.

[32]

Liu M, Qu J (2012) Mining high utility itemsets without candidate generation. In: ACM International Conference on Information and Knowledge Management, pp 55-64.

Digital Library

[33]

Microsoft (2016) Example database foodmart ofmicrosoft analysis services. http://msdn.microsoft.com/en-us/library/aa217032(SQL.80).aspx.

[34]

Nilesh D, Dan S (2007) Efficient query evaluation on probabilistic databases. VLDB J 16(4):523-544.

[35]

Rymon R (1992) Search through systematic set enumeration. In: International Conference Principles of Knowledge Representation and Reasoning, pp 539-550.

[36]

Sun L, Cheng R, Cheung DW, Cheng J (2010) Mining uncertain data with probabilistic guarantees. In: The 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 273-282.

[37]

Tong Y, Chen L, Cheng Y, Yu PS (2012) Mining frequent itemsets over uncertain databases. VLDB Endow 5(11):1650-1661.

Digital Library

[38]

Tseng VS, Wu CW, Shie BE, Yu PS (2010) UP-growth: an efficient algorithm for high utility itemset mining. In: The 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 253-262.

[39]

Tseng VS, Shie BE, Wu CW, Yu PS (2013) Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Trans Knowl Data Eng 25(8):1772-1786.

Digital Library

[40]

Wang L, Cheung DL, Cheng R, Lee SD, Yang XS (2012) Efficient mining of frequent item sets on large uncertain databases. IEEE Trans Knowl Data Eng 24(12):2170-2183.

Digital Library

[41]

Wang L, Cheng R, Lee SD, Cheung D (2010) Accelerating probabilistic frequent itemset mining: a model-based approach. In: The 19th ACM International Conference on Information and Knowledge Managemen, pp 429-438.

[42]

Wu CW, Shie BE, Tseng VS, Yu PS (2012) Mining top-k high utility itemsets. In: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 78-86.

[43]

Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: The SIAM International Conference on Data Mining, pp 211-225.

[44]

Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3):603-626.

Digital Library

[45]

Zihayat M, An A (2014) Mining top-k high utility patterns over data streams. Inf Sci 285:138-161.

Digital Library

Cited By

Mathur PChand S(2025)Re-induction based mining for high utility item-setsApplied Intelligence10.1007/s10489-024-05855-755:1Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s10489-024-05855-7
Kumar RSingh K(2023)High utility itemsets mining from transactional databases: a surveyApplied Intelligence10.1007/s10489-023-04853-553:22(27655-27703)Online publication date: 16-Sep-2023
https://dl.acm.org/doi/10.1007/s10489-023-04853-5
Gan WChen GYin HFournier-Viger PChen CYu P(2022)Towards Revenue Maximization with Popular and Profitable ProductsACM/IMS Transactions on Data Science10.1145/34880582:4(1-21)Online publication date: 24-May-2022
https://dl.acm.org/doi/10.1145/3488058
Show More Cited By

Efficiently mining uncertain high-utility itemsets
1. Information systems
  1. Information systems applications

Recommendations

Efficient algorithms for mining high-utility itemsets in uncertain databases

High-utility itemset mining (HUIM) is a useful set of techniques for discovering patterns in transaction databases, which considers both quantity and profit of items. However, most algorithms for mining high-utility itemsets (HUIs) assume that the ...
Mining closed high utility itemsets in uncertain databases
SoICT '16: Proceedings of the 7th Symposium on Information and Communication Technology

In order to reduce the number of high-utility itemsets (HUIs), closed high-utility itemsets (CHUIs) have been proposed. However, most techniques for mining CHUIs require certain databases; i.e., there are no probabilities. However, in many real-world ...
A fast algorithm for mining high average-utility itemsets

Mining high-utility itemsets (HUIs) in transactional databases has become a very popular research topic in recent years. A popular variation of the problem of HUI mining is to discover high average-utility itemsets (HAUIs), where an alternative measure ...

Comments

Information & Contributors

Information

Published In

cover image Soft Computing - A Fusion of Foundations, Methodologies and Applications

Soft Computing - A Fusion of Foundations, Methodologies and Applications Volume 21, Issue 11

June 2017

326 pages

ISSN:1432-7643

EISSN:1433-7479

Issue’s Table of Contents

Copyright © Copyright © 2017 Springer-Verlag Berlin Heidelberg.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 June 2017

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mathur PChand S(2025)Re-induction based mining for high utility item-setsApplied Intelligence10.1007/s10489-024-05855-755:1Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1007/s10489-024-05855-7
Kumar RSingh K(2023)High utility itemsets mining from transactional databases: a surveyApplied Intelligence10.1007/s10489-023-04853-553:22(27655-27703)Online publication date: 16-Sep-2023
https://dl.acm.org/doi/10.1007/s10489-023-04853-5
Gan WChen GYin HFournier-Viger PChen CYu P(2022)Towards Revenue Maximization with Popular and Profitable ProductsACM/IMS Transactions on Data Science10.1145/34880582:4(1-21)Online publication date: 24-May-2022
https://dl.acm.org/doi/10.1145/3488058
Gui YGan WChen YWu Y(2022)Mining with Rarity for Web IntelligenceCompanion Proceedings of the Web Conference 202210.1145/3487553.3524708(973-981)Online publication date: 25-Apr-2022
https://dl.acm.org/doi/10.1145/3487553.3524708
Wu JLi ZSrivastava GYun ULin J(2022)Analytics of high average-utility patterns in the industrial internet of thingsApplied Intelligence10.1007/s10489-021-02751-252:6(6450-6463)Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1007/s10489-021-02751-2
Zhang CDu ZYang YGan WYu P(2021)On-Shelf Utility Mining of Sequence DataACM Transactions on Knowledge Discovery from Data10.1145/345757016:2(1-31)Online publication date: 21-Jul-2021
https://dl.acm.org/doi/10.1145/3457570
Srivastava GLin JJolfaei ALi YDjenouri Y(2021)Uncertain-Driven Analytics of Sequence Data in IoCV EnvironmentsIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2020.301238722:8(5403-5414)Online publication date: 1-Aug-2021
https://dl.acm.org/doi/10.1109/TITS.2020.3012387
Alhusaini NLi JAlhusaini A(2019)FLUI-GrowthProceedings of the 2019 International Conference on Artificial Intelligence and Computer Science10.1145/3349341.3349464(535-541)Online publication date: 12-Jul-2019
https://dl.acm.org/doi/10.1145/3349341.3349464
Lin JWu JFournier-Viger PHong TLi T(2019)Efficient Mining of High Average-Utility Sequential Patterns from Uncertain Databases2019 IEEE International Conference on Systems, Man and Cybernetics (SMC)10.1109/SMC.2019.8914546(1989-1994)Online publication date: 6-Oct-2019
https://dl.acm.org/doi/10.1109/SMC.2019.8914546

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents