skip to main content
research-article

Efficient Algorithms for Mining the Concise and Lossless Representation of High Utility Itemsets

Published: 01 March 2015 Publication History

Abstract

Mining high utility itemsets (HUIs) from databases is an important data mining task, which refers to the discovery of itemsets with high utilities (e.g. high profits). However, it may present too many HUIs to users, which also degrades the efficiency of the mining process. To achieve high efficiency for the mining task and provide a concise mining result to users, we propose a novel framework in this paper for mining closed<sup>+</sup> high utility itemsets(CHUIs), which serves as a compact and lossless representation of HUIs. We propose three efficient algorithms named AprioriCH (Apriori-based algorithm for mining High utility Closed<sup>+</sup> itemsets), AprioriHC-D (AprioriHC algorithm with Discarding unpromising and isolated items) and CHUD (Closed<sup>+</sup> High Utility Itemset Discovery) to find this representation. Further, a method called DAHU (Derive All High Utility Itemsets) is proposed to recover all HUIs from the set of CHUIs without accessing the original database. Results on real and synthetic datasets show that the proposed algorithms are very efficient and that our approaches achieve a massive reduction in the number of HUIs. In addition, when all HUIs can be recovered by DAHU, the combination of CHUD and DAHU outperforms the state-of-the-art algorithms for mining HUIs.

References

[1]
R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” in Proc. 20th Int. Conf. Very Large Data Bases, 1994, pp. 487–499 .
[2]
C. F. Ahmed, S. K. Tanbeer, B.-S. Jeong, and Y.-K. Lee, “Efficient tree structures for high utility pattern mining in incremental databases,” IEEE Trans. Knowl. Data Eng. , vol. 21, no. 12, pp. 1708–1721, Dec. 2009.
[3]
J.-F. Boulicaut, A. Bykowski, and C. Rigotti, “Free-sets: A condensed representation of Boolean data for the approximation of frequency queries,” Data Mining Knowl. Discovery, vol. 7, no. 1, pp. 5– 22, 2003.
[4]
T. Calders and B. Goethals, “Mining all non-derivable frequent itemsets,” in Proc. Int. Conf. Eur. Conf. Principles Data Mining Knowl. Discovery, 2002, pp. 74– 85.
[5]
K. Chuang, J. Huang, and M. Chen, “Mining top-k frequent patterns in the presence of the memory constraint,” VLDB J., vol. 17, pp. 1321–1344, 2008.
[6]
R. Chan, Q. Yang, and Y. Shen, “Mining high utility itemsets,” in Proc. IEEE Int. Conf. Data Min., 2003, pp. 19–26.
[7]
A. Erwin, R. P. Gopalan, and N. R. Achuthan, “ Efficient mining of high utility itemsets from large datasets,” in Proc. Int. Conf. Pacific-Asia Conf. Knowl. Discovery Data Mining, 2008, pp. 554 –561.
[8]
K. Gouda and M. J. Zaki, “Efficiently mining maximal frequent itemsets,” in Proc. IEEE Int. Conf. Data Mining, 2001, pp. 163–170.
[9]
T. Hamrouni, “Key roles of closed sets and minimal generators in concise representations of frequent patterns,” Intell. Data Anal., vol. 16, no. 4, pp. 581–631, 2012.
[10]
J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2000, pp. 1–12.
[11]
T. Hamrouni, S. Yahia, and E. M. Nguifo, “Sweeping the disjunctive search space towards mining new exact concise representations of frequent itemsets,” Data Knowl. Eng., vol. 68, no. 10, pp. 1091 –1111, 2009.
[12]
H.-F. Li, H.-Y. Huang, Y.-C. Chen, Y.-J. Liu, and S.-Y. Lee, “Fast and memory efficient mining of high utility itemsets in data streams,” in Proc. IEEE Int. Conf. Data Mining, 2008, pp. 881–886.
[13]
C.-W. Lin, T.-P. Hong, and W.-H. Lu, “An effective tree structure for mining high utility itemsets,” Expert Syst. Appl., vol. 38, no. 6, pp.  7419–7424, 2011.
[14]
G.-C. Lan, T.-P. Hong, and V. and S. Tseng, “An efficient projection-based indexing approach for mining high utility itemsets,” Knowl. Inf. Syst, vol. 38, no. 1, pp. 85 –107, 2014.
[15]
H. Li, J. Li, L. Wong, M. Feng, and Y. Tan, “Relative risk and odds ratio: A data mining perspective,” in Proc. ACM SIGACT-SIGMOD-SIGART Symp. Principles Database Syst., 2005, pp. 368–377.
[16]
B. Le, H. Nguyen, T. A. Cao, and B. Vo, “A novel algorithm for mining high utility itemsets,” in Proc. 1st Asian Conf. Intell. Inf. Database Syst., 2009, pp. 13–17.
[17]
Y. Liu, W. Liao, and A. Choudhary, “A fast high utility itemsets mining algorithm,” in Proc. Utility-Based Data Mining Workshop, 2005, pp. 90–99.
[18]
C. Lucchese, S. Orlando, and R. Perego, “Fast and memory efficient mining of frequent closed itemsets,” IEEE Trans. Knowl. Data Eng. , vol. 18, no. 1, pp. 21–36, Jan. 2006.
[19]
Y.-C. Li, J.-S. Yeh, and C.-C. Chang, “Isolated items discarding strategy for discovering high utility itemsets,” Data Knowl. Eng. , vol. 64, no. 1, pp. 198–217, 2008.
[20]
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Efficient mining of association rules using closed itemset lattice,” J. Inf. Syst., vol 24, no. 1, pp. 25–46, 1999.
[21]
J. Pei, J. Han, H. Lu, S. Nishio, S. Tang, and D. Yang, “H-mine: Fast and space-preserving frequent pattern mining in large databases,” IIE Trans., vol. 39, no. 6, pp. 593–605, Jun. 2007 .
[22]
B.-E. Shie, H.-F. Hsiao, V. S. Tseng, and P. S. Yu, “Mining high utility mobile sequential patterns in mobile commerce environments,” in Proc. Int. Conf. Database Syst. Adv. Appl., 2011, vol. 6587, pp. 224– 238.
[23]
B.-E. Shie, V. S. Tseng, and P. S. Yu, “Online mining of temporal maximal utility itemsets from data streams,” in Proc. Annu. ACM Symp. Appl. Comput., 2010, pp. 1622–1626.
[24]
V. S. Tseng, C.-W. Wu, B.-E. Shie, and P. S. Yu, “UP-Growth: An efficient algorithm for high utility itemset mining,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2010, pp. 253–262.
[25]
B. Vo, H. Nguyen, T. B. Ho, and B. Le, “Parallel method for mining high utility itemsets from vertically partitioned distributed databases,” in Proc. Int. Conf. Knowl.-Based Intell. Inf. Eng. Syst., 2009, pp. 251–260.
[26]
C.-W. Wu, B.-E. Shie, V. S. Tseng, and P. S. Yu, “Mining top-k high utility itemsets,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2012, pp. 78–86.
[27]
J. Wang, J. Han, and J. Pei, “Closet+: Searching for the best strategies for mining frequent closed itemsets,” in Proc. ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2003, pp. 236– 245.
[28]
C.-W Wu, P. Fournier-Viger, P. S. Yu, and V. S. Tseng, “Efficient mining of a concise and lossless representation of high utility itemsets,” in Proc. IEEE Int. Conf. Data Mining, 2011, pp. 824–833.
[29]
U. Yun, “Mining lossless closed frequent patterns with weight constraints,” Knowl.-Based Syst., vol. 20, pp. 86–97, 2007.
[30]
H. Yao, H. J. Hamilton, and L. Geng, “A unified framework for utility-based measures for mining itemsets,” in Proc. ACM SIGKDD 2nd Workshop Utility-Based Data Mining, 2006, pp. 28–37 .
[31]
M. J. Zaki and C. J. Hsiao, “Efficient algorithms for mining closed itemsets and their lattice structure,” in IEEE Trans. Knowl. Data Eng., vol. 17, no. 4, pp. 462–478, Apr. 2005.
[32]
Frequent itemset mining implementations repository http://fimi.ua.ac.be/data/

Cited By

View all

Index Terms

  1. Efficient Algorithms for Mining the Concise and Lossless Representation of High Utility Itemsets
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image IEEE Transactions on Knowledge and Data Engineering
      IEEE Transactions on Knowledge and Data Engineering  Volume 27, Issue 3
      March 2015
      286 pages

      Publisher

      IEEE Educational Activities Department

      United States

      Publication History

      Published: 01 March 2015

      Author Tags

      1. data mining
      2. Frequent itemset
      3. closed+ high utility itemset
      4. lossless and concise representation
      5. utility mining

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 03 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media