skip to main content
article

Finding efficiencies in frequent pattern mining from big uncertain data

Published: 01 May 2017 Publication History

Abstract

Many existing data mining algorithms search interesting patterns from transactional databases of precise data. However, there are situations in which data are uncertain. Items in each transaction of these probabilistic databases of uncertain data are usually associated with existential probabilities, which express the likelihood of these items to be present in the transaction. When compared with mining from precise data, the search space for mining from uncertain data is much larger due to the presence of the existential probabilities. This problem is worsened as we are moving to the era of Big data. Furthermore, in many real-life applications, users may be interested in a tiny portion of this large search space for Big data mining. Without providing opportunities for users to express the interesting patterns to be mined, many existing data mining algorithms return numerous patterns--out of which only some are interesting. In this article, we propose an algorithm that allows users to express their interest in terms of constraints, uses the MapReduce model to mine uncertain Big data for frequent patterns that satisfy the user-specified anti-monotone and monotone constraints, as well as balance the load.

References

[1]
Agarwal, P., Shroff, G., Malhotra, P.: Approximate incremental big-data harmonization. In: IEEE Big Data Congress, pp. 118---125 (2013)
[2]
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB, pp. 487---499 (1994)
[3]
Azzini, A., Ceravolo, P.: Consistent process mining over Big data triple stores. In: IEEE Big Data Congress, pp. 54---61 (2013)
[4]
Can, F., Ozkarahan, E.A.: Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases. ACM TODS 15(4), 483---517 (1990)
[5]
Condie, T., Mineiro, P., Polyzotis, N., Weimer, M.: Machine learning for Big data. In: ACM SIGMOD, pp. 939---942 (2013)
[6]
Cordeiro, R.L.F., Traina Jr, C., Traina, A.J.M., López, J., Kang, U., Faloutsos, C.: Clustering very large multi-dimensional datasets with MapReduce. In: ACM KDD, pp. 690---698 (2011)
[7]
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. CACM 51(1), 107---113 (2008)
[8]
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: ACM SIGMOD, pp. 1---12 (2000)
[9]
Koufakou, A., Secretan, J., Reeder, J., Cardona, K., Georgiopoulos, M.: Fast parallel outlier detection for categorical datasets using MapReduce. In: IEEE IJCNN, pp. 3298---3304 (2008)
[10]
Kumar, A., Niu, F., Ré, C.: Hazy: making it easier to build and maintain Big-data analytics. CACM 56(3), 40---49 (2013)
[11]
Lakshmanan, L.V.S., Leung, C.K.-S., Ng, R.T.: Efficient dynamic mining of constrained frequent sets. ACM TODS 28(4), 337---389 (2003)
[12]
Lee, S., Jo, S., Kim, J.: MRDataCube: data cube computation using MapReduce. In: BigComp, pp. 95---102 (2015)
[13]
Leung, C.K.-S.: Frequent itemset mining with constraints. In: Encyclopedia of Database Systems, pp. 1179---1183 (2009)
[14]
Leung, C.K.-S.: Uncertain frequent pattern mining. In: Frequent Pattern Mining, pp. 417---453 (2014)
[15]
Leung, C.K.-S., Cuzzocrea, A., Jiang, F.: Discovering frequent patterns from uncertain data streams with time-fading and landmark models. Transactions on Large-Scale Data- and Knowledge-Centered Systems 8, 174---196 (2013)
[16]
Leung, C.K.-S., Jiang, F.: Big data analytics of social networks for the discovery of `following' patterns. In: DaWaK, pp. 123---135 (2015)
[17]
Leung, C.K.-S., Lakshmanan, L.V.S., Ng, R.T.: Exploiting succinct constraints using FP-trees. ACM SIGKDD Explorations 4(1), 40---49 (2002)
[18]
Leung, C.K.-S., Mateo, M.A.F., Brajczuk, D.A.: A tree-based approach for frequent pattern mining from uncertain data. In: PAKDD, pp. 653---661 (2008)
[19]
Leung, C.K.-S., MacKinnon, R.K., Jiang, F.: Reducing the search space for Big data mining for interesting patterns from uncertain data. In: IEEE Big Data Congress, pp. 315---322 (2014)
[20]
Leung, C.K.-S., Tanbeer, S.K.: Fast tree-based mining of frequent itemsets from uncertain data. In: DASFAA, pp. 272---287 (2012)
[21]
Leung, C.K.-S., Tanbeer, S.K.: PUF-tree: A compact tree structure for frequent pattern mining of uncertain data. In: PAKDD, pp. 13---25 (2013)
[22]
Lin, M.-Y., Lee, P.-Y., Hsueh, S.-C.: Apriori-based frequent itemset mining algorithms on MapReduce. In: ICUIMC, art. 76 (2012)
[23]
Madden, S.: From databases to big data. IEEE Internet Comput. 16(3), 4---6 (2012)
[24]
Ng, R.T., Lakshmanan, L.V.S., Han, J., Pang, A.: Exploratory mining and pruning optimizations of constrained associations rules. In: ACM SIGMOD, pp. 13---24 (1998)
[25]
Ölmezoğullari, E., Ari, I.: Online association rule mining over fast data. In: IEEE Big Data Congress 2013, pp. 110---117 (2013)
[26]
Pei, T., Sobolevsky, S., Ratti, C., Shaw, S.-L., Li, T., Zhou, C.: A new insight into land use classification based on aggregated mobile phone data. Int. J. Geogr. Inf. Sci. 28(9), 1988---2007 (2014)
[27]
Riondato, M., DeBrabant, J., Fonseca, R., Upfal, E.: PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce. In: ACM CIKM, pp. 85---94 (2012)
[28]
Sobolevsky, S., Sitko, I., Tachet des Combes, R., Hawelka, B., Arias, J. M., Ratti, C.: Money on the move: Big data of bank card transactions as the new proxy for human mobility patterns and regional delineation. The case of residents and foreign visitors in Spain. In: IEEE Big Data Congress, pp. 136---143 (2014)
[29]
Song, M.: Exploring concept graphs for biomedical literature mining. In: BigComp 2015, pp. 103---110
[30]
Tong, Y., Chen, L., Cheng, Y., Yu, P.S.: Mining frequent itemsets over uncertain databases. PVLDB 5(11), 1650---1661 (2012)
[31]
Xin, J., Wang, Z., Chen, C., Ding, L., Wang, G., Zhao, Y.: ELM : distributed extreme learning machine with MapReduce. World Wide Web 17, 1189---1204 (2014)
[32]
Yang, H., Fong, S.: Countering the concept-drift problem in big data using iOVFDT. In: IEEE Big Data Congress, pp. 126---132 (2013)
[33]
Yang, S., Wang, B., Zhao, H., Wu, B.: Efficient dense structure mining using MapReduce. In: IEEE ICDM Workshops, pp. 332---337 (2009)
[34]
Zaki, M.J.: Parallel and distributed association mining: a survey. IEEE Concurr. 7(4), 14---25 (1999)
[35]
Zeng, C., Lu, Z., Wang, J., Hung, P.C.K., Tian, J.: Variable granularity index on massive service processes. In: IEEE ICWS, pp. 18---25 (2013)

Cited By

View all
  1. Finding efficiencies in frequent pattern mining from big uncertain data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image World Wide Web
    World Wide Web  Volume 20, Issue 3
    May 2017
    155 pages

    Publisher

    Kluwer Academic Publishers

    United States

    Publication History

    Published: 01 May 2017

    Author Tags

    1. Algorithms and programming techniques for big data processing
    2. Big data analytics
    3. Big data models and algorithms
    4. Big data search and mining
    5. Constraints
    6. Frequent patterns
    7. Uncertain data

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media