skip to main content
10.1145/2492517.2500317acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Enhancing text clustering model based on truncated singular value decomposition, fuzzy art and cross validation

Published: 25 August 2013 Publication History

Abstract

Numerical schemes research on clustering model has been quite intensive in the past decade. The difficulties associated with curse of dimensionality and cost functions to reflect the general knowledge about internal structures and distributions of target data. Traditional computational clustering and variables selection schemes are struggling to estimate at high level of accuracy for this type of problem. Hence, in the present study, a novel semantic-based scheme was proposed to enhance the clustering accuracy. The results show that our conceptual model is automatic and optimal. Good comparisons with the experimental studies demonstrate the multidisciplinary applications of our approach.

References

[1]
O Alter, P O Brown, and D Botstein. Singular value decomposition for genome-wide expression data processing and modeling. Proceedings of the National Academy of Sciences, 97(18): 10101--10106, 2000.
[2]
M J Anderson. A new method for non-parametric multivariate analysis of variance. Austral Ecology, 26(1): 32--46, 2001.
[3]
M Baena-Garcia, J M Carmona-Cejudo, G Castillo, and R Morales-Bueno. TF-SIDF: Term frequency, sketched inverse document frequency. In Intelligent Systems Design and Applications (ISDA), 2011 11th International Conference on, pages 1044--1049, 2011.
[4]
Xu Baowen, Lu Jianjiang, and Huang Gangshi. A constrained non-negative matrix factorization in information retrieval. In Information Reuse and Integration, 2003. IRI 2003. IEEE International Conference on, pages 273--277, 2003.
[5]
P Berkhin. A survey of clustering data mining techniques. Grouping multidimensional data, pages 25--71, 2006.
[6]
F Berzal and N Matín. Data mining: concepts and techniques by Jiawei Han and Micheline Kamber. ACM Sigmod Record, 31(2): 66--68, 2002.
[7]
G A Carpenter, S Grossberg, and D B Rosen. Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system. Neural networks, 4(6): 759--771, 1991.
[8]
Jean-Guy Meunier Choukri Djellali and Sylvain Delisle. A new approach to the evolution of Data Mining ontology. The 3rd International Conference on the Extraction and Management of Knowledge - Maghreb Hammamet, Tunisia., 2012.
[9]
Richard O Duda, Peter E Hart, and David G Stork. Pattern classification. Wiley, New York; Toronto, 2nd edition, 2001.
[10]
L O Hall, I B Ozyurt, and J C Bezdek. Clustering with a genetically optimized approach. Evolutionary Computation, IEEE Transactions on, 3(2): 103--112, 1999.
[11]
Li Heping, Liu Jie, and Zhang Shuwu. Hierarchical Latent Dirichlet Allocation models for realistic action recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 1297--1300, 2011.
[12]
Zhijun Huang and Chuangwen Xia. Average Clustering of Discrete Data Based on Probability and its Application to Expressway Toll Fraud Detection. In Intelligent Systems, 2009. GCIS '09. WRI Global Congress on, volume 2, pages 404--407, 2009.
[13]
H Isawa, H Matsushita, and Y Nishio. Fuzzy Adaptive Resonance Theory Combining Overlapped Category in consideration of connections. In Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on, pages 3595--3600, 2008.
[14]
B Issac and W J Jap. Implementing spam detection using Bayesian and Porter Stemmer keyword stripping approaches. In TENCON 2009 - 2009 IEEE Region 10 Conference, pages 1--5, 2009.
[15]
T M Khoshgoftaar, J Van Hulse, and A Napolitano. Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data. Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 41(3): 552--568, 2011.
[16]
Meng Lei and Tan Ah-Hwee. Semi-supervised hierarchical clustering for personalized web image organization. In Neural Networks (IJCNN), The 2012 International Joint Conference on, pages 1--8, 2012.
[17]
P Lingras. Rough set clustering for Web mining. In Fuzzy Systems, 2002. FUZZ-IEEE'02. Proceedings of the 2002 IEEE International Conference on, volume 2, pages 1039--1044, 2002.
[18]
Xu Rui and D Wunsch II. Survey of clustering algorithms. Neural Networks, IEEE Transactions on, 16(3): 645--678, 2005.
[19]
G Salton, A Wong, and C S Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11): 613--620, 1975.
[20]
Y Shavitt, E Weinsberg, and U Weinsberg. Applying clustering algorithms on Peer-to-Peer networks for content searching and recommendation. In Electrical and Electronics Engineers in Israel (IEEEI), 2010 IEEE 26th Convention of, pages 244--248, 2010.
[21]
Luan Shuhan, Kong Xiangwei, Wang Bo, Guo Yanqing, and You Xingang. Silhouette coefficient based approach on cell-phone classification for unknown source images. In Communications (ICC), 2012 IEEE International Conference on, pages 6744--6747, 2012.
[22]
Nath Shyam Varan. Crime Pattern Detection Using Data Mining. In Web Intelligence and Intelligent Agent Technology Workshops, 2006. WI-IAT 2006 Workshops. 2006 IEEE/WIC/ACM International Conference on, pages 41--44, 2006.
[23]
Zhuang Weiwei, Ye Yanfang, Chen Yong, and Li Tao. Ensemble Clustering for Internet Security Applications. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 42(6): 1784--1796, 2012.
[24]
Yang Xiaobing, Kong Fansheng, Xu Weihua, and Liu Bihong. Gaussian mixture density modeling and decomposition with weighted likelihood. In Intelligent Control and Automation, 2004. WCICA 2004. Fifth World Congress on, volume 5, pages 4245--4249 Vol. 5, 2004.
[25]
Zhang Yanchun and Xu Guandong. Using Web Clustering for Web Communities Mining and Analysis. In Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT '08. IEEE/WIC/ACM International Conference on, volume 1, pages 20--31, 2008.
[26]
Zhou Yingjie, Kenneth R Fleischmann, and W A Wallace. Automatic Text Analysis of Values in the Enron Email Dataset: Clustering a Social Network Using the Value Patterns of Actors. In System Sciences (HICSS), 2010 43rd Hawaii International Conference on, pages 1--10, 2010.
[27]
Zhang Yong, Fan Bin, and Xiao Long-bin. Web Page Classification Based on a Least Square Support Vector Machine with Latent Semantic Analysis. In Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on, volume 2, pages 528--532, 2008.
[28]
Lu Yu and Lee Hong. Blog Community Discovery Based on Tag Data Clustering. In Computational Intelligence and Industrial Application, 2008. PACIIA '08. Pacific-Asia Workshop on, volume 2, pages 14--18, 2008.
[29]
A N K Zaman, P Matsakis, and C Brown. Evaluation of stop word lists in text retrieval using Latent Semantic Indexing. In Digital Information Management (ICDIM), 2011 Sixth International Conference on, pages 133--136, 2011.
[30]
J Zhao, G Y Wang, Z F Wu, H Tang, and H Li. The study on technologies for feature selection. volume 2, pages 689--693 vol. 2. IEEE, 2002.

Cited By

View all
  • (2024)Data Mining Efficiency in the ESG Indexes Verbalization Analysis (on the Example of the MSCI Site)Ecological Footprint of the Modern Economy and the Ways to Reduce It10.1007/978-3-031-49711-7_3(13-16)Online publication date: 27-Feb-2024
  • (2017)Quality and performance evaluation of the algorithms KMART and FCM for fuzzy clustering and categorization2017 IEEE 21st International Conference on Intelligent Engineering Systems (INES)10.1109/INES.2017.8118571(000285-000290)Online publication date: Oct-2017
  • (2014)A new conceptual model for dynamic text clustering Using unstructured text as a caseProceedings of the 2014 International C* Conference on Computer Science & Software Engineering10.1145/2641483.2641538(1-7)Online publication date: 3-Aug-2014

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASONAM '13: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
August 2013
1558 pages
ISBN:9781450322409
DOI:10.1145/2492517
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. NLP
  2. TSVD
  3. data mining
  4. learning
  5. model selection
  6. semantic analysis
  7. variable selection

Qualifiers

  • Research-article

Conference

ASONAM '13
Sponsor:
ASONAM '13: Advances in Social Networks Analysis and Mining 2013
August 25 - 28, 2013
Ontario, Niagara, Canada

Acceptance Rates

Overall Acceptance Rate 116 of 549 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Data Mining Efficiency in the ESG Indexes Verbalization Analysis (on the Example of the MSCI Site)Ecological Footprint of the Modern Economy and the Ways to Reduce It10.1007/978-3-031-49711-7_3(13-16)Online publication date: 27-Feb-2024
  • (2017)Quality and performance evaluation of the algorithms KMART and FCM for fuzzy clustering and categorization2017 IEEE 21st International Conference on Intelligent Engineering Systems (INES)10.1109/INES.2017.8118571(000285-000290)Online publication date: Oct-2017
  • (2014)A new conceptual model for dynamic text clustering Using unstructured text as a caseProceedings of the 2014 International C* Conference on Computer Science & Software Engineering10.1145/2641483.2641538(1-7)Online publication date: 3-Aug-2014

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media