skip to main content
10.1145/3448734.3450777acmotherconferencesArticle/Chapter ViewAbstractPublication PagesdtmisConference Proceedingsconference-collections
research-article

Research on Multi-granularity Ensemble Learning Based on Korean

Published: 17 May 2021 Publication History

Abstract

Ensemble learning can train and combine multiple classifiers where the predictions are used as new features to train a meta-classifier. This improves the accuracy of the model. This paper proposes a multi granularity model based on Stacking ensemble learning for Korean text classification. Firstly, eojeol and subeojeol granularity is proposed according to the Korean language composition. Since different feature granularity contains different semantic information, compare the six different granularities of the phoneme, syllable, subword, word, subeojeol, and eojeol in Korean text classification task. Secondly, construct suffix words based on Korean grammatical morphology and compare the different granularities effects after suffix preprocessing. Finally, propose a multi granularity ensemble learning model based on Korean called MGEL-K. To enrich the diversity of ensemble learning using different granularities, making differences between learners. The results show that MGEL-K model proposed in this paper works best in the Korean text classification task with an accuracy of 92.33%.

References

[1]
L. K. Hansen and P. Salamon, “Neural network ensembles,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 12, no. 10, pp. 993–1001, 1990.
[2]
T. G. Dietterich, “Ensemble methods in machine learning,” in International workshop on multiple classifier systems, 2000, pp. 1–15.
[3]
R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised learning algorithms,” in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 161–168
[4]
X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” in Advances in neural information processing systems, 2015, pp. 649–657.
[5]
R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words with subword units,” arXiv Prepr. arXiv1508.07909, 2015.
[6]
Mintae Kim, Yeongtaek Oh, and Wooju Kim, “Sentence similarity prediction based on siamese CNN-Bidirectional LSTM with Self-attention,” Korean Inst. Inf. Sci. Eng., vol. 46, no. 3, pp. 241–245, 2019.
[7]
X. Chen, L. Xu, Z. Liu, M. Sun, and H. Luan, “Joint learning of character and word embeddings,” 2015.
[8]
J. Yu, X. Jian, H. Xin, and Y. Song, “Joint embeddings of chinese words, characters, and fine-grained subcharacter components,” in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 286–291.
[9]
X. Meng, Y. Zhao, and M. Fang, “Multilingual text classification method based on bi-directional long term memory and convolutional neural network,” Appl. Res. Comput., vol. 37, no. 9, pp. 2669–2673, 2020.
[10]
E. L. Park and S. Cho, “KoNLPy: Korean natural language processing in Python,” Proc. 26th Annu. Conf. Hum. Cogn. Lang. Technol., pp. 133–136, 2014.
[11]
T. Kudo, “Subword regularization: Improving neural network translation models with multiple subword candidates,” arXiv Prepr. arXiv1804.10959, 2018.
[12]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv Prepr. arXiv1810.04805, 2018.
[13]
Cho S, Whitman J, “Korean: A Linguistic Introduction,” Cambridge University Press, 2019, pp. 31-35.
[14]
F. Yang, Y. Zhao, R. Cui, and Z. Yi, “Words Alignment in Parallel Corpus Based on Translation Probability,” J. Chinese Inf. Process., vol. 33, no. 12, pp. 37–44, 2019.
[15]
R. E. Schapire, “The strength of weak learnability,” Mach. Learn., vol. 5, no. 2, pp. 197–227, 1990.
[16]
L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2, pp. 123–140, 1996.
[17]
D. H. Wolpert, “Stacked generalization,” Neural networks, vol. 5, no. 2, pp. 241–259, 1992.
[18]
K. Tumer and J. Ghosh, “Analysis of decision boundaries in linearly combined neural classifiers,” Pattern Recognit., vol. 29, no. 2, pp. 341–348, 1996.
[19]
Y. Kim, “Convolutional neural networks for sentence classification,” arXiv Prepr. arXiv1408.5882, 2014.
[20]
A. Mnih and G. E. Hinton, “A scalable hierarchical distributed language model,” Adv. Neural Inf. Process. Syst., vol. 21, pp. 1081–1088, 2008.
[21]
Z. Lin, “A structured self-attentive sentence embedding,” arXiv Prepr. arXiv1703.03130, 2017.
[22]
A. Vaswani, “Attention is all you need,” in Advances in neural information processing systems, 2017, pp. 5998–6008.
[23]
M. Tian, Y. Zhao, and R. Cui, “Identifying Word Translations in Scientific Literature Based on Labeled Bilingual Topic Model and Co-occurrence Features,” in Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, Springer, 2018, pp. 76–87.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CONF-CDS 2021: The 2nd International Conference on Computing and Data Science
January 2021
1142 pages
ISBN:9781450389570
DOI:10.1145/3448734
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Ensemble learning
  2. Korean natural language processing
  3. multi-granularity segment
  4. text classification

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CONF-CDS 2021

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media