skip to main content
10.1145/3396452.3396460acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbdeConference Proceedingsconference-collections
research-article

Keywords Extraction Based on Word2Vec and TextRank

Published: 22 May 2020 Publication History

Abstract

In order to improve the performance of keyword extraction by enhancing the semantic representations of documents, we propose a method of keyword extraction which exploits the document's internal semantic information and the semantic representations of words pre-trained by massive external documents. Firstly, we utilize the deep learning tool Word2Vec to characterize the external document information, and evaluate the similarity between the words by the cosine distance, thus we obtain the semantic information between words in the external documents. Then, the word-to-word similarity is used to replace the probability transfer matrix in the TextRank of word graph of the target document. At the same time, the information of the title and the abstract of the internal document are exploited to construct the words' semantic graph for keyword extraction. The experiments select the related academic paper data from AMiner as experimental data set. The experimental results show that our method outperforms the TextRank algorithm and the precision, recall and F-score of the five keywords are increased by 28.60%, 10.70% and 12.90% respectively compared to the single TextRank algorithm.

References

[1]
Mihalcea, Rada., and Tarau, Paul. 2004. Textrank: bringing order into texts. Emnlp, 404--411. DOI= http://dx.doi.org/
[2]
Tian, Xia. 2013. Study on keyword extraction using word position weighted textrank. New Technology of Library and Information Service. 237 (09): 30--34.
[3]
Jianfei, Ning., and Jiangzhen, L. 2016. Using word2vec with text rank to extract keywords. New Technology of Library and Information Service.271 (06): 20--26.
[4]
Qifei, Liu., and Weiyu, Sheng. 2018. Research of keyword extraction of political news based on word2vec and textrank. Information Research. 248 (06): 26--31. (In Chinese)
[5]
Zhang, Kuo., Xu, Hui., Tang, Jie., and Juangzi, Li. 2006. Keyword Extraction Using Support Vector Machine. International Conference on Advances in Web-age Information Management. Springer-Verlag. 85--96. DOI=https://doi.org/10.1007/11775300_8
[6]
Liang, Hu., Lei, Xia., and Wei, Li. 2017. Keyword Extraction System Based on Improved TF-IDF Algorithm. Journal of Xiamen University of Technology. 25 (05): 73--78. (In Chinese)
[7]
Qiang, Jia., et al. 2017. Research on Improved TF-IDF Text Feature Word Extraction Algorithm. Journal of Liaoning University of Petroleum & Chemical Technology. 37 (4): 23--29. (In Chinese)
[8]
Blei, David M., et al. 2003. Latent dirichlet allocation. Journal of Machine Learning Research. 3993--1022.
[9]
Alokaili, Areej., Aletras, Nikolaos., Stevenson, Mark. 2019. Re-ranking words to improve interpretability of automatically generated topics. In Proceedings of the 13th International Conference on Computational Semantics - Long Papers. 43--54. DOI=10.18653 / v1 / W19-0404
[10]
Yijun, Guang., and Tian, Xia. 2014. Study on keyword extraction with lda and textrank combination. New Technology of Library and Information Service. 41--47.
[11]
Negi, Sumit. 2014. Document Keyphrase Extraction Using Label Information. In Proceedings of {COLING} 2014, the 25th International Conference on Computational Linguistics: Technical Papers. 1468--1476.
[12]
Jun, Chen., et al. 2018. Keyphrase Generation with Correlation Constraints. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 4057--4066. DOI=10.18653/v1/D18-1439
[13]
Nicosia, Massimo., and Moschitti, Alessandro. 2018. Semantic Linking in Convolutional Neural Networks for Answer Sentence Selection. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 1070--1076. DOI= 10.18653/v1/D18-1133
[14]
Timothy, Niven. and Hung-Yu, Kao. 2019. Probing Neural Network Comprehension of Natural Language Arguments. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4658--4664. DOI=10.18653/v1/P19-1459
[15]
Guangyi, Li., and Houfeng, Wang. 2014. Improved automatic keyword extraction based on textrank using domain knowledge. Communications in Computer & Information Science. 496:403--413.
[16]
Xiao-Lei, Bai., Guang-Jun, Huang., and Jian-Hui, Duan. 2014. A keyword extraction method based on bp neural network. Journal of Hefei University of Technology (Natural Science). 37(07):807--811.
[17]
Johannes, Villmow., Marco, Wrzalik., Dirk, Krechel. 2018. Automatic Keyphrase Extraction Using Recurrent Neural Networks. M. Machine Learning and Data Mining in Pattern Recognition.
[18]
Page Lawrence., et al. 1999.The PageRank Citation Ranking: Bringing Order to the Web. R. Stanford InfoLab.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICBDE '20: Proceedings of the 2020 3rd International Conference on Big Data and Education
April 2020
85 pages
ISBN:9781450374989
DOI:10.1145/3396452
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • University of Sunderland, UK: University of Sunderland, UK
  • City University of Hong Kong: City University of Hong Kong

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. TextRank
  2. keyword extraction
  3. word map
  4. word2vec

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICBDE '20

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)2
Reflects downloads up to 28 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media