skip to main content
10.1145/3695652.3695671acmotherconferencesArticle/Chapter ViewAbstractPublication PagesimmsConference Proceedingsconference-collections
research-article

A News Keyword Extraction Method Considering both Semantics and Pragmatics

Published: 15 October 2024 Publication History

Abstract

Based on pragmatics theory, this study aims to incorporate news users' information into keyword extraction. It integrates the popularity of news keywords among users into the TextRank model to improve the effectiveness of single-document keyword extraction. Linguistic theories are employed to explore the role of pragmatics in news keyword extraction. The popularity of candidate keywords among users is integrated into TextRank, constructing a candidate keyword graph and calculating the probability transition matrix. Through iterative calculations of candidate keyword scores, the top N words are selected to generate news keywords. It is necessary to incorporate news users' information into news keyword extraction based on pragmatics theory. The introduction of keyword popularity effectively retains the semantic and pragmatic information of news texts. Testing with datasets that evaluate semantic information, the Hot-TextRank algorithm outperforms the comparative methods in different news domains when using Top N=3 and Top N=5. With Top N=3, the average F-score improvement across various domains is 2.2%. Additionally, experimental results demonstrate that the Hot-TextRank algorithm can retain the pragmatics information of news texts and has the ability to extract appropriate keywords based on user characteristics. This paper innovatively proposes theHot-TextRank algorithm. The algorithm introduces the popularity of news keywords in users into keyword extraction, which provides technical support for news keyword extraction considering user information.

References

[1]
Niu, P. and Huang, D. (2016), “TF-IDF and Rules Based Automatic Extraction of Chinese Keywords”, Journal of Chinese Mini-Micro Computer Systems, Vol. 37 No. 04, pp. 711–715.
[2]
Zhao, J. (2017), “Review of Research in Automatic Keyword Extraction”, Journal of Software, Vol. 28 No. 09, pp. 2431–2449.
[3]
Marris, C.W. (2009), Selected writings of Charles W. Morris. social sciences academic press (CHINA), Beijing.
[4]
Wu, J. (2012), The Beauty of Math, Post & Telecom Press, Beijing, China.
[5]
Wu, H. (2021), “Research on Textual Semantic Similarity Modeling Based on Conceptual Informativeness”, dissertation, Beijing Institute of Technology.
[6]
Liu, C. (2016), “The Study of Advertising Symbols in the Light of Morris's Semiotic Theory”, thesis, Heilongjiang University.
[7]
Zhao, J. (2017), “A Review of Automatic Keyword Extraction Research”, Journal of Software, Vol. 28 No. 09, pp. 2431–2449.
[8]
Ding, T., Yang, W., Wei, F., Ding, C., Kang, P. and Bu, W. (2022), “Chinese keyword extraction model with distributed computing”, Computers & Electrical Engineering, Vol. 97.
[9]
She, C., You, H., Lin, C., Liu, S., Liang, B., Jia, J., Zhang, X., (2020), “Deep Neural Semantic Network for keywords extraction on short text”, Communications in Computer and Information Science, Vol. 1258, pp. 101–112.
[10]
Alzaidy, R., Caragea, C. and Giles, C.L. (2019), “Bi-LSTM-CRF sequence labeling for keyphrase extraction from scholarly documents”, The World Wide Web Conference, pp. 2551–2557.
[11]
Huang, Z. and Xie, Z. (2021), “A patent keywords extraction method using TextRank model with prior public knowledge”, Complex & Intelligent Systems, Vol. 8 No. 1, pp. 1–12.
[12]
Gao, N. (2020), “Keywords Extraction Method Based on Semantic Feature Fusion”, Computer Science, Vol. 47 No. 03, pp. 110–115.
[13]
Ao, X., Yu, X., Liu, D. and Tian, H. (2020), “News keywords extraction algorithm based on TextRank and classified TF-IDF”, 2020 International Wireless Communications and Mobile Computing (IWCMC), pp. 1364–1369.
[14]
Zheng, X., Zhou, T., Wang, Y. and Li, S. (2022), “An improved TextRank-based method for Chinese text summarization”, ICAIS 2022: Artificial Intelligence and Security, Vol. 13339, pp. 140–149.

Index Terms

  1. A News Keyword Extraction Method Considering both Semantics and Pragmatics

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    IMMS '24: Proceedings of the 2024 7th International Conference on Information Management and Management Science
    August 2024
    465 pages
    ISBN:9798400716997
    DOI:10.1145/3695652
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 October 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. External Knowledge
    2. News Keywords
    3. Pragmatics
    4. TextRank

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    • Technology project of State Grid Shaanxi Electric Power Co., Ltd

    Conference

    IMMS 2024

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 12
      Total Downloads
    • Downloads (Last 12 months)12
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 06 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media