research-article

Mulco: Recognizing Chinese Nested Named Entities through Multiple Scopes

Authors:

Yu XuAuthors Info & Claims

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Pages 2980 - 2989

https://doi.org/10.1145/3583780.3615026

Published: 21 October 2023 Publication History

Abstract

Nested Named Entity Recognition (NNER), as a subarea of Named Entity Recognition, has presented longstanding challenges to researchers. In NNER, one entity may be part of a larger entity, which can occur at multiple levels. These nested structures prevent traditional sequence labeling methods from properly recognizing all entities. While recent research has focused on designing better recognition methods for NNER in various languages, Chinese Nested Named Entity Recognition (CNNER) is still underdeveloped, largely due to a lack of freely available CNNER benchmarks. To support CNNER research, in this paper, we introduce ChiNesE, a CNNER dataset comprising 20,000 sentences from online passages in multiple domains and containing 117,284 entities that fall into 10 categories, of which 43.8% are nested named entities. Based on ChiNesE, we propose Mulco, a novel method that can recognize named entities in nested structures through multiple scopes. Each scope uses a scope-based sequence labeling method that predicts an anchor and the length of a named entity to recognize it. Experimental results show that Mulco outperforms state-of-the-art baseline methods with different recognition schemes on ChiNesE and ACE 2005 Chinese corpus.

References

[1]

Beatrice Alex, Barry Haddow, and Claire Grover. 2007. Recognising nested named entities in biomedical text. In Biological, translational, and clinical language processing. 65--72.

[2]

Yanping Chen, Guorong Wang, Qinghua Zheng, Yongbin Qin, Ruizhang Huang, and Ping Chen. 2019a. A set space model to capture structural information of a sentence. IEEE Access, Vol. 7 (2019), 142515--142530.

[3]

Yanping Chen, Yuefei Wu, Yongbin Qin, Ying Hu, Zeyu Wang, Ruizhang Huang, Xinyu Cheng, and Ping Chen. 2019b. Recognizing nested named entity based on the neural network boundary assembling model. IEEE Intelligent Systems, Vol. 35, 1 (2019), 74--81.

Digital Library

[4]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186. https://doi.org/10.18653/v1/N19--1423

[5]

George Doddington, Alexis Mitchell, Mark Przybocki, Lance Ramshaw, Stephanie Strassel, and Ralph Weischedel. 2004. The Automatic Content Extraction (ACE) Program -- Tasks, Data, and Evaluation. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC'04). European Language Resources Association (ELRA), Lisbon, Portugal. http://www.lrec-conf.org/proceedings/lrec2004/pdf/5.pdf

[6]

Chunyuan Fu and Guohong Fu. 2012. A dual-layer CRFs based method for Chinese nested named entity recognition. In 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery. 2546--2550. https://doi.org/10.1109/FSKD.2012.6234172

[7]

Zhifeng Hao, Hongfei Wang, Ruichu Cai, and Wen Wen. 2013. Product named entity recognition for Chinese query questions based on a skip-chain CRF model. Neural Computing and Applications, Vol. 23, 2 (2013), 371--379.

[8]

Wang Houfeng and Shi Wuguang. 2005. A simple rule-based approach to organization name recognition in chinese text. In International Conference on Intelligent Text Processing and Computational Linguistics. Springer, 769--772.

Digital Library

[9]

Peiyuan Jiang, Daji Ergu, Fangyao Liu, Ying Cai, and Bo Ma. 2022. A Review of Yolo algorithm developments. Procedia Computer Science, Vol. 199 (2022), 1066--1073.

[10]

Meizhi Ju, Makoto Miwa, and Sophia Ananiadou. 2018. A neural layered model for nested named entity recognition. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 1446--1459.

[11]

Arzoo Katiyar and Claire Cardie. 2018. Nested Named Entity Recognition Revisited. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 861--871. https://doi.org/10.18653/v1/N18--1079

[12]

Cvetana Krstev, Ivan Obradovi?, Milo? Utvi?, and Du?ko Vitas. 2014. A system for named entity recognition based on local grammars. Journal of Logic and Computation, Vol. 24, 2 (2014), 473--489. https://doi.org/10.1093/logcom/exs079

[13]

Jingye Li, Hao Fei, Jiang Liu, Shengqiong Wu, Meishan Zhang, Chong Teng, Donghong Ji, and Fei Li. 2022a. Unified named entity recognition as word-word relation classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 10965--10973.

[14]

Jing Li, Aixin Sun, Jianglei Han, and Chenliang Li. 2022b. A Survey on Deep Learning for Named Entity Recognition. IEEE Trans. on Knowl. and Data Eng., Vol. 34, 1 (jan 2022), 50--70. https://doi.org/10.1109/TKDE.2020.2981314

Digital Library

[15]

Ren Li, Tianjin Mo, Jianxi Yang, Dong Li, Shixin Jiang, and Di Wang. 2021. Bridge inspection named entity recognition via BERT and lexicon augmented machine reading comprehension neural model. Advanced Engineering Informatics, Vol. 50 (2021), 101416.

Digital Library

[16]

Xiaoya Li, Jingrong Feng, Yuxian Meng, Qinghong Han, Fei Wu, and Jiwei Li. 2020. A Unified MRC Framework for Named Entity Recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5849--5859. https://doi.org/10.18653/v1/2020.acl-main.519

[17]

Hongyu Lin, Yaojie Lu, Xianpei Han, and Le Sun. 2019. Sequence-to-Nuggets: Nested Entity Mention Detection via Anchor-Region Networks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 5182--5192. https://doi.org/10.18653/v1/P19--1511

[18]

Pan Liu, Yanming Guo, Fenglei Wang, and Guohui Li. 2022. Chinese named entity recognition: The state of the art. Neurocomputing, Vol. 473 (2022), 37--53.

Digital Library

[19]

Xinwei Long, Shuzi Niu, and Yucheng Li. 2020. Hierarchical Region Learning for Nested Named Entity Recognition. In Findings of the Association for Computational Linguistics: EMNLP 2020. Association for Computational Linguistics, Online, 4788--4793. https://doi.org/10.18653/v1/2020.findings-emnlp.430

[20]

Ilya Loshchilov and Frank Hutter. 2018. Decoupled Weight Decay Regularization. In International Conference on Learning Representations.

[21]

Wei Lu and Dan Roth. 2015. Joint mention extraction and classification with mention hypergraphs. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 857--867.

[22]

Zita Marinho, Alfonso Mendes, Sebastiao Miranda, and David Nogueira. 2019. Hierarchical nested named entity recognition. In Proceedings of the 2nd Clinical Natural Language Processing Workshop. 28--34.

[23]

Andrei Mikheev, Marc Moens, and Claire Grover. 1999. Named entity recognition without gazetteers. In Ninth Conference of the European Chapter of the Association for Computational Linguistics. 1--8.

Digital Library

[24]

Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532--1543.

[25]

Georgios Petasis, Frantz Vichot, Francis Wolinski, Georgios Paliouras, Vangelis Karkaletsis, and Constantine D Spyropoulos. 2001. Using machine learning to maintain rule-based named-entity recognition and classification systems. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics. 426--433.

Digital Library

[26]

Lisa F Rau. 1991. Extracting company names from text. In Proceedings the Seventh IEEE Conference on Artificial Intelligence Application. IEEE Computer Society, 29--30.

[27]

Nicky Ringland, Xiang Dai, Ben Hachey, Sarvnaz Karimi, Cecile Paris, and James R Curran. 2019. NNE: A Dataset for Nested Named Entity Recognition in English Newswire. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 5176--5181.

[28]

Yongliang Shen, Xinyin Ma, Zeqi Tan, Shuai Zhang, Wen Wang, and Weiming Lu. 2021. Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2782--2794.

[29]

Takashi Shibuya and Eduard Hovy. 2020. Nested named entity recognition via second-best sequence learning and decoding. Transactions of the Association for Computational Linguistics, Vol. 8 (2020), 605--620.

[30]

Chuanqi Tan, Wei Qiu, Mosha Chen, Rui Wang, and Fei Huang. 2020. Boundary enhanced neural span classification for nested named entity recognition. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 9016--9023.

[31]

Bailin Wang and Wei Lu. 2018. Neural Segmental Hypergraphs for Overlapping Mention Recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 204--214.

[32]

Bailin Wang, Wei Lu, Yu Wang, and Hongxia Jin. 2018. A Neural Transition-based Model for Nested Mention Recognition. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Brussels, Belgium, 1011--1017. https://doi.org/10.18653/v1/D18--1124

[33]

Jue Wang, Lidan Shou, Ke Chen, and Gang Chen. 2020. Pyramid: A Layered Model for Nested Named Entity Recognition. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 5918--5928. https://doi.org/10.18653/v1/2020.acl-main.525

[34]

Yu Wang, Hanghang Tong, Ziye Zhu, and Yun Li. 2022. Nested Named Entity Recognition: A Survey. ACM Transactions on Knowledge Discovery from Data (TKDD) (2022).

[35]

Casey Whitelaw and Jon Patrick. 2003. Evaluating corpora for named entity recognition using character-level features. In Australasian Joint Conference on Artificial Intelligence. Springer, 910--921.

[36]

Congying Xia, Chenwei Zhang, Tao Yang, Yaliang Li, Nan Du, Xian Wu, Wei Fan, Fenglong Ma, and Philip Yu. 2019. Multi-grained Named Entity Recognition. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 1430--1440. https://doi.org/10.18653/v1/P19--1138

[37]

Nianwen Xue, Fu-Dong Chiou, and Martha Palmer. 2002. Building a large-scale annotated chinese corpus. In COLING 2002: The 19th International Conference on Computational Linguistics.

Digital Library

[38]

Juntao Yu, Bernd Bohnet, and Massimo Poesio. 2020. Named Entity Recognition as Dependency Parsing. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 6470--6476. https://doi.org/10.18653/v1/2020.acl-main.577

[39]

S Yu, H Duan, and Y Wu. 2018. Corpus of multi-level processing for modern Chinese. Available at: opendata. pku. edu. cn/dataset. xhtml (2018).

[40]

Xiantao Zhang, Dongchen Li, and Xihong Wu. 2014. Parsing named entity as syntactic structure. In Fifteenth Annual Conference of the International Speech Communication Association.

[41]

Yuejie Zhang, Zhiting Xu, and Tao Zhang. 2008. Fusion of multiple features for Chinese named entity recognition based on CRF model. In Asia Information Retrieval Symposium. Springer, 95--106.

Index Terms

Mulco: Recognizing Chinese Nested Named Entities through Multiple Scopes
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction
      2. Language resources

Recommendations

Nested Named Entity Recognition: A Survey
With the rapid development of text mining, many studies observe that text generally contains a variety of implicit information, and it is important to develop techniques for extracting such information. Named Entity Recognition (NER), the first step of ...
Recognizing nested named entities in GENIA corpus
BioNLP '06: Proceedings of the Workshop on Linking Natural Language Processing and Biology: Towards Deeper Biological Literature Analysis

Nested Named Entities (nested NEs), one containing another, are commonly seen in biomedical text, e.g., accounting for 16.7% of all named entities in GENIA corpus. While many works have been done in recognizing non-nested NEs, nested NEs have been ...
Recognizing nested named entities in GENIA corpus
LNLBioNLP '06: Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology

Nested Named Entities (nested NEs), one containing another, are commonly seen in biomedical text, e.g., accounting for 16.7% of all named entities in GENIA corpus. While many works have been done in recognizing non-nested NEs, nested NEs have been ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

October 2023

5508 pages

ISBN:9798400701245

DOI:10.1145/3583780

General Chairs:
Ingo Frommholz
University of Wolverhampton, UK
,
Frank Hopfgartner
University of Koblenz, Germany
,
Mark Lee
University of Birmingham, UK
,
Michael Oakes
University of Birmingham, UK
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Min Zhang
Tsinghua University, China
,
Rodrygo Santos
Federal University of Minas Gerais, Brazil

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM '23

Sponsor:

CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2023

Birmingham, United Kingdom

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
59
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents