Article

Free access

Is it harder to parse Chinese, or the Chinese Treebank?

Authors:

Roger Levy,

Christopher ManningAuthors Info & Claims

ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1

Pages 439 - 446

https://doi.org/10.3115/1075096.1075152

Published: 07 July 2003 Publication History

PDF eReader

Abstract

We present a detailed investigation of the challenges posed when applying parsing models developed against English corpora to Chinese. We develop a factored-model statistical parser for the Penn Chinese Treebank, showing the implications of gross statistical differences between WSJ and Chinese Tree-banks for the most general methods of parser adaptation. We then provide a detailed analysis of the major sources of statistical parse errors for this corpus, showing their causes and relative frequencies, and show that while some types of errors are due to difficult ambiguities inherent in Chinese grammar, others arise due to treebank annotation practices. We show how each type of error can be addressed with simple, targeted changes to the independence assumptions of the maximum likelihood-estimated PCFG factor of the parsing model, which raises our F1 from 80.7% to 82.6% on our development set, and achieves parse accuracy close to the best published figures for Chinese parsing.

References

[1]

Emily Bender. 2001. The syntax of Mandarin ba: Reconsidering the verbal analysis. Journal of East Asian Linguistics, 9(2):105--145.

Crossref

Google Scholar

[2]

Daniel Bikel and David Chiang. 2000. Two statistical parsing models applied to the Chinese treebank. In Proceedings of the Second Chinese Language Processing Workshop, pages 1--6.

Digital Library

Google Scholar

[3]

Eugene Charniak. 2000. A maximum-entropy-inspired parser. In Proceedings of NAACL.

Digital Library

Google Scholar

[4]

David Chiang and Daniel Bikel. 2002. Recovering latent information in treebanks. In Proceedings of COLING-2002, pages 183--189.

Digital Library

Google Scholar

[5]

Ken Church and Ramish Patil. 1982. Coping with syntactic ambiguity or how to put the block in the box on the table. American Journal of Computational Linguistics, 8.

Digital Library

Google Scholar

[6]

Michael Collins. 1999. Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, U. Penn.

Digital Library

Google Scholar

[7]

Michael Collins. 2000. Discriminative reranking for natural language parsing. In Proceedings of ICML, pages 175--182. Morgan Kaufmann, San Francisco, CA.

Digital Library

Google Scholar

[8]

John C. Henderson and Eric Brill. 1999. Exploiting diversity in natural language processing: Combining parsers. In Proceedings of EM-NLP.

Digital Library

Google Scholar

[9]

Mark Johnson. 1998. PCFG models of linguistic tree representations. Computational Linguistics, 24(4):613--632.

Digital Library

Google Scholar

[10]

Dan Klein and Christopher D. Manning. 2002. Fast exact inference with a factored model for natural language parsing. In Proceedings of NIPS.

Google Scholar

[11]

Alexander Krotov, Mark Hepple, Robert Gaizauskas, and Yorick Wilks. 1998. Compacting the Penn Treebank grammar. In Proceedings of ACL-COLING, pages 699--703.

Digital Library

Google Scholar

[12]

Ivan A. Sag and Thomas Wasow. 1999. Syntactic Theory: A Formal Introduction. CUP.

Google Scholar

[13]

Nianwen Xue, Fu-Dong Chiou, and Martha Palmer. 2002. Building a large-scale annotated Chinese corpus. In Proceedings of COLING.

Digital Library

Google Scholar

Cited By

View all

Zhang YMa DTiwari PZhang CMasud MShorfuzzaman MSong D(2023)Stance-level Sarcasm Detection with BERT and Stance-centered Graph Attention NetworksACM Transactions on Internet Technology10.1145/353343023:2(1-21)Online publication date: 18-May-2023
https://dl.acm.org/doi/10.1145/3533430
Su CHuang HShi SJian P(2022)Improving Neural Machine Translation by Transferring Knowledge from Syntactic Constituent Alignment LearningACM Transactions on Asian and Low-Resource Language Information Processing10.1145/351058021:5(1-15)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3510580
Eriguchi AHashimoto KTsuruoka Y(2019)Incorporating source-side phrase structures into neural machine translationComputational Linguistics10.1162/coli_a_0034845:2(267-292)Online publication date: 1-Jun-2019
https://dl.acm.org/doi/10.1162/coli_a_00348
Show More Cited By

Recommendations

Semi-automatically developing Chinese HPSG grammar from the Penn Chinese Treebank for deep parsing
COLING '10: Proceedings of the 23rd International Conference on Computational Linguistics: Posters

In this paper, we introduce our recent work on Chinese HPSG grammar development through treebank conversion. By manually defining grammatical constraints and annotation rules, we convert the bracketing trees in the Penn Chinese Treebank (CTB) to be an ...
Automatic Treebank Conversion via Informed Decoding - A Case Study on Chinese Treebanks

Treebanks are valuable resources for syntactic parsing. For some languages such as Chinese, we can obtain multiple constituency treebanks which are developed by different organizations. However, due to discrepancies of underlying annotation standards, ...
Dependency-Based Chinese-English Statistical Machine Translation
CICLing '07: Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing

We present a Chinese-English Statistical Machine Translation (SMT) system based on dependency tree mappings. We use a state-of-the-art dependency parser to parse the English translation of the Penn Chinese Treebank to make it bilingual and then learn a ...

Comments

Information & Contributors

Information

Published In

ACL '03: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1

July 2003

571 pages

Program Chairs:
Erhard W. Hinrichs,
Dan Roth

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 07 July 2003

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

54
Total Citations
View Citations
982
Total Downloads

Downloads (Last 12 months)47
Downloads (Last 6 weeks)4

Reflects downloads up to 26 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Zhang YMa DTiwari PZhang CMasud MShorfuzzaman MSong D(2023)Stance-level Sarcasm Detection with BERT and Stance-centered Graph Attention NetworksACM Transactions on Internet Technology10.1145/353343023:2(1-21)Online publication date: 18-May-2023
https://dl.acm.org/doi/10.1145/3533430
Su CHuang HShi SJian P(2022)Improving Neural Machine Translation by Transferring Knowledge from Syntactic Constituent Alignment LearningACM Transactions on Asian and Low-Resource Language Information Processing10.1145/351058021:5(1-15)Online publication date: 29-Apr-2022
https://dl.acm.org/doi/10.1145/3510580
Eriguchi AHashimoto KTsuruoka Y(2019)Incorporating source-side phrase structures into neural machine translationComputational Linguistics10.1162/coli_a_0034845:2(267-292)Online publication date: 1-Jun-2019
https://dl.acm.org/doi/10.1162/coli_a_00348
Yang JYang RLu HWang CXie J(2019)Multi-Entity Aspect-Based Sentiment Analysis with Context, Entity, Aspect Memory and Dependency InformationACM Transactions on Asian and Low-Resource Language Information Processing10.1145/332112518:4(1-22)Online publication date: 7-May-2019
https://dl.acm.org/doi/10.1145/3321125
Liu CHe SLiu KZhao J(2018)Curriculum learning for natural answer generationProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304222.3304357(4223-4229)Online publication date: 13-Jul-2018
https://dl.acm.org/doi/10.5555/3304222.3304357
Wu CSu MLiang W(2017)Miscommunication handling in spoken dialog systems based on error-aware dialog state detectionEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-017-0107-32017:1(1-17)Online publication date: 1-Dec-2017
https://dl.acm.org/doi/10.1186/s13636-017-0107-3
Kang TZhang SXu NWen DZhang XLei J(2017)Detecting negation and scope in Chinese clinical notes using character and word embeddingComputer Methods and Programs in Biomedicine10.1016/j.cmpb.2016.11.009140:C(53-59)Online publication date: 1-Mar-2017
https://dl.acm.org/doi/10.1016/j.cmpb.2016.11.009
Bellegarda JMonz C(2016)State of the art in statistical methods for language and speech processingComputer Speech and Language10.1016/j.csl.2015.07.00135:C(163-184)Online publication date: 1-Jan-2016
https://dl.acm.org/doi/10.1016/j.csl.2015.07.001
Huang JZhang XTan LWang PLiang BJalote PBriand LHoek A(2014)AsDroid: detecting stealthy behaviors in Android applications by user interface and program behavior contradictionProceedings of the 36th International Conference on Software Engineering10.1145/2568225.2568301(1036-1046)Online publication date: 31-May-2014
https://dl.acm.org/doi/10.1145/2568225.2568301
Zhang MZhang YChe WLiu T(2014)A Semantics Oriented Grammar for Chinese TreebankingProceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing - Volume 840310.1007/978-3-642-54906-9_30(366-378)Online publication date: 6-Apr-2014
https://dl.acm.org/doi/10.1007/978-3-642-54906-9_30
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Recommendations

Semi-automatically developing Chinese HPSG grammar from the Penn Chinese Treebank for deep parsing

Automatic Treebank Conversion via Informed Decoding - A Case Study on Chinese Treebanks

Dependency-Based Chinese-English Statistical Machine Translation

Comments

Information

Published In

Publisher

Publication History

Qualifiers

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations