poster

Label correspondence learning for part-of-speech annotation transformation

Authors:

Jingbo ZhuAuthors Info & Claims

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Pages 1461 - 1464

https://doi.org/10.1145/1645953.1646145

Published: 02 November 2009 Publication History

Abstract

The performance of machine learning methods heavily depends on the volume of used training data. For the purpose of dataset enlargement, it is of interest to study the problem of unifying multiple labeled datasets with different annotation standards. In this paper, we focus on the case of unifying datasets for sequence labeling problems with natural language part-of-speech (POS) tagging as an examplar application. To this end, we propose a probabilistic approach to transforming the annotations of one dataset to the standard specified by another dataset. The key component of the approach, named as label correspondence learning, serves as a bridge of annotations from the datasets. Two methods designed from distinct perspectives are proposed to attack this sub-problem. Experiments on two large-scale part-of-speech datasets demonstrate the efficacy of the transformation and label correspondence learning methods.

References

[1]

D. Jurafsky and J. H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition (Ed.2). Prentice Hall Science in Artificial Intelligence, 2009.

Digital Library

[2]

M. Banko and E. Brill. Scaling to very very large corpora for natural language. In Proceeding of ACL, pages 26--33, 2001.

Digital Library

[3]

J. K. Low, H. T. Ng, and W. Guo. A maximum entropy approach to chinese word segmentation. In Proceedings of fifth SIGHAN workshop, pages 161--164, 2005.

[4]

A. Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In Proceeding of Association of Computational Linguistics, pages 133--132, 1996.

[5]

M. Collins. Head-driven statistical models for natural language parsing. Ph.D. Thesis. Penn University, 1999.

Digital Library

[6]

S. M. Thede and M. P. Harper. A second-order hidden markov models for part-of-speech. In Proceedings of ACL., pages 175--182, 1999.

Digital Library

[7]

N. Xue, F. dong Chiou, and M. Palmer. Building a large-scale annotated chinese corpus. In Proceeding of COLING., pages 1--8, 2002.

Digital Library

[8]

Z. qiang Huang. M. P. Harper, and W. Wang. Mandarin part-of-speech tagging and discriminative. In Proceeding of EMNLP-CoNLL., pages 1093--1102, 2007.

[9]

Q. Zhou.Phrase bracketing and annotating on chinese language corpus. (in chinese). Ph.D. Thesis, Beijing University., 1996.

[10]

J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: probabilistic models for segmenting and labeling sequence. In Proceedings of ICML., pages 282--289, 2001.

Digital Library

[11]

J. Nivre. Inductive dependency parsing. In Springer., 34.

[12]

R .Johansson and P. Nugues. Extended constituent-to-dependency conversion for english. In Proceeding of EMNLP-CoNLL., pages 105--112, 2007.

[13]

S. Ekeklint and J. Nivre.A dependency-based conversion of propbank. In Proceeding of FRAME., pages 19--25, 2007.

[14]

P. Kingsbury, M. Palmer, and M. Marcus. Adding semantic annotation to the penn treebank. In Proceeding of HLT., 2002.

[15]

M. Johnson. PCFG models of linguistic tree representations. Computational Linguistics., 24.

Digital Library

[16]

W. Jiang, L. Huang, and Q. Liu. Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging - A Case Study. In Proceedings of ACL., pages 522--530, 2009.

Digital Library

Cited By

Zhu MZhu JXiao T(2011)Automatic Treebank Conversion via Informed Decoding - A Case Study on Chinese TreebanksACM Transactions on Asian Language Information Processing10.1145/2002980.200298210:3(1-24)Online publication date: 1-Sep-2011
https://dl.acm.org/doi/10.1145/2002980.2002982
Zhu MZhu JJoshi AHuang CJurafsky D(2010)Automatic treebank conversion via informed decodingProceedings of the 23rd International Conference on Computational Linguistics: Posters10.5555/1944566.1944742(1541-1549)Online publication date: 23-Aug-2010
https://dl.acm.org/doi/10.5555/1944566.1944742
Zhu MZhu JXiao TJoshi AHuang CJurafsky D(2010)Heterogeneous parsing via collaborative decodingProceedings of the 23rd International Conference on Computational Linguistics10.5555/1873781.1873932(1344-1352)Online publication date: 23-Aug-2010
https://dl.acm.org/doi/10.5555/1873781.1873932

Index Terms

Label correspondence learning for part-of-speech annotation transformation
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

A robust transformation-based learning approach using ripple down rules for part-of-speech tagging

In this paper, we propose a new approach to construct a system of transformation rules for the Part-of-Speech (POS) tagging task. Our approach is based on an incremental knowledge acquisition method where rules are stored in an exception structure and new ...
Lingual-Agnostic Meta-Learning for Low-Resource Part-of-Speech Tagging
ICIT '20: Proceedings of the 2020 8th International Conference on Information Technology: IoT and Smart City

Current deep learning based cross-lingual Part-of-Speech (POS) tagging methods are limited by their ability to achieve fast learning and generalization when the data in the target language is scarce. In this paper, we integrate a meta-learning procedure ...
Korean Part-of-speech Tagging Based on Morpheme Generation

Two major problems of Korean part-of-speech (POS) tagging are that the word-spacing unit is not mapped one-to-one to a POS tag and that morphemes should be recovered during POS tagging. Therefore, this article proposes a novel two-step Korean POS tagger ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

November 2009

2162 pages

ISBN:9781605585123

DOI:10.1145/1645953

General Chairs:
David Cheung
University of Hong Kong, Hong Kong
,
Il-Yeol Song
Drexel University, USA
,
Program Chairs:
Wesley Chu
UCLA, USA
,
Xiaohua Hu
Drexel University, USA
,
Jimmy Lin
University of Maryland, USA

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

CIKM '09

Sponsor:

CIKM '09: Conference on Information and Knowledge Management

November 2 - 6, 2009

Hong Kong, China

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
170
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhu MZhu JXiao T(2011)Automatic Treebank Conversion via Informed Decoding - A Case Study on Chinese TreebanksACM Transactions on Asian Language Information Processing10.1145/2002980.200298210:3(1-24)Online publication date: 1-Sep-2011
https://dl.acm.org/doi/10.1145/2002980.2002982
Zhu MZhu JJoshi AHuang CJurafsky D(2010)Automatic treebank conversion via informed decodingProceedings of the 23rd International Conference on Computational Linguistics: Posters10.5555/1944566.1944742(1541-1549)Online publication date: 23-Aug-2010
https://dl.acm.org/doi/10.5555/1944566.1944742
Zhu MZhu JXiao TJoshi AHuang CJurafsky D(2010)Heterogeneous parsing via collaborative decodingProceedings of the 23rd International Conference on Computational Linguistics10.5555/1873781.1873932(1344-1352)Online publication date: 23-Aug-2010
https://dl.acm.org/doi/10.5555/1873781.1873932

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents