note

Boosting Neural POS Tagger for Farsi Using Morphological Information

Authors:

Peyman Passban,

Andy WayAuthors Info & Claims

ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), Volume 16, Issue 1

Article No.: 4, Pages 1 - 15

https://doi.org/10.1145/2934676

Published: 22 July 2016 Publication History

Abstract

Farsi (Persian) is a low-resource language that suffers from the data sparsity problem and a lack of efficient processing tools. Due to their broad application in natural language processing tasks, part-of-speech (POS) taggers are one of those important tools that should be considered in this respect. Despite recent work on Farsi tagging, there is still room for improvement. The best reported accuracy so far is 96%, which in special cases can rise to 96.9%. The main problem with existing taggers is their inefficiency in coping with out-of-vocabulary (OOV) words. Addressing both problems of accuracy and OOV words, we developed a neural network-based POS tagger (NPT) that performs efficiently on Farsi. Despite using less data, NPT provides better results in comparison to state-of-the-art systems. Our proposed tagger performs with an accuracy of 97.4%, with performance highly influenced by morphological features. We carry out a shallow morphological analysis and show considerable improvement over the baseline configuration.

References

[1]

James Bergstra, Frédéric Bastien, Olivier Breuleux, Pascal Lamblin, Razvan Pascanu, Olivier Delalleau, Guillaume Desjardins, et al. 2011. Theano: Deep learning on GPUs with Python. In Proceedings of Advances in Neural Information Processing Systems 24 (NIPS’11).

[2]

Mahmood Bijankhan, Javad Sheykhzadegan, Mohammad Bahrani, and Masood Ghayoomi. 2011. Lessons from building a Persian written corpus: Peykare. Language Resources and Evaluation 45, 2, 143--164.

Digital Library

[3]

Thorsten Brants. 2000. TnT: A statistical part-of-speech tagger. In Proceedings of the 6th Conference on Applied Natural Language Processing. 224--231.

Digital Library

[4]

Ronan Collobert, Koray Kavukcuoglu, and Clément Farabet. 2012. Implementing neural networks efficiently. In Neural Networks: Tricks of the Trade. Springer, 537--557.

[5]

Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, 2493--2537.

Digital Library

[6]

Erick R. Fonseca, João Luís G. Rosa, and Sandra Maria Aluísio. 2015. Evaluating word embeddings and a revised corpus for part-of-speech tagging in Portuguese. Journal of the Brazilian Computer Society 21, 1, 1--14.

[7]

Eugenie Giesbrecht and Stefan Evert. 2009. Is part-of-speech tagging a solved task? An evaluation of POS taggers for the German Web as corpus. In Proceedings of the 5th Web as Corpus Workshop. 27--35.

[8]

Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics. 249--256.

[9]

Péter Halácsy, András Kornai, and Csaba Oravecz. 2007. HunPos: An open source trigram tagger. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. 209--212.

Digital Library

[10]

Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7, 1527--1554.

Digital Library

[11]

Kurt Hornik, Maxwell Stinchcombe, and Halbert White. 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2, 5, 359--366.

Digital Library

[12]

M. Jagadeesh, M. Anand Kumar, and K. P. Soman. 2016. Deep belief network based part-of-speech tagger for Telugu language. In Proceedings of the 2nd International Conference on Computer and Communication Technologies. 75--84.

[13]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia (MM’14). ACM, New York, NY, 675--678.

Digital Library

[14]

Ji Ma, Yue Zhang, and Jingbo Zhu. 2014. Tagging the Web: Building a robust Web tagger with neural network. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Vol. 1. 144--154.

[15]

Christopher D. Manning. 2011. Part-of-speech tagging from 97&percnt; to 100&percnt;: Is it time for some linguistics? In Proceedings of the 12th International Conference on Computational Linguistics and Intelligent Text Processing, Part I (CICLing’11). 171--189.

Digital Library

[16]

William J. Masek and Michael S. Paterson. 1980. A faster algorithm computing string edit distances. Journal of Computer and System Sciences 20, 1, 18--31.

[17]

Karine Megerdoomian. 2004. Developing a Persian part of speech tagger. In Proceedings of the 1st Workshop on Persian Language and Computer. 99--105.

[18]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781.

[19]

Mahdi Mohseni and Behrouz Minaei-Bidgoli. 2010. A Persian part-of-speech tagger based on morphological analysis. In Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC’10). 1253--1257.

[20]

Farhad Oroumchian, Samira Tasharofi, Hadi Amiri, Hossein Hojjat, and Fahime Raja. 2006. Creating a Feasible Corpus for Persian POS Tagging. Technical Report No. TR3/06. University of Wollongong, New South Wales, Australia.

[21]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532--1543. <url>http://www.aclweb.org/anthology/D14-1162</url>.

[22]

John R. Perry and Alan S. Kaye. 2007. Persian morphology. Morphologies of Asia and Africa 2, 975--1019.

[23]

Juan Antonio Prezortiz and Mikel L. Forcada. 2001. Part-of-speech tagging with recurrent neural networks. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’01).

[24]

Fahimeh Raja, Hadi Amiri, Samira Tasharofi, Mehdi Sarmadi, Hossein Hojjat, and Farhad Oroumchian. 2007. Evaluation of part of speech tagging on Persian text. In Proceedings of the 2nd Workshop on Computational Approaches to Arabic Script-Based Languages.

[25]

Cicero D. Santos and Bianca Zadrozny. 2014. Learning character-level representations for part-of-speech tagging. In Proceedings of the 31st International Conference on Machine Learning (ICML’14). 1818--1826.

[26]

Helmut Schmid. 1994. Part-of-speech tagging with neural networks. In Proceedings of the 15th Conference on Computational Linguistics, Volume 1 (COLING’94). 172--176.

Digital Library

[27]

Mojgan Seraji. 2011. A statistical part-of-speech tagger for Persian. In Proceedings of the 18th Nordic Conference of Computational Linguistics (NODALIDA’11). 340--343.

[28]

Mojgan Seraji, Beáta Megyesi, and Joakim Nivre. 2012. A basic language resource kit for Persian. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC’12). 2245--2252.

[29]

Mehrnoush Shamsfard, Soheila Kiani, and Yaseer Shahedi. 2009. STeP-1: Standard text preparation for Persian language. In Proceedings of the 3rd Workshop on Computational Approaches to Arabic Script-Based Languages.

[30]

Huihsin Tseng, Daniel Jurafsky, and Christopher Manning. 2005. Morphological features help POS tagging of unknown words across language varieties. In Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing. 32--39.

[31]

Peilu Wang, Yao Qian, Frank K. Soong, Lei He, and Hai Zhao. 2015. Part-of-speech tagging with bidirectional long short-term memory recurrent neural network. arXiv:1510.06168.

[32]

Othman Zennaki, Nasredine Semmar, and Laurent Besacier. 2015. Unsupervised and Lightly Supervised Part-of-Speech Tagging Using Recurrent Neural Networks. Retrieved June 30, 2016, from https://aclweb.org/anthology/Y/Y15/Y15-1016.pdf.

[33]

Xiaoqing Zheng, Hanyang Chen, and Tianyu Xu. 2013. Deep learning for Chinese word segmentation and POS tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’13). 647--657.

Cited By

Rajani Shree MShambhavi B(2022)POS Tagger Model for South Indian Language Using a Deep Learning ApproachICCCE 202110.1007/978-981-16-7985-8_16(155-167)Online publication date: 16-May-2022
https://doi.org/10.1007/978-981-16-7985-8_16
Ajees AAbrar KSumam MSreenathan M(2020)A Deep Level Tagger for Malayalam, a Morphologically Rich LanguageJournal of Intelligent Systems10.1515/jisys-2019-007030:1(115-129)Online publication date: 5-Jul-2020
https://doi.org/10.1515/jisys-2019-0070
koochari AAlavi Gharahbagh AHajihashemi V(2020)A Persian part of speech tagging system using the long short-term memory neural network2020 6th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS)10.1109/ICSPIS51611.2020.9349556(1-6)Online publication date: 23-Dec-2020
https://doi.org/10.1109/ICSPIS51611.2020.9349556
Show More Cited By

Index Terms

Boosting Neural POS Tagger for Farsi Using Morphological Information
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Toward an Effective Igbo Part-of-Speech Tagger

Part-of-speech (POS) tagging is a well-established technology for most Western European languages and a few other world languages, but it has not been evaluated on Igbo, an agglutinative African language. This article presents POS tagging experiments ...
POS tagger for Urdu using Stochastic approaches
ICTCS '16: Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies

Part-of-Speech tagging is a problem of Natural language processing. It is a process of labeling an accurate part of speech for each word of a given corpus sentence. There are various approaches like rule based, stochastic and hybrid that are mainly used ...
A Comparative Study on the Efficiency of POS Tagging Techniques on Amazigh Corpus
NISS '19: Proceedings of the 2nd International Conference on Networking, Information Systems & Security

Part-of-speech (POS) tagging is a fundamental task of Natural Language Processing (NLP). It provides useful information for many other NLP tasks, including word sense disambiguation, text chunking, named entity recognition, syntactic parsing, semantic ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 16, Issue 1

TALLIP Notes and Regular Papers

March 2017

133 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/2961867

Editor:
Nianwen Xue
Brandeis University, Waltham, USA

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 July 2016

Accepted: 01 April 2016

Revised: 01 March 2016

Received: 01 January 2016

Published in TALLIP Volume 16, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Note
Research
Refereed

Funding Sources

CNGL Programme
Science Foundation Ireland
ADAPT Centre at Dublin City University

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
259
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Rajani Shree MShambhavi B(2022)POS Tagger Model for South Indian Language Using a Deep Learning ApproachICCCE 202110.1007/978-981-16-7985-8_16(155-167)Online publication date: 16-May-2022
https://doi.org/10.1007/978-981-16-7985-8_16
Ajees AAbrar KSumam MSreenathan M(2020)A Deep Level Tagger for Malayalam, a Morphologically Rich LanguageJournal of Intelligent Systems10.1515/jisys-2019-007030:1(115-129)Online publication date: 5-Jul-2020
https://doi.org/10.1515/jisys-2019-0070
koochari AAlavi Gharahbagh AHajihashemi V(2020)A Persian part of speech tagging system using the long short-term memory neural network2020 6th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS)10.1109/ICSPIS51611.2020.9349556(1-6)Online publication date: 23-Dec-2020
https://doi.org/10.1109/ICSPIS51611.2020.9349556
Yambao ACheng C(2020)Feedforward Approach to Sequential Morphological Analysis in the Tagalog Language2020 International Conference on Asian Language Processing (IALP)10.1109/IALP51396.2020.9310516(81-85)Online publication date: 4-Dec-2020
https://doi.org/10.1109/IALP51396.2020.9310516
Hadifar AMomtazi S(2018)The impact of corpus domain on word representationLanguage Resources and Evaluation10.1007/s10579-018-9419-x52:4(997-1019)Online publication date: 15-Dec-2018
https://dl.acm.org/doi/10.1007/s10579-018-9419-x

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents