research-article

Recurrent Neural Hidden Markov Model for High-order Transition

Authors:

Tatsuya Hiraoka,

Atsushi Keyaki,

Naoaki OkazakiAuthors Info & Claims

Transactions on Asian and Low-Resource Language Information Processing, Volume 21, Issue 2

Article No.: 36, Pages 1 - 15

https://doi.org/10.1145/3476511

Published: 31 October 2021 Publication History

Abstract

We propose a method to pay attention to high-order relations among latent states to improve the conventional HMMs that focus only on the latest latent state, since they assume Markov property. To address the high-order relations, we apply an RNN to each sequence of latent states, because the RNN can represent the information of an arbitrary-length sequence with their cell: a fixed-size vector. However, the simplest way, which provides all latent sequences explicitly for the RNN, is intractable due to the combinatorial explosion of the search space of latent states.

Thus, we modify the RNN to represent the history of latent states from the beginning of the sequence to the current state with a fixed number of RNN cells whose number is equal to the number of possible states. We conduct experiments on unsupervised POS tagging and synthetic datasets. Experimental results show that the proposed method achieves better performance than previous methods. In addition, the results on the synthetic dataset indicate that the proposed method can capture the high-order relations.

References

[1]

James Allan and Hema Raghavan. 2002. Using part-of-speech patterns to reduce query ambiguity. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 307–314.

Digital Library

[2]

Taylor Berg-Kirkpatrick, Alexandre Bouchard-Côté, John DeNero, and Dan Klein. 2010. Painless unsupervised learning with features. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 582–590.

[3]

Phil Blunsom and Trevor Cohn. 2011. A hierarchical Pitman-Yor process HMM for unsupervised part of speech induction. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 865–874.

Digital Library

[4]

Simon Carter, Marc Dymetman, and Guillaume Bouchard. 2012. Exact sampling and decoding in high-order hidden Markov models. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 1125–1134.

[5]

Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, and Tony Robinson. 2014. One billion word benchmark for measuring progress in statistical language modeling. In Proceedings of the 15th Annual Conference of the International Speech Communication Association.

[6]

Justin Chiu and Alexander M. Rush. 2020. Scaling hidden Markov language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 1341–1349.

[7]

Noam Chomsky. 1957. Syntactic structures. Lightning Source, Inc. (2015).

[8]

Christos Christodoulopoulos, Sharon Goldwater, and Mark Steedman. 2010. Two decades of unsupervised POS induction: How far have we come? In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 575–584.

[9]

Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. 160–167.

Digital Library

[10]

George Oliver Curme et al. 1935. Parts of speech and accidence. D.C. Health and company, Boston. (1935).

[11]

Hanjun Dai, Bo Dai, Yan-Ming Zhang, Shuang Li, and Le Song. 2017. Recurrent hidden semi-Markov model. In Proceedings of the 5th International Conference on Learning Representations.

[12]

Jeffrey L. Elman. 1990. Finding structure in time. Cogn. Sci. 14, 2 (1990), 179–211.

[13]

Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka. 2016. Tree-to-Sequence attentional neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 823–833.

[14]

Yarin Gal and Zoubin Ghahramani. 2016. A theoretically grounded application of dropout in recurrent neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 1019–1027.

[15]

Vikram Gupta, Haoyue Shi, Kevin Gimpel, and Mrinmaya Sachan. 2020. Clustering contextualized representations of text for unsupervised syntax induction. arXiv preprint arXiv:2010.12784 (2020).

[16]

Junxian He, Graham Neubig, and Taylor Berg-Kirkpatrick. 2018. Unsupervised learning of syntactic structure with invertible neural projections. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1292–1302.

[17]

Anh Khoa Ngo Ho and François Yvon. 2020. Neural baselines for word alignment. arXiv preprint arXiv:2009.13116 (2020).

[18]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780.

Digital Library

[19]

Mark Johnson. 2007. Why doesn’t EM find good HMM POS-taggers? In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 296–305.

[20]

Rafal Jozefowicz, Wojciech Zaremba, and Ilya Sutskever. 2015. An empirical exploration of recurrent network architectures. In Proceedings of the International Conference on Machine Learning. 2342–2350.

[21]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[22]

Durk P. Kingma, Tim Salimans, and Max Welling. 2015. Variational dropout and the local reparameterization trick. In Proceedings of the Advances in Neural Information Processing Systems. 2575–2583.

[23]

Eliyahu Kiperwasser and Yoav Goldberg. 2016. Simple and accurate dependency parsing using bidirectional LSTM feature representations. Trans. Assoc. Comput. Ling. 4, 1 (2016), 313–327.

[24]

Chu-Cheng Lin, Waleed Ammar, Chris Dyer, and Lori Levin. 2015. Unsupervised POS induction with word embeddings. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1311–1316.

[25]

Ryan McDonald, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, T. Oscar, et al. 2013. Universal dependency annotation for multilingual parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 92–97.

[26]

Alain Polguère et al. 2009. Dependency in Linguistic Description. Vol. 111. John Benjamins Publishing.

[27]

Andrew Rosenberg and Julia Hirschberg. 2007. V-measure: A conditional entropy-based external cluster evaluation measure. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 410–420.

[28]

Xi Shao, Changsheng Xu, and Mohan S. Kankanhalli. 2004. Unsupervised classification of music genre using hidden Markov model. In Proceedings of the IEEE International Conference on Multimedia and Expo. 2023–2026.

[29]

Matthias Sperber, Graham Neubig, Jan Niehues, and Alex Waibel. 2017. Neural lattice-to-sequence models for uncertain inputs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1380–1389.

[30]

Karl Stratos. 2019. Mutual information maximization for simple and accurate part-of-speech induction. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies). 1095–1104.

[31]

Karl Stratos, Michael Collins, and Daniel Hsu. 2016. Unsupervised part-of-speech tagging with anchor hidden Markov models. Trans. Assoc. Comput. Ling. 4, 1 (2016), 245–257.

[32]

Ke M. Tran, Yonatan Bisk, Ashish Vaswani, Daniel Marcu, and Kevin Knight. 2016. Unsupervised neural hidden Markov models. In Proceedings of the Workshop on Structured Prediction for NLP. 63–71.

[33]

Jurgen Van Gael, Andreas Vlachos, and Zoubin Ghahramani. 2009. The infinite HMM for unsupervised PoS tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 678–687.

[34]

Andrew Viterbi. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theor. 13, 2 (1967), 260–269.

Digital Library

[35]

Songlin Yang, Yong Jiang, Wenjuan Han, and Kewei Tu. 2020. Second-order unsupervised neural dependency parsing. In Proceedings of the 28th International Conference on Computational Linguistics. 3911–3924.

[36]

George Yule and Terrie Mathis. 1992. The role of staging and constructed dialogue in establishing speaker’s topic. Linguistics 30, 1 (1992), 199–216.

[37]

Yue Zhang and Jie Yang. 2018. Chinese NER using lattice LSTM. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 1554–1564.

Cited By

Prasad VKshirsagar AKoert DStock-Homburg RPeters JChalvatzaki G(2024)MoVEInt: Mixture of Variational Experts for Learning Human–Robot Interactions From DemonstrationsIEEE Robotics and Automation Letters10.1109/LRA.2024.33960749:7(6043-6050)Online publication date: Jul-2024
https://doi.org/10.1109/LRA.2024.3396074
Li YWu F(2023)Design and Application Research of Embedded Voice Teaching System Based on Cloud ComputingWireless Communications & Mobile Computing10.1155/2023/78737152023Online publication date: 28-Apr-2023
https://dl.acm.org/doi/10.1155/2023/7873715
Hiraoka T(2022)Chatting and Accidental Meeting Promoted Study of Optimizing Word Segmentation単語分割の最適化に関する研究は雑談と偶然の出会いに育まれたJournal of Natural Language Processing10.5715/jnlp.29.68829:2(688-693)Online publication date: 2022
https://doi.org/10.5715/jnlp.29.688

Index Terms

Recurrent Neural Hidden Markov Model for High-order Transition
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Phonology / morphology

Recommendations

A model of recurrent neural network with high capacity
Hybrid high order neural networks

Neural networks (NNs) represent a familiar artificial intelligence approach widely applied in many fields and to a wide range of issues. The back propagation network (BPN) is one of the most well-known NNs, comprising multilayer perceptrons (MLPs) with ...
High-Order hopfield neural networks
ISNN'05: Proceedings of the Second international conference on Advances in Neural Networks - Volume Part I

In 1984 Hopfield showed that the time evolution of a symmetric Hopfield neural networks are a motion in state space that seeks out minima in the energy function (i.e., equilibrium point set of Hopfield neural networks). Because high-order Hopfield ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing

ACM Transactions on Asian and Low-Resource Language Information Processing Volume 21, Issue 2

March 2022

413 pages

ISSN:2375-4699

EISSN:2375-4702

DOI:10.1145/3494070

Editor:
Imed Zitouni
Google, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 October 2021

Accepted: 01 July 2021

Revised: 01 May 2021

Received: 01 October 2020

Published in TALLIP Volume 21, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
305
Total Downloads

Downloads (Last 12 months)60
Downloads (Last 6 weeks)5

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Prasad VKshirsagar AKoert DStock-Homburg RPeters JChalvatzaki G(2024)MoVEInt: Mixture of Variational Experts for Learning Human–Robot Interactions From DemonstrationsIEEE Robotics and Automation Letters10.1109/LRA.2024.33960749:7(6043-6050)Online publication date: Jul-2024
https://doi.org/10.1109/LRA.2024.3396074
Li YWu F(2023)Design and Application Research of Embedded Voice Teaching System Based on Cloud ComputingWireless Communications & Mobile Computing10.1155/2023/78737152023Online publication date: 28-Apr-2023
https://dl.acm.org/doi/10.1155/2023/7873715
Hiraoka T(2022)Chatting and Accidental Meeting Promoted Study of Optimizing Word Segmentation単語分割の最適化に関する研究は雑談と偶然の出会いに育まれたJournal of Natural Language Processing10.5715/jnlp.29.68829:2(688-693)Online publication date: 2022
https://doi.org/10.5715/jnlp.29.688

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents