skip to main content
research-article

Recurrent Neural Hidden Markov Model for High-order Transition

Published: 31 October 2021 Publication History

Abstract

We propose a method to pay attention to high-order relations among latent states to improve the conventional HMMs that focus only on the latest latent state, since they assume Markov property. To address the high-order relations, we apply an RNN to each sequence of latent states, because the RNN can represent the information of an arbitrary-length sequence with their cell: a fixed-size vector. However, the simplest way, which provides all latent sequences explicitly for the RNN, is intractable due to the combinatorial explosion of the search space of latent states.
Thus, we modify the RNN to represent the history of latent states from the beginning of the sequence to the current state with a fixed number of RNN cells whose number is equal to the number of possible states. We conduct experiments on unsupervised POS tagging and synthetic datasets. Experimental results show that the proposed method achieves better performance than previous methods. In addition, the results on the synthetic dataset indicate that the proposed method can capture the high-order relations.

References

[1]
James Allan and Hema Raghavan. 2002. Using part-of-speech patterns to reduce query ambiguity. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 307–314.
[2]
Taylor Berg-Kirkpatrick, Alexandre Bouchard-Côté, John DeNero, and Dan Klein. 2010. Painless unsupervised learning with features. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 582–590.
[3]
Phil Blunsom and Trevor Cohn. 2011. A hierarchical Pitman-Yor process HMM for unsupervised part of speech induction. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 865–874.
[4]
Simon Carter, Marc Dymetman, and Guillaume Bouchard. 2012. Exact sampling and decoding in high-order hidden Markov models. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 1125–1134.
[5]
Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, and Tony Robinson. 2014. One billion word benchmark for measuring progress in statistical language modeling. In Proceedings of the 15th Annual Conference of the International Speech Communication Association.
[6]
Justin Chiu and Alexander M. Rush. 2020. Scaling hidden Markov language models. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 1341–1349.
[7]
Noam Chomsky. 1957. Syntactic structures. Lightning Source, Inc. (2015).
[8]
Christos Christodoulopoulos, Sharon Goldwater, and Mark Steedman. 2010. Two decades of unsupervised POS induction: How far have we come? In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 575–584.
[9]
Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning. 160–167.
[10]
George Oliver Curme et al. 1935. Parts of speech and accidence. D.C. Health and company, Boston. (1935).
[11]
Hanjun Dai, Bo Dai, Yan-Ming Zhang, Shuang Li, and Le Song. 2017. Recurrent hidden semi-Markov model. In Proceedings of the 5th International Conference on Learning Representations.
[12]
Jeffrey L. Elman. 1990. Finding structure in time. Cogn. Sci. 14, 2 (1990), 179–211.
[13]
Akiko Eriguchi, Kazuma Hashimoto, and Yoshimasa Tsuruoka. 2016. Tree-to-Sequence attentional neural machine translation. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 823–833.
[14]
Yarin Gal and Zoubin Ghahramani. 2016. A theoretically grounded application of dropout in recurrent neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems. 1019–1027.
[15]
Vikram Gupta, Haoyue Shi, Kevin Gimpel, and Mrinmaya Sachan. 2020. Clustering contextualized representations of text for unsupervised syntax induction. arXiv preprint arXiv:2010.12784 (2020).
[16]
Junxian He, Graham Neubig, and Taylor Berg-Kirkpatrick. 2018. Unsupervised learning of syntactic structure with invertible neural projections. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1292–1302.
[17]
Anh Khoa Ngo Ho and François Yvon. 2020. Neural baselines for word alignment. arXiv preprint arXiv:2009.13116 (2020).
[18]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (1997), 1735–1780.
[19]
Mark Johnson. 2007. Why doesn’t EM find good HMM POS-taggers? In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 296–305.
[20]
Rafal Jozefowicz, Wojciech Zaremba, and Ilya Sutskever. 2015. An empirical exploration of recurrent network architectures. In Proceedings of the International Conference on Machine Learning. 2342–2350.
[21]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[22]
Durk P. Kingma, Tim Salimans, and Max Welling. 2015. Variational dropout and the local reparameterization trick. In Proceedings of the Advances in Neural Information Processing Systems. 2575–2583.
[23]
Eliyahu Kiperwasser and Yoav Goldberg. 2016. Simple and accurate dependency parsing using bidirectional LSTM feature representations. Trans. Assoc. Comput. Ling. 4, 1 (2016), 313–327.
[24]
Chu-Cheng Lin, Waleed Ammar, Chris Dyer, and Lori Levin. 2015. Unsupervised POS induction with word embeddings. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1311–1316.
[25]
Ryan McDonald, Joakim Nivre, Yvonne Quirmbach-Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, T. Oscar, et al. 2013. Universal dependency annotation for multilingual parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. 92–97.
[26]
Alain Polguère et al. 2009. Dependency in Linguistic Description. Vol. 111. John Benjamins Publishing.
[27]
Andrew Rosenberg and Julia Hirschberg. 2007. V-measure: A conditional entropy-based external cluster evaluation measure. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 410–420.
[28]
Xi Shao, Changsheng Xu, and Mohan S. Kankanhalli. 2004. Unsupervised classification of music genre using hidden Markov model. In Proceedings of the IEEE International Conference on Multimedia and Expo. 2023–2026.
[29]
Matthias Sperber, Graham Neubig, Jan Niehues, and Alex Waibel. 2017. Neural lattice-to-sequence models for uncertain inputs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1380–1389.
[30]
Karl Stratos. 2019. Mutual information maximization for simple and accurate part-of-speech induction. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies). 1095–1104.
[31]
Karl Stratos, Michael Collins, and Daniel Hsu. 2016. Unsupervised part-of-speech tagging with anchor hidden Markov models. Trans. Assoc. Comput. Ling. 4, 1 (2016), 245–257.
[32]
Ke M. Tran, Yonatan Bisk, Ashish Vaswani, Daniel Marcu, and Kevin Knight. 2016. Unsupervised neural hidden Markov models. In Proceedings of the Workshop on Structured Prediction for NLP. 63–71.
[33]
Jurgen Van Gael, Andreas Vlachos, and Zoubin Ghahramani. 2009. The infinite HMM for unsupervised PoS tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 678–687.
[34]
Andrew Viterbi. 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theor. 13, 2 (1967), 260–269.
[35]
Songlin Yang, Yong Jiang, Wenjuan Han, and Kewei Tu. 2020. Second-order unsupervised neural dependency parsing. In Proceedings of the 28th International Conference on Computational Linguistics. 3911–3924.
[36]
George Yule and Terrie Mathis. 1992. The role of staging and constructed dialogue in establishing speaker’s topic. Linguistics 30, 1 (1992), 199–216.
[37]
Yue Zhang and Jie Yang. 2018. Chinese NER using lattice LSTM. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 1554–1564.

Cited By

View all

Index Terms

  1. Recurrent Neural Hidden Markov Model for High-order Transition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Asian and Low-Resource Language Information Processing
    ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 21, Issue 2
    March 2022
    413 pages
    ISSN:2375-4699
    EISSN:2375-4702
    DOI:10.1145/3494070
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 October 2021
    Accepted: 01 July 2021
    Revised: 01 May 2021
    Received: 01 October 2020
    Published in TALLIP Volume 21, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Neural networks
    2. POS tagging

    Qualifiers

    • Research-article
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)60
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 27 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media