skip to main content
research-article

Structurally Comparative Hinge Loss for Dependency-Based Neural Text Representation

Published: 18 May 2020 Publication History

Abstract

Dependency-based graph convolutional networks (DepGCNs) are proven helpful for text representation to handle many natural language tasks. Almost all previous models are trained with cross-entropy (CE) loss, which maximizes the posterior likelihood directly. However, the contribution of dependency structures is not well considered by CE loss. As a result, the performance improvement gained by using the structure information can be narrow due to the failure in learning to rely on this structure information. To face the challenge, we propose the novel structurally comparative hinge (SCH) loss function for DepGCNs. SCH loss aims at enlarging the margin gained by structural representations over non-structural ones. From the perspective of information theory, this is equivalent to improving the conditional mutual information of model decision and structure information given text. Our experimental results on both English and Chinese datasets show that by substituting SCH loss for CE loss on various tasks, for both induced structures and structures from an external parser, performance is improved without additional learnable parameters. Furthermore, the extent to which certain types of examples rely on the dependency structure can be measured directly by the learned margin, which results in better interpretability. In addition, through detailed analysis, we show that this structure margin has a positive correlation with task performance and structure induction of DepGCNs, and SCH loss can help model focus more on the shortest dependency path between entities. We achieve the new state-of-the-art results on TACRED, IMDB, and Zh. Literature datasets, even compared with ensemble and BERT baselines.

References

[1]
Joost Bastings, Wilker Aziz, Ivan Titov, and Khalil Sima’an. 2019. Modeling latent sentence structure in neural machine translation. arxiv:1901.06436.
[2]
Joost Bastings, Ivan Titov, Wilker Aziz, Diego Marcheggiani, and Khalil Sima’an. 2017. Graph convolutional encoders for syntax-aware neural machine translation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP’17). 1957--1967. https://aclanthology.info/papers/D17-1209/d17-1209.
[3]
Yonatan Bisk and Ke Tran. 2018. Inducing grammars with and for neural machine translation. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation (NMT@ACL’18). 25--35. https://aclanthology.info/papers/W18-2704/w18-2704.
[4]
Rui Cai, Xiaodong Zhang, and Houfeng Wang. 2016. Bidirectional recurrent convolutional neural network for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16), Volume 1: Long Papers. 756–765. http://aclweb.org/anthology/P/P16/P16-1072.pdf.
[5]
Daniel Cer, Yinfei Yang, Sheng-Yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, et al. 2018. Universal sentence encoder for English. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP’18): System Demonstrations 169--174. https://aclanthology.info/papers/D18-2029/d18-2029.
[6]
Jihun Choi, Kang Min Yoo, and SangGoo Lee. 2018. Learning to compose task-specific tree structures. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18). 5094--5101. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16682.
[7]
Junyoung Chung, Sungjin Ahn, and Yoshua Bengio. 2017. Hierarchical multiscale recurrent neural networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17): Conference Track Proceedings. https://openreview.net/forum?id=S1di0sfgl.
[8]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805.
[9]
Qiming Diao, Minghui Qiu, Chao-Yuan Wu, Alexander J. Smola, Jing Jiang, and Chong Wang. 2014. Jointly modeling aspects, ratings and sentiments for movie recommendation (JMARS). In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’14). 193--202.
[10]
Akiko Eriguchi, Yoshimasa Tsuruoka, and Kyunghyun Cho. 2017. Learning to parse and translate improves neural machine translation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL’17), Volume 2: Short Papers. 72--78.
[11]
Zhe Gan, Yunchen Pu, Ricardo Henao, Chunyuan Li, Xiaodong He, and Lawrence Carin. 2017. Learning generic sentence representations using convolutional neural networks. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP’17). 2390--2400. https://aclanthology.info/papers/D17-1254/d17-1254.
[12]
Yichen Gong, Heng Luo, and Jian Zhang. 2018. Natural language inference over interaction space. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18): Conference Track Proceedings. https://openreview.net/forum?id=r1dHXnH6-.
[13]
Matthew R. Gormley, Mo Yu, and Mark Dredze. 2015. Improved relation extraction with feature-rich compositional embedding models. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1774--1784. http://aclweb.org/anthology/D/D15/D15-1205.pdf.
[14]
Zhijiang Guo, Yan Zhang, and Wei Lu. 2019. Attention guided graph convolutional networks for relation extraction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL’19). 241--251.
[15]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770--778.
[16]
Shexia He, Zuchao Li, Hai Zhao, and Hongxiao Bai. 2018. Syntax for semantic role labeling, to be, or not to be. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL’18), Volume 1: Long Papers. 2061--2071. https://aclanthology.info/papers/P18-1192/p18-1192.
[17]
Felix Hill, Kyunghyun Cho, and Anna Korhonen. 2016. Learning distributed representations of sentences from unlabelled data. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’16). 1367--1377. http://aclweb.org/anthology/N/N16/N16-1162.pdf.
[18]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.
[19]
Matthew Honnibal and Mark Johnson. 2015. An improved non-monotonic transition system for dependency parsing. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1373--1378. https://aclweb.org/anthology/D/D15/D15-1162.
[20]
Sébastien Jean and Kyunghyun Cho. 2019. Context-aware learning for neural machine translation. arxiv:1903.04715.
[21]
Nal Kalchbrenner, Edward Grefenstette, and Phil Blunsom. 2014. A convolutional neural network for modelling sentences. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL’14), Volume 1: Long Papers. 655--665. http://aclweb.org/anthology/P/P14/P14-1062.pdf.
[22]
Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17): Conference Track Proceedings. https://openreview.net/forum?id=SJU4ayYgl.
[23]
Terry Koo, Amir Globerson, Xavier Carreras, and Michael Collins. 2007. Structured prediction models via the matrix-tree theorem. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07). 141--150. http://www.aclweb.org/anthology/D07-1015.
[24]
Jiwei Li, Xinlei Chen, Eduard H. Hovy, and Dan Jurafsky. 2016. Visualizing and understanding neural models in NLP. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’16). 681--691. http://aclweb.org/anthology/N/N16/N16-1082.pdf.
[25]
Shen Li, Zhe Zhao, Renfen Hu, Wensi Li, Tao Liu, and Xiaoyong Du. 2018. Analogical reasoning on Chinese morphological and semantic relations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL’18), Volume 2: Short Papers. 138--143. http://aclweb.org/anthology/P18-2023.
[26]
Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard S. Zemel. 2016. Gated graph sequence neural networks. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16): Conference Track Proceedings. http://arxiv.org/abs/1511.05493
[27]
Xin Liu, Qingcai Chen, Chong Deng, Huajun Zeng, Jing Chen, Dongfang Li, and Buzhou Tang. 2018. LCQMC: A large-scale Chinese question matching corpus. In Proceedings of the 27th International Conference on Computational Linguistics (COLING’18). 1952--1962. https://aclanthology.info/papers/C18-1166/c18-1166.
[28]
Yang Liu and Mirella Lapata. 2018. Learning structured text representations. Transactions of the Association for Computational Linguistics 6 (2018), 63--75. https://transacl.org/ojs/index.php/tacl/article/view/1185.
[29]
Yang Liu, Furu Wei, Sujian Li, Heng Ji, Ming Zhou, and Houfeng Wang. 2015. A dependency-based neural network for relation classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL’15), Volume 2: Short Papers. 285--290. http://aclweb.org/anthology/P/P15/P15-2047.pdf.
[30]
Chris J. Maddison, Andriy Mnih, and Yee Whye Teh. 2017. The concrete distribution: A continuous relaxation of discrete random variables. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17): Conference Track Proceedings. https://openreview.net/forum?id=S1jE5L5gl.
[31]
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Rose Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL’14): System Demonstrations. 55--60. http://aclweb.org/anthology/P/P14/P14-5010.pdf.
[32]
Diego Marcheggiani and Ivan Titov. 2017. Encoding sentences with graph convolutional networks for semantic role labeling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP’17). 1506--1515. https://aclanthology.info/papers/D17-1159/d17-1159.
[33]
Makoto Miwa and Mohit Bansal. 2016. End-to-end relation extraction using LSTMs on sequences and tree structures. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL’16), Volume 1: Long Papers. http://aclweb.org/anthology/P/P16/P16-1105.pdf.
[34]
Vlad Niculae, André F. T. Martins, and Claire Cardie. 2018. Towards dynamic computation graphs via sparse latent structure. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language. 905--911. https://aclanthology.info/papers/D18-1108/d18-1108.
[35]
Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu Chen, Xinying Song, and Rabab K. Ward. 2016. Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 4 (2016), 694--707.
[36]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14), a Meeting of SIGDAT, a Special Interest Group of the ACL. 1532--1543. http://aclweb.org/anthology/D/D14/D14-1162.pdf.
[37]
Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’18), Volume 1 (Long Papers). 2227--2237. https://aclanthology.info/papers/N18-1202/n18-1202
[38]
Alessandro Raganato and Jörg Tiedemann. 2018. An analysis of encoder representations in transformer-based machine translation. In Proceedings of the Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP@EMNLP’18). 287--297. https://aclanthology.info/papers/W18-5431/w18-5431.
[39]
Rico Sennrich and Barry Haddow. 2016. Linguistic input features improve neural machine translation. In Proceedings of the 1st Conference on Machine Translation (WMT’17), Colocated with ACL 2016. 83--91. http://aclweb.org/anthology/W/W16/W16-2209.pdf.
[40]
Yikang Shen, Shawn Tan, Alessandro Sordoni, and Aaron C. Courville. 2019. Ordered neurons: Integrating tree structures into recurrent neural networks. In Proceedings of the 7th International Conference on Learning Representations (ICLR’19). https://openreview.net/forum?id=B1l6qiR5F7.
[41]
Peng Shi and Jimmy Lin. 2019. Simple BERT models for relation extraction and semantic role labeling. arxiv:1904.05255.
[42]
Xing Shi, Inkit Padhi, and Kevin Knight. 2016. Does string-based neural MT learn source syntax? In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP’16). 1526--1534. http://aclweb.org/anthology/D/D16/D16-1159.pdf.
[43]
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP’13), a Meeting of SIGDAT, a Special Interest Group of the ACL. 1631--1642. https://aclanthology.info/papers/D13-1170/d13-1170.
[44]
Baohua Sun, Lin Yang, Patrick Dong, Wenhan Zhang, Jason Dong, and Charles Young. 2018. Super characters: A conversion from sentiment classification to image classification. In Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment, and Social Media Analysis (WASSA@EMNLP’18). 309--315. https://aclanthology.info/papers/W18-6245/w18-6245.
[45]
Swabha Swayamdipta, Sam Thomson, Kenton Lee, Luke Zettlemoyer, Chris Dyer, and Noah A. Smith. 2018. Syntactic scaffolds for semantic structures. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP’18). 3772--3782. https://aclanthology.info/papers/D18-1412/d18-1412.
[46]
Kai Sheng Tai, Richard Socher, and Christopher D. Manning. 2015. Improved semantic representations from tree-structured long short-term memory networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL’15), Volume 1: Long Papers. 1556--1566. http://aclweb.org/anthology/P/P15/P15-1150.pdf.
[47]
Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP’15). 1422--1432. http://aclweb.org/anthology/D/D15/D15-1167.pdf.
[48]
Ian Tenney, Dipanjan Das, and Ellie Pavlick. 2019. BERT rediscovers the classical NLP pipeline. arxiv:1905.05950.
[49]
Yufei Wang, Mark Johnson, Stephen Wan, Yifang Sun, and Wei Wang. 2019. How to best use syntax in semantic role labelling. arxiv:1906.00266.
[50]
Zhiguo Wang, Wael Hamza, and Radu Florian. 2017. Bilateral multi-perspective matching for natural language sentences. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). 4144--4150.
[51]
Zhiguo Wang, Haitao Mi, and Abraham Ittycheriah. 2016. Sentence similarity learning by lexical decomposition and composition. In Proceedings of the 26th International Conference on Computational Linguistics (COLING’16): Technical Papers. 1340--1349. http://aclweb.org/anthology/C/C16/C16-1127.pdf.
[52]
Larry Wasserman. 2000. Bayesian model selection and model averaging. Journal of Mathematical Psychology 44, 1 (2000), 92--107.
[53]
Ji Wen, Xu Sun, Xuancheng Ren, and Qi Su. 2018. Structure regularized neural network for entity relation classification for Chinese literature text. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’18), Volume 2 (Short Papers). 365--370. https://aclanthology.info/papers/N18-2059/n18-2059.
[54]
Yunlun Yang, Yunhai Tong, Shulei Ma, and Zhi-Hong Deng. 2016. A position encoding convolutional neural network based on dependency tree for relation classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP’16). 65--74.
[55]
Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alexander J. Smola, and Eduard H. Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’16). 1480--1489. http://aclweb.org/anthology/N/N16/N16-1174.pdf.
[56]
Meishan Zhang, Zhenghua Li, Guohong Fu, and Min Zhang. 2019. Syntax-enhanced neural machine translation with syntax-aware word representations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19), Volume 1 (Long and Short Papers). 1151--1161. https://aclweb.org/anthology/papers/N/N19/N19-1118/.
[57]
Xiang Zhang and Yann LeCun. 2017. Which encoding is the best for text classification in Chinese, English, Japanese and Korean?arxiv:1708.02657.
[58]
Yuhao Zhang, Peng Qi, and Christopher D. Manning. 2018. Graph convolution over pruned dependency trees improves relation extraction. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP’18). 2205--2215. https://aclanthology.info/papers/D18-1244/d18-1244.
[59]
Yuhao Zhang, Victor Zhong, Danqi Chen, Gabor Angeli, and Christopher D. Manning. 2017. Position-aware attention and supervised data improve slot filling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP’17). 35--45. https://aclanthology.info/papers/D17-1004/d17-1004.
[60]
Xiao-Dan Zhu, Parinaz Sobhani, and Hongyu Guo. 2015. Long short-term memory over recursive structures. In Proceedings of the 32nd International Conference on Machine Learning (ICML’15). 1604--1612. http://jmlr.org/proceedings/papers/v37/zhub15.html.

Cited By

View all

Index Terms

  1. Structurally Comparative Hinge Loss for Dependency-Based Neural Text Representation

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 19, Issue 4
      July 2020
      291 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3391538
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 May 2020
      Online AM: 07 May 2020
      Accepted: 01 March 2020
      Revised: 01 December 2019
      Received: 01 August 2019
      Published in TALLIP Volume 19, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Text representation
      2. graph convolutional networks
      3. loss function

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 07 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media