skip to main content
10.1145/3340531.3412013acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Dual Head-wise Coattention Network for Machine Comprehension with Multiple-Choice Questions

Published: 19 October 2020 Publication History

Abstract

Multiple-choice Machine Comprehension (MC) is an important and challenging nature language processing (NLP) task where the machine is required to make the best answer from candidate answer set given particular passage and question. Existing approaches either only utilize the powerful pre-trained language models or only rely on an over complicated matching network that is design supposed to capture the relationship effectively among the triplet of passage, question and candidate answers. In this paper, we present a novel architecture, Dual Head-wise Coattention network (called DHC), which is a simple and efficient attention neural network designed to perform multiple-choice MC task. Our proposed DHC not only support a powerful pre-trained language model as encoder, but also models the MC relationship as attention mechanism straightforwardly, by head-wise matching and aggregating method on multiple layers, which better model relationships sufficiently between question and passage, and cooperate with large pre-trained language models more efficiently. To evaluate the performance, we test our proposed model on five challenging and well-known datasets for multiple-choice MC: RACE, DREAM, SemEval-2018 Task 11, OpenBookQA, and TOEFL. Extensive experimental results demonstrate that our proposal can achieve a significant increase in accuracy comparing existing models based on all five datasets, and it consistently outperforms all tested baselines including the state-of-the-arts techniques. More remarkably, our proposal is a pluggable and more flexible model, and it thus can be plugged into any pre-trained Language Models based on BERT. Ablation studies demonstrate its state-of-the-art performance and generalization.

Supplementary Material

MP4 File (3340531.3412013.mp4)
In this paper, we present a novel architecture, Dual Head-wise Coattention network (called DHC), which is a simple and efficient attention neural network designed to perform multiple-choice MC task. Our proposed DHC not only support a powerful pre-trained language model as encoder, but also models the MC relationship as attention mechanism straightforwardly, by head-wise matching and aggregating method on multiple layers, which better model relationships sufficiently between question and passage, and cooperate with large pre-trained language models more efficiently.

References

[1]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7--9, 2015, Conference Track Proceedings. http://arxiv.org/abs/1409.0473
[2]
Samuel R. Bowman, Ellie Pavlick, Edouard Grave, and Benjamin Van Durme. 2018. Looking for ELMo's friends: Sentence-Level Pretraining Beyond Language Modeling. CoRR, Vol. abs/1812.10860 (2018). arxiv: 1812.10860 http://arxiv.org/abs/1812.10860
[3]
Zhipeng Chen, Yiming Cui, Wentao Ma, Shijin Wang, and Guoping Hu. 2019. Convolutional Spatial Attention Model for Reading Comprehension with Multiple-Choice Questions. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019. 6276--6283. https://doi.org/10.1609/aaai.v33i01.33016276
[4]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2--7, 2019, Volume 1 (Long and Short Papers). 4171--4186. https://www.aclweb.org/anthology/N19--1423/
[5]
Karl Moritz Hermann, Tomá s Kociský, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. 2015. Teaching Machines to Read and Comprehend. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, NIPS 2015, December 7--12, 2015, Montreal, Quebec, Canada. 1693--1701. http://papers.nips.cc/paper/5945-teaching-machines-to-read-and-comprehend
[6]
Pengfei Hu, Hai Zhao, and Xiaoguang Li. 2020. Dual Multi-head Co-attention for Multi-choice Reading Comprehension. CoRR, Vol. abs/2001.09415 (2020). arxiv: 2001.09415 https://arxiv.org/abs/2001.09415
[7]
Vaishali Ingale and Pushpender Singh. 2020. GenNet: Reading Comprehension with Multiple Choice Questions using Generation and Selection model. CoRR, Vol. abs/2003.04360 (2020). arxiv: 2003.04360 https://arxiv.org/abs/2003.04360
[8]
Di Jin, Shuyang Gao, Jiun-Yu Kao, Tagyoung Chung, and Dilek Hakkani-Tü r. 2019. MMM: Multi-stage Multi-task Learning for Multi-choice Reading Comprehension. CoRR, Vol. abs/1910.00458 (2019). arxiv: 1910.00458 http://arxiv.org/abs/1910.00458
[9]
Daniel Khashabi, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark, and Hannaneh Hajishirzi. 2020. UnifiedQA: Crossing Format Boundaries With a Single QA System. CoRR, Vol. abs/2005.00700 (2020). arxiv: 2005.00700 https://arxiv.org/abs/2005.00700
[10]
Guokun Lai, Qizhe Xie, Hanxiao Liu, Yiming Yang, and Eduard H. Hovy. 2017. RACE: Large-scale ReAding Comprehension Dataset From Examinations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9--11, 2017. 785--794. https://aclanthology.info/papers/D17--1082/d17--1082
[11]
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2020. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. In 8th International Conference on Learning Representations, ICLR 2020.
[12]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR, Vol. abs/1907.11692 (2019). arxiv: 1907.11692 http://arxiv.org/abs/1907.11692
[13]
Zhuang Liu, Degen Huang, Kaiyu Huang, Zhuang Li, and Jun Zhao. 2020 b. FinBERT: A Pre-trained Financial Language Representation Model for Financial Text Mining. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020, January 5--10, 2021, Yokohama, Japan. 4513--4519.
[14]
Zhuang Liu, Degen Huang, Kaiyu Huang, and Jing Zhang. 2017. DIM Reader: Dual Interaction Model for Machine Comprehension. In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data - 16th China National Conference, CCL 2017, - and - 5th International Symposium, NLP-NABD 2017, Nanjing, China, October 13--15, 2017, Proceedings. 387--397. https://doi.org/10.1007/978--3--319--69005--6_32
[15]
Zhuang Liu, Kaiyu Huang, Degen Huang, and Jun Zhao. 2020 a. Semantics-Reinforced Networks for Question Generation. In Proceedings of the Twenty-forth European Conference on Artificial Intelligence, ECAI 2020, Santiago de Compostela, Spain, Aug 29 - Sep 5, 2020, Long Papers.
[16]
Zhuang Liu, Keli Xiao, Bo Jin, Kaiyu Huang, Degen Huang, and Yunxia Zhang. 2020 c. Unified Generative Adversarial Networks for Multiple-Choice Oriented Machine Comprehension. ACM Trans. Intell. Syst. Technol., Vol. 11, 3 (2020), 25:1--25:20. https://doi.org/10.1145/3372120
[17]
Todor Mihaylov, Peter Clark, Tushar Khot, and Ashish Sabharwal. 2018. Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018. 2381--2391. https://doi.org/10.18653/v1/d18--1260
[18]
Lili Mou, Rui Men, Ge Li, Yan Xu, Lu Zhang, Rui Yan, and Zhi Jin. 2016. Natural Language Inference by Tree-Based Convolution and Heuristic Matching. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, August 7--12, 2016, Berlin, Germany, Volume 2: Short Papers. https://doi.org/10.18653/v1/p16--2022
[19]
Simon Ostermann, Michael Roth, Ashutosh Modi, Stefan Thater, and Manfred Pinkal. 2018. SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge. In Proceedings of The 12th International Workshop on Semantic Evaluation, SemEval@NAACL-HLT 2018, New Orleans, Louisiana, USA, June 5--6, 2018. 747--757. https://doi.org/10.18653/v1/s18--1119
[20]
Xiaoman Pan, Kai Sun, Dian Yu, Heng Ji, and Dong Yu. 2019. Improving Question Answering with External Knowledge. CoRR, Vol. abs/1902.00993 (2019). arxiv: 1902.00993 http://arxiv.org/abs/1902.00993
[21]
Soham Parikh, Ananya Sai, Preksha Nema, and Mitesh M. Khapra. 2018. ElimiNet: A Model for Eliminating Options for Reading Comprehension with Multiple Choice Questions. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13--19, 2018, Stockholm, Sweden. 4272--4278. https://doi.org/10.24963/ijcai.2018/594
[22]
Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving language understanding by generative pre-training. In Proceedings of Technical report, OpenAI (2018). https://github.com/openai/finetune-transformer-lm
[23]
Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1--4, 2016. 2383--2392. http://aclweb.org/anthology/D/D16/D16--1264.pdf
[24]
Qiu Ran, Peng Li, Weiwei Hu, and Jie Zhou. 2019. Option Comparison Network for Multiple-choice Reading Comprehension. CoRR, Vol. abs/1903.03033 (2019). arxiv: 1903.03033 http://arxiv.org/abs/1903.03033
[25]
Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, Vol. abs/1910.01108 (2019). arxiv: 1910.01108 http://arxiv.org/abs/1910.01108
[26]
Min Joon Seo, Aniruddha Kembhavi, Ali Farhadi, and Hannaneh Hajishirzi. 2017. Bidirectional Attention Flow for Machine Comprehension. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24--26, 2017, Conference Track Proceedings. https://openreview.net/forum?id=HJ0UKP9ge
[27]
Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. 2019. Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism. CoRR, Vol. abs/1909.08053 (2019). arxiv: 1909.08053 http://arxiv.org/abs/1909.08053
[28]
Kai Sun, Dian Yu, Jianshu Chen, Dong Yu, Yejin Choi, and Claire Cardie. 2019 a. DREAM: A Challenge Dataset and Models for Dialogue-Based Reading Comprehension. TACL, Vol. 7 (2019), 217--231. https://transacl.org/ojs/index.php/tacl/article/view/1534
[29]
Kai Sun, Dian Yu, Dong Yu, and Claire Cardie. 2019 b. Improving Machine Reading Comprehension with General Reading Strategies. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2--7, 2019, Volume 1 (Long and Short Papers). 2633--2643. https://doi.org/10.18653/v1/n19--1270
[30]
Min Tang, Jiaran Cai, and Hankz Hankui Zhuo. 2019. Multi-Matching Network for Multiple Choice Reading Comprehension. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019. 7088--7095. https://doi.org/10.1609/aaai.v33i01.33017088
[31]
Bo-Hsiang Tseng, Sheng-syun Shen, Hung-yi Lee, and Lin-Shan Lee. 2016. Towards Machine Comprehension of Spoken Content: Initial TOEFL Listening Comprehension Test by Machine. In Interspeech 2016, 17th Annual Conference of the International Speech Communication Association, San Francisco, CA, USA, September 8--12, 2016. 2731--2735. https://doi.org/10.21437/Interspeech.2016--876
[32]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 4--9 December 2017, Long Beach, CA, USA. 5998--6008. http://papers.nips.cc/paper/7181-attention-is-all-you-need
[33]
Hui Wan. 2020. Multi-task Learning with Multi-head Attention for Multi-choice Reading Comprehension. CoRR, Vol. abs/2003.04992 (2020). arxiv: 2003.04992 https://arxiv.org/abs/2003.04992
[34]
Shuohang Wang, Mo Yu, Jing Jiang, and Shiyu Chang. 2018. A Co-Matching Model for Multi-choice Reading Comprehension. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, July 15--20, 2018, Volume 2: Short Papers, Iryna Gurevych and Yusuke Miyao (Eds.). Association for Computational Linguistics, 746--751. https://doi.org/10.18653/v1/P18--2118
[35]
Yichong Xu, Jingjing Liu, Jianfeng Gao, Yelong Shen, and Xiaodong Liu. 2017. Towards Human-level Machine Reading Comprehension: Reasoning and Inference with Multiple Strategies. CoRR, Vol. abs/1711.04964 (2017). arxiv: 1711.04964 http://arxiv.org/abs/1711.04964
[36]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and Quoc V. Le. 2019. XLNet: Generalized Autoregressive Pretraining for Language Understanding. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, 8--14 December 2019, Vancouver, BC, Canada. 5754--5764. http://papers.nips.cc/paper/8812-xlnet-generalized-autoregressive-pretraining-for-language-understanding
[37]
Shuailiang Zhang, Hai Zhao, Yuwei Wu, Zhuosheng Zhang, Xi Zhou, and Xiang Zhou. 2019. DCMN+: Dual Co-Matching Network for Multi-choice Reading Comprehension. CoRR, Vol. abs/1908.11511 (2019). arxiv: 1908.11511 http://arxiv.org/abs/1908.11511
[38]
Haichao Zhu, Furu Wei, Bing Qin, and Ting Liu. 2018. Hierarchical Attention Flow for Multiple-Choice Reading Comprehension. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2--7, 2018. 6077--6085. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16331

Cited By

View all

Index Terms

  1. Dual Head-wise Coattention Network for Machine Comprehension with Multiple-Choice Questions

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management
    October 2020
    3619 pages
    ISBN:9781450368599
    DOI:10.1145/3340531
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. attention
    2. multiple-choice machine comprehension
    3. neural network
    4. race

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation

    Conference

    CIKM '20
    Sponsor:

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)15
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media