skip to main content
10.1145/3539618.3592039acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Power Norm Based Lifelong Learning for Paraphrase Generations

Published: 18 July 2023 Publication History

Abstract

Lifelong seq2seq language generation models are trained with multiple domains in a lifelong learning manner, with data from each domain being observed in an online fashion. It is a well-known problem that lifelong learning suffers from the catastrophic forgetting (CF). To handle this challenge, existing works have leveraged experience replay or dynamic architecture to consolidate the past knowledge, which however result in incremental memory space or high computational cost. In this work, we propose a novel framework name "power norm based lifelong learning" (PNLLL), which aims to remedy the catastrophic forgetting issues with a power normalization on NLP transformer models. Specifically, PNLLL leverages power norm to achieve a better balance between past experience rehearsal and new knowledge acquisition. These designs enable the knowledge adaptation onto new tasks while memorizing the experience of past tasks. Our experiments on paraphrase generation tasks show that PNLLL not only outperforms SOTA models by a considerable margin and but also largely alleviates forgetting.

References

[1]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
[2]
Deng Cai, Yan Wang, Wei Bi, Zhaopeng Tu, Xiaojiang Liu, and Shuming Shi. 2019. Retrieval-guided dialogue response generation via a matching-to-generation framework. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China, 1866--1875.
[3]
Arslan Chaudhry, Marcus Rohrbach, Mohamed Elhoseiny, Thalaiyasingam Ajanthan, Puneet K Dokania, Philip HS Torr, and M Ranzato. 2019. Continual learning with tiny episodic memories. arXiv preprint arXiv:1902.10486 (2019).
[4]
Sanyuan Chen, Yutai Hou, Yiming Cui, Wanxiang Che, Ting Liu, and Xiangzhan Yu. 2020. Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less Forgetting. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online, 7870--7881.
[5]
Zhiyuan Chen and Bing Liu. 2014. Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data. In Proceedings of the 31th International Conference on Machine Learning (ICML). Beijing, China, 703--711.
[6]
Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. On the Properties of Neural Machine Translation: Encoder-Decoder Approaches. In Proceedings of SSST@EMNLP 2014, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation. Doha, Qatar, 103--111.
[7]
Matthias De Lange, Gido van de Ven, and Tinne Tuytelaars. 2022. Continual evaluation for lifelong learning: Identifying the stability gap. arXiv preprint arXiv:2205.13452 (2022).
[8]
Bosheng Ding, Junjie Hu, Lidong Bing, Sharifah Mahani Aljunied, Shafiq R. Joty, Luo Si, and Chunyan Miao. 2022. GlobalWoZ: Globalizing MultiWoZ to Develop Multilingual Task-Oriented Dialogue Systems. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL). Dublin, Ireland, 1639--1657.
[9]
Qingxiu Dong, Xiaojun Wan, and Yue Cao. 2021. ParaSCI: A Large Scientific Paraphrase Dataset for Longer Paraphrase Generation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL). Online, 424--434.
[10]
Sayna Ebrahimi, Mohamed Elhoseiny, Trevor Darrell, and Marcus Rohrbach. 2020. Uncertainty-guided Continual Learning with Bayesian Neural Networks. In Proceedings of the 8th International Conference on Learning Representations (ICLR). Addis Ababa, Ethiopia.
[11]
Andre Esteva, Anuprit Kale, Romain Paulus, Kazuma Hashimoto, Wenpeng Yin, Dragomir Radev, and Richard Socher. 2021. COVID-19 information retrieval with deep-learning based semantic search, question answering, and abstractive summarization. NPJ digital medicine, Vol. 4, 1 (2021), 68.
[12]
Yue Feng, Zhen Han, Mingming Sun, and Ping Li. 2022. Multi-Hop Open-Domain Question Answering over Structured and Unstructured Knowledge. In Findings of the Association for Computational Linguistics (NAACL). Seattle, WA, 151--156.
[13]
Robert M French. 1999. Catastrophic forgetting in connectionist networks. Trends in cognitive sciences, Vol. 3, 4 (1999), 128--135.
[14]
Ankush Gupta, Arvind Agarwal, Prawaan Singh, and Piyush Rai. 2018. A Deep Generative Framework for Paraphrase Generation. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI). New Orleans, LA, 5149--5156.
[15]
Nithin Holla, Pushkar Mishra, Helen Yannakoudakis, and Ekaterina Shutova. 2020. Meta-learning with sparse experience replay for lifelong language learning. arXiv preprint arXiv:2009.04891 (2020).
[16]
Ferenc Huszár. 2018. Note on the quadratic penalties in elastic weight consolidation. Proceedings of the National Academy of Sciences (2018), 201717042.
[17]
Sergey Ioffe. 2017. Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models. In Advances in Neural Information Processing Systems (NIPS). Long Beach, CA, 1945--1953.
[18]
David Isele and Akansel Cosgun. 2018. Selective Experience Replay for Lifelong Learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI). New Orleans, LA, 3302--3309.
[19]
Nal Kalchbrenner and Phil Blunsom. 2013. Recurrent Continuous Translation Models. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP). Seattle, WA, 1700--1709.
[20]
Amirhossein Kazemnejad, Mohammadreza Salehi, and Mahdieh Soleymani Baghshah. 2020. Paraphrase Generation by Learning How to Edit from Samples. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). Online, 6010--6021.
[21]
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, and Agnieszka Grabska-Barwinska. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, Vol. 114, 13 (2017), 3521--3526.
[22]
Dharshan Kumaran, Demis Hassabis, and James L McClelland. 2016. What learning systems do intelligent agents need? Complementary learning systems theory updated. Trends in cognitive sciences, Vol. 20, 7 (2016), 512--534.
[23]
Wuwei Lan, Siyu Qiu, Hua He, and Wei Xu. 2017. A Continuously Growing Dataset of Sentential Paraphrases. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP). Copenhagen, Denmark, 1224--1234.
[24]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL). Online, 7871--7880.
[25]
Dingcheng Li, Zheng Chen, Eunah Cho, Jie Hao, Xiaohu Liu, Fan Xing, Chenlei Guo, and Yang Liu. 2022a. Overcoming Catastrophic Forgetting During Domain Adaptation of Seq2seq Language Generation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL). Seattle, WA, 5441--5454.
[26]
Xiaodi Li, Zhuoyi Wang, Dingcheng Li, Latifur Khan, and Bhavani Thuraisingham. 2022b. LPC: A Logits and Parameter Calibration Framework for Continual Learning. In Findings of the Association for Computational Linguistics (EMNLP). Abu Dhabi, United Arab Emirates, 7142--7155.
[27]
Zichao Li, Xin Jiang, Lifeng Shang, and Hang Li. 2018. Paraphrase Generation with Deep Reinforcement Learning. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP). Brussels, Belgium, 3865--3878.
[28]
Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollá r, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Proceedings of the 13th European Conference on Computer Vision (ECCV), Part V. Zurich, Switzerland, 740--755.
[29]
Zhaojiang Lin, Genta Indra Winata, Peng Xu, Zihan Liu, and Pascale Fung. 2020. Variational transformers for diverse response generation. arXiv preprint arXiv:2003.12738 (2020).
[30]
Andrea Madotto, Zhaojiang Lin, Zhenpeng Zhou, Seungwhan Moon, Paul A. Crook, Bing Liu, Zhou Yu, Eunjoon Cho, Pascale Fung, and Zhiguang Wang. 2021. Continual Learning in Task-Oriented Dialogue Systems. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). Virtual Event / Punta Cana, Dominican Republic, 7452--7467.
[31]
Mateusz Malinowski, Marcus Rohrbach, and Mario Fritz. 2017. Ask your neurons: A deep learning approach to visual question answering. International Journal of Computer Vision, Vol. 125 (2017), 110--135.
[32]
Michael McCloskey and Neal J Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation. Vol. 24. 109--165.
[33]
Fei Mi, Liangwei Chen, Mengjie Zhao, Minlie Huang, and Boi Faltings. 2020. Continual Learning for Natural Language Generation in Task-oriented Dialog Systems. In Findings of the Association for Computational Linguistics (EMNLP Findings). Online Event, 3461--3474.
[34]
Sharmila Reddy Nangi, Atharv Tyagi, Jay Mundra, Sagnik Mukherjee, Raj Snehal, Niyati Chhaya, and Aparna Garimella. 2021. AUTOSUMM: Automatic Model Creation for Text Summarization. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). Virtual Event / Punta Cana, Dominican Republic, 10162--10172.
[35]
Hadi Nekoei, Akilesh Badrinaaraayanan, Aaron C. Courville, and Sarath Chandar. 2021. Continuous Coordination As a Realistic Scenario for Lifelong Learning. In Proceedings of the 38th International Conference on Machine Learning (ICML), Marina Meila and Tong Zhang (Eds.). Virtual Event, 8016--8024.
[36]
German Ignacio Parisi, Ronald Kemker, Jose L. Part, Christopher Kanan, and Stefan Wermter. 2019. Continual lifelong learning with neural networks: A review. Neural Networks, Vol. 113 (2019), 54--71.
[37]
Lorenzo Pellegrini, Gabriele Graffieti, Vincenzo Lomonaco, and Davide Maltoni. 2020. Latent Replay for Real-Time Continual Learning. In Procedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Las Vegas, NV, 10203--10209.
[38]
Quang Pham, Chenghao Liu, Doyen Sahoo, and Steven C. H. Hoi. 2021. Contextual Transformation Networks for Online Continual Learning. In Proceedings of the 9th International Conference on Learning Representations (ICLR). Virtual Event.
[39]
Martin Popel, Marketa Tomkova, Jakub Tomek, Łukasz Kaiser, Jakob Uszkoreit, Ondvr ej Bojar, and Zdenve k vZ abokrtskỳ. 2020. Transforming machine translation: a deep learning system reaches news translation quality comparable to human professionals. Nature communications, Vol. 11, 1 (2020), 4381.
[40]
Aaditya Prakash, Sadid A. Hasan, Kathy Lee, Vivek V. Datla, Ashequl Qadir, Joey Liu, and Oladimeji Farri. 2016. Neural Paraphrase Generation with Stacked Residual LS™ Networks. In Proceedings of the 26th International Conference on Computational Linguistics (COLING). Osaka, Japan, 2923--2934.
[41]
Mark Bishop Ring. 1994. Continual learning in reinforcement environments. Ph.,D. Dissertation. University of Texas at Austin Austin, Texas 78712.
[42]
Abigail See, Peter J. Liu, and Christopher D. Manning. 2017. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL). Vancouver, Canada, 1073--1083.
[43]
Yashvardhan Sharma and Sahil Gupta. 2018. Deep learning approaches for question answering system. Procedia computer science, Vol. 132 (2018), 785--794.
[44]
Sheng Shen, Zhewei Yao, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. 2020. PowerNorm: Rethinking Batch Normalization in Transformers. In Proceedings of the 37th International Conference on Machine Learning (ICML). Virtual Event, 8741--8751.
[45]
Yanyao Shen, Hyokun Yun, Zachary C. Lipton, Yakov Kronrod, and Animashree Anandkumar. 2018. Deep Active Learning for Named Entity Recognition. In Proceedings of the 6th International Conference on Learning Representations (ICLR). Vancouver, Canada.
[46]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to Sequence Learning with Neural Networks. In Advances in Neural Information Processing Systems (NIPS). Montreal, Canada, 3104--3112.
[47]
Sebastian Thrun. 1995. Is Learning The n-th Thing Any Easier Than Learning The First?. In Advances in Neural Information Processing Systems (NIPS), David S. Touretzky, Michael Mozer, and Michael E. Hasselmo (Eds.). Denver, CO, 640--646.
[48]
Gido M Van de Ven and Andreas S Tolias. 2018. Generative replay with feedback connections as a general strategy for continual learning. arXiv preprint arXiv:1809.10635 (2018).
[49]
Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francc ois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Lukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, and Jakob Uszkoreit. 2018. Tensor2Tensor for Neural Machine Translation. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (AMTA). Boston, MA, 193--199.
[50]
Wenbo Wang, Yang Gao, Heyan Huang, and Yuxiang Zhou. 2019. Concept Pointer Network for Abstractive Summarization. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China, 3074--3083.
[51]
Yigong Wang, Zhuoyi Wang, Yu Lin, Latifur Khan, and Dingcheng Li. 2021. CIFDM: Continual and Interactive Feature Distillation for Multi-Label Stream Learning. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Virtual Event, Canada, 2121--2125.
[52]
Zhuoyi Wang, Dingcheng Li, and Ping Li. 2022. Latent Coreset Sampling based Data-Free Continual Learning. In Proceedings of the 31st ACM International Conference on Information and Knowledge Management (CIKM). Atlanta, GA, 2077--2087.
[53]
Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan L. Yuille, and Quoc V. Le. 2020. Adversarial Examples Improve Image Recognition. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, WA, 816--825.
[54]
Hu Xu, Bing Liu, Lei Shu, and Philip S. Yu. 2018. Lifelong Domain Word Embedding via Meta-Learning. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI). Stockholm, Sweden, 4510--4516.
[55]
Fan Yang, Chao Yang, Huaping Liu, and Fuchun Sun. 2021. Evaluations of the Gap between Supervised and Reinforcement Lifelong Learning on Robotic Manipulation Tasks. In Proceedings of the Conference on Robot Learning (CoRL). London, UK, 547--556.
[56]
Peng Yang, Dingcheng Li, and Ping Li. 2022. Continual learning for natural language generations with transformer calibration. In Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL). Abu Dhabi, United Arab Emirates, 158--170.
[57]
Haiyan Yin, Dingcheng Li, and Ping Li. 2022. Learning to Selectively Learn for Weakly Supervised Paraphrase Generation with Model-based Reinforcement Learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL). Seattle, WA, 1385--1395.
[58]
Haiyan Yin, Dingcheng Li, Xu Li, and Ping Li. 2020. Meta-CoTGAN: A Meta Cooperative Training Paradigm for Improving Adversarial Text Generation. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI). New York, NY, 9466--9473.
[59]
Haiyan Yin, Peng Yang, and Ping Li. 2021. Mitigating Forgetting in Online Continual Learning with Neuron Calibration. In Advances in Neural Information Processing Systems (NeurIPS). virtual, 10260--10272.
[60]
Shengqiang Zhang, Xingxing Zhang, Hangbo Bao, and Furu Wei. 2022. Attention Temperature Matters in Abstractive Summarization Distillation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL). Dublin, Ireland, 127--141.
[61]
Yingxiu Zhao, Yinhe Zheng, Zhiliang Tian, Chang Gao, Jian Sun, and Nevin L. Zhang. 2022. Prompt Conditioned VAE: Enhancing Generative Replay for Lifelong Learning in Task-Oriented Dialogue. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP). Abu Dhabi, United Arab Emirates, 11153--11169.
[62]
Fan Zhou and Chengtai Cao. 2021. Overcoming Catastrophic Forgetting in Graph Neural Networks with Experience Replay. In Proceedings of. theThirty-Fifth AAAI Conference on Artificial Intelligence (AAAI). Virtual Event, 4714--4722.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2023
3567 pages
ISBN:9781450394086
DOI:10.1145/3539618
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. lifelong learning
  2. normalized regularization
  3. power normalization

Qualifiers

  • Short-paper

Conference

SIGIR '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 95
    Total Downloads
  • Downloads (Last 12 months)29
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media