Continual learning for natural language generations with transformer calibration

P Yang, D Li, P Li - Proceedings of the 26th Conference on …, 2022 - aclanthology.org
Proceedings of the 26th Conference on Computational Natural Language …, 2022aclanthology.org
Conventional natural language process (NLP) generation models are trained offline with a
given dataset for a particular task, which is referred to as isolated learning. Research on
sequence-to-sequence language generation aims to study continual learning model to
constantly learning from sequentially encountered tasks. However, continual learning
studies often suffer from catastrophic forgetting, a persistent challenge for lifelong learning.
In this paper, we present a novel NLP transformer model that attempts to mitigate …
Abstract
Conventional natural language process (NLP) generation models are trained offline with a given dataset for a particular task, which is referred to as isolated learning. Research on sequence-to-sequence language generation aims to study continual learning model to constantly learning from sequentially encountered tasks. However, continual learning studies often suffer from catastrophic forgetting, a persistent challenge for lifelong learning. In this paper, we present a novel NLP transformer model that attempts to mitigate catastrophic forgetting in online continual learning from a new perspective, ie, attention calibration. We model the attention in the transformer as a calibrated unit in a general formulation, where the attention calibration could give benefits to balance the stability and plasticity of continual learning algorithms through influencing both their forward inference path and backward optimization path. Our empirical experiments, paraphrase generation and dialog response generation, demonstrate that this work outperforms state-of-the-art models by a considerable margin and effectively mitigate the forgetting.
aclanthology.org