Chenhui Chu

2024

pdf bib abs
Flexible Weight Tuning and Weight Fusion Strategies for Continual Named Entity Recognition
Yahan Yu | Duzhen Zhang | Xiuyi Chen | Chenhui Chu
Findings of the Association for Computational Linguistics: ACL 2024

Continual Named Entity Recognition (CNER) is dedicated to sequentially learning new entity types while mitigating catastrophic forgetting of old entity types. Traditional CNER approaches commonly employ knowledge distillation to retain old knowledge within the current model. However, because only the representations of old and new models are constrained to be consistent, the reliance solely on distillation in existing methods still suffers from catastrophic forgetting. To further alleviate the forgetting issue of old entity types, this paper introduces flexible Weight Tuning (WT) and Weight Fusion (WF) strategies for CNER. The WT strategy, applied at each training step, employs a learning rate schedule on the parameters of the current model. After learning the current task, the WF strategy dynamically integrates knowledge from both the current and previous models for inference. Notably, these two strategies are model-agnostic and seamlessly integrate with existing State-Of-The-Art (SOTA) models. Extensive experiments demonstrate that the WT and WF strategies consistently enhance the performance of previous SOTA methods across ten CNER settings in three datasets.

Emotion plays a crucial role in human conversation. This paper underscores the significance of considering emotion in speech translation. We present the MELD-ST dataset for the emotion-aware speech translation task, comprising English-to-Japanese and English-to-German language pairs. Each language pair includes about 10,000 utterances annotated with emotion labels from the MELD dataset. Baseline experiments using the SeamlessM4T model on the dataset indicate that fine-tuning with emotion labels can enhance translation performance in some settings, highlighting the need for further research in emotion-aware speech translation systems.

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs via cost-effective training strategies. The resulting models not only preserve the inherent reasoning and decision-making capabilities of LLMs but also empower a diverse range of MM tasks. In this paper, we provide a comprehensive survey aimed at facilitating further research of MM-LLMs. Initially, we outline general design formulations for model architecture and training pipeline. Subsequently, we introduce a taxonomy encompassing 126 MM-LLMs, each characterized by its specific formulations. Furthermore, we review the performance of selected MM-LLMs on mainstream benchmarks and summarize key training recipes to enhance the potency of MM-LLMs. Finally, we explore promising directions for MM-LLMs while concurrently maintaining a [real-time tracking website](https://mm-llms.github.io/) for the latest developments in the field. We hope that this survey contributes to the ongoing advancement of the MM-LLMs domain.

pdf bib abs
StyEmp: Stylizing Empathetic Response Generation via Multi-Grained Prefix Encoder and Personality Reinforcement
Yahui Fu | Chenhui Chu | Tatsuya Kawahara
Proceedings of the 25th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Recent approaches for empathetic response generation mainly focus on emotional resonance and user understanding, without considering the system’s personality. Consistent personality is evident in real human expression and is important for creating trustworthy systems. To address this problem, we propose StyEmp, which aims to stylize the empathetic response generation with a consistent personality. Specifically, it incorporates a multi-grained prefix mechanism designed to capture the intricate relationship between a system’s personality and its empathetic expressions. Furthermore, we introduce a personality reinforcement module that leverages contrastive learning to calibrate the generation model, ensuring that responses are both empathetic and reflective of a distinct personality. Automatic and human evaluations on the EMPATHETICDIALOGUES benchmark show that StyEmp outperforms competitive baselines in terms of both empathy and personality expressions. Our code is available at https://github.com/fuyahuii/StyEmp.

pdf bib abs
SubMerge: Merging Equivalent Subword Tokenizations for Subword Regularized Models in Neural Machine Translation
Haiyue Song | Francois Meyer | Raj Dabre | Hideki Tanaka | Chenhui Chu | Sadao Kurohashi
Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1)

Subword regularized models leverage multiple subword tokenizations of one target sentence during training. However, selecting one tokenization during inference leads to the underutilization of knowledge learned about multiple tokenizations.We propose the SubMerge algorithm to rescue the ignored Subword tokenizations through merging equivalent ones during inference.SubMerge is a nested search algorithm where the outer beam search treats the word as the minimal unit, and the inner beam search provides a list of word candidates and their probabilities, merging equivalent subword tokenizations. SubMerge estimates the probability of the next word more precisely, providing better guidance during inference.Experimental results on six low-resource to high-resource machine translation datasets show that SubMerge utilizes a greater proportion of a model’s probability weight during decoding (lower word perplexities for hypotheses). It also improves BLEU and chrF++ scores for many translation directions, most reliably for low-resource scenarios. We investigate the effect of different beam sizes, training set sizes, dropout rates, and whether it is effective on non-regularized models.

pdf bib abs
Abstractive Multi-Video Captioning: Benchmark Dataset Construction and Extensive Evaluation
Rikito Takahashi | Hirokazu Kiyomaru | Chenhui Chu | Sadao Kurohashi
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper introduces a new task, abstractive multi-video captioning, which focuses on abstracting multiple videos with natural language. Unlike conventional video captioning tasks generating a specific caption for a video, our task generates an abstract caption of the shared content in a video group containing multiple videos. To address our task, models must learn to understand each video in detail and have strong abstraction abilities to find commonalities among videos. We construct a benchmark dataset for abstractive multi-video captioning named AbstrActs. AbstrActs contains 13.5k video groups and corresponding abstract captions. As abstractive multi-video captioning models, we explore two approaches: end-to-end and cascade. For evaluation, we proposed a new metric, CocoA, which can evaluate the model performance based on the abstractness of the generated captions. In experiments, we report the impact of the way of combining multiple video features, the overall model architecture, and the number of input videos.

pdf bib abs
Identifying Source Language Expressions for Pre-editing in Machine Translation
Norizo Sakaguchi | Yugo Murawaki | Chenhui Chu | Sadao Kurohashi
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Machine translation-mediated communication can benefit from pre-editing source language texts to ensure accurate transmission of intended meaning in the target language. The primary challenge lies in identifying source language expressions that pose difficulties in translation. In this paper, we hypothesize that such expressions tend to be distinctive features of texts originally written in the source language (native language) rather than translations generated from the target language into the source language (machine translation). To identify such expressions, we train a neural classifier to distinguish native language from machine translation, and subsequently isolate the expressions that contribute to the model’s prediction of native language. Our manual evaluation revealed that our method successfully identified characteristic expressions of the native language, despite the noise and the inherent nuances of the task. We also present case studies where we edit the identified expressions to improve translation quality.

pdf bib abs
Rapidly Developing High-quality Instruction Data and Evaluation Benchmark for Large Language Models with Minimal Human Effort: A Case Study on Japanese
Yikun Sun | Zhen Wan | Nobuhiro Ueda | Sakiko Yahata | Fei Cheng | Chenhui Chu | Sadao Kurohashi
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The creation of instruction data and evaluation benchmarks for serving Large language models often involves enormous human annotation. This issue becomes particularly pronounced when rapidly developing such resources for a non-English language like Japanese. Instead of following the popular practice of directly translating existing English resources into Japanese (e.g., Japanese-Alpaca), we propose an efficient self-instruct method based on GPT-4. We first translate a small amount of English instructions into Japanese and post-edit them to obtain native-level quality. GPT-4 then utilizes them as demonstrations to automatically generate Japanese instruction data. We also construct an evaluation benchmark containing 80 questions across 8 categories, using GPT-4 to automatically assess the response quality of LLMs without human references. The empirical results suggest that the models fine-tuned on our GPT-4 self-instruct data significantly outperformed the Japanese-Alpaca across all three base pre-trained models. Our GPT-4 self-instruct data allowed the LLaMA 13B model to defeat GPT-3.5 (Davinci-003) with a 54.37% win-rate. The human evaluation exhibits the consistency between GPT-4’s assessments and human preference. Our high-quality instruction data and evaluation benchmark are released here.

pdf bib abs
Towards Human-Like Machine Comprehension: Few-Shot Relational Learning in Visually-Rich Documents
Hao Wang | Tang Li | Chenhui Chu | Rui Wang | Pinpin Zhu
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Key-value relations are prevalent in Visually-Rich Documents (VRDs), often depicted in distinct spatial regions accompanied by specific color and font styles. These non-textual cues serve as important indicators that greatly enhance human comprehension and acquisition of such relation triplets. However, current document AI approaches often fail to consider this valuable prior information related to visual and spatial features, resulting in suboptimal performance, particularly when dealing with limited examples. To address this limitation, our research focuses on few-shot relational learning, specifically targeting the extraction of key-value relation triplets in VRDs. Given the absence of a suitable dataset for this task, we introduce two new few-shot benchmarks built upon existing supervised benchmark datasets. Furthermore, we propose a variational approach that incorporates relational 2D-spatial priors and prototypical rectification techniques. This approach aims to generate relation representations that are more aware of the spatial context and unseen relation in a manner similar to human perception. Experimental results demonstrate the effectiveness of our proposed method by showcasing its ability to outperform existing methods. This study also opens up new possibilities for practical applications.

2023

pdf bib abs
Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation
Zhuoyuan Mao | Raj Dabre | Qianying Liu | Haiyue Song | Chenhui Chu | Sadao Kurohashi
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

This paper studies the impact of layer normalization (LayerNorm) on zero-shot translation (ZST). Recent efforts for ZST often utilize the Transformer architecture as the backbone, with LayerNorm at the input of layers (PreNorm) set as the default. However, Xu et al. (2019) has revealed that PreNorm carries the risk of overfitting the training data. Based on this, we hypothesize that PreNorm may overfit supervised directions and thus have low generalizability for ZST. Through experiments on OPUS, IWSLT, and Europarl datasets for 54 ZST directions, we demonstrate that the original Transformer setting of LayerNorm after residual connections (PostNorm) consistently outperforms PreNorm by up to 12.3 BLEU points. We then study the performance disparities by analyzing the differences in off-target rates and structural variations between PreNorm and PostNorm. This study highlights the need for careful consideration of the LayerNorm setting for ZST.

pdf bib abs
The Kyoto Speech-to-Speech Translation System for IWSLT 2023
Zhengdong Yang | Shuichiro Shimizu | Wangjin Zhou | Sheng Li | Chenhui Chu
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)

This paper describes the Kyoto speech-to-speech translation system for IWSLT 2023. Our system is a combination of speech-to-text translation and text-to-speech synthesis. For the speech-to-text translation model, we used the dual-decoderTransformer model. For text-to-speech synthesis model, we took a cascade approach of an acoustic model and a vocoder.

pdf bib abs
Towards Speech Dialogue Translation Mediating Speakers of Different Languages
Shuichiro Shimizu | Chenhui Chu | Sheng Li | Sadao Kurohashi
Findings of the Association for Computational Linguistics: ACL 2023

We present a new task, speech dialogue translation mediating speakers of different languages. We construct the SpeechBSD dataset for the task and conduct baseline experiments. Furthermore, we consider context to be an important aspect that needs to be addressed in this task and propose two ways of utilizing context, namely monolingual context and bilingual context. We conduct cascaded speech translation experiments using Whisper and mBART, and show that bilingual context performs better in our settings.

pdf bib abs
ARKitSceneRefer: Text-based Localization of Small Objects in Diverse Real-World 3D Indoor Scenes
Shunya Kato | Shuhei Kurita | Chenhui Chu | Sadao Kurohashi
Findings of the Association for Computational Linguistics: EMNLP 2023

3D referring expression comprehension is a task to ground text representations onto objects in 3D scenes. It is a crucial task for indoor household robots or augmented reality devices to localize objects referred to in user instructions. However, existing indoor 3D referring expression comprehension datasets typically cover larger object classes that are easy to localize, such as chairs, tables, or doors, and often overlook small objects, such as cooking tools or office supplies. Based on the recently proposed diverse and high-resolution 3D scene dataset of ARKitScenes, we construct the ARKitSceneRefer dataset focusing on small daily-use objects that frequently appear in real-world indoor scenes. ARKitSceneRefer contains 15k objects of 1,605 indoor scenes, which are significantly larger than those of the existing 3D referring datasets, and covers diverse object classes of 583 from the LVIS dataset. In empirical experiments with both 2D and 3D state-of-the-art referring expression comprehension models, we observed the task difficulty of the localization in the diverse small object classes.

pdf bib abs
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading
Hao Wang | Qingxuan Wang | Yue Li | Changqing Wang | Chenhui Chu | Rui Wang
Findings of the Association for Computational Linguistics: EMNLP 2023

The use of visually-rich documents in various fields has created a demand for Document AI models that can read and comprehend documents like humans, which requires the overcoming of technical, linguistic, and cognitive barriers. Unfortunately, the lack of appropriate datasets has significantly hindered advancements in the field. To address this issue, we introduce DocTrack, a visually-rich document dataset really aligned with human eye-movement information using eye-tracking technology. This dataset can be used to investigate the challenges mentioned above. Additionally, we explore the impact of human reading order on document understanding tasks and examine what would happen if a machine reads in the same order as a human. Our results suggest that although Document AI models have made significant progresses, they still have a long way to go before they can read visually richer documents as accurately, continuously, and flexibly as humans do. These findings have potential implications for future research and development of document intelligence.

pdf bib abs
Reasoning before Responding: Integrating Commonsense-based Causality Explanation for Empathetic Response Generation
Yahui Fu | Koji Inoue | Chenhui Chu | Tatsuya Kawahara
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Recent approaches to empathetic response generation try to incorporate commonsense knowledge or reasoning about the causes of emotions to better understand the user’s experiences and feelings. However, these approaches mainly focus on understanding the causalities of context from the user’s perspective, ignoring the system’s perspective. In this paper, we propose a commonsense-based causality explanation approach for diverse empathetic response generation that considers both the user’s perspective (user’s desires and reactions) and the system’s perspective (system’s intentions and reactions). We enhance ChatGPT’s ability to reason for the system’s perspective by integrating in-context learning with commonsense knowledge. Then, we integrate the commonsense-based causality explanation with both ChatGPT and a T5-based model. Experimental evaluations demonstrate that our method outperforms other comparable methods on both automatic and human evaluations.

pdf bib abs
Video-Helpful Multimodal Machine Translation
Yihang Li | Shuichiro Shimizu | Chenhui Chu | Sadao Kurohashi | Wei Li
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Existing multimodal machine translation (MMT) datasets consist of images and video captions or instructional video subtitles, which rarely contain linguistic ambiguity, making visual information ineffective in generating appropriate translations. Recent work has constructed an ambiguous subtitles dataset to alleviate this problem but is still limited to the problem that videos do not necessarily contribute to disambiguation. We introduce EVA (Extensive training set and Video-helpful evaluation set for Ambiguous subtitles translation), an MMT dataset containing 852k Japanese-English parallel subtitle pairs, 520k Chinese-English parallel subtitle pairs, and corresponding video clips collected from movies and TV episodes. In addition to the extensive training set, EVA contains a video-helpful evaluation set in which subtitles are ambiguous, and videos are guaranteed helpful for disambiguation. Furthermore, we propose SAFA, an MMT model based on the Selective Attention model with two novel methods: Frame attention loss and Ambiguity augmentation, aiming to use videos in EVA for disambiguation fully. Experiments on EVA show that visual information and the proposed methods can boost translation performance, and our model performs significantly better than existing MMT models.

pdf bib abs
Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning
Hao Wang | Xiahua Chen | Rui Wang | Chenhui Chu
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Extracting meaningful entities belonging to predefined categories from Visually-rich Form-like Documents (VFDs) is a challenging task. Visual and layout features such as font, background, color, and bounding box location and size provide important cues for identifying entities of the same type. However, existing models commonly train a visual encoder with weak cross-modal supervision signals, resulting in a limited capacity to capture these non-textual features and suboptimal performance. In this paper, we propose a novel Visually-Asymmetric coNsistenCy Learning (VANCL) approach that addresses the above limitation by enhancing the model’s ability to capture fine-grained visual and layout features through the incorporation of color priors. Experimental results on benchmark datasets show that our approach substantially outperforms the strong LayoutLM series baseline, demonstrating the effectiveness of our approach. Additionally, we investigate the effects of different color schemes on our approach, providing insights for optimizing model performance. We believe our work will inspire future research on multimodal information extraction.

This paper presents the results of the shared tasks from the 10th workshop on Asian translation (WAT2023). For the WAT2023, 2 teams submitted their translation results for the human evaluation. We also accepted 1 research paper. About 40 translation results were submitted to the automatic evaluation server, and selected submissions were manually evaluated.

pdf bib
Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation
Zhuoyuan Mao | Haiyue Song | Raj Dabre | Chenhui Chu | Sadao Kurohashi
Proceedings of the 1st International Workshop on Multilingual, Multimodal and Multitask Language Generation

2022

pdf bib abs
Flexible Visual Grounding
Yongmin Kim | Chenhui Chu | Sadao Kurohashi
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Existing visual grounding datasets are artificially made, where every query regarding an entity must be able to be grounded to a corresponding image region, i.e., answerable. However, in real-world multimedia data such as news articles and social media, many entities in the text cannot be grounded to the image, i.e., unanswerable, due to the fact that the text is unnecessarily directly describing the accompanying image. A robust visual grounding model should be able to flexibly deal with both answerable and unanswerable visual grounding. To study this flexible visual grounding problem, we construct a pseudo dataset and a social media dataset including both answerable and unanswerable queries. In order to handle unanswerable visual grounding, we propose a novel method by adding a pseudo image region corresponding to a query that cannot be grounded. The model is then trained to ground to ground-truth regions for answerable queries and pseudo regions for unanswerable queries. In our experiments, we show that our model can flexibly process both answerable and unanswerable queries with high accuracy on our datasets.

This paper presents the results of the shared tasks from the 9th workshop on Asian translation (WAT2022). For the WAT2022, 8 teams submitted their translation results for the human evaluation. We also accepted 4 research papers. About 300 translation results were submitted to the automatic evaluation server, and selected submissions were manually evaluated.

Word alignment has proven to benefit many-to-many neural machine translation (NMT). However, high-quality ground-truth bilingual dictionaries were used for pre-editing in previous methods, which are unavailable for most language pairs. Meanwhile, the contrastive objective can implicitly utilize automatically learned word alignment, which has not been explored in many-to-many NMT. This work proposes a word-level contrastive objective to leverage word alignments for many-to-many NMT. Empirical results show that this leads to 0.8 BLEU gains for several language pairs. Analyses reveal that in many-to-many NMT, the encoder’s sentence retrieval performance highly correlates with the translation quality, which explains when the proposed method impacts translation. This motivates future exploration for many-to-many NMT to improve the encoder’s sentence retrieval performance.

pdf bib abs
VISA: An Ambiguous Subtitles Dataset for Visual Scene-aware Machine Translation
Yihang Li | Shuichiro Shimizu | Weiqi Gu | Chenhui Chu | Sadao Kurohashi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Existing multimodal machine translation (MMT) datasets consist of images and video captions or general subtitles which rarely contain linguistic ambiguity, making visual information not so effective to generate appropriate translations. We introduce VISA, a new dataset that consists of 40k Japanese-English parallel sentence pairs and corresponding video clips with the following key features: (1) the parallel sentences are subtitles from movies and TV episodes; (2) the source subtitles are ambiguous, which means they have multiple possible translations with different meanings; (3) we divide the dataset into Polysemy and Omission according to the cause of ambiguity. We show that VISA is challenging for the latest MMT system, and we hope that the dataset can facilitate MMT research.

pdf bib abs
BERTSeg: BERT Based Unsupervised Subword Segmentation for Neural Machine Translation
Haiyue Song | Raj Dabre | Zhuoyuan Mao | Chenhui Chu | Sadao Kurohashi
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Existing subword segmenters are either 1) frequency-based without semantics information or 2) neural-based but trained on parallel corpora. To address this, we present BERTSeg, an unsupervised neural subword segmenter for neural machine translation, which utilizes the contextualized semantic embeddings of words from characterBERT and maximizes the generation probability of subword segmentations. Furthermore, we propose a generation probability-based regularization method that enables BERTSeg to produce multiple segmentations for one word to improve the robustness of neural machine translation. Experimental results show that BERTSeg with regularization achieves up to 8 BLEU points improvement in 9 translation directions on ALT, IWSLT15 Vi->En, WMT16 Ro->En, and WMT15 Fi->En datasets compared with BPE. In addition, BERTSeg is efficient, needing up to 5 minutes for training.

pdf bib abs
Filtering of Noisy Web-Crawled Parallel Corpus: the Japanese-Bulgarian Language Pair
Iglika Nikolova-Stoupak | Shuichiro Shimizu | Chenhui Chu | Sadao Kurohashi
Proceedings of the Fifth International Conference on Computational Linguistics in Bulgaria (CLIB 2022)

One of the main challenges within the rapidly developing field of neural machine translation is its application to low-resource languages. Recent attempts to provide large parallel corpora in rare language pairs include the generation of web-crawled corpora, which may be vast but are, unfortunately, excessively noisy. The corpus utilised to train machine translation models in the study is CCMatrix, provided by OPUS. Firstly, the corpus is cleaned based on a number of heuristic rules. Then, parts of it are selected in three discrete ways: at random, based on the “margin distance” metric that is native to the CCMatrix dataset, and based on scores derived through the application of a state-of-the-art classifier model (Acarcicek et al., 2020) utilised in a thematic WMT shared task. The performance of the issuing models is evaluated and compared. The classifier-based model does not reach high performance as compared with its margin-based counterpart, opening a discussion of ways for further improvement. Still, BLEU scores surpass those of Acarcicek et al.’s (2020) paper by over 15 points.

2021

pdf bib abs
Lightweight Cross-Lingual Sentence Representation Learning
Zhuoyuan Mao | Prakhar Gupta | Chenhui Chu | Martin Jaggi | Sadao Kurohashi
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Large-scale models for learning fixed-dimensional cross-lingual sentence representations like LASER (Artetxe and Schwenk, 2019b) lead to significant improvement in performance on downstream tasks. However, further increases and modifications based on such large-scale models are usually impractical due to memory limitations. In this work, we introduce a lightweight dual-transformer architecture with just 2 layers for generating memory-efficient cross-lingual sentence representations. We explore different training tasks and observe that current cross-lingual training tasks leave a lot to be desired for this shallow architecture. To ameliorate this, we propose a novel cross-lingual language model, which combines the existing single-word masked language model with the newly proposed cross-lingual token-level reconstruction task. We further augment the training task by the introduction of two computationally-lite sentence-level contrastive learning tasks to enhance the alignment of cross-lingual sentence representation space, which compensates for the learning bottleneck of the lightweight transformer for generative tasks. Our comparisons with competing models on cross-lingual sentence retrieval and multilingual document classification confirm the effectiveness of the newly proposed training tasks for a shallow model.

pdf bib abs
Attending Self-Attention: A Case Study of Visually Grounded Supervision in Vision-and-Language Transformers
Jules Samaran | Noa Garcia | Mayu Otani | Chenhui Chu | Yuta Nakashima
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop

The impressive performances of pre-trained visually grounded language models have motivated a growing body of research investigating what has been learned during the pre-training. As a lot of these models are based on Transformers, several studies on the attention mechanisms used by the models to learn to associate phrases with their visual grounding in the image have been conducted. In this work, we investigate how supervising attention directly to learn visual grounding can affect the behavior of such models. We compare three different methods on attention supervision and their impact on the performances of a state-of-the-art visually grounded language model on two popular vision-and-language tasks.

pdf bib abs
Video-guided Machine Translation with Spatial Hierarchical Attention Network
Weiqi Gu | Haiyue Song | Chenhui Chu | Sadao Kurohashi
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop

Video-guided machine translation, as one type of multimodal machine translations, aims to engage video contents as auxiliary information to address the word sense ambiguity problem in machine translation. Previous studies only use features from pretrained action detection models as motion representations of the video to solve the verb sense ambiguity, leaving the noun sense ambiguity a problem. To address this problem, we propose a video-guided machine translation system by using both spatial and motion representations in videos. For spatial features, we propose a hierarchical attention network to model the spatial information from object-level to video-level. Experiments on the VATEX dataset show that our system achieves 35.86 BLEU-4 score, which is 0.51 score higher than the single model of the SOTA method.

pdf bib abs
WRIME: A New Dataset for Emotional Intensity Estimation with Subjective and Objective Annotations
Tomoyuki Kajiwara | Chenhui Chu | Noriko Takemura | Yuta Nakashima | Hajime Nagahara
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

We annotate 17,000 SNS posts with both the writer’s subjective emotional intensity and the reader’s objective one to construct a Japanese emotion analysis dataset. In this study, we explore the difference between the emotional intensity of the writer and that of the readers with this dataset. We found that the reader cannot fully detect the emotions of the writer, especially anger and trust. In addition, experimental results in estimating the emotional intensity show that it is more difficult to estimate the writer’s subjective labels than the readers’. The large gap between the subjective and objective emotions imply the complexity of the mapping from a post to the subjective emotion intensities, which also leads to a lower performance with machine learning models.

This paper presents the results of the shared tasks from the 8th workshop on Asian translation (WAT2021). For the WAT2021, 28 teams participated in the shared tasks and 24 teams submitted their translation results for the human evaluation. We also accepted 5 research papers. About 2,100 translation results were submitted to the automatic evaluation server, and selected submissions were manually evaluated.

pdf bib abs
TMEKU System for the WAT2021 Multimodal Translation Task
Yuting Zhao | Mamoru Komachi | Tomoyuki Kajiwara | Chenhui Chu
Proceedings of the 8th Workshop on Asian Translation (WAT2021)

We introduce our TMEKU system submitted to the English-Japanese Multimodal Translation Task for WAT 2021. We participated in the Flickr30kEnt-JP task and Ambiguous MSCOCO Multimodal task under the constrained condition using only the officially provided datasets. Our proposed system employs soft alignment of word-region for multimodal neural machine translation (MNMT). The experimental results evaluated on the BLEU metric provided by the WAT 2021 evaluation site show that the TMEKU system has achieved the best performance among all the participated systems. Further analysis of the case study demonstrates that leveraging word-region alignment between the textual and visual modalities is the key to performance enhancement in our TMEKU system, which leads to better visual information use.

2020

pdf bib abs
Multilingual Neural Machine Translation
Raj Dabre | Chenhui Chu | Anoop Kunchukuttan
Proceedings of the 28th International Conference on Computational Linguistics: Tutorial Abstracts

The advent of neural machine translation (NMT) has opened up exciting research in building multilingual translation systems i.e. translation models that can handle more than one language pair. Many advances have been made which have enabled (1) improving translation for low-resource languages via transfer learning from high resource languages; and (2) building compact translation models spanning multiple languages. In this tutorial, we will cover the latest advances in NMT approaches that leverage multilingualism, especially to enhance low-resource translation. In particular, we will focus on the following topics: modeling parameter sharing for multi-way models, massively multilingual models, training protocols, language divergence, transfer learning, zero-shot/zero-resource learning, pivoting, multilingual pre-training and multi-source translation.

pdf bib abs
IDSOU at WNUT-2020 Task 2: Identification of Informative COVID-19 English Tweets
Sora Ohashi | Tomoyuki Kajiwara | Chenhui Chu | Noriko Takemura | Yuta Nakashima | Hajime Nagahara
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

We introduce the IDSOU submission for the WNUT-2020 task 2: identification of informative COVID-19 English Tweets. Our system is an ensemble of pre-trained language models such as BERT. We ranked 16th in the F1 score.

In this paper, we propose a full pipeline of analysis of a large corpus about a century of public meeting in historical Australian news papers, from construction to visual exploration. The corpus construction method is based on image processing and OCR. We digitize and transcribe texts of the specific topic of public meeting. Experiments show that our proposed method achieves a F-score of 87.8% for corpus construction. As a result, we built a content search tool for temporal and semantic content analysis.

pdf bib abs
Annotation of Adverse Drug Reactions in Patients’ Weblogs
Yuki Arase | Tomoyuki Kajiwara | Chenhui Chu
Proceedings of the Twelfth Language Resources and Evaluation Conference

Adverse drug reactions are a severe problem that significantly degrade quality of life, or even threaten the life of patients. Patient-generated texts available on the web have been gaining attention as a promising source of information in this regard. While previous studies annotated such patient-generated content, they only reported on limited information, such as whether a text described an adverse drug reaction or not. Further, they only annotated short texts of a few sentences crawled from online forums and social networking services. The dataset we present in this paper is unique for the richness of annotated information, including detailed descriptions of drug reactions with full context. We crawled patient’s weblog articles shared on an online patient-networking platform and annotated the effects of drugs therein reported. We identified spans describing drug reactions and assigned labels for related drug names, standard codes for the symptoms of the reactions, and types of effects. As a first dataset, we annotated 677 drug reactions with these detailed labels based on 169 weblog articles by Japanese lung cancer patients. Our annotation dataset is made publicly available at our web site (https://yukiar.github.io/adr-jp/) for further research on the detection of adverse drug reactions and more broadly, on patient-generated text processing.

pdf bib abs
Text Classification with Negative Supervision
Sora Ohashi | Junya Takayama | Tomoyuki Kajiwara | Chenhui Chu | Yuki Arase
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Advanced pre-trained models for text representation have achieved state-of-the-art performance on various text classification tasks. However, the discrepancy between the semantic similarity of texts and labelling standards affects classifiers, i.e. leading to lower performance in cases where classifiers should assign different labels to semantically similar texts. To address this problem, we propose a simple multitask learning model that uses negative supervision. Specifically, our model encourages texts with different labels to have distinct representations. Comprehensive experiments show that our model outperforms the state-of-the-art pre-trained model on both single- and multi-label classifications, sentence and document classifications, and classifications in three different languages.

pdf bib abs
Double Attention-based Multimodal Neural Machine Translation with Semantic Image Regions
Yuting Zhao | Mamoru Komachi | Tomoyuki Kajiwara | Chenhui Chu
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation

Existing studies on multimodal neural machine translation (MNMT) have mainly focused on the effect of combining visual and textual modalities to improve translations. However, it has been suggested that the visual modality is only marginally beneficial. Conventional visual attention mechanisms have been used to select the visual features from equally-sized grids generated by convolutional neural networks (CNNs), and may have had modest effects on aligning the visual concepts associated with textual objects, because the grid visual features do not capture semantic information. In contrast, we propose the application of semantic image regions for MNMT by integrating visual and textual features using two individual attention mechanisms (double attention). We conducted experiments on the Multi30k dataset and achieved an improvement of 0.5 and 0.9 BLEU points for English-German and English-French translation tasks, compared with the MNMT with grid visual features. We also demonstrated concrete improvements on translation performance benefited from semantic image regions.

pdf bib abs
Meta Ensemble for Japanese-Chinese Neural Machine Translation: Kyoto-U+ECNU Participation to WAT 2020
Zhuoyuan Mao | Yibin Shen | Chenhui Chu | Sadao Kurohashi | Cheqing Jin
Proceedings of the 7th Workshop on Asian Translation

This paper describes the Japanese-Chinese Neural Machine Translation (NMT) system submitted by the joint team of Kyoto University and East China Normal University (Kyoto-U+ECNU) to WAT 2020 (Nakazawa et al.,2020). We participate in APSEC Japanese-Chinese translation task. We revisit several techniques for NMT including various architectures, different data selection and augmentation methods, denoising pre-training, and also some specific tricks for Japanese-Chinese translation. We eventually perform a meta ensemble to combine all of the models into a single model. BLEU results of this meta ensembled model rank the first both on 2 directions of ASPEC Japanese-Chinese translation.

2019

pdf bib abs
Exploiting Multilingualism through Multistage Fine-Tuning for Low-Resource Neural Machine Translation
Raj Dabre | Atsushi Fujita | Chenhui Chu
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

This paper highlights the impressive utility of multi-parallel corpora for transfer learning in a one-to-many low-resource neural machine translation (NMT) setting. We report on a systematic comparison of multistage fine-tuning configurations, consisting of (1) pre-training on an external large (209k–440k) parallel corpus for English and a helping target language, (2) mixed pre-training or fine-tuning on a mixture of the external and low-resource (18k) target parallel corpora, and (3) pure fine-tuning on the target parallel corpora. Our experiments confirm that multi-parallel corpora are extremely useful despite their scarcity and content-wise redundancy thus exhibiting the true power of multilingualism. Even when the helping target language is not one of the target languages of our concern, our multistage fine-tuning can give 3–9 BLEU score gains over a simple one-to-one model.

2018

pdf bib
Osaka University MT Systems for WAT 2018: Rewarding, Preordering, and Domain Adaptation
Yuki Kawara | Yuto Takebayashi | Chenhui Chu | Yuki Arase
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation

pdf bib abs
Recursive Neural Network Based Preordering for English-to-Japanese Machine Translation
Yuki Kawara | Chenhui Chu | Yuki Arase
Proceedings of ACL 2018, Student Research Workshop

The word order between source and target languages significantly influences the translation quality. Preordering can effectively address this problem. Previous preordering methods require a manual feature design, making language dependent design difficult. In this paper, we propose a preordering method with recursive neural networks that learn features from raw inputs. Experiments show the proposed method is comparable to the state-of-the-art method but without a manual feature design.

pdf bib abs
A Survey of Domain Adaptation for Neural Machine Translation
Chenhui Chu | Rui Wang
Proceedings of the 27th International Conference on Computational Linguistics

Neural machine translation (NMT) is a deep learning based approach for machine translation, which yields the state-of-the-art translation performance in scenarios where large-scale parallel corpora are available. Although the high-quality and domain-specific translation is crucial in the real world, domain-specific corpora are usually scarce or nonexistent, and thus vanilla NMT performs poorly in such scenarios. Domain adaptation that leverages both out-of-domain parallel corpora as well as monolingual corpora for in-domain translation, is very important for domain-specific translation. In this paper, we give a comprehensive survey of the state-of-the-art domain adaptation techniques for NMT.

pdf bib abs
iParaphrasing: Extracting Visually Grounded Paraphrases via an Image
Chenhui Chu | Mayu Otani | Yuta Nakashima
Proceedings of the 27th International Conference on Computational Linguistics

A paraphrase is a restatement of the meaning of a text in other words. Paraphrases have been studied to enhance the performance of many natural language processing tasks. In this paper, we propose a novel task iParaphrasing to extract visually grounded paraphrases (VGPs), which are different phrasal expressions describing the same visual concept in an image. These extracted VGPs have the potential to improve language and image multimodal tasks such as visual question answering and image captioning. How to model the similarity between VGPs is the key of iParaphrasing. We apply various existing methods as well as propose a novel neural network-based method with image attention, and report the results of the first attempt toward iParaphrasing.

2017

pdf bib abs
An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation
Chenhui Chu | Raj Dabre | Sadao Kurohashi
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In this paper, we propose a novel domain adaptation method named “mixed fine tuning” for neural machine translation (NMT). We combine two existing approaches namely fine tuning and multi domain NMT. We first train an NMT model on an out-of-domain parallel corpus, and then fine tune it on a parallel corpus which is a mix of the in-domain and out-of-domain corpora. All corpora are augmented with artificial tags to indicate specific domains. We empirically compare our proposed method against fine tuning and multi domain methods and discuss its benefits and shortcomings.

2016

pdf bib
Cross-language Projection of Dependency Trees with Constrained Partial Parsing for Tree-to-Tree Machine Translation
Yu Shen | Chenhui Chu | Fabien Cromieres | Sadao Kurohashi
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers

pdf bib abs
Kyoto University Participation to WAT 2016
Fabien Cromieres | Chenhui Chu | Toshiaki Nakazawa | Sadao Kurohashi
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

We describe here our approaches and results on the WAT 2016 shared translation tasks. We tried to use both an example-based machine translation (MT) system and a neural MT system. We report very good translation results, especially when using neural MT for Chinese-to-Japanese translation.

pdf bib abs
SCTB: A Chinese Treebank in Scientific Domain
Chenhui Chu | Toshiaki Nakazawa | Daisuke Kawahara | Sadao Kurohashi
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)

Treebanks are curial for natural language processing (NLP). In this paper, we present our work for annotating a Chinese treebank in scientific domain (SCTB), to address the problem of the lack of Chinese treebanks in this domain. Chinese analysis and machine translation experiments conducted using this treebank indicate that the annotated treebank can significantly improve the performance on both tasks. This treebank is released to promote Chinese NLP research in scientific domain.

pdf bib abs
Paraphrasing Out-of-Vocabulary Words with Word Embeddings and Semantic Lexicons for Low Resource Statistical Machine Translation
Chenhui Chu | Sadao Kurohashi
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Out-of-vocabulary (OOV) word is a crucial problem in statistical machine translation (SMT) with low resources. OOV paraphrasing that augments the translation model for the OOV words by using the translation knowledge of their paraphrases has been proposed to address the OOV problem. In this paper, we propose using word embeddings and semantic lexicons for OOV paraphrasing. Experiments conducted on a low resource setting of the OLYMPICS task of IWSLT 2012 verify the effectiveness of our proposed method.

pdf bib abs
Simultaneous Sentence Boundary Detection and Alignment with Pivot-based Machine Translation Generated Lexicons
Antoine Bourlon | Chenhui Chu | Toshiaki Nakazawa | Sadao Kurohashi
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Sentence alignment is a task that consists in aligning the parallel sentences in a translated article pair. This paper describes a method to perform sentence boundary detection and alignment simultaneously, which significantly improves the alignment accuracy on languages like Chinese with uncertain sentence boundaries. It relies on the definition of hard (certain) and soft (uncertain) punctuation delimiters, the latter being possibly ignored to optimize the alignment result. The alignment method is used in combination with lexicons automatically generated from the input article pairs using pivot-based MT, achieving better coverage of the input words with fewer entries than pre-existing dictionaries. Pivot-based MT makes it possible to build dictionaries for language pairs that have scarce parallel data. The alignment method is implemented in a tool that will be freely available in the near future.

pdf bib abs
Parallel Sentence Extraction from Comparable Corpora with Neural Network Features
Chenhui Chu | Raj Dabre | Sadao Kurohashi
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Parallel corpora are crucial for machine translation (MT), however they are quite scarce for most language pairs and domains. As comparable corpora are far more available, many studies have been conducted to extract parallel sentences from them for MT. In this paper, we exploit the neural network features acquired from neural MT for parallel sentence extraction. We observe significant improvements for both accuracy in sentence extraction and MT performance.

pdf bib abs
Consistent Word Segmentation, Part-of-Speech Tagging and Dependency Labelling Annotation for Chinese Language
Mo Shen | Wingmui Li | HyunJeong Choe | Chenhui Chu | Daisuke Kawahara | Sadao Kurohashi
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In this paper, we propose a new annotation approach to Chinese word segmentation, part-of-speech (POS) tagging and dependency labelling that aims to overcome the two major issues in traditional morphology-based annotation: Inconsistency and data sparsity. We re-annotate the Penn Chinese Treebank 5.0 (CTB5) and demonstrate the advantages of this approach compared to the original CTB5 annotation through word segmentation, POS tagging and machine translation experiments.

pdf bib
Dependency Forest based Word Alignment
Hitoshi Otsuki | Chenhui Chu | Toshiaki Nakazawa | Sadao Kurohashi
Proceedings of the ACL 2016 Student Research Workshop

2015

pdf bib
Large-scale Dictionary Construction via Pivot-based Statistical Machine Translation with Significance Pruning and Neural Network Features
Raj Dabre | Chenhui Chu | Fabien Cromieres | Toshiaki Nakazawa | Sadao Kurohashi
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation

pdf bib
Cross-language Projection of Dependency Trees for Tree-to-tree Machine Translation
Yu Shen | Chenhui Chu | Fabien Cromieres | Sadao Kurohashi
Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation: Posters

2014

pdf bib abs
Constructing a Chinese—Japanese Parallel Corpus from Wikipedia
Chenhui Chu | Toshiaki Nakazawa | Sadao Kurohashi
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Parallel corpora are crucial for statistical machine translation (SMT). However, they are quite scarce for most language pairs, such as Chinese―Japanese. As comparable corpora are far more available, many studies have been conducted to automatically construct parallel corpora from comparable corpora. This paper presents a robust parallel sentence extraction system for constructing a Chinese―Japanese parallel corpus from Wikipedia. The system is inspired by previous studies that mainly consist of a parallel sentence candidate filter and a binary classifier for parallel sentence identification. We improve the system by using the common Chinese characters for filtering and two novel feature sets for classification. Experiments show that our system performs significantly better than the previous studies for both accuracy in parallel sentence extraction and SMT performance. Using the system, we construct a Chinese―Japanese parallel corpus with more than 126k highly accurate parallel sentences from Wikipedia. The constructed parallel corpus is freely available at http://orchid.kuee.kyoto-u.ac.jp/chu/resource/wiki_zh_ja.tgz.

pdf bib
Improving Statistical Machine Translation Accuracy Using Bilingual Lexicon Extractionwith Paraphrases
Chenhui Chu | Toshiaki Nakazawa | Sadao Kurohashi
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing

2013

pdf bib
Chinese–Japanese Parallel Sentence Extraction from Quasi–Comparable Corpora
Chenhui Chu | Toshiaki Nakazawa | Sadao Kurohashi
Proceedings of the Sixth Workshop on Building and Using Comparable Corpora

pdf bib
Accurate Parallel Fragment Extraction from Quasi–Comparable Corpora using Alignment Model and Translation Lexicon
Chenhui Chu | Toshiaki Nakazawa | Sadao Kurohashi
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib abs
EBMT system of Kyoto University in OLYMPICS task at IWSLT 2012
Chenhui Chu | Toshiaki Nakazawa | Sadao Kurohashi
Proceedings of the 9th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper describes the EBMT system of Kyoto University that participated in the OLYMPICS task at IWSLT 2012. When translating very different language pairs such as Chinese-English, it is very important to handle sentences in tree structures to overcome the difference. Many recent studies incorporate tree structures in some parts of translation process, but not all the way from model training (alignment) to decoding. Our system is a fully tree-based translation system where we use the Bayesian phrase alignment model on dependency trees and example-based translation. To improve the translation quality, we conduct some special processing for the IWSLT 2012 OLYMPICS task, including sub-sentence splitting, non-parallel sentence filtering, adoption of an optimized Chinese segmenter and rule-based decoding constraints.

pdf bib abs
Chinese Characters Mapping Table of Japanese, Traditional Chinese and Simplified Chinese
Chenhui Chu | Toshiaki Nakazawa | Sadao Kurohashi
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Chinese characters are used both in Japanese and Chinese, which are called Kanji and Hanzi respectively. Chinese characters contain significant semantic information, a mapping table between Kanji and Hanzi can be very useful for many Japanese-Chinese bilingual applications, such as machine translation and cross-lingual information retrieval. Because Kanji characters are originated from ancient China, most Kanji have corresponding Chinese characters in Hanzi. However, the relation between Kanji and Hanzi is quite complicated. In this paper, we propose a method of making a Chinese characters mapping table of Japanese, Traditional Chinese and Simplified Chinese automatically by means of freely available resources. We define seven categories for Kanji based on the relation between Kanji and Hanzi, and classify mappings of Chinese characters into these categories. We use a resource from Wiktionary to show the completeness of the mapping table we made. Statistical comparison shows that our proposed method makes a more complete mapping table than the current version of Wiktionary.

pdf bib
Exploiting Shared Chinese Characters in Chinese Word Segmentation Optimization for Chinese-Japanese Machine Translation
Chenhui Chu | Toshiaki Nakazawa | Daisuke Kawahara | Sadao Kurohashi
Proceedings of the 16th Annual Conference of the European Association for Machine Translation