Gulinigeer Abudouwaili
2023
Strategies to Improve Low-Resource Agglutinative Languages Morphological Inflection
Gulinigeer Abudouwaili
|
Wayit Ablez
|
Kahaerjiang Abiderexiti
|
Aishan Wumaier
|
Nian Yi
Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL)
Morphological inflection is a crucial task in the field of morphology and is typically considered a sequence transduction task. In recent years, it has received substantial attention from researchers and made significant progress. Models have achieved impressive performance levels for both high- and low-resource languages. However, when the distribution of instances in the training dataset changes, or novel lemma or feature labels are predicted, the model’s accuracy declines. In agglutinative languages, morphological inflection involves phonological phenomena while generating new words, which can alter the syllable patterns at the boundary between the lemma and the suffixes. This paper proposes four strategies for low-resource agglutinative languages to enhance the model’s generalization ability. Firstly, a convolution module extracts syllable-like units from lemmas, allowing the model to learn syllable features. Secondly, the lemma and feature labels are represented separately in the input, and the position encoding of the feature labels is marked so that the model learns the order between suffixes and labels. Thirdly, the model recognizes the common substrings in lemmas through two special characters and copies them into words. Finally, combined with syllable features, we improve the data augmentation method. A series of experiments show that the proposed model in this paper is superior to other baseline models.
Joint Learning Model for Low-Resource Agglutinative Language Morphological Tagging
Gulinigeer Abudouwaili
|
Kahaerjiang Abiderexiti
|
Nian Yi
|
Aishan Wumaier
Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology
Due to the lack of data resources, rule-based or transfer learning is mainly used in the morphological tagging of low-resource languages. However, these methods require expert knowledge, ignore contextual features, and have error propagation. Therefore, we propose a joint morphological tagger for low-resource agglutinative languages to alleviate the above challenges. First, we represent the contextual input with multi-dimensional features of agglutinative words. Second, joint training reduces the direct impact of part-of-speech errors on morphological features and increases the indirect influence between the two types of labels through a fusion mechanism. Finally, our model separately predicts part-of-speech and morphological features. Part-of-speech tagging is regarded as sequence tagging. When predicting morphological features, two-label adjacency graphs are dynamically reconstructed by integrating multilingual global features and monolingual local features. Then, a graph convolution network is used to learn the higher-order intersection of labels. A series of experiments show that the proposed model in this paper is superior to other comparative models.
Search