Arabic text classification based on word and document embeddings

A El Mahdaouy, E Gaussier, SO El Alaoui - Proceedings of the …, 2017 - Springer
Proceedings of the International Conference on Advanced Intelligent Systems …, 2017Springer
Abstract Recently, Word Embeddings have been introduced as a major breakthrough in
Natural Language Processing (NLP) to learn viable representation of linguistic items based
on contextual information or/and word co-occurrence. In this paper, we investigate Arabic
document classification using Word and document Embeddings as representational basis
rather than relying on text preprocessing and bag-of-words representation. We demonstrate
that document Embeddings outperform text preprocessing techniques either by learning …
Abstract
Recently, Word Embeddings have been introduced as a major breakthrough in Natural Language Processing (NLP) to learn viable representation of linguistic items based on contextual information or/and word co-occurrence. In this paper, we investigate Arabic document classification using Word and document Embeddings as representational basis rather than relying on text preprocessing and bag-of-words representation. We demonstrate that document Embeddings outperform text preprocessing techniques either by learning them using Doc2Vec or averaging word vectors using a simple method for document Embedding construction. Moreover, the results show that the classification accuracy is less sensitive to word and document vectors learning parameters.
Springer
Showing the best result for this search. See all results