UnibucKernel: A kernel-based learning method for complex word identification

Butnaru, Andrei M.; Ionescu, Radu Tudor

Computer Science > Computation and Language

arXiv:1803.07602 (cs)

[Submitted on 20 Mar 2018 (v1), last revised 22 May 2018 (this version, v4)]

Title:UnibucKernel: A kernel-based learning method for complex word identification

Authors:Andrei M. Butnaru, Radu Tudor Ionescu

View PDF

Abstract:In this paper, we present a kernel-based learning approach for the 2018 Complex Word Identification (CWI) Shared Task. Our approach is based on combining multiple low-level features, such as character n-grams, with high-level semantic features that are either automatically learned using word embeddings or extracted from a lexical knowledge base, namely WordNet. After feature extraction, we employ a kernel method for the learning phase. The feature matrix is first transformed into a normalized kernel matrix. For the binary classification task (simple versus complex), we employ Support Vector Machines. For the regression task, in which we have to predict the complexity level of a word (a word is more complex if it is labeled as complex by more annotators), we employ v-Support Vector Regression. We applied our approach only on the three English data sets containing documents from Wikipedia, WikiNews and News domains. Our best result during the competition was the third place on the English Wikipedia data set. However, in this paper, we also report better post-competition results.

Comments:	This paper presents the system developed by the UnibucKernel team for the 2018 CWI Shared Task. Accepted at the BEA13 Workshop of NAACL 2018
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:1803.07602 [cs.CL]
	(or arXiv:1803.07602v4 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1803.07602

Submission history

From: Radu Tudor Ionescu [view email]
[v1] Tue, 20 Mar 2018 18:47:54 UTC (338 KB)
[v2] Fri, 6 Apr 2018 09:24:55 UTC (338 KB)
[v3] Tue, 10 Apr 2018 15:27:52 UTC (338 KB)
[v4] Tue, 22 May 2018 17:03:19 UTC (338 KB)

Computer Science > Computation and Language

Title:UnibucKernel: A kernel-based learning method for complex word identification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:UnibucKernel: A kernel-based learning method for complex word identification

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators