Vicomtech at ALexS 2020: Unsupervised ComplexWord Identification Based on Domain Frequency

Abstract

This paper introduces Vicomtech’s systems for unsupervised complex word identification submitted tothe ALexS “Análisis Léxico en la SEPLN 2020” task. The systems are based on clustering algorithms withdomain specific features, such as word frequency and probability in several Wikipedia corpora, wordlength, and number of synsets in WordNet. Our systems are designed to identify complex words, takinginto account occurrence of the word in domain-specific texts in order to be able to adapt to the domain.Our systems reported good results, performing in second position.

BIB_text

@Article {
title = {Vicomtech at ALexS 2020: Unsupervised ComplexWord Identification Based on Domain Frequency},
pages = {7-14},
keywds = {
Complex Word Identification, Lexical Simplification, Unsupervised Learning
}
abstract = {

This paper introduces Vicomtech’s systems for unsupervised complex word identification submitted tothe ALexS “Análisis Léxico en la SEPLN 2020” task. The systems are based on clustering algorithms withdomain specific features, such as word frequency and probability in several Wikipedia corpora, wordlength, and number of synsets in WordNet. Our systems are designed to identify complex words, takinginto account occurrence of the word in domain-specific texts in order to be able to adapt to the domain.Our systems reported good results, performing in second position.


}
date = {2020-09-23},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (España)

+(34) 943 309 230

close overlay