Vicomtech at alexs 2020: Unsupervised complex word identification based on domain frequency

Abstract

This paper introduces Vicomtech's systems for unsupervised complex word identification submitted to the ALexS "Análisis Léxico en la SEPLN 2020"task. The systems are based on clustering algorithms with domain specific features, such as word frequency and probability in several Wikipedia corpora, word length, and number of synsets in WordNet. Our systems are designed to identify complex words, taking into account occurrence of the word in domain-specific texts in order to be able to adapt to the domain. Our systems reported good results, performing in second position.

BIB_text

@Article {
title = {Vicomtech at alexs 2020: Unsupervised complex word identification based on domain frequency},
pages = {7-14},
keywds = {
Complex word identification, Lexical simplification, Unsupervised learning
}
abstract = {

This paper introduces Vicomtech's systems for unsupervised complex word identification submitted to the ALexS "Análisis Léxico en la SEPLN 2020"task. The systems are based on clustering algorithms with domain specific features, such as word frequency and probability in several Wikipedia corpora, word length, and number of synsets in WordNet. Our systems are designed to identify complex words, taking into account occurrence of the word in domain-specific texts in order to be able to adapt to the domain. Our systems reported good results, performing in second position.


}
date = {2020-09-23},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (Spain)

+(34) 943 309 230

close overlay