Lexical normalization of Spanish tweets with preprocessing rules, domain-specific edit-distances, and language models

Authors: Pablo Ruiz, Montse Cuadros, Thierry Etchegoyhen

Date: 12.09.2013


PDF

Abstract

We present a system to normalize Spanish tweets, which uses preprocessing rules, a domain-appropriate edit-distance model, and language models to select correction candidates based on context. The system’s results at SEPLN 2013 Tweet-Norm task were above-average.

BIB_text

@Article {
author = {Pablo Ruiz, Montse Cuadros, Thierry Etchegoyhen},
title = {Lexical normalization of Spanish tweets with preprocessing rules, domain-specific edit-distances, and language models},
number = {9},
keywds = {

microtexto, español, castellano, normalización léxica, Twitter, distancia de edición, modelo de lengua, Spanish microtext, lexical normalization, Twitter, edit distance, language model


}
abstract = {

We present a system to normalize Spanish tweets, which uses preprocessing rules, a domain-appropriate edit-distance model, and language models to select correction candidates based on context. The system’s results at SEPLN 2013 Tweet-Norm task were above-average.


}
isbn = {978-84-695-8349-4},
date = {2013-09-12},
year = {2013},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (Spain)

+(34) 943 309 230

close overlay