Lexical normalization of Spanish tweets with preprocessing rules, domain-specific edit-distances, and language models

Egileak: Pablo Ruiz, Montse Cuadros, Thierry Etchegoyhen

Data: 12.09.2013


PDF

Abstract

We present a system to normalize Spanish tweets, which uses preprocessing rules, a domain-appropriate edit-distance model, and language models to select correction candidates based on context. The system’s results at SEPLN 2013 Tweet-Norm task were above-average.

BIB_text

@Article {
author = {Pablo Ruiz, Montse Cuadros, Thierry Etchegoyhen},
title = {Lexical normalization of Spanish tweets with preprocessing rules, domain-specific edit-distances, and language models},
number = {9},
keywds = {

microtexto, español, castellano, normalización léxica, Twitter, distancia de edición, modelo de lengua, Spanish microtext, lexical normalization, Twitter, edit distance, language model


}
abstract = {

We present a system to normalize Spanish tweets, which uses preprocessing rules, a domain-appropriate edit-distance model, and language models to select correction candidates based on context. The system’s results at SEPLN 2013 Tweet-Norm task were above-average.


}
isbn = {978-84-695-8349-4},
date = {2013-09-12},
year = {2013},
}
Vicomtech

Gipuzkoako Zientzia eta Teknologia Parkea,
Mikeletegi Pasealekua 57,
20009 Donostia / San Sebastián (Espainia)

+(34) 943 309 230

Zorrotzaurreko Erribera 2, Deusto,
48014 Bilbo (Espainia)

close overlay

Jokaeraren araberako publizitateko cookieak beharrezkoak dira eduki hau kargatzeko

Onartu jokaeraren araberako publizitateko cookieak