A Portable Method for Parallel and Comparable Document Alignment

Egileak: Thierry Etchegoyhen Andoni Azpeitia Zaldua

Data: 02.06.2016


Abstract

We present a document alignment method based on expanded lexical translation sets and document-level Jaccard similarity. We compare our approach to state-of-the-art methods on a variety of alignment tasks, showing that it outperforms alternative methods in most scenarios for both parallel and comparable corpora. The proposed method is highly portable, requiring only minimal seed information and no task-specific training, thus providing the means for an efficient exploitation of multilingual documents.

BIB_text

@Article {
title = {A Portable Method for Parallel and Comparable Document Alignment},
pages = {243-255},
number = {2},
volume = {4},
keywds = {

Document alignment, Comparable corpora, Parallel corpora


}
abstract = {

We present a document alignment method based on expanded lexical translation sets and document-level Jaccard similarity. We compare our approach to state-of-the-art methods on a variety of alignment tasks, showing that it outperforms alternative methods in most scenarios for both parallel and comparable corpora. The proposed method is highly portable, requiring only minimal seed information and no task-specific training, thus providing the means for an efficient exploitation of multilingual documents.


}
date = {2016-06-02},
year = {2016},
}
Vicomtech

Gipuzkoako Zientzia eta Teknologia Parkea,
Mikeletegi Pasealekua 57,
20009 Donostia / San Sebastián (Espainia)

+(34) 943 309 230

Ensanche eraikina,
Zabalgune Plaza 11,
48009 Bilbo (Espainia)

close overlay