A Portable Method for Parallel and Comparable Document Alignment

Fecha: 02.06.2016


Abstract

We present a document alignment method based on expanded lexical translation sets and document-level Jaccard similarity. We compare our approach to state-of-the-art methods on a variety of alignment tasks, showing that it outperforms alternative methods in most scenarios for both parallel and comparable corpora. The proposed method is highly portable, requiring only minimal seed information and no task-specific training, thus providing the means for an efficient exploitation of multilingual documents.

BIB_text

@Article {
title = {A Portable Method for Parallel and Comparable Document Alignment},
pages = {243-255},
number = {2},
volume = {4},
keywds = {

Document alignment, Comparable corpora, Parallel corpora


}
abstract = {

We present a document alignment method based on expanded lexical translation sets and document-level Jaccard similarity. We compare our approach to state-of-the-art methods on a variety of alignment tasks, showing that it outperforms alternative methods in most scenarios for both parallel and comparable corpora. The proposed method is highly portable, requiring only minimal seed information and no task-specific training, thus providing the means for an efficient exploitation of multilingual documents.


}
date = {2016-06-02},
year = {2016},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (España)

+(34) 943 309 230

close overlay