efficient document alignment across scenarios

Date: 01.09.2019

Machine Translation


Abstract

We present and evaluate an approach to document alignment meant for efficiency and portability, as it relies on automatically extracted lexical translations and simple set-theoretic operations for the computation of document-level similarity. We compare our approach to the state of the art on a variety of alignment scenarios, showing that it outperforms alternative document-alignment methods in the vast majority of cases, on both parallel and comparable corpora. We also explore several forms of simple component optimisation to evaluate the potential for improvement of the core method, and describe several successful optimisation paths that lead to significant improvements over strong baselines. The proposed approach constitutes an effective and easy to deploy method to perform accurate document alignment across scenarios, with the potential to improve the creation of parallel corpora.

BIB_text

@Article {
title = {efficient document alignment across scenarios},
journal = {Machine Translation},
pages = {205-237},
volume = {33},
keywds = {
Document alignment, Comparable corpora, Parallel corpora
}
abstract = {

We present and evaluate an approach to document alignment meant for efficiency and portability, as it relies on automatically extracted lexical translations and simple set-theoretic operations for the computation of document-level similarity. We compare our approach to the state of the art on a variety of alignment scenarios, showing that it outperforms alternative document-alignment methods in the vast majority of cases, on both parallel and comparable corpora. We also explore several forms of simple component optimisation to evaluate the potential for improvement of the core method, and describe several successful optimisation paths that lead to significant improvements over strong baselines. The proposed approach constitutes an effective and easy to deploy method to perform accurate document alignment across scenarios, with the potential to improve the creation of parallel corpora.


}
doi = {10.1007/s10590-019-09234-9},
date = {2019-09-01},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (Spain)

+(34) 943 309 230

close overlay