efficient document alignment across scenarios

Data: 01.09.2019

Machine Translation


Abstract

We present and evaluate an approach to document alignment meant for efficiency and portability, as it relies on automatically extracted lexical translations and simple set-theoretic operations for the computation of document-level similarity. We compare our approach to the state of the art on a variety of alignment scenarios, showing that it outperforms alternative document-alignment methods in the vast majority of cases, on both parallel and comparable corpora. We also explore several forms of simple component optimisation to evaluate the potential for improvement of the core method, and describe several successful optimisation paths that lead to significant improvements over strong baselines. The proposed approach constitutes an effective and easy to deploy method to perform accurate document alignment across scenarios, with the potential to improve the creation of parallel corpora.

BIB_text

@Article {
title = {efficient document alignment across scenarios},
journal = {Machine Translation},
pages = {205-237},
volume = {33},
keywds = {
Document alignment, Comparable corpora, Parallel corpora
}
abstract = {

We present and evaluate an approach to document alignment meant for efficiency and portability, as it relies on automatically extracted lexical translations and simple set-theoretic operations for the computation of document-level similarity. We compare our approach to the state of the art on a variety of alignment scenarios, showing that it outperforms alternative document-alignment methods in the vast majority of cases, on both parallel and comparable corpora. We also explore several forms of simple component optimisation to evaluate the potential for improvement of the core method, and describe several successful optimisation paths that lead to significant improvements over strong baselines. The proposed approach constitutes an effective and easy to deploy method to perform accurate document alignment across scenarios, with the potential to improve the creation of parallel corpora.


}
doi = {10.1007/s10590-019-09234-9},
date = {2019-09-01},
}
Vicomtech

Gipuzkoako Zientzia eta Teknologia Parkea,
Mikeletegi Pasealekua 57,
20009 Donostia / San Sebasti√°n (Espainia)

+(34) 943 309 230

Zorrotzaurreko Erribera 2, Deusto,
48014 Bilbo (Espainia)

close overlay

Jokaeraren araberako publizitateko cookieak beharrezkoak dira eduki hau kargatzeko

Onartu jokaeraren araberako publizitateko cookieak