Multilingual Opinion Mining

Every day a lot of text is generated in different online media. Much of this text contains opinions about a multitude of entities, products, services, etc. Given the growing need for automated means to analyse, process and exploit this information, sentiment analysis techniques have received a great deal of attention from industry and the scientific community over the past decade and a half. However, many of the techniques used often require supervised training using manually annotated examples, or other language resources related to a specific language or application domain. This limits the application of these types of techniques, since these resources and training examples are not easy to obtain.   This thesis explores a series of methods for performing various automatic text analyses in the context of sentiment analysis, including the automatic extraction of terms of a domain, words expressing opinions, the polarity of the sentiment of those words (positive or negative), etc.  Finally, a method combining continuous word embeddings and topic-modelling, inspired by the Latent Dirichlet Allocation (LDA) technique, is proposed and evaluated to obtain an aspect-based sentiment analysis system (ABSA) which only needs a few seed words to process texts from a given language or domain. In this way, the adaptation to another language or domain is reduced to the translation of the corresponding seed words.

Fecha

2017-07-11

Lugar

Donostia-San Sebastián

Abstract

Every day a lot of text is generated in different online media. Much of this text contains opinions about a multitude of entities, products, services, etc. Given the growing need for automated means to analyse, process and exploit this information, sentiment analysis techniques have received a great deal of attention from industry and the scientific community over the past decade and a half. However, many of the techniques used often require supervised training using manually annotated examples, or other language resources related to a specific language or application domain. This limits the application of these types of techniques, since these resources and training examples are not easy to obtain.   This thesis explores a series of methods for performing various automatic text analyses in the context of sentiment analysis, including the automatic extraction of terms of a domain, words expressing opinions, the polarity of the sentiment of those words (positive or negative), etc.  Finally, a method combining continuous word embeddings and topic-modelling, inspired by the Latent Dirichlet Allocation (LDA) technique, is proposed and evaluated to obtain an aspect-based sentiment analysis system (ABSA) which only needs a few seed words to process texts from a given language or domain. In this way, the adaptation to another language or domain is reduced to the translation of the corresponding seed words.

Autor

Aitor García Pablos

Universidad

UPV/EHU