Multimedia edukien ulerpen semantikorako Ekarpen metodologikoak: irudien behemailako analisitik bideoen ekintzen sailkapenera

< Back

Author: Naiara Aginako

Directors: Julián Flórez Esnal (Vicomtech) Basilio Sierra (University)

University: UPV-EHU

Date: 28.04.2017

In the last years, the amount of digital content that is produced worldwide has grown exponentially. Concretely, digital images and videos are the main representatives of this growth. They have become the core unit of all the digital communications and the experts in the field foretell that this trend will continue in the next few years. Therefore, there is a real need of effective and stable methods for the storage, management and analysis of this huge volume of content. The presented dissertation work focuses on the research and development of analysis methods for images and videos. The main objective is to establish the pillars for the semantic understanding of images and videos. Remark that understanding the analysed content permits the automatic labelling of it, which incurs in benefits from different perspectives. On the one hand, content producers could automatically label all their information while they are generating it. This fact permits the adoption of more effective storage
strategies and content retrieval systems. On the other hand, content consumers gain the possibility of searching content using semantic concepts, ergo, natural language terms. In consequence, Human-Computer Interaction (HCI) becomes more natural.
Internet is nowadays the biggest warehouse of the digital content. Most of the content that is everyday produced in the world is housed there and the main characteristic of this content is its heterogeneity. Content can proceed from very different fields; in other words, they don’t belong to a unique domain. Hence, domain specific strategies can’t be applied and this hinders the analysis of the
content. In order to tackle this, lot of research has been done in effective methods for the domain recognition. However, these methods are commonly based on the analysis of the content itself which ends in a Vicious circle.
Computer Vision drives all the research results presented in this dissertation work. Main contributions rely on the fact that designed and developed methods are domain specific; even though, some domain-agnostic methods has also been studied. Main contributions can be divided into three principal research lines. First research line includes the analysis and development of methods based on low-level descriptors for the inference of simple image semantic concepts. Methods for the automatic labelling of images are also the target. Second line focuses on the addition of machine learning strategies for image classification, recognition and understanding.
The inclusion of classification algorithms permits broadening the spectrum of the resolved issues. The semantic concepts that can be extracted from the images are more complex, or high-level, and developed solutions can deal with a greater variety of images.
Third and last research line is focused on video analytics. This line benefits from all the work accomplished in the previous lines. Even though previous developments can’t be directly applied as timing of videos is a highly relevant variable, learned conclusions are very relevant when designing and developing the adapted solutions. Nowadays, video analytics is one of the most challenging tasks within Computer Vision. This dissertation work presents a new methodology based on low-level descriptors and classifiers for action recognition in videos. Automatic image and video understanding is still an open issue within research community. Still there is a lot of work to do but some steps has been made in order to overcome the actual computer vision challenges. Future work will imply to continue in the path already started.