Exploring the Limits of Diffusion Models to Generate Person Detection Training Datasets

Egileak: Hugo Rodríguez Arce Francisco Javier Iriarte Satrustegui Miguel Ortiz Huamani Luis Unzueta Irurtia Seán Gaines Cooke

Data: 17.09.2024


Abstract

Current diffusion models could assist in creating training datasets for Deep Neural Network (DNN)-based person detectors by producing high-quality, realistic, and custom images of non-existent people and objects, avoiding privacy issues. However, these models have difficulties in generating images of people in a fully controlled way. Problems may occur such as abnormal proportions, distortions in body or face, extra limbs, or elements that do not match the input text prompt. Moreover, biases related to factors like gender, clothing type and colors, ethnicity or age can also limit the control over the generated images. Both generative AI models and DNN-based person detectors need large sets of annotated images that reflect the diverse visual appearances expected in the application context. In this paper we explore the capabilities of state-of-the-art text-to-image a diffusion models for person image generation and propose a methodology to exploit their usage for training DNN-based person detectors. For the generation of virtual persons, this includes variations in the environment, such as illumination or background, and people characteristics, such as body pose, skin tones, gender, age, clothing types and colors, as well as multiple types of partial occlusions with other objects (or people). Our method leverages explainability techniques to gain more understanding of the behaviour of the diffusion models and the relation between inputs and outputs to improve the diversity of the person detection training dataset. Experimental results using the WiderPerson benchmark of a YOLOX detection model trained with the proposed methodology show the potential use of this approach.

BIB_text

@Article {
title = {Exploring the Limits of Diffusion Models to Generate Person Detection Training Datasets},
pages = {132060T},
keywds = {
Person Detection; Stable Diffusion; Synthetic Dataset
}
abstract = {

Current diffusion models could assist in creating training datasets for Deep Neural Network (DNN)-based person detectors by producing high-quality, realistic, and custom images of non-existent people and objects, avoiding privacy issues. However, these models have difficulties in generating images of people in a fully controlled way. Problems may occur such as abnormal proportions, distortions in body or face, extra limbs, or elements that do not match the input text prompt. Moreover, biases related to factors like gender, clothing type and colors, ethnicity or age can also limit the control over the generated images. Both generative AI models and DNN-based person detectors need large sets of annotated images that reflect the diverse visual appearances expected in the application context. In this paper we explore the capabilities of state-of-the-art text-to-image a diffusion models for person image generation and propose a methodology to exploit their usage for training DNN-based person detectors. For the generation of virtual persons, this includes variations in the environment, such as illumination or background, and people characteristics, such as body pose, skin tones, gender, age, clothing types and colors, as well as multiple types of partial occlusions with other objects (or people). Our method leverages explainability techniques to gain more understanding of the behaviour of the diffusion models and the relation between inputs and outputs to improve the diversity of the person detection training dataset. Experimental results using the WiderPerson benchmark of a YOLOX detection model trained with the proposed methodology show the potential use of this approach.


}
date = {2024-09-17},
}
Vicomtech

Gipuzkoako Zientzia eta Teknologia Parkea,
Mikeletegi Pasealekua 57,
20009 Donostia / San Sebastián (Espainia)

+(34) 943 309 230

Zorrotzaurreko Erribera 2, Deusto,
48014 Bilbo (Espainia)

close overlay

Jokaeraren araberako publizitateko cookieak beharrezkoak dira eduki hau kargatzeko

Onartu jokaeraren araberako publizitateko cookieak