Contextualised learning-free three-dimensional body pose estimation from two-dimensional body features in monocular images

Date: 01.06.2016

IET Computer Vision


Abstract

In this study, the authors present a learning-free method for inferring kinematically plausible three-dimensional (3D) human body poses contextualised in a predefined 3D world, given a set of 2D body features extracted from monocular images. This contextualisation has the advantage of providing further semantic information about the observed scene. Their method consists of two main steps. Initially, the camera parameters are obtained by adjusting the reference floor of the predefined 3D world to four key-points in the image. Then, the person's body part lengths and pose are estimated by fitting a parametrised multi-body 3D kinematic model to 2D image body features, which can be located by state-of-the-art body part detectors. The adjustment is carried out by a hierarchical optimisation procedure, where the model's scale variations are considered first and then the body part lengths are refined. At each iteration, tentative poses are inferred by a combination of efficient perspective-n-point camera pose estimation and constrained viewpoint-dependent inverse kinematics. Experimental results show that their method obtains good results in terms of accuracy with respect to state-of-the-art alternatives, but without the need of learning 2D/3D mapping models from training data. Their method works efficiently, allowing its integration in video soft sensing systems.

BIB_text

@Article {
title = {Contextualised learning-free three-dimensional body pose estimation from two-dimensional body features in monocular images},
journal = {IET Computer Vision},
pages = {299-306},
number = {4},
volume = {10},
keywds = {

inference mechanisms; pose estimation; optimisation; video signal processing; feature extraction; cameras


}
abstract = {

In this study, the authors present a learning-free method for inferring kinematically plausible three-dimensional (3D) human body poses contextualised in a predefined 3D world, given a set of 2D body features extracted from monocular images. This contextualisation has the advantage of providing further semantic information about the observed scene. Their method consists of two main steps. Initially, the camera parameters are obtained by adjusting the reference floor of the predefined 3D world to four key-points in the image. Then, the person's body part lengths and pose are estimated by fitting a parametrised multi-body 3D kinematic model to 2D image body features, which can be located by state-of-the-art body part detectors. The adjustment is carried out by a hierarchical optimisation procedure, where the model's scale variations are considered first and then the body part lengths are refined. At each iteration, tentative poses are inferred by a combination of efficient perspective-n-point camera pose estimation and constrained viewpoint-dependent inverse kinematics. Experimental results show that their method obtains good results in terms of accuracy with respect to state-of-the-art alternatives, but without the need of learning 2D/3D mapping models from training data. Their method works efficiently, allowing its integration in video soft sensing systems.


}
isi = {1},
doi = {10.1049/iet-cvi.2015.0283},
date = {2016-06-01},
year = {2016},
}
Vicomtech

Parque Científico y Tecnológico de Gipuzkoa,
Paseo Mikeletegi 57,
20009 Donostia / San Sebastián (Spain)

+(34) 943 309 230

close overlay