Markerless Full-Body Human Motion Capture and Combined Motor Action Recognition for Human-Computer Interaction
Classically people interact with computers with the use of devices such as the keyboard and the mouse. Computers can detect the events coming from them, such as pushing down or up isolated or combined keyboard and mouse buttons, or the mouse motions, and then react according to the interpretation assigned to them. This communication procedure has been satisfactorily used for a wide range of applications. However, this approach lacks naturalness respect to the face-to-face human communication. This thesis project presents a method for the markerless real-time capture and automatic interpretation of full-body human movements for human-computer interaction (HCI). Three stages can be distinguished in order to reach this objective: (1) the markerless tracking of as much user’s body parts as possible, (2) the reconstruction of the kinematical 3D skeleton that represents the user’s pose from the tracked body parts, and (3) the recognition of the movement patterns in order to make the computer “understand” the user’s will and then react according to it. All these three processes must be solved in real time in order to attain a satisfactory HCI. The first stage can be solved with the use of cameras focusing on the user and computer vision algorithms that extract and track the user’s relevant characteristics from the images. This project proposes a method that combines color probabilities and optical flow. The second one requires to situate the kinematical 3D skeleton in biomechanical plausible poses fitted to the detected body parts, considering previous poses in order to obtain smooth motions. This project proposes an analytic-iterative inverse kinematics method that situates the body parts sequentially, from the torso to the upper and lower limbs, taking into account the biomechanical limits and the most relevant collisions. Finally, the last stage requires to analyze which are the significant features from motion, in order to interpret patterns with artificial intelligence techniques. This project proposes a method to automatically extract potential gestures from the data flow and then label them, allowing the performance of combined actions.