Action Recognition with Tracking

Institute Homepage

Institute Homepage Sign In

Back

Research Overview

Intrinsically Motivated Learning

Regularity as Intrinsic Reward for Free Play

SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models

Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation

Learning with Muscles

Natural and Robust Walking from Generic Rewards

The effect of muscles in Learning Behavior

Scaling RL to Large Musculoskeletal Systems

Reinforcement Learning for Diverse Solutions

Offline Diversity Under Imitation Constraints

Learning Diverse Skills for Local Navigation

Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations

Reinforcement Learning and Control

Model-based Reinforcement Learning and Planning

Object-centric Self-supervised Reinforcement Learning

Self-exploration of Behavior

Causal Reasoning in RL

Equation Learner for Extrapolation and Control

Intrinsically Motivated Hierarchical Learner

Regularity as Intrinsic Reward for Free Play

Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation

Natural and Robust Walking from Generic Rewards

Goal-conditioned Offline Planning

Offline Diversity Under Imitation Constraints

Learning Diverse Skills for Local Navigation

Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations

Deep Learning

Combinatorial Optimization as a Layer / Blackbox Differentiation

Object-centric Self-supervised Reinforcement Learning

Symbolic Regression and Equation Learning

Representation Learning

Stepsize adaptation for stochastic optimization

Probabilistic Neural Networks

Learning with 3D rotations: A hitchhiker’s guide to SO(3)

Haptic Sensing

Super-resolution Sensing for Haptics

Insight: a Haptic Sensor Powered by Vision and Machine Learning

Minsight: Learning-based tactile sensing for robotics

ML for Science

Predicting brain activity (fMRI)

Equation Learning for Statistical Physics

Machine Learning for Understanding Quantum Systems

Symbolic Regression and Equation Learning

Previous Research Projects

The Playful Machine

Robust and Affordable Haptic Sensation with Sparse Sensor Configuration

Perceiving Systems Members Publications

Action Recognition with Tracking

Research photo humanmotionanalysis — Action recognition can be performed based on low-level appearance features (a) such as color, optical flow, and spatio-temporal gradients or on features derived from the human pose (f). We have shown that action recognition benefits from human poses, but also that pose estimation can benefit from action recognition. For instance, outputs of the 2D action recognition (b) can be used as a prior distribution (c) for 3D pose estimation (d) (Arrow 1). Vice-versa, 3D pose-based action recognition (g) can be performed based on pose-based features (f) extracted from the estimated poses (e) (Arrow 2).

Vision-based human motion analysis attempts to understand the movements of the human body using computer vision and machine learning techniques. The movements of the body can be interpreted on a physical level through pose estimation, i.e., reconstruction of the 3D articulated motions, or on a higher, semantic level through action recognition, i.e., understanding the body's movements over time. While the objectives of the two tasks differ, they share a significant information overlap. For instance, poses from a given action tend to be a constrained subset of all possible configurations within the space of physiologically possible poses. Therefore, many state-of-the-art pose estimation systems use action-specific priors to simplify the pose estimation problem. At the same time, pose information can be a very strong indicator of actions.

Given that human pose estimation and action recognition are such closely intertwined tasks, information from one task can be leveraged to assist the other and vice versa. Therefore, we advocate in this project the use of information from action recognition to help with pose estimation and vice versa for the following reasons. First, using the results of an action classifier is a simple way to bring together many single-activity priors for pose estimation in multi-activity sequences. Secondly, pose-based action recognition has several advantages. For example, pose representations suffer little of the intra-class variances common in appearance-based systems; in particular, 3D skeleton poses are viewpoint and appearance invariant, such that actions vary less from actor to actor. Furthermore, using pose-based representations greatly simplifies learning for the action recognition itself, since the relevant high-level information has already been extracted.

To demonstrate the advantage of coupling the closely intertwined tasks of action recognition and pose estimation, we have developed a framework that jointly optimizes over several low-dimensional spaces that represent poses of various activities. Beyond that, unobserved pose variations or unobserved transitions between actions are resolved by continuing the optimization in the high-dimensional space of all human poses. Our experiments have shown that this combination is superior compared to optimization in either space individually.

For action recognition, 3D pose-based features have been shown to be more successful at classifying the actions than 2D appearance-based features. The same has been shown to be true even when the pose-based features were extracted from the estimated poses of the developed pose estimation system, indicating that the quality of estimated poses with an average error between 42mm-70mm is sufficient enough for reliable action recognition.

To advance vision-based human motion analysis beyond isolated actions and poses, our current research is focused on integrating contextual information, either from the environment or objects. Environmental context, e.g., the type of scene or even specific locations within a scene can provide strong indicators to the types of actions and therefore poses which can be expected. Furthermore, interactions with objects can often be the defining characteristic of an action and having a better understanding of human-object interactions would lead to improved recognition on high-level actions.

Members

Perceiving Systems

Jürgen Gall

Publications

Perceiving Systems Article Coupled Action Recognition and Pose Estimation from Multiple Views Yao, A., Gall, J., van Gool, L. International Journal of Computer Vision, 100(1):16-37, October 2012 publisher's site code pdf BibTeX

Perceiving Systems Book Chapter Data-driven Manifolds for Outdoor Motion Capture Pons-Moll, G., Leal-Taix’e, L., Gall, J., Rosenhahn, B. In Outdoor and Large-Scale Real-World Scene Analysis, 7474:305-328, LNCS, (Editors: Dellaert, Frank and Frahm, Jan-Michael and Pollefeys, Marc and Rosenhahn, Bodo and Leal-Taix’e, Laura), Springer, 2012 video publisher's site pdf BibTeX

Perceiving Systems Conference Paper Destination Flow for Crowd Simulation Pellegrini, S., Gall, J., Sigal, L., van Gool, L. In Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Streams, 7585:162-171, LNCS, Springer, 2012 pdf BibTeX

Perceiving Systems Conference Paper Metric Learning from Poses for Temporal Clustering of Human Motion L’opez-M’endez, A., Gall, J., Casas, J., van Gool, L. In British Machine Vision Conference (BMVC), 49.1-49.12, (Editors: Bowden, Richard and Collomosse, John and Mikolajczyk, Krystian), BMVA Press, 2012 video pdf BibTeX