Back

Perceiving Systems Members Publications

Optical Flow and Human Action

(Top) We learn human flow [File IconFile Icon] from synthetically generated flow fields and find that this generalizes to real videos of human movement. (Bottom) We fine tune an optical flow algorithm to produce flow that improves action recognition [File Icon]. (Left columns) SpyNet. (Right columns) FlowNet. In each set, left to right: first image in sequence, original flow, flow when trained on action recognition, differences in the flow are focused on the human action.

Members

Perceiving Systems
  • Doctoral Researcher
Perceiving Systems
Perceiving Systems
Affiliated Researcher
Autonomous Vision
Perceiving Systems, Autonomous Vision
  • Doctoral Researcher
Perceiving Systems
Autonomous Vision, Perceiving Systems
Guest Scientist
Perceiving Systems
Emeritus / Acting Director
Perceiving Systems
Perceiving Systems
  • Guest Scientist
Perceiving Systems
Guest Scientist

Publications

Perceiving Systems Article Learning Multi-Human Optical Flow Ranjan, A., Hoffmann, D. T., Tzionas, D., Tang, S., Romero, J., Black, M. J. International Journal of Computer Vision (IJCV), 128(4):873-890, April 2020 (Published)
The optical flow of humans is well known to be useful for the analysis of human action. Recent optical flow methods focus on training deep networks to approach the problem. However, the training data used by them does not cover the domain of human motion. Therefore, we develop a dataset of multi-human optical flow and train optical flow networks on this dataset. We use a 3D model of the human body and motion capture data to synthesize realistic flow fields in both single-and multi-person images. We then train optical flow networks to estimate human flow fields from pairs of images. We demonstrate that our trained networks are more accurate than a wide range of top methods on held-out test data and that they can generalize well to real image sequences. The code, trained models and the dataset are available for research.
pdf DOI poster DOI URL BibTeX

Perceiving Systems Ph.D. Thesis Towards Geometric Understanding of Motion Ranjan, A. University of Tübingen, December 2019
<p> The motion of the world is inherently dependent on the spatial structure of the world and its geometry. Therefore, classical optical flow methods try to model this geometry to solve for the motion. However, recent deep learning methods take a completely different approach. They try to predict optical flow by learning from labelled data. Although deep networks have shown state-of-the-art performance on classification problems in computer vision, they have not been as effective in solving optical flow. The key reason is that deep learning methods do not explicitly model the structure of the world in a neural network, and instead expect the network to learn about the structure from data. We hypothesize that it is difficult for a network to learn about motion without any constraint on the structure of the world. Therefore, we explore several approaches to explicitly model the geometry of the world and its spatial structure in deep neural networks. </p> <p> The spatial structure in images can be captured by representing it at multiple scales. To represent multiple scales of images in deep neural nets, we introduce a Spatial Pyramid Network (SpyNet). Such a network can leverage global information for estimating large motions and local information for estimating small motions. We show that SpyNet significantly improves over previous optical flow networks while also being the smallest and fastest neural network for motion estimation. SPyNet achieves a 97% reduction in model parameters over previous methods and is more accurate. </p> <p> The spatial structure of the world extends to people and their motion. Humans have a very well-defined structure, and this information is useful in estimating optical flow for humans. To leverage this information, we create a synthetic dataset for human optical flow using a statistical human body model and motion capture sequences. We use this dataset to train deep networks and see significant improvement in the ability of the networks to estimate human optical flow. </p> <p> The structure and geometry of the world affects the motion. Therefore, learning about the structure of the scene together with the motion can benefit both problems. To facilitate this, we introduce Competitive Collaboration, where several neural networks are constrained by geometry and can jointly learn about structure and motion in the scene without any labels. To this end, we show that jointly learning single view depth prediction, camera motion, optical flow and motion segmentation using Competitive Collaboration achieves state-of-the-art results among unsupervised approaches. </p> <p> Our findings provide support for our hypothesis that explicit constraints on structure and geometry of the world lead to better methods for motion estimation. </p>
PhD Thesis BibTeX

Perceiving Systems Conference Paper Learning Human Optical Flow Ranjan, A., Romero, J., Black, M. J. In 29th British Machine Vision Conference, September 2018
The optical flow of humans is well known to be useful for the analysis of human action. Given this, we devise an optical flow algorithm specifically for human motion and show that it is superior to generic flow methods. Designing a method by hand is impractical, so we develop a new training database of image sequences with ground truth optical flow. For this we use a 3D model of the human body and motion capture data to synthesize realistic flow fields. We then train a convolutional neural network to estimate human flow fields from pairs of images. Since many applications in human motion analysis depend on speed, and we anticipate mobile applications, we base our method on SpyNet with several modifications. We demonstrate that our trained network is more accurate than a wide range of top methods on held-out test data and that it generalizes well to real image sequences. When combined with a person detector/tracker, the approach provides a full solution to the problem of 2D human flow estimation. Both the code and the dataset are available for research.
video code pdf URL BibTeX

Perceiving Systems Conference Paper Learning from Synthetic Humans Varol, G., Romero, J., Martin, X., Mahmood, N., Black, M. J., Laptev, I., Schmid, C. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, 4627-4635, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Estimating human pose, shape, and motion from images and videos are fundamental challenges with many applications. Recent advances in 2D human pose estimation use large amounts of manually-labeled training data for learning convolutional neural networks (CNNs). Such data is time consuming to acquire and difficult to extend. Moreover, manual labeling of 3D pose, depth and motion is impractical. In this work we present SURREAL (Synthetic hUmans foR REAL tasks): a new large-scale dataset with synthetically-generated but realistic images of people rendered from 3D sequences of human motion capture data. We generate more than 6 million frames together with ground truth pose, depth maps, and segmentation masks. We show that CNNs trained on our synthetic dataset allow for accurate human depth estimation and human part segmentation in real RGB images. Our results and the new dataset open up new possibilities for advancing person analysis using cheap and large-scale synthetic data.
arXiv project data BibTeX