Back

Perceiving Systems Members Publications

Learning Optical Flow

Top: Deep learning methods like SpyNet [File Icon] now dominate optical flow estimation. Bottom: Unfortunately, many of these methods are easily attacked [File Icon]. A small adversarial image patch (in the red square) can disrupt large parts of the flow field.

Members

Perceiving Systems
Emeritus / Acting Director
Perceiving Systems
  • Doctoral Researcher
Perceiving Systems
  • Doctoral Researcher
no image
Perceiving Systems
Perceiving Systems
Autonomous Vision, Perceiving Systems
Guest Scientist
Perceiving Systems, Autonomous Vision
  • Doctoral Researcher
Perceiving Systems, Autonomous Vision
  • Doctoral Researcher
Perceiving Systems

Publications

Perceiving Systems Article Learning Multi-Human Optical Flow Ranjan, A., Hoffmann, D. T., Tzionas, D., Tang, S., Romero, J., Black, M. J. International Journal of Computer Vision (IJCV), 128(4):873-890, April 2020 (Published)
The optical flow of humans is well known to be useful for the analysis of human action. Recent optical flow methods focus on training deep networks to approach the problem. However, the training data used by them does not cover the domain of human motion. Therefore, we develop a dataset of multi-human optical flow and train optical flow networks on this dataset. We use a 3D model of the human body and motion capture data to synthesize realistic flow fields in both single-and multi-person images. We then train optical flow networks to estimate human flow fields from pairs of images. We demonstrate that our trained networks are more accurate than a wide range of top methods on held-out test data and that they can generalize well to real image sequences. The code, trained models and the dataset are available for research.
pdf DOI poster DOI URL BibTeX

Perceiving Systems Autonomous Vision Conference Paper Attacking Optical Flow Ranjan, A., Janai, J., Geiger, A., Black, M. J. In Proceedings International Conference on Computer Vision (ICCV), 2404-2413, IEEE, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), November 2019, ISSN: 2380-7504 (Published)
Deep neural nets achieve state-of-the-art performance on the problem of optical flow estimation. Since optical flow is used in several safety-critical applications like self-driving cars, it is important to gain insights into the robustness of those techniques. Recently, it has been shown that adversarial attacks easily fool deep neural networks to misclassify objects. The robustness of optical flow networks to adversarial attacks, however, has not been studied so far. In this paper, we extend adversarial patch attacks to optical flow networks and show that such attacks can compromise their performance. We show that corrupting a small patch of less than 1% of the image size can significantly affect optical flow estimates. Our attacks lead to noisy flow estimates that extend significantly beyond the region of the attack, in many cases even completely erasing the motion of objects in the scene. While networks using an encoder-decoder architecture are very sensitive to these attacks, we found that networks using a spatial pyramid architecture are less affected. We analyse the success and failure of attacking both architectures by visualizing their feature maps and comparing them to classical optical flow techniques which are robust to these attacks. We also demonstrate that such attacks are practical by placing a printed pattern into real scenes.
Video Project Page Paper Supplementary Material DOI URL BibTeX

Perceiving Systems Conference Paper Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation Ranjan, A., Jampani, V., Balles, L., Kim, K., Sun, D., Wulff, J., Black, M. J. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 12240-12249, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
We address the unsupervised learning of several interconnected problems in low-level vision: single view depth prediction, camera motion estimation, optical flow, and segmentation of a video into the static scene and moving regions. Our key insight is that these four fundamental vision problems are coupled through geometric constraints. Consequently, learning to solve them together simplifies the problem because the solutions can reinforce each other. We go beyond previous work by exploiting geometry more explicitly and segmenting the scene into static and moving regions. To that end, we introduce Competitive Collaboration, a framework that facilitates the coordinated training of multiple specialized neural networks to solve complex problems. Competitive Collaboration works much like expectation-maximization, but with neural networks that act as both competitors to explain pixels that correspond to static or moving regions, and as collaborators through a moderator that assigns pixels to be either static or independently moving. Our novel method integrates all these problems in a common framework and simultaneously reasons about the segmentation of the scene into moving objects and the static background, the camera motion, depth of the static scene structure, and the optical flow of moving objects. Our model is trained without any supervision and achieves state-of-the-art performance among joint unsupervised methods on all sub-problems.
Paper URL BibTeX

Perceiving Systems Conference Paper Temporal Interpolation as an Unsupervised Pretraining Task for Optical Flow Estimation Wulff, J., Black, M. J. In German Conference on Pattern Recognition (GCPR), LNCS 11269:567-582, Springer, Cham, October 2018
The difficulty of annotating training data is a major obstacle to using CNNs for low-level tasks in video. Synthetic data often does not generalize to real videos, while unsupervised methods require heuristic n losses. Proxy tasks can overcome these issues, and start by training a network for a task for which annotation is easier or which can be trained unsupervised. The trained network is then fine-tuned for the original task using small amounts of ground truth data. Here, we investigate frame interpolation as a proxy task for optical flow. Using real movies, we train a CNN unsupervised for temporal interpolation. Such a network implicitly estimates motion, but cannot handle untextured regions. By fi ne-tuning on small amounts of ground truth flow, the network can learn to fill in homogeneous regions and compute full optical flow fi elds. Using this unsupervised pre-training, our network outperforms similar architectures that were trained supervised using synthetic optical flow.
pdf arXiv DOI BibTeX

Perceiving Systems Conference Paper Learning Human Optical Flow Ranjan, A., Romero, J., Black, M. J. In 29th British Machine Vision Conference, September 2018
The optical flow of humans is well known to be useful for the analysis of human action. Given this, we devise an optical flow algorithm specifically for human motion and show that it is superior to generic flow methods. Designing a method by hand is impractical, so we develop a new training database of image sequences with ground truth optical flow. For this we use a 3D model of the human body and motion capture data to synthesize realistic flow fields. We then train a convolutional neural network to estimate human flow fields from pairs of images. Since many applications in human motion analysis depend on speed, and we anticipate mobile applications, we base our method on SpyNet with several modifications. We demonstrate that our trained network is more accurate than a wide range of top methods on held-out test data and that it generalizes well to real image sequences. When combined with a person detector/tracker, the approach provides a full solution to the problem of 2D human flow estimation. Both the code and the dataset are available for research.
video code pdf URL BibTeX

Autonomous Vision Perceiving Systems Conference Paper Unsupervised Learning of Multi-Frame Optical Flow with Occlusions Janai, J., Güney, F., Ranjan, A., Black, M. J., Geiger, A. In European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, vol 11220:713-731, Springer, Cham, September 2018 pdf suppmat Video Project Page DOI BibTeX

Perceiving Systems Conference Paper Optical Flow Estimation using a Spatial Pyramid Network Ranjan, A., Black, M. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, 2720-2729, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
We learn to compute optical flow by combining a classical spatial-pyramid formulation with deep learning. This estimates large motions in a coarse-to-fine approach by warping one image of a pair at each pyramid level by the current flow estimate and computing an update to the flow. Instead of the standard minimization of an objective function at each pyramid level, we train one deep network per level to compute the flow update. Unlike the recent FlowNet approach, the networks do not need to deal with large motions; these are dealt with by the pyramid. This has several advantages. First, our Spatial Pyramid Network (SPyNet) is much simpler and 96% smaller than FlowNet in terms of model parameters. This makes it more efficient and appropriate for embedded applications. Second, since the flow at each pyramid level is small (< 1 pixel), a convolutional approach applied to pairs of warped images is appropriate. Third, unlike FlowNet, the learned convolution filters appear similar to classical spatio-temporal filters, giving insight into the method and how to improve it. Our results are more accurate than FlowNet on most standard benchmarks, suggesting a new direction of combining classical flow methods with deep learning.
pdf SupMat project/code BibTeX