Learning Optical Flow

Optical flow is the projection of the 3D motion field into the 2D image plane and it is useful for a variety of applications. Advances in deep learning have rapidly improved the accuracy of optical flow methods. Despite this, they have several limitations that we have worked to resolve.

To deal with large image motions in a compact network, we developed the Spatial Pyramid Network (SpyNet) [], which computes optical flow by combining a classical coarse-to-fine flow approach with deep learning. At each level of a spatial pyramid, the deep network computes an update to the current flow estimate. SpyNet is 96% smaller than FlowNet, is very fast, and can be trained end-to-end, making it easy to incorporate into other networks for tasks like action recognition [].

We discovered that many existing deep flow networks are not robust to adversarial attacks, even when only a small portion of the image is corrupted []. We learn an optimal pattern that, when placed in the image, can cause widespread errors in the flow. This gives insights into the inner workings of these networks, points out potential risks, and suggests a path to making them more robust.

Deep networks also require significant amounts of training data, yet there are no sensors that give ground truth optical flow for real image sequences and synthetic data is currently unrealistic. Consequently, have developed methods for unsupervised learning.

To that end, we exploit the geometric structure of optical flow in rigid scenes. With Competitive Collaboration [] we train four different networks that estimate monocular depth, camera pose, optical flow and non-rigid motion segmentation. These models compete and collaborate to explain the motion in the scene, producing accurate optical flow without explicit supervision.

The lack of proper occlusion handling in commonly used data terms is a major source of error in existing unsupervised methods. To address this, we use three consecutive frames to strengthen the photometric loss and explicitly reason about occlusions []. Our multi-frame formulation outperforms existing unsupervised two-frame methods and even produces results on par with some fully supervised methods.

Additionally, motion occlusion boundaries give important information about scene structure and we have worked on learning to detect these [].

Members

Perceiving Systems

Michael Black

Emeritus / Acting Director

Perceiving Systems

Anurag Ranjan

Doctoral Researcher

Perceiving Systems

Jonas Wulff

Doctoral Researcher

Perceiving Systems

Deqing Sun

Perceiving Systems

Varun Jampani

Autonomous Vision, Perceiving Systems

Andreas Geiger

Guest Scientist

Perceiving Systems, Autonomous Vision

Joel Janai

Doctoral Researcher

Perceiving Systems, Autonomous Vision

Fatma Güney

Doctoral Researcher

Perceiving Systems

Laura Sevilla

Publications

Perceiving Systems Article Learning Multi-Human Optical Flow Ranjan, A., Hoffmann, D. T., Tzionas, D., Tang, S., Romero, J., Black, M. J. International Journal of Computer Vision (IJCV), 128(4):873-890, April 2020 (Published)

Abstract ›

The optical flow of humans is well known to be useful for the analysis of human action. Recent optical flow methods focus on training deep networks to approach the problem. However, the training data used by them does not cover the domain of human motion. Therefore, we develop a dataset of multi-human optical flow and train optical flow networks on this dataset. We use a 3D model of the human body and motion capture data to synthesize realistic flow fields in both single-and multi-person images. We then train optical flow networks to estimate human flow fields from pairs of images. We demonstrate that our trained networks are more accurate than a wide range of top methods on held-out test data and that they can generalize well to real image sequences. The code, trained models and the dataset are available for research.

pdf DOI poster DOI URL BibTeX

Perceiving Systems Autonomous Vision Conference Paper Attacking Optical Flow Ranjan, A., Janai, J., Geiger, A., Black, M. J. In Proceedings International Conference on Computer Vision (ICCV), 2404-2413, IEEE, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), November 2019, ISSN: 2380-7504 (Published)

Abstract ›

Deep neural nets achieve state-of-the-art performance on the problem of optical flow estimation. Since optical flow is used in several safety-critical applications like self-driving cars, it is important to gain insights into the robustness of those techniques. Recently, it has been shown that adversarial attacks easily fool deep neural networks to misclassify objects. The robustness of optical flow networks to adversarial attacks, however, has not been studied so far. In this paper, we extend adversarial patch attacks to optical flow networks and show that such attacks can compromise their performance. We show that corrupting a small patch of less than 1% of the image size can significantly affect optical flow estimates. Our attacks lead to noisy flow estimates that extend significantly beyond the region of the attack, in many cases even completely erasing the motion of objects in the scene. While networks using an encoder-decoder architecture are very sensitive to these attacks, we found that networks using a spatial pyramid architecture are less affected. We analyse the success and failure of attacking both architectures by visualizing their feature maps and comparing them to classical optical flow techniques which are robust to these attacks. We also demonstrate that such attacks are practical by placing a printed pattern into real scenes.

Video Project Page Paper Supplementary Material DOI URL BibTeX

Perceiving Systems Conference Paper Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation Ranjan, A., Jampani, V., Balles, L., Kim, K., Sun, D., Wulff, J., Black, M. J. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 12240-12249, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2019

Abstract ›

We address the unsupervised learning of several interconnected problems in low-level vision: single view depth prediction, camera motion estimation, optical flow, and segmentation of a video into the static scene and moving regions. Our key insight is that these four fundamental vision problems are coupled through geometric constraints. Consequently, learning to solve them together simplifies the problem because the solutions can reinforce each other. We go beyond previous work by exploiting geometry more explicitly and segmenting the scene into static and moving regions. To that end, we introduce Competitive Collaboration, a framework that facilitates the coordinated training of multiple specialized neural networks to solve complex problems. Competitive Collaboration works much like expectation-maximization, but with neural networks that act as both competitors to explain pixels that correspond to static or moving regions, and as collaborators through a moderator that assigns pixels to be either static or independently moving. Our novel method integrates all these problems in a common framework and simultaneously reasons about the segmentation of the scene into moving objects and the static background, the camera motion, depth of the static scene structure, and the optical flow of moving objects. Our model is trained without any supervision and achieves state-of-the-art performance among joint unsupervised methods on all sub-problems.

Paper URL BibTeX

Perceiving Systems Conference Paper Temporal Interpolation as an Unsupervised Pretraining Task for Optical Flow Estimation Wulff, J., Black, M. J. In German Conference on Pattern Recognition (GCPR), LNCS 11269:567-582, Springer, Cham, October 2018

Abstract ›

The difficulty of annotating training data is a major obstacle to using CNNs for low-level tasks in video. Synthetic data often does not generalize to real videos, while unsupervised methods require heuristic n losses. Proxy tasks can overcome these issues, and start by training a network for a task for which annotation is easier or which can be trained unsupervised. The trained network is then fine-tuned for the original task using small amounts of ground truth data. Here, we investigate frame interpolation as a proxy task for optical flow. Using real movies, we train a CNN unsupervised for temporal interpolation. Such a network implicitly estimates motion, but cannot handle untextured regions. By fine-tuning on small amounts of ground truth flow, the network can learn to fill in homogeneous regions and compute full optical flow fields. Using this unsupervised pre-training, our network outperforms similar architectures that were trained supervised using synthetic optical flow.

pdf arXiv DOI BibTeX

Perceiving Systems Conference Paper Learning Human Optical Flow Ranjan, A., Romero, J., Black, M. J. In 29th British Machine Vision Conference, September 2018

Abstract ›

The optical flow of humans is well known to be useful for the analysis of human action. Given this, we devise an optical flow algorithm specifically for human motion and show that it is superior to generic flow methods. Designing a method by hand is impractical, so we develop a new training database of image sequences with ground truth optical flow. For this we use a 3D model of the human body and motion capture data to synthesize realistic flow fields. We then train a convolutional neural network to estimate human flow fields from pairs of images. Since many applications in human motion analysis depend on speed, and we anticipate mobile applications, we base our method on SpyNet with several modifications. We demonstrate that our trained network is more accurate than a wide range of top methods on held-out test data and that it generalizes well to real image sequences. When combined with a person detector/tracker, the approach provides a full solution to the problem of 2D human flow estimation. Both the code and the dataset are available for research.

video code pdf URL BibTeX

Autonomous Vision Perceiving Systems Conference Paper Unsupervised Learning of Multi-Frame Optical Flow with Occlusions Janai, J., Güney, F., Ranjan, A., Black, M. J., Geiger, A. In European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, vol 11220:713-731, Springer, Cham, September 2018 pdf suppmat Video Project Page DOI BibTeX

Perceiving Systems Conference Paper Optical Flow Estimation using a Spatial Pyramid Network Ranjan, A., Black, M. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, 2720-2729, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017

Abstract ›

We learn to compute optical flow by combining a classical spatial-pyramid formulation with deep learning. This estimates large motions in a coarse-to-fine approach by warping one image of a pair at each pyramid level by the current flow estimate and computing an update to the flow. Instead of the standard minimization of an objective function at each pyramid level, we train one deep network per level to compute the flow update. Unlike the recent FlowNet approach, the networks do not need to deal with large motions; these are dealt with by the pyramid. This has several advantages. First, our Spatial Pyramid Network (SPyNet) is much simpler and 96% smaller than FlowNet in terms of model parameters. This makes it more efficient and appropriate for embedded applications. Second, since the flow at each pyramid level is small (< 1 pixel), a convolutional approach applied to pairs of warped images is appropriate. Third, unlike FlowNet, the learned convolution filters appear similar to classical spatio-temporal filters, giving insight into the method and how to improve it. Our results are more accurate than FlowNet on most standard benchmarks, suggesting a new direction of combining classical flow methods with deep learning.

pdf SupMat project/code BibTeX