Institute Homepage

Institute Homepage Sign In

Perceiving Systems Conference Paper 2018

Temporal Interpolation as an Unsupervised Pretraining Task for Optical Flow Estimation

Perceiving Systems

Jonas Wulff

Doctoral Researcher

Perceiving Systems

Michael Black

Emeritus / Acting Director

The difficulty of annotating training data is a major obstacle to using CNNs for low-level tasks in video. Synthetic data often does not generalize to real videos, while unsupervised methods require heuristic n losses. Proxy tasks can overcome these issues, and start by training a network for a task for which annotation is easier or which can be trained unsupervised. The trained network is then fine-tuned for the original task using small amounts of ground truth data. Here, we investigate frame interpolation as a proxy task for optical flow. Using real movies, we train a CNN unsupervised for temporal interpolation. Such a network implicitly estimates motion, but cannot handle untextured regions. By fine-tuning on small amounts of ground truth flow, the network can learn to fill in homogeneous regions and compute full optical flow fields. Using this unsupervised pre-training, our network outperforms similar architectures that were trained supervised using synthetic optical flow.

Author(s):	Jonas Wulff and Michael J. Black
Links:	pdf arXiv
Book Title:	German Conference on Pattern Recognition (GCPR)
Volume:	LNCS 11269
Pages:	567--582
Year:	2018
Month:	October
Publisher:	Springer, Cham

Project(s):	Learning Optical Flow
BibTeX Type:	Conference Paper (inproceedings)

DOI:	https://doi.org/10.1007/978-3-030-12939-2_39

Electronic Archiving:	grant_archive

BibTeX

@inproceedings{Wulff:GCPR:2018,
  title = {Temporal Interpolation as an Unsupervised Pretraining Task for Optical Flow Estimation},
  booktitle = {German Conference on Pattern Recognition (GCPR)},
  abstract = {The difficulty of annotating training data is a major obstacle to using CNNs for low-level tasks in video. Synthetic data often does not generalize to real videos, while unsupervised methods require heuristic n losses. Proxy tasks can overcome these issues, and start by training a network for a task for which annotation is easier or which can be trained unsupervised. The trained network is then fine-tuned for the original task using small amounts of ground truth data. Here, we investigate frame interpolation
  as a proxy task for optical flow. Using real movies, we train a CNN unsupervised for temporal interpolation. Such a network implicitly estimates motion, but cannot handle untextured regions. By fine-tuning on small amounts of ground truth flow, the network can learn to fill in homogeneous regions and compute full optical flow fields. Using this unsupervised pre-training, our network outperforms similar architectures that were trained supervised using synthetic optical flow.},
  volume = {LNCS 11269},
  pages = {567--582},
  publisher = {Springer, Cham},
  month = oct,
  year = {2018},
  author = {Wulff, Jonas and Black, Michael J.},
  doi = {https://doi.org/10.1007/978-3-030-12939-2_39},
  month_numeric = {10}
}