Perceiving Systems Conference Paper 2017

Semantic Video CNNs through Representation Warping

Thumb ticker sm raghudeep
Perceiving Systems
Thumb ticker sm thumb varun
Perceiving Systems
Thumb ticker sm petergehler copy
Perceiving Systems
  • Research Group Leader
Website teaser

In this work, we propose a technique to convert CNN models for semantic segmentation of static images into CNNs for video data. We describe a warping method that can be used to augment existing architectures with very lit- tle extra computational cost. This module is called Net- Warp and we demonstrate its use for a range of network architectures. The main design principle is to use optical flow of adjacent frames for warping internal network repre- sentations across time. A key insight of this work is that fast optical flow methods can be combined with many different CNN architectures for improved performance and end-to- end training. Experiments validate that the proposed ap- proach incurs only little extra computational cost, while im- proving performance, when video streams are available. We achieve new state-of-the-art results on the standard CamVid and Cityscapes benchmark datasets and show reliable im- provements over different baseline networks. Our code and models are available at http://segmentation.is. tue.mpg.de

Author(s): Gadde, Raghudeep and Jampani, Varun and Gehler, Peter V.
Book Title: Proceedings IEEE International Conference on Computer Vision (ICCV)
Pages: 4463-4472
Year: 2017
Month: October
Day: 22-29
Publisher: IEEE
Project(s):
Bibtex Type: Conference Paper (inproceedings)
Address: Piscataway, NJ, USA
Event Name: IEEE International Conference on Computer Vision (ICCV)
Event Place: Venice, Italy
State: Accepted
Electronic Archiving: grant_archive
ISBN: 978-1-5386-1032-9
ISSN: 2380-7504
Attachments:

BibTex

@inproceedings{gadde2017semantic,
  title = {Semantic Video {CNNs} through Representation Warping},
  booktitle = {Proceedings IEEE International Conference on Computer Vision (ICCV)},
  abstract = {In this work, we propose a technique to convert CNN
  models for semantic segmentation of static images into
  CNNs for video data. We describe a warping method that
  can be used to augment existing architectures with very lit-
  tle extra computational cost. This module is called Net-
  Warp and we demonstrate its use for a range of network
  architectures. The main design principle is to use optical
  flow of adjacent frames for warping internal network repre-
  sentations across time. A key insight of this work is that fast
  optical flow methods can be combined with many different
  CNN architectures for improved performance and end-to-
  end training. Experiments validate that the proposed ap-
  proach incurs only little extra computational cost, while im-
  proving performance, when video streams are available. We
  achieve new state-of-the-art results on the standard CamVid
  and Cityscapes benchmark datasets and show reliable im-
  provements over different baseline networks. Our code and
  models are available at http://segmentation.is.
  tue.mpg.de},
  pages = {4463-4472},
  publisher = {IEEE},
  address = {Piscataway, NJ, USA},
  month = oct,
  year = {2017},
  slug = {gadde2017semantic},
  author = {Gadde, Raghudeep and Jampani, Varun and Gehler, Peter V.},
  month_numeric = {10}
}