Computer Vision Performance Evaluation

While ground truth datasets spur innovation, many current datasets for evaluating stereo, optical flow, scene flow and other tasks are restricted in terms of size, complexity, and diversity, making it difficult to train and test on realistic data. For example, we co-authored the Middlebury flow dataset [], which arguably set a standard for the field but was limited in terms of complexity.

In [], we took advantage of an autonomous driving platform to develop challenging real-world benchmarks for stereo, optical flow, scene flow, visual odometry/SLAM, 3D object detection, 3D tracking and road/lane detection. Accurate ground truth is provided by a Velodyne laser scanner and a GPS localization system. Our datasets are captured by driving around a mid-size city of Karlsruhe, in rural areas and on highways with up to 15 cars and 30 pedestrians visible per image. For each of our benchmarks, we also provide a set of evaluation metrics and a server for evaluating results on the test set. Our experiments showed that moving outside the laboratory to the real world was critical. We continue to develop new ground truth to push the field further.

In [], we proposed a novel optical flow, stereo and scene flow data set derived from the open source 3D animated short film Sintel. We extracted 35 sequences displaying different environments, characters/objects, and actions and showed that the image and motion statistics of Sintel are similar to natural movies. Using the 3D source data, we created an optical flow data set exhibits important features not present in previous datasets: long sequences, large motions, non-rigidly moving objects, specular reflections, motion blur, defocus blur, and atmospheric effects. We released the ground truth optical flow for 23 training sequences and withheld the remaining 12 sequences for evaluation purposes. When released in 2012, the best methods had an average endpoint error of around 10 pixels. The dataset has focused the community on core problems and only 3.5 years later, there are over 70 methods evaluated on the benchmark with the best methods are approaching 5 pixels in error.

Members

Autonomous Vision, Perceiving Systems

Andreas Geiger

Guest Scientist

Perceiving Systems

Jonas Wulff

Doctoral Researcher

Perceiving Systems

Michael Black

Director

Perceiving Systems, Autonomous Vision

Moritz Menze

Perceiving Systems

Daniel Butler

Publications

Perceiving Systems Autonomous Vision Conference Paper Object Scene Flow for Autonomous Vehicles Menze, M., Geiger, A. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) 2015, 3061-3070, IEEE, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2015

Abstract ›

This paper proposes a novel model and dataset for 3D scene flow estimation with an application to autonomous driving. Taking advantage of the fact that outdoor scenes often decompose into a small number of independently moving objects, we represent each element in the scene by its rigid motion parameters and each superpixel by a 3D plane as well as an index to the corresponding object. This minimal representation increases robustness and leads to a discrete-continuous CRF where the data term decomposes into pairwise potentials between superpixels and objects. Moreover, our model intrinsically segments the scene into its constituting dynamic components. We demonstrate the performance of our model on existing benchmarks as well as a novel realistic dataset with scene flow ground truth. We obtain this dataset by annotating 400 dynamic scenes from the KITTI raw data collection using detailed 3D CAD models for all vehicles in motion. Our experiments also reveal novel challenges which can't be handled by existing methods.

pdf abstract suppmat DOI BibTeX

Perceiving Systems Autonomous Vision Article Vision meets Robotics: The KITTI Dataset Geiger, A., Lenz, P., Stiller, C., Urtasun, R. International Journal of Robotics Research, 32(11):1231 - 1237 , Sage Publishing, September 2013

Abstract ›

We present a novel dataset captured from a VW station wagon for use in mobile robotics and autonomous driving research. In total, we recorded 6 hours of traffic scenarios at 10-100 Hz using a variety of sensor modalities such as high-resolution color and grayscale stereo cameras, a Velodyne 3D laser scanner and a high-precision GPS/IMU inertial navigation system. The scenarios are diverse, capturing real-world traffic situations and range from freeways over rural areas to inner-city scenes with many static and dynamic objects. Our data is calibrated, synchronized and timestamped, and we provide the rectified and raw image sequences. Our dataset also contains object labels in the form of 3D tracklets and we provide online benchmarks for stereo, optical flow, object detection and other tasks. This paper describes our recording platform, the data format and the utilities that we provide.

pdf DOI BibTeX

Perceiving Systems Conference Paper A naturalistic open source movie for optical flow evaluation Butler, D. J., Wulff, J., Stanley, G. B., Black, M. J. In European Conf. on Computer Vision (ECCV), 611-625, Part IV, LNCS 7577, (Editors: A. Fitzgibbon et al. (Eds.)), Springer-Verlag, October 2012

Abstract ›

Ground truth optical flow is difficult to measure in real scenes with natural motion. As a result, optical flow data sets are restricted in terms of size, complexity, and diversity, making optical flow algorithms difficult to train and test on realistic data. We introduce a new optical flow data set derived from the open source 3D animated short film Sintel. This data set has important features not present in the popular Middlebury flow evaluation: long sequences, large motions, specular reflections, motion blur, defocus blur, and atmospheric effects. Because the graphics data that generated the movie is open source, we are able to render scenes under conditions of varying complexity to evaluate where existing flow algorithms fail. We evaluate several recent optical flow algorithms and find that current highly-ranked methods on the Middlebury evaluation have difficulty with this more complex data set suggesting further research on optical flow estimation is needed. To validate the use of synthetic data, we compare the image- and flow-statistics of Sintel to those of real films and videos and show that they are similar. The data set, metrics, and evaluation website are publicly available.

pdf dataset youtube talk supplemental material BibTeX

Perceiving Systems Conference Paper Lessons and insights from creating a synthetic optical flow benchmark Wulff, J., Butler, D. J., Stanley, G. B., Black, M. J. In ECCV Workshop on Unsolved Problems in Optical Flow and Stereo Estimation, 168-177, Part II, LNCS 7584, (Editors: A. Fusiello et al. (Eds.)), Springer-Verlag, October 2012 pdf dataset poster youtube BibTeX

Perceiving Systems Article A Database and Evaluation Methodology for Optical Flow Baker, S., Scharstein, D., Lewis, J. P., Roth, S., Black, M. J., Szeliski, R. International Journal of Computer Vision, 92(1):1-31, March 2011

Abstract ›

The quantitative evaluation of optical flow algorithms by Barron et al. (1994) led to significant advances in performance. The challenges for optical flow algorithms today go beyond the datasets and evaluation methods proposed in that paper. Instead, they center on problems associated with complex natural scenes, including nonrigid motion, real sensor noise, and motion discontinuities. We propose a new set of benchmarks and evaluation methods for the next generation of optical flow algorithms. To that end, we contribute four types of data to test different aspects of optical flow algorithms: (1) sequences with nonrigid motion where the ground-truth flow is determined by tracking hidden fluorescent texture, (2) realistic synthetic sequences, (3) high frame-rate video used to study interpolation error, and (4) modified stereo sequences of static scenes. In addition to the average angular error used by Barron et al., we compute the absolute flow endpoint error, measures for frame interpolation error, improved statistics, and results at motion discontinuities and in textureless regions. In October 2007, we published the performance of several well-known methods on a preliminary version of our data to establish the current state of the art. We also made the data freely available on the web at http://vision.middlebury.edu/flow/ . Subsequently a number of researchers have uploaded their results to our website and published papers using the data. A significant improvement in performance has already been achieved. In this paper we analyze the results obtained to date and draw a large number of conclusions from them.

pdf pdf from publisher Middlebury Flow Evaluation Website DOI BibTeX

Perceiving Systems Conference Paper An additive latent feature model for transparent object recognition Fritz, M., Black, M., Bradski, G., Karayev, S., Darrell, T. In Advances in Neural Information Processing Systems 22, NIPS, 558-566, MIT Press, 2009 pdf slides BibTeX