Header logo is


2018


Thumb xl sevillagcpr
On the Integration of Optical Flow and Action Recognition

Sevilla-Lara, L., Liao, Y., Güney, F., Jampani, V., Geiger, A., Black, M. J.

In German Conference on Pattern Recognition (GCPR), LNCS 11269, pages: 281-297, Springer, Cham, October 2018 (inproceedings)

Abstract
Most of the top performing action recognition methods use optical flow as a "black box" input. Here we take a deeper look at the combination of flow and action recognition, and investigate why optical flow is helpful, what makes a flow method good for action recognition, and how we can make it better. In particular, we investigate the impact of different flow algorithms and input transformations to better understand how these affect a state-of-the-art action recognition method. Furthermore, we fine tune two neural-network flow methods end-to-end on the most widely used action recognition dataset (UCF101). Based on these experiments, we make the following five observations: 1) optical flow is useful for action recognition because it is invariant to appearance, 2) optical flow methods are optimized to minimize end-point-error (EPE), but the EPE of current methods is not well correlated with action recognition performance, 3) for the flow methods tested, accuracy at boundaries and at small displacements is most correlated with action recognition performance, 4) training optical flow to minimize classification error instead of minimizing EPE improves recognition performance, and 5) optical flow learned for the task of action recognition differs from traditional optical flow especially inside the human body and at the boundary of the body. These observations may encourage optical flow researchers to look beyond EPE as a goal and guide action recognition researchers to seek better motion cues, leading to a tighter integration of the optical flow and action recognition communities.

avg ps

arXiv DOI [BibTex]

2018


arXiv DOI [BibTex]


Thumb xl iros18
Towards Robust Visual Odometry with a Multi-Camera System

Liu, P., Geppert, M., Heng, L., Sattler, T., Geiger, A., Pollefeys, M.

In International Conference on Intelligent Robots and Systems (IROS) 2018, International Conference on Intelligent Robots and Systems, October 2018 (inproceedings)

Abstract
We present a visual odometry (VO) algorithm for a multi-camera system and robust operation in challenging environments. Our algorithm consists of a pose tracker and a local mapper. The tracker estimates the current pose by minimizing photometric errors between the most recent keyframe and the current frame. The mapper initializes the depths of all sampled feature points using plane-sweeping stereo. To reduce pose drift, a sliding window optimizer is used to refine poses and structure jointly. Our formulation is flexible enough to support an arbitrary number of stereo cameras. We evaluate our algorithm thoroughly on five datasets. The datasets were captured in different conditions: daytime, night-time with near-infrared (NIR) illumination and night-time without NIR illumination. Experimental results show that a multi-camera setup makes the VO more robust to challenging environments, especially night-time conditions, in which a single stereo configuration fails easily due to the lack of features.

avg

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb xl ianeccv18
Learning Priors for Semantic 3D Reconstruction

Cherabier, I., Schönberger, J., Oswald, M., Pollefeys, M., Geiger, A.

In Computer Vision – ECCV 2018, Springer International Publishing, Cham, September 2018 (inproceedings)

Abstract
We present a novel semantic 3D reconstruction framework which embeds variational regularization into a neural network. Our network performs a fixed number of unrolled multi-scale optimization iterations with shared interaction weights. In contrast to existing variational methods for semantic 3D reconstruction, our model is end-to-end trainable and captures more complex dependencies between the semantic labels and the 3D geometry. Compared to previous learning-based approaches to 3D reconstruction, we integrate powerful long-range dependencies using variational coarse-to-fine optimization. As a result, our network architecture requires only a moderate number of parameters while keeping a high level of expressiveness which enables learning from very little data. Experiments on real and synthetic datasets demonstrate that our network achieves higher accuracy compared to a purely variational approach while at the same time requiring two orders of magnitude less iterations to converge. Moreover, our approach handles ten times more semantic class labels using the same computational resources.

avg

pdf suppmat Project Page Video DOI Project Page [BibTex]

pdf suppmat Project Page Video DOI Project Page [BibTex]


Thumb xl joeleccv18
Unsupervised Learning of Multi-Frame Optical Flow with Occlusions

Janai, J., Güney, F., Ranjan, A., Black, M. J., Geiger, A.

In European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, vol 11220, pages: 713-731, Springer, Cham, September 2018 (inproceedings)

avg ps

pdf suppmat Video Project Page DOI Project Page [BibTex]

pdf suppmat Video Project Page DOI Project Page [BibTex]


Thumb xl beneccv18
SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images

Coors, B., Condurache, A. P., Geiger, A.

European Conference on Computer Vision (ECCV), September 2018 (conference)

Abstract
Omnidirectional cameras offer great benefits over classical cameras wherever a wide field of view is essential, such as in virtual reality applications or in autonomous robots. Unfortunately, standard convolutional neural networks are not well suited for this scenario as the natural projection surface is a sphere which cannot be unwrapped to a plane without introducing significant distortions, particularly in the polar regions. In this work, we present SphereNet, a novel deep learning framework which encodes invariance against such distortions explicitly into convolutional neural networks. Towards this goal, SphereNet adapts the sampling locations of the convolutional filters, effectively reversing distortions, and wraps the filters around the sphere. By building on regular convolutions, SphereNet enables the transfer of existing perspective convolutional neural network models to the omnidirectional case. We demonstrate the effectiveness of our method on the tasks of image classification and object detection, exploiting two newly created semi-synthetic and real-world omnidirectional datasets.

avg

pdf suppmat Project Page [BibTex]


Thumb xl andrease teaser 2
Robust Dense Mapping for Large-Scale Dynamic Environments

Barsan, I. A., Liu, P., Pollefeys, M., Geiger, A.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2018, IEEE, International Conference on Robotics and Automation, May 2018 (inproceedings)

Abstract
We present a stereo-based dense mapping algorithm for large-scale dynamic urban environments. In contrast to other existing methods, we simultaneously reconstruct the static background, the moving objects, and the potentially moving but currently stationary objects separately, which is desirable for high-level mobile robotic tasks such as path planning in crowded environments. We use both instance-aware semantic segmentation and sparse scene flow to classify objects as either background, moving, or potentially moving, thereby ensuring that the system is able to model objects with the potential to transition from static to dynamic, such as parked cars. Given camera poses estimated from visual odometry, both the background and the (potentially) moving objects are reconstructed separately by fusing the depth maps computed from the stereo input. In addition to visual odometry, sparse scene flow is also used to estimate the 3D motions of the detected moving objects, in order to reconstruct them accurately. A map pruning technique is further developed to improve reconstruction accuracy and reduce memory consumption, leading to increased scalability. We evaluate our system thoroughly on the well-known KITTI dataset. Our system is capable of running on a PC at approximately 2.5Hz, with the primary bottleneck being the instance-aware semantic segmentation, which is a limitation we hope to address in future work.

avg

pdf Video Project Page Project Page [BibTex]

pdf Video Project Page Project Page [BibTex]


Thumb xl us20180021892a1 20180125 d00000
Method and device for reversibly attaching a phase changing metal to an object

Zhou Ye, G. Z. L. M. S.

US Patent Application US 2018/0021892 A1, January 2018 (patent)

Abstract
A method for reversibly attaching a phase changing metal to an object, the method comprising the steps of: providing a substrate having at least one surface at which the phase changing metal is attached, heating the phase changing metal above a phase changing temperature at which the phase changing metal changes its phase from solid to liquid, bringing the phase changing metal, when the phase changing metal is in the liquid phase or before the phase changing metal is brought into the liquid phase, into contact with the object, permitting the phase changing metal to cool below the phase changing temperature, whereby the phase changing metal becomes solid and the object and the phase changing metal become attached to each other, reheating the phase changing metal above the phase changing temperature to liquefy the phase changing metal, and removing the substrate from the object, with the phase changing metal separating from the object and remaining with the substrate.

pi

US Patent Application Database US Patent Application (PDF) [BibTex]


Thumb xl us20180012693a1 20180111 d00000
Method of fabricating a shape-changeable magentic member, method of producing a shape changeable magnetic member and shape changeable magnetic member

Guo Zhan Lum, Z. Y. M. S.

US Patent Application US 2018/0012693 A1, January 2018 (patent)

Abstract
The present invention relates to a method of fabricating a shape-changeable magnetic member comprising a plurality of segments with each segment being able to be magnetized with a desired magnitude and orientation of magnetization, to a method of producing a shape changeable magnetic member composed of a plurality of segments and to a shape changeable magnetic member.

pi

US Patent Application Database US Patent Application (PDF) [BibTex]


Thumb xl despoina paper teaser
RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials

Paschalidou, D., Ulusoy, A. O., Schmitt, C., Gool, L., Geiger, A.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
In this paper, we consider the problem of reconstructing a dense 3D model using images captured from different views. Recent methods based on convolutional neural networks (CNN) allow learning the entire task from data. However, they do not incorporate the physics of image formation such as perspective geometry and occlusion. Instead, classical approaches based on Markov Random Fields (MRF) with ray-potentials explicitly model these physical processes, but they cannot cope with large surface appearance variations across different viewpoints. In this paper, we propose RayNet, which combines the strengths of both frameworks. RayNet integrates a CNN that learns view-invariant feature representations with an MRF that explicitly encodes the physics of perspective projection and occlusion. We train RayNet end-to-end using empirical risk minimization. We thoroughly evaluate our approach on challenging real-world datasets and demonstrate its benefits over a piece-wise trained baseline, hand-crafted models as well as other learning-based approaches.

avg

pdf suppmat Video Project Page code Poster Project Page [BibTex]

pdf suppmat Video Project Page code Poster Project Page [BibTex]


no image
Enhanced Non-Steady Gliding Performance of the MultiMo-Bat through Optimal Airfoil Configuration and Control Strategy

Kim, H., Woodward, M. A., Sitti, M.

In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages: 1382-1388, 2018 (inproceedings)

pi

[BibTex]

[BibTex]


Thumb xl yiyi paper teaser
Deep Marching Cubes: Learning Explicit Surface Representations

Liao, Y., Donne, S., Geiger, A.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
Existing learning based solutions to 3D surface prediction cannot be trained end-to-end as they operate on intermediate representations (eg, TSDF) from which 3D surface meshes must be extracted in a post-processing step (eg, via the marching cubes algorithm). In this paper, we investigate the problem of end-to-end 3D surface prediction. We first demonstrate that the marching cubes algorithm is not differentiable and propose an alternative differentiable formulation which we insert as a final layer into a 3D convolutional neural network. We further propose a set of loss functions which allow for training our model with sparse point supervision. Our experiments demonstrate that the model allows for predicting sub-voxel accurate 3D shapes of arbitrary topology. Additionally, it learns to complete shapes and to separate an object's inside from its outside even in the presence of sparse and incomplete ground truth. We investigate the benefits of our approach on the task of inferring shapes from 3D point clouds. Our model is flexible and can be combined with a variety of shape encoder and shape inference techniques.

avg

pdf suppmat Video Project Page Poster Project Page [BibTex]

pdf suppmat Video Project Page Poster Project Page [BibTex]


Thumb xl teaser andreas
Semantic Visual Localization

Schönberger, J., Pollefeys, M., Geiger, A., Sattler, T.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
Robust visual localization under a wide range of viewing conditions is a fundamental problem in computer vision. Handling the difficult cases of this problem is not only very challenging but also of high practical relevance, eg, in the context of life-long localization for augmented reality or autonomous robots. In this paper, we propose a novel approach based on a joint 3D geometric and semantic understanding of the world, enabling it to succeed under conditions where previous approaches failed. Our method leverages a novel generative model for descriptor learning, trained on semantic scene completion as an auxiliary task. The resulting 3D descriptors are robust to missing observations by encoding high-level 3D geometric and semantic information. Experiments on several challenging large-scale localization datasets demonstrate reliable localization under extreme viewpoint, illumination, and geometry changes.

avg

pdf suppmat Poster Project Page [BibTex]

pdf suppmat Poster Project Page [BibTex]


Thumb xl eigval gradpen
Which Training Methods for GANs do actually Converge?

Mescheder, L., Geiger, A., Nowozin, S.

International Conference on Machine learning (ICML), 2018 (conference)

Abstract
Recent work has shown local convergence of GAN training for absolutely continuous data and generator distributions. In this paper, we show that the requirement of absolute continuity is necessary: we describe a simple yet prototypical counterexample showing that in the more realistic case of distributions that are not absolutely continuous, unregularized GAN training is not always convergent. Furthermore, we discuss regularization strategies that were recently proposed to stabilize GAN training. Our analysis shows that GAN training with instance noise or zero-centered gradient penalties converges. On the other hand, we show that Wasserstein-GANs and WGAN-GP with a finite number of discriminator updates per generator update do not always converge to the equilibrium point. We discuss these results, leading us to a new explanation for the stability problems of GAN training. Based on our analysis, we extend our convergence results to more general GANs and prove local convergence for simplified gradient penalties even if the generator and data distributions lie on lower dimensional manifolds. We find these penalties to work well in practice and use them to learn high-resolution generative image models for a variety of datasets with little hyperparameter tuning.

avg

code video paper supplement slides poster Project Page [BibTex]


no image
Collectives of Spinning Mobile Microrobots for Navigation and Object Manipulation at the Air-Water Interface

Wang, W., Kishore, V., Koens, L., Lauga, E., Sitti, M.

In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages: 1-9, 2018 (inproceedings)

pi

[BibTex]

[BibTex]


Thumb xl david paper teaser
Learning 3D Shape Completion from Laser Scan Data with Weak Supervision

Stutz, D., Geiger, A.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
3D shape completion from partial point clouds is a fundamental problem in computer vision and computer graphics. Recent approaches can be characterized as either data-driven or learning-based. Data-driven approaches rely on a shape model whose parameters are optimized to fit the observations. Learning-based approaches, in contrast, avoid the expensive optimization step and instead directly predict the complete shape from the incomplete observations using deep neural networks. However, full supervision is required which is often not available in practice. In this work, we propose a weakly-supervised learning-based approach to 3D shape completion which neither requires slow optimization nor direct supervision. While we also learn a shape prior on synthetic data, we amortize, ie, learn, maximum likelihood fitting using deep neural networks resulting in efficient shape completion without sacrificing accuracy. Tackling 3D shape completion of cars on ShapeNet and KITTI, we demonstrate that the proposed amortized maximum likelihood approach is able to compete with a fully supervised baseline and a state-of-the-art data-driven approach while being significantly faster. On ModelNet, we additionally show that the approach is able to generalize to other object categories as well.

avg

pdf suppmat Project Page Poster Project Page [BibTex]

pdf suppmat Project Page Poster Project Page [BibTex]


no image
Endo-VMFuseNet: A Deep Visual-Magnetic Sensor Fusion Approach for Endoscopic Capsule Robots

Turan, M., Almalioglu, Y., Gilbert, H. B., Sari, A. E., Soylu, U., Sitti, M.

In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages: 1-7, 2018 (inproceedings)

pi

[BibTex]

[BibTex]


no image
Endosensorfusion: Particle filtering-based multi-sensory data fusion with switching state-space model for endoscopic capsule robots

Turan, M., Almalioglu, Y., Gilbert, H., Araujo, H., Cemgil, T., Sitti, M.

In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages: 1-8, 2018 (inproceedings)

pi

[BibTex]

[BibTex]


Thumb xl benvisapp
Learning Transformation Invariant Representations with Weak Supervision

Coors, B., Condurache, A., Mertins, A., Geiger, A.

In International Conference on Computer Vision Theory and Applications, International Conference on Computer Vision Theory and Applications, 2018 (inproceedings)

Abstract
Deep convolutional neural networks are the current state-of-the-art solution to many computer vision tasks. However, their ability to handle large global and local image transformations is limited. Consequently, extensive data augmentation is often utilized to incorporate prior knowledge about desired invariances to geometric transformations such as rotations or scale changes. In this work, we combine data augmentation with an unsupervised loss which enforces similarity between the predictions of augmented copies of an input sample. Our loss acts as an effective regularizer which facilitates the learning of transformation invariant representations. We investigate the effectiveness of the proposed similarity loss on rotated MNIST and the German Traffic Sign Recognition Benchmark (GTSRB) in the context of different classification models including ladder networks. Our experiments demonstrate improvements with respect to the standard data augmentation approach for supervised and semi-supervised learning tasks, in particular in the presence of little annotated data. In addition, we analyze the performance of the proposed approach with respect to its hyperparameters, including the strength of the regularization as well as the layer where representation similarity is enforced.

avg

pdf [BibTex]

pdf [BibTex]

2017


Thumb xl larsnips
The Numerics of GANs

Mescheder, L., Nowozin, S., Geiger, A.

In Proceedings from the conference "Neural Information Processing Systems 2017., (Editors: Guyon I. and Luxburg U.v. and Bengio S. and Wallach H. and Fergus R. and Vishwanathan S. and Garnett R.), Curran Associates, Inc., Advances in Neural Information Processing Systems 30 (NIPS), December 2017 (inproceedings)

Abstract
In this paper, we analyze the numerics of common algorithms for training Generative Adversarial Networks (GANs). Using the formalism of smooth two-player games we analyze the associated gradient vector field of GAN training objectives. Our findings suggest that the convergence of current algorithms suffers due to two factors: i) presence of eigenvalues of the Jacobian of the gradient vector field with zero real-part, and ii) eigenvalues with big imaginary part. Using these findings, we design a new algorithm that overcomes some of these limitations and has better convergence properties. Experimentally, we demonstrate its superiority on training common GAN architectures and show convergence on GAN architectures that are known to be notoriously hard to train.

avg

pdf Project Page [BibTex]

2017


pdf Project Page [BibTex]


Thumb xl teaser iccv2017
Bounding Boxes, Segmentations and Object Coordinates: How Important is Recognition for 3D Scene Flow Estimation in Autonomous Driving Scenarios?

Behl, A., Jafari, O. H., Mustikovela, S. K., Alhaija, H. A., Rother, C., Geiger, A.

In Proceedings IEEE International Conference on Computer Vision (ICCV), IEEE, Piscataway, NJ, USA, IEEE International Conference on Computer Vision (ICCV), October 2017 (inproceedings)

Abstract
Existing methods for 3D scene flow estimation often fail in the presence of large displacement or local ambiguities, e.g., at texture-less or reflective surfaces. However, these challenges are omnipresent in dynamic road scenes, which is the focus of this work. Our main contribution is to overcome these 3D motion estimation problems by exploiting recognition. In particular, we investigate the importance of recognition granularity, from coarse 2D bounding box estimates over 2D instance segmentations to fine-grained 3D object part predictions. We compute these cues using CNNs trained on a newly annotated dataset of stereo images and integrate them into a CRF-based model for robust 3D scene flow estimation - an approach we term Instance Scene Flow. We analyze the importance of each recognition cue in an ablation study and observe that the instance segmentation cue is by far strongest, in our setting. We demonstrate the effectiveness of our method on the challenging KITTI 2015 scene flow benchmark where we achieve state-of-the-art performance at the time of submission.

avg

pdf suppmat Poster Project Page [BibTex]

pdf suppmat Poster Project Page [BibTex]


Thumb xl jonas teaser
Sparsity Invariant CNNs

Uhrig, J., Schneider, N., Schneider, L., Franke, U., Brox, T., Geiger, A.

International Conference on 3D Vision (3DV) 2017, International Conference on 3D Vision (3DV), October 2017 (conference)

Abstract
In this paper, we consider convolutional neural networks operating on sparse inputs with an application to depth upsampling from sparse laser scan data. First, we show that traditional convolutional networks perform poorly when applied to sparse data even when the location of missing data is provided to the network. To overcome this problem, we propose a simple yet effective sparse convolution layer which explicitly considers the location of missing data during the convolution operation. We demonstrate the benefits of the proposed network architecture in synthetic and real experiments \wrt various baseline approaches. Compared to dense baselines, the proposed sparse convolution network generalizes well to novel datasets and is invariant to the level of sparsity in the data. For our evaluation, we derive a novel dataset from the KITTI benchmark, comprising 93k depth annotated RGB images. Our dataset allows for training and evaluating depth upsampling and depth prediction techniques in challenging real-world settings.

avg

pdf suppmat Project Page Project Page [BibTex]

pdf suppmat Project Page Project Page [BibTex]


Thumb xl gernot teaser
OctNetFusion: Learning Depth Fusion from Data

Riegler, G., Ulusoy, A. O., Bischof, H., Geiger, A.

International Conference on 3D Vision (3DV) 2017, International Conference on 3D Vision (3DV), October 2017 (conference)

Abstract
In this paper, we present a learning based approach to depth fusion, i.e., dense 3D reconstruction from multiple depth images. The most common approach to depth fusion is based on averaging truncated signed distance functions, which was originally proposed by Curless and Levoy in 1996. While this method is simple and provides great results, it is not able to reconstruct (partially) occluded surfaces and requires a large number frames to filter out sensor noise and outliers. Motivated by the availability of large 3D model repositories and recent advances in deep learning, we present a novel 3D CNN architecture that learns to predict an implicit surface representation from the input depth maps. Our learning based method significantly outperforms the traditional volumetric fusion approach in terms of noise reduction and outlier suppression. By learning the structure of real world 3D objects and scenes, our approach is further able to reconstruct occluded regions and to fill in gaps in the reconstruction. We demonstrate that our learning based approach outperforms both vanilla TSDF fusion as well as TV-L1 fusion on the task of volumetric fusion. Further, we demonstrate state-of-the-art 3D shape completion results.

avg

pdf Video 1 Video 2 Project Page Project Page [BibTex]

pdf Video 1 Video 2 Project Page Project Page [BibTex]


Thumb xl andreas teaser
Direct Visual Odometry for a Fisheye-Stereo Camera

Liu, P., Heng, L., Sattler, T., Geiger, A., Pollefeys, M.

In Proceedings IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, Piscataway, NJ, USA, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), September 2017 (inproceedings)

Abstract
We present a direct visual odometry algorithm for a fisheye-stereo camera. Our algorithm performs simultaneous camera motion estimation and semi-dense reconstruction. The pipeline consists of two threads: a tracking thread and a mapping thread. In the tracking thread, we estimate the camera pose via semi-dense direct image alignment. To have a wider field of view (FoV) which is important for robotic perception, we use fisheye images directly without converting them to conventional pinhole images which come with a limited FoV. To address the epipolar curve problem, plane-sweeping stereo is used for stereo matching and depth initialization. Multiple depth hypotheses are tracked for selected pixels to better capture the uncertainty characteristics of stereo matching. Temporal motion stereo is then used to refine the depth and remove false positive depth hypotheses. Our implementation runs at an average of 20 Hz on a low-end PC. We run experiments in outdoor environments to validate our algorithm, and discuss the experimental results. We experimentally show that we are able to estimate 6D poses with low drift, and at the same time, do semi-dense 3D reconstruction with high accuracy.

avg

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Thumb xl hassan paper teasere
Augmented Reality Meets Deep Learning for Car Instance Segmentation in Urban Scenes

Alhaija, H. A., Mustikovela, S. K., Mescheder, L., Geiger, A., Rother, C.

In Proceedings of the British Machine Vision Conference 2017, Proceedings of the British Machine Vision Conference, September 2017 (inproceedings)

Abstract
The success of deep learning in computer vision is based on the availability of large annotated datasets. To lower the need for hand labeled images, virtually rendered 3D worlds have recently gained popularity. Unfortunately, creating realistic 3D content is challenging on its own and requires significant human effort. In this work, we propose an alternative paradigm which combines real and synthetic data for learning semantic instance segmentation models. Exploiting the fact that not all aspects of the scene are equally important for this task, we propose to augment real-world imagery with virtual objects of the target category. Capturing real-world images at large scale is easy and cheap, and directly provides real background appearances without the need for creating complex 3D models of the environment. We present an efficient procedure to augment these images with virtual objects. This allows us to create realistic composite images which exhibit both realistic background appearance as well as a large number of complex object arrangements. In contrast to modeling complete 3D environments, our data augmentation approach requires only a few user interactions in combination with 3D shapes of the target object category. We demonstrate the utility of the proposed approach for training a state-of-the-art high-capacity deep model for semantic instance segmentation. In particular, we consider the task of segmenting car instances on the KITTI dataset which we have annotated with pixel-accurate ground truth. Our experiments demonstrate that models trained on augmented imagery generalize better than those trained on synthetic data or models trained on limited amounts of annotated real data.

avg

pdf Project Page [BibTex]

pdf Project Page [BibTex]


no image
Swimming in low reynolds numbers using planar and helical flagellar waves

Khalil, I. S. M., Tabak, A. F., Seif, M. A., Klingner, A., Adel, B., Sitti, M.

In International Conference on Intelligent Robots and Systems (IROS) 2017, pages: 1907-1912, International Conference on Intelligent Robots and Systems, September 2017 (inproceedings)

Abstract
In travelling towards the oviducts, sperm cells undergo transitions between planar to helical flagellar propulsion by a beating tail based on the viscosity of the environment. In this work, we aim to model and mimic this behaviour in low Reynolds number fluids using externally actuated soft robotic sperms. We numerically investigate the effects of transition between planar to helical flagellar propulsion on the swimming characteristics of the robotic sperm using a model based on resistive-force theory to study the role of viscous forces on its flexible tail. Experimental results are obtained using robots that contain magnetic particles within the polymer matrix of its head and an ultra-thin flexible tail. The planar and helical flagellar propulsion are achieved using in-plane and out-of-plane uniform fields with sinusoidally varying components, respectively. We experimentally show that the swimming speed of the robotic sperm increases by a factor of 1.4 (fluid viscosity 5 Pa.s) when it undergoes a controlled transition between planar to helical flagellar propulsion, at relatively low actuation frequencies.

pi

DOI [BibTex]

DOI [BibTex]


Thumb xl img01
Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks

Mescheder, L., Nowozin, S., Geiger, A.

In Proceedings of the 34th International Conference on Machine Learning, 70, Proceedings of Machine Learning Research, (Editors: Doina Precup, Yee Whye Teh), PMLR, International Conference on Machine Learning (ICML), August 2017 (inproceedings)

Abstract
Variational Autoencoders (VAEs) are expressive latent variable models that can be used to learn complex probability distributions from training data. However, the quality of the resulting model crucially relies on the expressiveness of the inference model. We introduce Adversarial Variational Bayes (AVB), a technique for training Variational Autoencoders with arbitrarily expressive inference models. We achieve this by introducing an auxiliary discriminative network that allows to rephrase the maximum-likelihood-problem as a two-player game, hence establishing a principled connection between VAEs and Generative Adversarial Networks (GANs). We show that in the nonparametric limit our method yields an exact maximum-likelihood assignment for the parameters of the generative model, as well as the exact posterior distribution over the latent variables given an observation. Contrary to competing approaches which combine VAEs with GANs, our approach has a clear theoretical justification, retains most advantages of standard Variational Autoencoders and is easy to implement.

avg

pdf suppmat Project Page arxiv-version Project Page [BibTex]

pdf suppmat Project Page arxiv-version Project Page [BibTex]


Thumb xl publications toc
An XY ϴz flexure mechanism with optimal stiffness properties

Lum, G. Z., Pham, M. T., Teo, T. J., Yang, G., Yeo, S. H., Sitti, M.

In 2017 IEEE International Conference on Advanced Intelligent Mechatronics (AIM), pages: 1103-1110, July 2017 (inproceedings)

Abstract
The development of optimal XY θz flexure mechanisms, which can deliver high precision motion about the z-axis, and along the x- and y-axes is highly desirable for a wide range of micro/nano-positioning tasks pertaining to biomedical research, microscopy technologies and various industrial applications. Although maximizing the stiffness ratios is a very critical design requirement, the achievable translational and rotational stiffness ratios of existing XY θz flexure mechanisms are still restricted between 0.5 and 130. As a result, these XY θz flexure mechanisms are unable to fully optimize their workspace and capabilities to reject disturbances. Here, we present an optimal XY θz flexure mechanism, which is designed to have maximum stiffness ratios. Based on finite element analysis (FEA), it has translational stiffness ratio of 248, rotational stiffness ratio of 238 and a large workspace of 2.50 mm × 2.50 mm × 10°. Despite having such a large workspace, FEA also predicts that the proposed mechanism can still achieve a high bandwidth of 70 Hz. In comparison, the bandwidth of similar existing flexure mechanisms that can deflect more than 0.5 mm or 0.5° is typically less than 45 Hz. Hence, the high stiffness ratios of the proposed mechanism are achieved without compromising its dynamic performance. Preliminary experimental results pertaining to the mechanism's translational actuating stiffness and bandwidth were in agreement with the FEA predictions as the deviation was within 10%. In conclusion, the proposed flexure mechanism exhibits superior performance and can be used across a wide range of applications.

pi

DOI [BibTex]

DOI [BibTex]


Thumb xl publications toc
Positioning of drug carriers using permanent magnet-based robotic system in three-dimensional space

Khalil, I. S. M., Alfar, A., Tabak, A. F., Klingner, A., Stramigioli, S., Sitti, M.

In 2017 IEEE International Conference on Advanced Intelligent Mechatronics (AIM), pages: 1117-1122, July 2017 (inproceedings)

Abstract
Magnetic control of drug carriers using systems with open-configurations is essential to enable scaling to the size of in vivo applications. In this study, we demonstrate motion control of paramagnetic microparticles in a low Reynolds number fluid, using a permanent magnet-based robotic system with an open-configuration. The microparticles are controlled in three-dimensional (3D) space using a cylindrical NdFeB magnet that is fixed to the end-effector of a robotic arm. We develop a kinematic map between the position of the microparticles and the configuration of the robotic arm, and use this map as a basis of a closed-loop control system based on the position of the microparticles. Our experimental results show the ability of the robot configuration to control the exerted field gradient on the dipole of the microparticles, and achieve positioning in 3D space with maximum error of 300 µm and 600 µm in the steady-state during setpoint and trajectory tracking, respectively.

pi

DOI [BibTex]

DOI [BibTex]


no image
Self-assembly of micro/nanosystems across scales and interfaces

Mastrangeli, M.

In 2017 19th International Conference on Solid-State Sensors, Actuators and Microsystems (TRANSDUCERS), pages: 676 - 681, IEEE, July 2017 (inproceedings)

Abstract
Steady progress in understanding and implementation are establishing self-assembly as a versatile, parallel and scalable approach to the fabrication of transducers. In this contribution, I illustrate the principles and reach of self-assembly with three applications at different scales - namely, the capillary self-alignment of millimetric components, the sealing of liquid-filled polymeric microcapsules, and the accurate capillary assembly of single nanoparticles - and propose foreseeable directions for further developments.

pi

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Thumb xl joel slow flow crop
Slow Flow: Exploiting High-Speed Cameras for Accurate and Diverse Optical Flow Reference Data

Janai, J., Güney, F., Wulff, J., Black, M., Geiger, A.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, pages: 1406-1416, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
Existing optical flow datasets are limited in size and variability due to the difficulty of capturing dense ground truth. In this paper, we tackle this problem by tracking pixels through densely sampled space-time volumes recorded with a high-speed video camera. Our model exploits the linearity of small motions and reasons about occlusions from multiple frames. Using our technique, we are able to establish accurate reference flow fields outside the laboratory in natural environments. Besides, we show how our predictions can be used to augment the input images with realistic motion blur. We demonstrate the quality of the produced flow fields on synthetic and real-world datasets. Finally, we collect a novel challenging optical flow dataset by applying our technique on data from a high-speed camera and analyze the performance of the state-of-the-art in optical flow under various levels of motion blur.

avg ps

pdf suppmat Project page Video DOI Project Page [BibTex]

pdf suppmat Project page Video DOI Project Page [BibTex]


Thumb xl img03
OctNet: Learning Deep 3D Representations at High Resolutions

Riegler, G., Ulusoy, O., Geiger, A.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
We present OctNet, a representation for deep learning with sparse 3D data. In contrast to existing models, our representation enables 3D convolutional networks which are both deep and high resolution. Towards this goal, we exploit the sparsity in the input data to hierarchically partition the space using a set of unbalanced octrees where each leaf node stores a pooled feature representation. This allows to focus memory allocation and computation to the relevant dense regions and enables deeper networks without compromising resolution. We demonstrate the utility of our OctNet representation by analyzing the impact of resolution on several 3D tasks including 3D object classification, orientation estimation and point cloud labeling.

avg ps

pdf suppmat Project Page Video Project Page [BibTex]

pdf suppmat Project Page Video Project Page [BibTex]


Thumb xl schoeps2017cvpr
A Multi-View Stereo Benchmark with High-Resolution Images and Multi-Camera Videos

Schöps, T., Schönberger, J. L., Galliani, S., Sattler, T., Schindler, K., Pollefeys, M., Geiger, A.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
Motivated by the limitations of existing multi-view stereo benchmarks, we present a novel dataset for this task. Towards this goal, we recorded a variety of indoor and outdoor scenes using a high-precision laser scanner and captured both high-resolution DSLR imagery as well as synchronized low-resolution stereo videos with varying fields-of-view. To align the images with the laser scans, we propose a robust technique which minimizes photometric errors conditioned on the geometry. In contrast to previous datasets, our benchmark provides novel challenges and covers a diverse set of viewpoints and scene types, ranging from natural scenes to man-made indoor and outdoor environments. Furthermore, we provide data at significantly higher temporal and spatial resolution. Our benchmark is the first to cover the important use case of hand-held mobile devices while also providing high-resolution DSLR camera images. We make our datasets and an online evaluation server available at http://www.eth3d.net.

avg

pdf suppmat Project Page Project Page [BibTex]

pdf suppmat Project Page Project Page [BibTex]


Thumb xl camposeco2017cvpr
Toroidal Constraints for Two Point Localization Under High Outlier Ratios

Camposeco, F., Sattler, T., Cohen, A., Geiger, A., Pollefeys, M.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
Localizing a query image against a 3D model at large scale is a hard problem, since 2D-3D matches become more and more ambiguous as the model size increases. This creates a need for pose estimation strategies that can handle very low inlier ratios. In this paper, we draw new insights on the geometric information available from the 2D-3D matching process. As modern descriptors are not invariant against large variations in viewpoint, we are able to find the rays in space used to triangulate a given point that are closest to a query descriptor. It is well known that two correspondences constrain the camera to lie on the surface of a torus. Adding the knowledge of direction of triangulation, we are able to approximate the position of the camera from \emphtwo matches alone. We derive a geometric solver that can compute this position in under 1 microsecond. Using this solver, we propose a simple yet powerful outlier filter which scales quadratically in the number of matches. We validate the accuracy of our solver and demonstrate the usefulness of our method in real world settings.

avg

pdf suppmat Project Page Project Page [BibTex]

pdf suppmat Project Page pdf Project Page [BibTex]


Thumb xl cvpr2017 landpsace
Semantic Multi-view Stereo: Jointly Estimating Objects and Voxels

Ulusoy, A. O., Black, M. J., Geiger, A.

In Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (inproceedings)

Abstract
Dense 3D reconstruction from RGB images is a highly ill-posed problem due to occlusions, textureless or reflective surfaces, as well as other challenges. We propose object-level shape priors to address these ambiguities. Towards this goal, we formulate a probabilistic model that integrates multi-view image evidence with 3D shape information from multiple objects. Inference in this model yields a dense 3D reconstruction of the scene as well as the existence and precise 3D pose of the objects in it. Our approach is able to recover fine details not captured in the input shapes while defaulting to the input models in occluded regions where image evidence is weak. Due to its probabilistic nature, the approach is able to cope with the approximate geometry of the 3D models as well as input shapes that are not present in the scene. We evaluate the approach quantitatively on several challenging indoor and outdoor datasets.

avg ps

YouTube pdf suppmat Project Page [BibTex]

YouTube pdf suppmat Project Page [BibTex]


Thumb xl publications toc
Dynamic analysis on hexapedal water-running robot with compliant joints

Kim, H., Liu, Y., Jeong, K., Sitti, M., Seo, T.

In 2017 14th International Conference on Ubiquitous Robots and Ambient Intelligence (URAI), pages: 250-251, June 2017 (inproceedings)

Abstract
The dynamic analysis has been considered as one of the important design methods to design robots. In this research, we derive dynamic equation of hexapedal water-running robot to design compliant joints. The compliant joints that connect three bodies will be used to improve mobility and stability of water-running motion's pitch behavior. We considered all of parts as rigid body including links of six Klann mechanisms and three main frames. And then, we derived dynamic equation by using the Lagrangian method with external force of the water. We are expecting that the dynamic analysis is going to be used to design parts of the water running robot.

pi

DOI [BibTex]

DOI [BibTex]


Thumb xl publications toc
Design and actuation of a magnetic millirobot under a constant unidirectional magnetic field

Erin, O., Giltinan, J., Tsai, L., Sitti, M.

In Proceedings 2017 IEEE International Conference on Robotics and Automation (ICRA), pages: 3404-3410, IEEE, Piscataway, NJ, USA, IEEE International Conference on Robotics and Automation (ICRA), May 2017 (inproceedings)

Abstract
Magnetic untethered millirobots, which are actuated and controlled by remote magnetic fields, have been proposed for medical applications due to their ability to safely pass through tissues at long ranges. For example, magnetic resonance imaging (MRI) systems with a 3-7 T constant unidirectional magnetic field and 3D gradient coils have been used to actuate magnetic robots. Such magnetically constrained systems place limits on the degrees of freedom that can be actuated for untethered devices. This paper presents a design and actuation methodology for a magnetic millirobot that exhibits both position and orientation control in 2D under a magnetic field, dominated by a constant unidirectional magnetic field as found in MRI systems. Placing a spherical permanent magnet, which is free to rotate inside the millirobot and located away from the center of mass, allows the generation of net forces and torques with applied 3D magnetic field gradients. We model this system in a 3D planar case and experimentally demonstrate open-loop control of both position and orientation by the applied 2D field gradients. The actuation performance is characterized across the most important design variables, and we experimentally demonstrate that the proposed approach is feasible.

pi

DOI [BibTex]

DOI [BibTex]


Thumb xl publications toc
Magnetically actuated soft capsule endoscope for fine-needle aspiration biopsy

Son, D., Dogan, M. D., Sitti, M.

In Proceedings 2017 IEEE International Conference on Robotics and Automation (ICRA), pages: 1132-1139, IEEE, Piscataway, NJ, USA, IEEE International Conference on Robotics and Automation (ICRA), May 2017 (inproceedings)

Abstract
This paper presents a magnetically actuated soft capsule endoscope for fine-needle aspiration biopsy (B-MASCE) in the upper gastrointestinal tract. A thin and hollow needle is attached to the capsule, which can penetrate deeply into tissues to obtain subsurface biopsy sample. The design utilizes a soft elastomer body as a compliant mechanism to guide the needle. An internal permanent magnet provides a means for both actuation and tracking. The capsule is designed to roll towards its target and then deploy the biopsy needle in a precise location selected as the target area. B-MASCE is controlled by multiple custom-designed electromagnets while its position and orientation are tracked by a magnetic sensor array. In in vitro trials, B-MASCE demonstrated rolling locomotion and biopsy of a swine tissue model positioned inside an anatomical human stomach model. It was confirmed after the experiment that a tissue sample was retained inside the needle.

pi

DOI Project Page [BibTex]

DOI Project Page [BibTex]


Thumb xl image toc
The use of clamping grips and friction pads by tree frogs for climbing curved surfaces

Endlein, T., Ji, A., Yuan, S., Hill, I., Wang, H., Barnes, W. J. P., Dai, Z., Sitti, M.

In Proc. R. Soc. B, 284(1849):20162867, Febuary 2017 (inproceedings)

Abstract
Most studies on the adhesive mechanisms of climbing animals have addressed attachment against flat surfaces, yet many animals can climb highly curved surfaces, like twigs and small branches. Here we investigated whether tree frogs use a clamping grip by recording the ground reaction forces on a cylindrical object with either a smooth or anti-adhesive, rough surface. Furthermore, we measured the contact area of fore and hindlimbs against differently sized transparent cylinders and the forces of individual pads and subarticular tubercles in restrained animals. Our study revealed that frogs use friction and normal forces of roughly a similar magnitude for holding on to cylindrical objects. When challenged with climbing a non-adhesive surface, the compressive forces between opposite legs nearly doubled, indicating a stronger clamping grip. In contrast to climbing flat surfaces, frogs increased the contact area on all limbs by engaging not just adhesive pads but also subarticular tubercles on curved surfaces. Our force measurements showed that tubercles can withstand larger shear stresses than pads. SEM images of tubercles revealed a similar structure to that of toe pads including the presence of nanopillars, though channels surrounding epithelial cells were less pronounced. The tubercles' smaller size, proximal location on the toes and shallow cells make them probably less prone to buckling and thus ideal for gripping curved surfaces.

pi

DOI [BibTex]

DOI [BibTex]


Thumb xl publications toc
Planning spin-walking locomotion for automatic grasping of microobjects by an untethered magnetic microgripper

Dong, X., Sitti, M.

In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages: 6612-6618, 2017 (inproceedings)

Abstract
Most demonstrated mobile microrobot tasks so far have been achieved via pick-and-placing and dynamic trapping with teleoperation or simple path following algorithms. In our previous work, an untethered magnetic microgripper has been developed which has advanced functions, such as gripping objects. Both teleoperated manipulation in 2D and 3D have been demonstrated. However, it is challenging to control the magnetic microgripper to carry out manipulation tasks, because the grasping of objects so far in the literature relies heavily on teleoperation, which takes several minutes with even a skilled human expert. Here, we propose a new spin-walking locomotion and an automated 2D grasping motion planner for the microgripper, which enables time-efficient automatic grasping of microobjects that has not been achieved yet for untethered microrobots. In its locomotion, the microgripper repeatedly rotates about two principal axes to regulate its pose and move precisely on a surface. The motion planner could plan different motion primitives for grasping and compensate the uncertainties in the motion by learning the uncertainties and planning accordingly. We experimentally demonstrated that, using the proposed method, the microgripper could align to the target pose with error less than 0.1 body length and grip the objects within 40 seconds. Our method could significantly improve the time efficiency of micro-scale manipulation and have potential applications in microassembly and biomedical engineering.

pi

DOI Project Page [BibTex]

DOI Project Page [BibTex]


Thumb xl toc image patent
Methods, apparatuses, and systems for micromanipulation with adhesive fibrillar structures

Sitti, M., Mengüç, Y.

US Patent 9,731,422, 2017 (patent)

Abstract
The present invention are methods for fabrication of micro- and/or nano-scale adhesive fibers and their use for movement and manipulation of objects. Further disclosed is a method of manipulating a part by providing a manipulation device with a plurality of fibers, where each fiber has a tip with a flat surface that is parallel to a backing layer, contacting the flat surfaces on an object, moving the object to a new location, then disengaging the tips from the object.

pi

link (url) [BibTex]

2003


no image
High aspect ratio polymer micro/nano-structure manufacturing using nanoembossing, nanomolding and directed self-assembly

Sitti, M.

In ASME 2003 International Mechanical Engineering Congress and Exposition, pages: 293-297, 2003 (inproceedings)

pi

[BibTex]

2003


[BibTex]


no image
Nsf workshop on future directions in nano-scale systems, dynamics and control

Sitti, M.

In Automatic Control Conference (ACC), 2003 (inproceedings)

pi

[BibTex]

[BibTex]


no image
3-D nano-fiber manufacturing by controlled pulling of liquid polymers using nano-probes

Nain, A. S., Sitti, M.

In Nanotechnology, 2003. IEEE-NANO 2003. 2003 Third IEEE Conference on, 1, pages: 60-63, 2003 (inproceedings)

pi

[BibTex]

[BibTex]


no image
Manufacturing of two and three-dimensional micro/nanostructures by integrating optical tweezers with chemical assembly

Castelino, K., Satyanarayana, S., Sitti, M.

In Nanotechnology, 2003. IEEE-NANO 2003. 2003 Third IEEE Conference on, 1, pages: 56-59, 2003 (inproceedings)

pi

[BibTex]

[BibTex]


no image
Synthetic gecko foot-hair micro/nano-structures for future wall-climbing robots

Sitti, M., Fearing, R. S.

In Robotics and Automation, 2003. Proceedings. ICRA’03. IEEE International Conference on, 1, pages: 1164-1170, 2003 (inproceedings)

pi

[BibTex]

[BibTex]


no image
Biomimetic propulsion for a swimming surgical micro-robot

Edd, J., Payen, S., Rubinsky, B., Stoller, M. L., Sitti, M.

In Intelligent Robots and Systems, 2003.(IROS 2003). Proceedings. 2003 IEEE/RSJ International Conference on, 3, pages: 2583-2588, 2003 (inproceedings)

pi

Project Page [BibTex]

Project Page [BibTex]

1998


no image
Nano tele-manipulation using virtual reality interface

Sitti, M., Horiguchi, S., Hashimoto, H.

In Industrial Electronics, 1998. Proceedings. ISIE’98. IEEE International Symposium on, 1, pages: 171-176, 1998 (inproceedings)

pi

[BibTex]

1998


[BibTex]


no image
Tele-nanorobotics using atomic force microscope

Sitti, M., Hashimoto, H.

In Intelligent Robots and Systems, 1998. Proceedings., 1998 IEEE/RSJ International Conference on, 3, pages: 1739-1746, 1998 (inproceedings)

pi

[BibTex]

[BibTex]