Header logo is


2020


Acoustofluidic Tweezers for the 3D Manipulation of Microparticles
Acoustofluidic Tweezers for the 3D Manipulation of Microparticles

Guo, X., Ma, Z., Goyal, R., Jeong, M., Pang, W., Fischer, P., Dian, X., Qiu, T.

In 2020 IEEE International Conference on Robotics and Automation (ICRA),, Febuary 2020 (conference)

Abstract
Non-contact manipulation is of great importance in the actuation of micro-robotics. It is challenging to contactless manipulate micro-scale objects over large spatial distance in fluid. Here, we describe a novel approach for the dynamic position control of microparticles in three-dimensional (3D) space, based on high-speed acoustic streaming generated by a micro-fabricated gigahertz transducer. Due to the vertical lifting force and the horizontal centripetal force generated by the streaming, microparticles are able to be stably trapped at a position far away from the transducer surface, and to be manipulated over centimeter distance in all three directions. Only the hydrodynamic force is utilized in the system for particle manipulation, making it a versatile tool regardless the material properties of the trapped particle. The system shows high reliability and manipulation velocity, revealing its potentials for the applications in robotics and automation at small scales.

pf

[BibTex]

2020


[BibTex]


Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from a Single RGB Image
Learning Unsupervised Hierarchical Part Decomposition of 3D Objects from a Single RGB Image

Paschalidou, D., Gool, L., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
Humans perceive the 3D world as a set of distinct objects that are characterized by various low-level (geometry, reflectance) and high-level (connectivity, adjacency, symmetry) properties. Recent methods based on convolutional neural networks (CNNs) demonstrated impressive progress in 3D reconstruction, even when using a single 2D image as input. However, the majority of these methods focuses on recovering the local 3D geometry of an object without considering its part-based decomposition or relations between parts. We address this challenging problem by proposing a novel formulation that allows to jointly recover the geometry of a 3D object as a set of primitives as well as their latent hierarchical structure without part-level supervision. Our model recovers the higher level structural decomposition of various objects in the form of a binary tree of primitives, where simple parts are represented with fewer primitives and more complex parts are modeled with more components. Our experiments on the ShapeNet and D-FAUST datasets demonstrate that considering the organization of parts indeed facilitates reasoning about 3D geometry.

avg

pdf suppmat Video 2 Project Page Slides Poster Video 1 [BibTex]

pdf suppmat Video 2 Project Page Slides Poster Video 1 [BibTex]


Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis
Towards Unsupervised Learning of Generative Models for 3D Controllable Image Synthesis

Liao, Y., Schwarz, K., Mescheder, L., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
In recent years, Generative Adversarial Networks have achieved impressive results in photorealistic image synthesis. This progress nurtures hopes that one day the classical rendering pipeline can be replaced by efficient models that are learned directly from images. However, current image synthesis models operate in the 2D domain where disentangling 3D properties such as camera viewpoint or object pose is challenging. Furthermore, they lack an interpretable and controllable representation. Our key hypothesis is that the image generation process should be modeled in 3D space as the physical world surrounding us is intrinsically three-dimensional. We define the new task of 3D controllable image synthesis and propose an approach for solving it by reasoning both in 3D space and in the 2D image domain. We demonstrate that our model is able to disentangle latent 3D factors of simple multi-object scenes in an unsupervised fashion from raw images. Compared to pure 2D baselines, it allows for synthesizing scenes that are consistent wrt. changes in viewpoint or object pose. We further evaluate various 3D representations in terms of their usefulness for this challenging task.

avg

pdf suppmat Video 2 Project Page Video 1 Slides Poster [BibTex]

pdf suppmat Video 2 Project Page Video 1 Slides Poster [BibTex]


Exploring Data Aggregation in Policy Learning for Vision-based Urban Autonomous Driving
Exploring Data Aggregation in Policy Learning for Vision-based Urban Autonomous Driving

Prakash, A., Behl, A., Ohn-Bar, E., Chitta, K., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
Data aggregation techniques can significantly improve vision-based policy learning within a training environment, e.g., learning to drive in a specific simulation condition. However, as on-policy data is sequentially sampled and added in an iterative manner, the policy can specialize and overfit to the training conditions. For real-world applications, it is useful for the learned policy to generalize to novel scenarios that differ from the training conditions. To improve policy learning while maintaining robustness when training end-to-end driving policies, we perform an extensive analysis of data aggregation techniques in the CARLA environment. We demonstrate how the majority of them have poor generalization performance, and develop a novel approach with empirically better generalization performance compared to existing techniques. Our two key ideas are (1) to sample critical states from the collected on-policy data based on the utility they provide to the learned policy in terms of driving behavior, and (2) to incorporate a replay buffer which progressively focuses on the high uncertainty regions of the policy's state distribution. We evaluate the proposed approach on the CARLA NoCrash benchmark, focusing on the most challenging driving scenarios with dense pedestrian and vehicle traffic. Our approach improves driving success rate by 16% over state-of-the-art, achieving 87% of the expert performance while also reducing the collision rate by an order of magnitude without the use of any additional modality, auxiliary tasks, architectural modifications or reward from the environment.

avg

pdf suppmat Video 2 Project Page Slides Video 1 [BibTex]

pdf suppmat Video 2 Project Page Slides Video 1 [BibTex]


Learning Situational Driving
Learning Situational Driving

Ohn-Bar, E., Prakash, A., Behl, A., Chitta, K., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
Human drivers have a remarkable ability to drive in diverse visual conditions and situations, e.g., from maneuvering in rainy, limited visibility conditions with no lane markings to turning in a busy intersection while yielding to pedestrians. In contrast, we find that state-of-the-art sensorimotor driving models struggle when encountering diverse settings with varying relationships between observation and action. To generalize when making decisions across diverse conditions, humans leverage multiple types of situation-specific reasoning and learning strategies. Motivated by this observation, we develop a framework for learning a situational driving policy that effectively captures reasoning under varying types of scenarios. Our key idea is to learn a mixture model with a set of policies that can capture multiple driving modes. We first optimize the mixture model through behavior cloning, and show it to result in significant gains in terms of driving performance in diverse conditions. We then refine the model by directly optimizing for the driving task itself, i.e., supervised with the navigation task reward. Our method is more scalable than methods assuming access to privileged information, e.g., perception labels, as it only assumes demonstration and reward-based supervision. We achieve over 98% success rate on the CARLA driving benchmark as well as state-of-the-art performance on a newly introduced generalization benchmark.

avg

pdf suppmat Video 2 Project Page Video 1 Slides [BibTex]

pdf suppmat Video 2 Project Page Video 1 Slides [BibTex]


On Joint Estimation of Pose, Geometry and svBRDF from a Handheld Scanner
On Joint Estimation of Pose, Geometry and svBRDF from a Handheld Scanner

Schmitt, C., Donne, S., Riegler, G., Koltun, V., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
We propose a novel formulation for joint recovery of camera pose, object geometry and spatially-varying BRDF. The input to our approach is a sequence of RGB-D images captured by a mobile, hand-held scanner that actively illuminates the scene with point light sources. Compared to previous works that jointly estimate geometry and materials from a hand-held scanner, we formulate this problem using a single objective function that can be minimized using off-the-shelf gradient-based solvers. By integrating material clustering as a differentiable operation into the optimization process, we avoid pre-processing heuristics and demonstrate that our model is able to determine the correct number of specular materials independently. We provide a study on the importance of each component in our formulation and on the requirements of the initial geometry. We show that optimizing over the poses is crucial for accurately recovering fine details and that our approach naturally results in a semantically meaningful material segmentation.

avg

pdf Project Page Slides Video Poster [BibTex]

pdf Project Page Slides Video Poster [BibTex]


Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision
Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision

Niemeyer, M., Mescheder, L., Oechsle, M., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2020, 2020 (inproceedings)

Abstract
Learning-based 3D reconstruction methods have shown impressive results. However, most methods require 3D supervision which is often hard to obtain for real-world datasets. Recently, several works have proposed differentiable rendering techniques to train reconstruction models from RGB images. Unfortunately, these approaches are currently restricted to voxel- and mesh-based representations, suffering from discretization or low resolution. In this work, we propose a differentiable rendering formulation for implicit shape and texture representations. Implicit representations have recently gained popularity as they represent shape and texture continuously. Our key insight is that depth gradients can be derived analytically using the concept of implicit differentiation. This allows us to learn implicit shape and texture representations directly from RGB images. We experimentally show that our single-view reconstructions rival those learned with full 3D supervision. Moreover, we find that our method can be used for multi-view 3D reconstruction, directly resulting in watertight meshes.

avg

pdf suppmat Video 2 Project Page Video 1 Video 3 Slides Poster [BibTex]

pdf suppmat Video 2 Project Page Video 1 Video 3 Slides Poster [BibTex]

2019


Attacking Optical Flow
Attacking Optical Flow

Ranjan, A., Janai, J., Geiger, A., Black, M. J.

In International Conference on Computer Vision, November 2019 (inproceedings)

Abstract
Deep neural nets achieve state-of-the-art performance on the problem of optical flow estimation. Since optical flow is used in several safety-critical applications like self-driving cars, it is important to gain insights into the robustness of those techniques. Recently, it has been shown that adversarial attacks easily fool deep neural networks to misclassify objects. The robustness of optical flow networks to adversarial attacks, however, has not been studied so far. In this paper, we extend adversarial patch attacks to optical flow networks and show that such attacks can compromise their performance. We show that corrupting a small patch of less than 1% of the image size can significantly affect optical flow estimates. Our attacks lead to noisy flow estimates that extend significantly beyond the region of the attack, in many cases even completely erasing the motion of objects in the scene. While networks using an encoder-decoder architecture are very sensitive to these attacks, we found that networks using a spatial pyramid architecture are less affected. We analyse the success and failure of attacking both architectures by visualizing their feature maps and comparing them to classical optical flow techniques which are robust to these attacks. We also demonstrate that such attacks are practical by placing a printed pattern into real scenes.

avg ps

Video Project Page Paper Supplementary Material link (url) [BibTex]

2019


Video Project Page Paper Supplementary Material link (url) [BibTex]


Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics
Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics

Niemeyer, M., Mescheder, L., Oechsle, M., Geiger, A.

International Conference on Computer Vision, October 2019 (conference)

Abstract
Deep learning based 3D reconstruction techniques have recently achieved impressive results. However, while state-of-the-art methods are able to output complex 3D geometry, it is not clear how to extend these results to time-varying topologies. Approaches treating each time step individually lack continuity and exhibit slow inference, while traditional 4D reconstruction methods often utilize a template model or discretize the 4D space at fixed resolution. In this work, we present Occupancy Flow, a novel spatio-temporal representation of time-varying 3D geometry with implicit correspondences. Towards this goal, we learn a temporally and spatially continuous vector field which assigns a motion vector to every point in space and time. In order to perform dense 4D reconstruction from images or sparse point clouds, we combine our method with a continuous 3D representation. Implicitly, our model yields correspondences over time, thus enabling fast inference while providing a sound physical description of the temporal dynamics. We show that our method can be used for interpolation and reconstruction tasks, and demonstrate the accuracy of the learned correspondences. We believe that Occupancy Flow is a promising new 4D representation which will be useful for a variety of spatio-temporal reconstruction tasks.

avg

pdf poster suppmat code Project page video blog [BibTex]


Texture Fields: Learning Texture Representations in Function Space
Texture Fields: Learning Texture Representations in Function Space

Oechsle, M., Mescheder, L., Niemeyer, M., Strauss, T., Geiger, A.

International Conference on Computer Vision, October 2019 (conference)

Abstract
In recent years, substantial progress has been achieved in learning-based reconstruction of 3D objects. At the same time, generative models were proposed that can generate highly realistic images. However, despite this success in these closely related tasks, texture reconstruction of 3D objects has received little attention from the research community and state-of-the-art methods are either limited to comparably low resolution or constrained experimental setups. A major reason for these limitations is that common representations of texture are inefficient or hard to interface for modern deep learning techniques. In this paper, we propose Texture Fields, a novel texture representation which is based on regressing a continuous 3D function parameterized with a neural network. Our approach circumvents limiting factors like shape discretization and parameterization, as the proposed texture representation is independent of the shape representation of the 3D object. We show that Texture Fields are able to represent high frequency texture and naturally blend with modern deep learning techniques. Experimentally, we find that Texture Fields compare favorably to state-of-the-art methods for conditional texture reconstruction of 3D objects and enable learning of probabilistic generative models for texturing unseen 3D models. We believe that Texture Fields will become an important building block for the next generation of generative 3D models.

avg

pdf suppmat video poster blog Project Page [BibTex]


Soft Continuous Surface for Micromanipulation driven by Light-controlled Hydrogels
Soft Continuous Surface for Micromanipulation driven by Light-controlled Hydrogels

Choi, E., Jeong, H., Qiu, T., Fischer, P., Palagi, S.

4th IEEE International Conference on Manipulation, Automation and Robotics at Small Scales (MARSS), July 2019 (conference)

Abstract
Remotely controlled, automated actuation and manipulation at the microscale is essential for a number of micro-manufacturing, biology, and lab-on-a-chip applications. To transport and manipulate micro-objects, arrays of remotely controlled micro-actuators are required, which, in turn, typically require complex and expensive solid-state chips. Here, we show that a continuous surface can function as a highly parallel, many-degree of freedom, wirelessly-controlled microactuator with seamless deformation. The soft continuous surface is based on a hydrogel that undergoes a volume change in response to applied light. The fabrication of the hydrogels and the characterization of their optical and thermomechanical behaviors are reported. The temperature-dependent localized deformation of the hydrogel is also investigated by numerical simulations. Static and dynamic deformations are obtained in the soft material by projecting light fields at high spatial resolution onto the surface. By controlling such deformations in open loop and especially closed loop, automated photoactuation is achieved. The surface deformations are then exploited to examine how inert microbeads can be manipulated autonomously on the surface. We believe that the proposed approach suggests ways to implement universal 2D micromanipulation schemes that can be useful for automation in microfabrication and lab-on-a-chip applications.

pf

[BibTex]

[BibTex]


Soft Phantom for the Training of Renal Calculi Diagnostics and  Lithotripsy
Soft Phantom for the Training of Renal Calculi Diagnostics and Lithotripsy

Li., D., Suarez-Ibarrola, R., Choi, E., Jeong, M., Gratzke, C., Miernik, A., Fischer, P., Qiu, T.

41st Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), July 2019 (conference)

Abstract
Organ models are important for medical training and surgical planning. With the fast development of additive fabrication technologies, including 3D printing, the fabrication of 3D organ phantoms with precise anatomical features becomes possible. Here, we develop the first high-resolution kidney phantom based on soft material assembly, by combining 3D printing and polymer molding techniques. The phantom exhibits both the detailed anatomy of a human kidney and the elasticity of soft tissues. The phantom assembly can be separated into two parts on the coronal plane, thus large renal calculi are readily placed at any desired location of the calyx. With our sealing method, the assembled phantom withstands a hydraulic pressure that is four times the normal intrarenal pressure, thus it allows the simulation of medical procedures under realistic pressure conditions. The medical diagnostics of the renal calculi is performed by multiple imaging modalities, including X-ray, ultrasound imaging and endoscopy. The endoscopic lithotripsy is also successfully performed on the phantom. The use of a multifunctional soft phantom assembly thus shows great promise for the simulation of minimally invasive medical procedures under realistic conditions.

pf

[BibTex]

[BibTex]


A Magnetic Actuation System for the  Active Microrheology in Soft Biomaterials
A Magnetic Actuation System for the Active Microrheology in Soft Biomaterials

Jeong, M., Choi, E., Li., D., Palagi, S., Fischer, P., Qiu, T.

4th IEEE International Conference on Manipulation, Automation and Robotics at Small Scales (MARSS), July 2019 (conference)

Abstract
Microrheology is a key technique to characterize soft materials at small scales. The microprobe is wirelessly actuated and therefore typically only low forces or torques can be applied, which limits the range of the applied strain. Here, we report a new magnetic actuation system for microrheology consisting of an array of rotating permanent magnets, which achieves a rotating magnetic field with a spatially homogeneous high field strength of ~100 mT in a working volume of ~20×20×20 mm3. Compared to a traditional electromagnetic coil system, the permanent magnet assembly is portable and does not require cooling, and it exerts a large magnetic torque on the microprobe that is an order of magnitude higher than previous setups. Experimental results demonstrate that the measurement range of the soft gels’ elasticity covers at least five orders of magnitude. With the large actuation torque, it is also possible to study the fracture mechanics of soft biomaterials at small scales.

pf

[BibTex]

[BibTex]


Taking a Deeper Look at the Inverse Compositional Algorithm
Taking a Deeper Look at the Inverse Compositional Algorithm

Lv, Z., Dellaert, F., Rehg, J. M., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
In this paper, we provide a modern synthesis of the classic inverse compositional algorithm for dense image alignment. We first discuss the assumptions made by this well-established technique, and subsequently propose to relax these assumptions by incorporating data-driven priors into this model. More specifically, we unroll a robust version of the inverse compositional algorithm and replace multiple components of this algorithm using more expressive models whose parameters we train in an end-to-end fashion from data. Our experiments on several challenging 3D rigid motion estimation tasks demonstrate the advantages of combining optimization with learning-based techniques, outperforming the classic inverse compositional algorithm as well as data-driven image-to-pose regression approaches.

avg

pdf suppmat Video Project Page Poster [BibTex]

pdf suppmat Video Project Page Poster [BibTex]


MOTS: Multi-Object Tracking and Segmentation
MOTS: Multi-Object Tracking and Segmentation

Voigtlaender, P., Krause, M., Osep, A., Luiten, J., Sekar, B. B. G., Geiger, A., Leibe, B.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
This paper extends the popular task of multi-object tracking to multi-object tracking and segmentation (MOTS). Towards this goal, we create dense pixel-level annotations for two existing tracking datasets using a semi-automatic annotation procedure. Our new annotations comprise 65,213 pixel masks for 977 distinct objects (cars and pedestrians) in 10,870 video frames. For evaluation, we extend existing multi-object tracking metrics to this new task. Moreover, we propose a new baseline method which jointly addresses detection, tracking, and segmentation with a single convolutional network. We demonstrate the value of our datasets by achieving improvements in performance when training on MOTS annotations. We believe that our datasets, metrics and baseline will become a valuable resource towards developing multi-object tracking approaches that go beyond 2D bounding boxes.

avg

pdf suppmat Project Page Poster Video Project Page [BibTex]

pdf suppmat Project Page Poster Video Project Page [BibTex]


PointFlowNet: Learning Representations for Rigid Motion Estimation from Point Clouds
PointFlowNet: Learning Representations for Rigid Motion Estimation from Point Clouds

Behl, A., Paschalidou, D., Donne, S., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
Despite significant progress in image-based 3D scene flow estimation, the performance of such approaches has not yet reached the fidelity required by many applications. Simultaneously, these applications are often not restricted to image-based estimation: laser scanners provide a popular alternative to traditional cameras, for example in the context of self-driving cars, as they directly yield a 3D point cloud. In this paper, we propose to estimate 3D motion from such unstructured point clouds using a deep neural network. In a single forward pass, our model jointly predicts 3D scene flow as well as the 3D bounding box and rigid body motion of objects in the scene. While the prospect of estimating 3D scene flow from unstructured point clouds is promising, it is also a challenging task. We show that the traditional global representation of rigid body motion prohibits inference by CNNs, and propose a translation equivariant representation to circumvent this problem. For training our deep network, a large dataset is required. Because of this, we augment real scans from KITTI with virtual objects, realistically modeling occlusions and simulating sensor noise. A thorough comparison with classic and learning-based techniques highlights the robustness of the proposed approach.

avg

pdf suppmat Project Page Poster Video [BibTex]

pdf suppmat Project Page Poster Video [BibTex]


Learning Non-volumetric Depth Fusion using Successive Reprojections
Learning Non-volumetric Depth Fusion using Successive Reprojections

Donne, S., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
Given a set of input views, multi-view stereopsis techniques estimate depth maps to represent the 3D reconstruction of the scene; these are fused into a single, consistent, reconstruction -- most often a point cloud. In this work we propose to learn an auto-regressive depth refinement directly from data. While deep learning has improved the accuracy and speed of depth estimation significantly, learned MVS techniques remain limited to the planesweeping paradigm. We refine a set of input depth maps by successively reprojecting information from neighbouring views to leverage multi-view constraints. Compared to learning-based volumetric fusion techniques, an image-based representation allows significantly more detailed reconstructions; compared to traditional point-based techniques, our method learns noise suppression and surface completion in a data-driven fashion. Due to the limited availability of high-quality reconstruction datasets with ground truth, we introduce two novel synthetic datasets to (pre-)train our network. Our approach is able to improve both the output depth maps and the reconstructed point cloud, for both learned and traditional depth estimation front-ends, on both synthetic and real data.

avg

pdf suppmat Project Page Video Poster blog [BibTex]

pdf suppmat Project Page Video Poster blog [BibTex]


Connecting the Dots: Learning Representations for Active Monocular Depth Estimation
Connecting the Dots: Learning Representations for Active Monocular Depth Estimation

Riegler, G., Liao, Y., Donne, S., Koltun, V., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
We propose a technique for depth estimation with a monocular structured-light camera, \ie, a calibrated stereo set-up with one camera and one laser projector. Instead of formulating the depth estimation via a correspondence search problem, we show that a simple convolutional architecture is sufficient for high-quality disparity estimates in this setting. As accurate ground-truth is hard to obtain, we train our model in a self-supervised fashion with a combination of photometric and geometric losses. Further, we demonstrate that the projected pattern of the structured light sensor can be reliably separated from the ambient information. This can then be used to improve depth boundaries in a weakly supervised fashion by modeling the joint statistics of image and depth edges. The model trained in this fashion compares favorably to the state-of-the-art on challenging synthetic and real-world datasets. In addition, we contribute a novel simulator, which allows to benchmark active depth prediction algorithms in controlled conditions.

avg

pdf suppmat Poster Project Page [BibTex]

pdf suppmat Poster Project Page [BibTex]


Superquadrics Revisited: Learning 3D Shape Parsing beyond Cuboids
Superquadrics Revisited: Learning 3D Shape Parsing beyond Cuboids

Paschalidou, D., Ulusoy, A. O., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
Abstracting complex 3D shapes with parsimonious part-based representations has been a long standing goal in computer vision. This paper presents a learning-based solution to this problem which goes beyond the traditional 3D cuboid representation by exploiting superquadrics as atomic elements. We demonstrate that superquadrics lead to more expressive 3D scene parses while being easier to learn than 3D cuboid representations. Moreover, we provide an analytical solution to the Chamfer loss which avoids the need for computational expensive reinforcement learning or iterative prediction. Our model learns to parse 3D objects into consistent superquadric representations without supervision. Results on various ShapeNet categories as well as the SURREAL human body dataset demonstrate the flexibility of our model in capturing fine details and complex poses that could not have been modelled using cuboids.

avg

Project Page Poster suppmat pdf Video blog handout [BibTex]

Project Page Poster suppmat pdf Video blog handout [BibTex]


Real-Time Dense Mapping for Self-Driving Vehicles using Fisheye Cameras
Real-Time Dense Mapping for Self-Driving Vehicles using Fisheye Cameras

Cui, Z., Heng, L., Yeo, Y. C., Geiger, A., Pollefeys, M., Sattler, T.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2019, IEEE, International Conference on Robotics and Automation, May 2019 (inproceedings)

Abstract
We present a real-time dense geometric mapping algorithm for large-scale environments. Unlike existing methods which use pinhole cameras, our implementation is based on fisheye cameras which have larger field of view and benefit some other tasks including Visual-Inertial Odometry, localization and object detection around vehicles. Our algorithm runs on in-vehicle PCs at 15 Hz approximately, enabling vision-only 3D scene perception for self-driving vehicles. For each synchronized set of images captured by multiple cameras, we first compute a depth map for a reference camera using plane-sweeping stereo. To maintain both accuracy and efficiency, while accounting for the fact that fisheye images have a rather low resolution, we recover the depths using multiple image resolutions. We adopt the fast object detection framework YOLOv3 to remove potentially dynamic objects. At the end of the pipeline, we fuse the fisheye depth images into the truncated signed distance function (TSDF) volume to obtain a 3D map. We evaluate our method on large-scale urban datasets, and results show that our method works well even in complex environments.

avg

pdf video poster Project Page [BibTex]

pdf video poster Project Page [BibTex]


Project AutoVision: Localization and 3D Scene Perception for an Autonomous Vehicle with a Multi-Camera System
Project AutoVision: Localization and 3D Scene Perception for an Autonomous Vehicle with a Multi-Camera System

Heng, L., Choi, B., Cui, Z., Geppert, M., Hu, S., Kuan, B., Liu, P., Nguyen, R. M. H., Yeo, Y. C., Geiger, A., Lee, G. H., Pollefeys, M., Sattler, T.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2019, IEEE, International Conference on Robotics and Automation, May 2019 (inproceedings)

Abstract
Project AutoVision aims to develop localization and 3D scene perception capabilities for a self-driving vehicle. Such capabilities will enable autonomous navigation in urban and rural environments, in day and night, and with cameras as the only exteroceptive sensors. The sensor suite employs many cameras for both 360-degree coverage and accurate multi-view stereo; the use of low-cost cameras keeps the cost of this sensor suite to a minimum. In addition, the project seeks to extend the operating envelope to include GNSS-less conditions which are typical for environments with tall buildings, foliage, and tunnels. Emphasis is placed on leveraging multi-view geometry and deep learning to enable the vehicle to localize and perceive in 3D space. This paper presents an overview of the project, and describes the sensor suite and current progress in the areas of calibration, localization, and perception.

avg

pdf [BibTex]

pdf [BibTex]


no image
Geometric Image Synthesis

Abu Alhaija, H., Mustikovela, S. K., Geiger, A., Rother, C.

Computer Vision – ACCV 2018, 11366, pages: 85-100, Lecture Notes in Computer Science, (Editors: Jawahar, C. and Li, H. and Mori, G. and Schindler, K. ), Asian Conference on Computer Vision, 2019 (conference)

avg

DOI Project Page [BibTex]

DOI Project Page [BibTex]


NoVA: Learning to See in Novel Viewpoints and Domains
NoVA: Learning to See in Novel Viewpoints and Domains

Coors, B., Condurache, A. P., Geiger, A.

In 2019 International Conference on 3D Vision (3DV), 2019 International Conference on 3D Vision (3DV), 2019 (inproceedings)

Abstract
Domain adaptation techniques enable the re-use and transfer of existing labeled datasets from a source to a target domain in which little or no labeled data exists. Recently, image-level domain adaptation approaches have demonstrated impressive results in adapting from synthetic to real-world environments by translating source images to the style of a target domain. However, the domain gap between source and target may not only be caused by a different style but also by a change in viewpoint. This case necessitates a semantically consistent translation of source images and labels to the style and viewpoint of the target domain. In this work, we propose the Novel Viewpoint Adaptation (NoVA) model, which enables unsupervised adaptation to a novel viewpoint in a target domain for which no labeled data is available. NoVA utilizes an explicit representation of the 3D scene geometry to translate source view images and labels to the target view. Experiments on adaptation to synthetic and real-world datasets show the benefit of NoVA compared to state-of-the-art domain adaptation approaches on the task of semantic segmentation.

avg

pdf suppmat poster video [BibTex]

pdf suppmat poster video [BibTex]


Occupancy Networks: Learning 3D Reconstruction in Function Space
Occupancy Networks: Learning 3D Reconstruction in Function Space

Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, 2019 (inproceedings)

Abstract
With the advent of deep neural networks, learning-based approaches for 3D reconstruction have gained popularity. However, unlike for images, in 3D there is no canonical representation which is both computationally and memory efficient yet allows for representing high-resolution geometry of arbitrary topology. Many of the state-of-the-art learning-based 3D reconstruction approaches can hence only represent very coarse 3D geometry or are limited to a restricted domain. In this paper, we propose Occupancy Networks, a new representation for learning-based 3D reconstruction methods. Occupancy networks implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier. In contrast to existing approaches, our representation encodes a description of the 3D output at infinite resolution without excessive memory footprint. We validate that our representation can efficiently encode 3D structure and can be inferred from various kinds of input. Our experiments demonstrate competitive results, both qualitatively and quantitatively, for the challenging tasks of 3D reconstruction from single images, noisy point clouds and coarse discrete voxel grids. We believe that occupancy networks will become a useful tool in a wide variety of learning-based 3D tasks.

avg

Code Video pdf suppmat Project Page blog [BibTex]

Code Video pdf suppmat Project Page blog [BibTex]

2018


Gait learning for soft microrobots controlled by light fields
Gait learning for soft microrobots controlled by light fields

Rohr, A. V., Trimpe, S., Marco, A., Fischer, P., Palagi, S.

In International Conference on Intelligent Robots and Systems (IROS) 2018, pages: 6199-6206, International Conference on Intelligent Robots and Systems 2018, October 2018 (inproceedings)

Abstract
Soft microrobots based on photoresponsive materials and controlled by light fields can generate a variety of different gaits. This inherent flexibility can be exploited to maximize their locomotion performance in a given environment and used to adapt them to changing environments. However, because of the lack of accurate locomotion models, and given the intrinsic variability among microrobots, analytical control design is not possible. Common data-driven approaches, on the other hand, require running prohibitive numbers of experiments and lead to very sample-specific results. Here we propose a probabilistic learning approach for light-controlled soft microrobots based on Bayesian Optimization (BO) and Gaussian Processes (GPs). The proposed approach results in a learning scheme that is highly data-efficient, enabling gait optimization with a limited experimental budget, and robust against differences among microrobot samples. These features are obtained by designing the learning scheme through the comparison of different GP priors and BO settings on a semisynthetic data set. The developed learning scheme is validated in microrobot experiments, resulting in a 115% improvement in a microrobot’s locomotion performance with an experimental budget of only 20 tests. These encouraging results lead the way toward self-adaptive microrobotic systems based on lightcontrolled soft microrobots and probabilistic learning control.

ics pf

arXiv IEEE Xplore DOI Project Page [BibTex]

2018


arXiv IEEE Xplore DOI Project Page [BibTex]


On the Integration of Optical Flow and Action Recognition
On the Integration of Optical Flow and Action Recognition

Sevilla-Lara, L., Liao, Y., Güney, F., Jampani, V., Geiger, A., Black, M. J.

In German Conference on Pattern Recognition (GCPR), LNCS 11269, pages: 281-297, Springer, Cham, October 2018 (inproceedings)

Abstract
Most of the top performing action recognition methods use optical flow as a "black box" input. Here we take a deeper look at the combination of flow and action recognition, and investigate why optical flow is helpful, what makes a flow method good for action recognition, and how we can make it better. In particular, we investigate the impact of different flow algorithms and input transformations to better understand how these affect a state-of-the-art action recognition method. Furthermore, we fine tune two neural-network flow methods end-to-end on the most widely used action recognition dataset (UCF101). Based on these experiments, we make the following five observations: 1) optical flow is useful for action recognition because it is invariant to appearance, 2) optical flow methods are optimized to minimize end-point-error (EPE), but the EPE of current methods is not well correlated with action recognition performance, 3) for the flow methods tested, accuracy at boundaries and at small displacements is most correlated with action recognition performance, 4) training optical flow to minimize classification error instead of minimizing EPE improves recognition performance, and 5) optical flow learned for the task of action recognition differs from traditional optical flow especially inside the human body and at the boundary of the body. These observations may encourage optical flow researchers to look beyond EPE as a goal and guide action recognition researchers to seek better motion cues, leading to a tighter integration of the optical flow and action recognition communities.

avg ps

arXiv DOI [BibTex]

arXiv DOI [BibTex]


Towards Robust Visual Odometry with a Multi-Camera System
Towards Robust Visual Odometry with a Multi-Camera System

Liu, P., Geppert, M., Heng, L., Sattler, T., Geiger, A., Pollefeys, M.

In International Conference on Intelligent Robots and Systems (IROS) 2018, International Conference on Intelligent Robots and Systems, October 2018 (inproceedings)

Abstract
We present a visual odometry (VO) algorithm for a multi-camera system and robust operation in challenging environments. Our algorithm consists of a pose tracker and a local mapper. The tracker estimates the current pose by minimizing photometric errors between the most recent keyframe and the current frame. The mapper initializes the depths of all sampled feature points using plane-sweeping stereo. To reduce pose drift, a sliding window optimizer is used to refine poses and structure jointly. Our formulation is flexible enough to support an arbitrary number of stereo cameras. We evaluate our algorithm thoroughly on five datasets. The datasets were captured in different conditions: daytime, night-time with near-infrared (NIR) illumination and night-time without NIR illumination. Experimental results show that a multi-camera setup makes the VO more robust to challenging environments, especially night-time conditions, in which a single stereo configuration fails easily due to the lack of features.

avg

pdf Project Page [BibTex]

pdf Project Page [BibTex]


Learning Priors for Semantic 3D Reconstruction
Learning Priors for Semantic 3D Reconstruction

Cherabier, I., Schönberger, J., Oswald, M., Pollefeys, M., Geiger, A.

In Computer Vision – ECCV 2018, Springer International Publishing, Cham, September 2018 (inproceedings)

Abstract
We present a novel semantic 3D reconstruction framework which embeds variational regularization into a neural network. Our network performs a fixed number of unrolled multi-scale optimization iterations with shared interaction weights. In contrast to existing variational methods for semantic 3D reconstruction, our model is end-to-end trainable and captures more complex dependencies between the semantic labels and the 3D geometry. Compared to previous learning-based approaches to 3D reconstruction, we integrate powerful long-range dependencies using variational coarse-to-fine optimization. As a result, our network architecture requires only a moderate number of parameters while keeping a high level of expressiveness which enables learning from very little data. Experiments on real and synthetic datasets demonstrate that our network achieves higher accuracy compared to a purely variational approach while at the same time requiring two orders of magnitude less iterations to converge. Moreover, our approach handles ten times more semantic class labels using the same computational resources.

avg

pdf suppmat Project Page Video DOI Project Page [BibTex]

pdf suppmat Project Page Video DOI Project Page [BibTex]


Unsupervised Learning of Multi-Frame Optical Flow with Occlusions
Unsupervised Learning of Multi-Frame Optical Flow with Occlusions

Janai, J., Güney, F., Ranjan, A., Black, M. J., Geiger, A.

In European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, vol 11220, pages: 713-731, Springer, Cham, September 2018 (inproceedings)

avg ps

pdf suppmat Video Project Page DOI Project Page [BibTex]

pdf suppmat Video Project Page DOI Project Page [BibTex]


SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images
SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images

Coors, B., Condurache, A. P., Geiger, A.

European Conference on Computer Vision (ECCV), September 2018 (conference)

Abstract
Omnidirectional cameras offer great benefits over classical cameras wherever a wide field of view is essential, such as in virtual reality applications or in autonomous robots. Unfortunately, standard convolutional neural networks are not well suited for this scenario as the natural projection surface is a sphere which cannot be unwrapped to a plane without introducing significant distortions, particularly in the polar regions. In this work, we present SphereNet, a novel deep learning framework which encodes invariance against such distortions explicitly into convolutional neural networks. Towards this goal, SphereNet adapts the sampling locations of the convolutional filters, effectively reversing distortions, and wraps the filters around the sphere. By building on regular convolutions, SphereNet enables the transfer of existing perspective convolutional neural network models to the omnidirectional case. We demonstrate the effectiveness of our method on the tasks of image classification and object detection, exploiting two newly created semi-synthetic and real-world omnidirectional datasets.

avg

pdf suppmat Project Page [BibTex]


A machine from machines
A machine from machines

Fischer, P.

Nature Physics, 14, pages: 1072–1073, July 2018 (misc)

Abstract
Building spinning microrotors that self-assemble and synchronize to form a gear sounds like an impossible feat. However, it has now been achieved using only a single type of building block -- a colloid that self-propels.

pf

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Soft Miniaturized Linear Actuators Wirelessly Powered by Rotating Permanent Magnets
Soft Miniaturized Linear Actuators Wirelessly Powered by Rotating Permanent Magnets

Qiu, T., Palagi, S., Sachs, J., Fischer, P.

In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages: 3595-3600, May 2018 (inproceedings)

Abstract
Wireless actuation by magnetic fields allows for the operation of untethered miniaturized devices, e.g. in biomedical applications. Nevertheless, generating large controlled forces over relatively large distances is challenging. Magnetic torques are easier to generate and control, but they are not always suitable for the tasks at hand. Moreover, strong magnetic fields are required to generate a sufficient torque, which are difficult to achieve with electromagnets. Here, we demonstrate a soft miniaturized actuator that transforms an externally applied magnetic torque into a controlled linear force. We report the design, fabrication and characterization of both the actuator and the magnetic field generator. We show that the magnet assembly, which is based on a set of rotating permanent magnets, can generate strong controlled oscillating fields over a relatively large workspace. The actuator, which is 3D-printed, can lift a load of more than 40 times its weight. Finally, we show that the actuator can be further miniaturized, paving the way towards strong, wirelessly powered microactuators.

pf

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Robust Dense Mapping for Large-Scale Dynamic Environments
Robust Dense Mapping for Large-Scale Dynamic Environments

Barsan, I. A., Liu, P., Pollefeys, M., Geiger, A.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2018, IEEE, International Conference on Robotics and Automation, May 2018 (inproceedings)

Abstract
We present a stereo-based dense mapping algorithm for large-scale dynamic urban environments. In contrast to other existing methods, we simultaneously reconstruct the static background, the moving objects, and the potentially moving but currently stationary objects separately, which is desirable for high-level mobile robotic tasks such as path planning in crowded environments. We use both instance-aware semantic segmentation and sparse scene flow to classify objects as either background, moving, or potentially moving, thereby ensuring that the system is able to model objects with the potential to transition from static to dynamic, such as parked cars. Given camera poses estimated from visual odometry, both the background and the (potentially) moving objects are reconstructed separately by fusing the depth maps computed from the stereo input. In addition to visual odometry, sparse scene flow is also used to estimate the 3D motions of the detected moving objects, in order to reconstruct them accurately. A map pruning technique is further developed to improve reconstruction accuracy and reduce memory consumption, leading to increased scalability. We evaluate our system thoroughly on the well-known KITTI dataset. Our system is capable of running on a PC at approximately 2.5Hz, with the primary bottleneck being the instance-aware semantic segmentation, which is a limitation we hope to address in future work.

avg

pdf Video Project Page Project Page [BibTex]

pdf Video Project Page Project Page [BibTex]


RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials
RayNet: Learning Volumetric 3D Reconstruction with Ray Potentials

Paschalidou, D., Ulusoy, A. O., Schmitt, C., Gool, L., Geiger, A.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
In this paper, we consider the problem of reconstructing a dense 3D model using images captured from different views. Recent methods based on convolutional neural networks (CNN) allow learning the entire task from data. However, they do not incorporate the physics of image formation such as perspective geometry and occlusion. Instead, classical approaches based on Markov Random Fields (MRF) with ray-potentials explicitly model these physical processes, but they cannot cope with large surface appearance variations across different viewpoints. In this paper, we propose RayNet, which combines the strengths of both frameworks. RayNet integrates a CNN that learns view-invariant feature representations with an MRF that explicitly encodes the physics of perspective projection and occlusion. We train RayNet end-to-end using empirical risk minimization. We thoroughly evaluate our approach on challenging real-world datasets and demonstrate its benefits over a piece-wise trained baseline, hand-crafted models as well as other learning-based approaches.

avg

pdf suppmat Video Project Page code Poster Project Page [BibTex]

pdf suppmat Video Project Page code Poster Project Page [BibTex]


Deep Marching Cubes: Learning Explicit Surface Representations
Deep Marching Cubes: Learning Explicit Surface Representations

Liao, Y., Donne, S., Geiger, A.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
Existing learning based solutions to 3D surface prediction cannot be trained end-to-end as they operate on intermediate representations (eg, TSDF) from which 3D surface meshes must be extracted in a post-processing step (eg, via the marching cubes algorithm). In this paper, we investigate the problem of end-to-end 3D surface prediction. We first demonstrate that the marching cubes algorithm is not differentiable and propose an alternative differentiable formulation which we insert as a final layer into a 3D convolutional neural network. We further propose a set of loss functions which allow for training our model with sparse point supervision. Our experiments demonstrate that the model allows for predicting sub-voxel accurate 3D shapes of arbitrary topology. Additionally, it learns to complete shapes and to separate an object's inside from its outside even in the presence of sparse and incomplete ground truth. We investigate the benefits of our approach on the task of inferring shapes from 3D point clouds. Our model is flexible and can be combined with a variety of shape encoder and shape inference techniques.

avg

pdf suppmat Video Project Page Poster Project Page [BibTex]

pdf suppmat Video Project Page Poster Project Page [BibTex]


Semantic Visual Localization
Semantic Visual Localization

Schönberger, J., Pollefeys, M., Geiger, A., Sattler, T.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
Robust visual localization under a wide range of viewing conditions is a fundamental problem in computer vision. Handling the difficult cases of this problem is not only very challenging but also of high practical relevance, eg, in the context of life-long localization for augmented reality or autonomous robots. In this paper, we propose a novel approach based on a joint 3D geometric and semantic understanding of the world, enabling it to succeed under conditions where previous approaches failed. Our method leverages a novel generative model for descriptor learning, trained on semantic scene completion as an auxiliary task. The resulting 3D descriptors are robust to missing observations by encoding high-level 3D geometric and semantic information. Experiments on several challenging large-scale localization datasets demonstrate reliable localization under extreme viewpoint, illumination, and geometry changes.

avg

pdf suppmat Poster Project Page [BibTex]

pdf suppmat Poster Project Page [BibTex]


Which Training Methods for GANs do actually Converge?
Which Training Methods for GANs do actually Converge?

Mescheder, L., Geiger, A., Nowozin, S.

International Conference on Machine learning (ICML), 2018 (conference)

Abstract
Recent work has shown local convergence of GAN training for absolutely continuous data and generator distributions. In this paper, we show that the requirement of absolute continuity is necessary: we describe a simple yet prototypical counterexample showing that in the more realistic case of distributions that are not absolutely continuous, unregularized GAN training is not always convergent. Furthermore, we discuss regularization strategies that were recently proposed to stabilize GAN training. Our analysis shows that GAN training with instance noise or zero-centered gradient penalties converges. On the other hand, we show that Wasserstein-GANs and WGAN-GP with a finite number of discriminator updates per generator update do not always converge to the equilibrium point. We discuss these results, leading us to a new explanation for the stability problems of GAN training. Based on our analysis, we extend our convergence results to more general GANs and prove local convergence for simplified gradient penalties even if the generator and data distributions lie on lower dimensional manifolds. We find these penalties to work well in practice and use them to learn high-resolution generative image models for a variety of datasets with little hyperparameter tuning.

avg

code video paper supplement slides poster Project Page [BibTex]


Learning 3D Shape Completion from Laser Scan Data with Weak Supervision
Learning 3D Shape Completion from Laser Scan Data with Weak Supervision

Stutz, D., Geiger, A.

In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE Computer Society, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2018, 2018 (inproceedings)

Abstract
3D shape completion from partial point clouds is a fundamental problem in computer vision and computer graphics. Recent approaches can be characterized as either data-driven or learning-based. Data-driven approaches rely on a shape model whose parameters are optimized to fit the observations. Learning-based approaches, in contrast, avoid the expensive optimization step and instead directly predict the complete shape from the incomplete observations using deep neural networks. However, full supervision is required which is often not available in practice. In this work, we propose a weakly-supervised learning-based approach to 3D shape completion which neither requires slow optimization nor direct supervision. While we also learn a shape prior on synthetic data, we amortize, ie, learn, maximum likelihood fitting using deep neural networks resulting in efficient shape completion without sacrificing accuracy. Tackling 3D shape completion of cars on ShapeNet and KITTI, we demonstrate that the proposed amortized maximum likelihood approach is able to compete with a fully supervised baseline and a state-of-the-art data-driven approach while being significantly faster. On ModelNet, we additionally show that the approach is able to generalize to other object categories as well.

avg

pdf suppmat Project Page Poster Project Page [BibTex]

pdf suppmat Project Page Poster Project Page [BibTex]


Learning Transformation Invariant Representations with Weak Supervision
Learning Transformation Invariant Representations with Weak Supervision

Coors, B., Condurache, A., Mertins, A., Geiger, A.

In International Conference on Computer Vision Theory and Applications, International Conference on Computer Vision Theory and Applications, 2018 (inproceedings)

Abstract
Deep convolutional neural networks are the current state-of-the-art solution to many computer vision tasks. However, their ability to handle large global and local image transformations is limited. Consequently, extensive data augmentation is often utilized to incorporate prior knowledge about desired invariances to geometric transformations such as rotations or scale changes. In this work, we combine data augmentation with an unsupervised loss which enforces similarity between the predictions of augmented copies of an input sample. Our loss acts as an effective regularizer which facilitates the learning of transformation invariant representations. We investigate the effectiveness of the proposed similarity loss on rotated MNIST and the German Traffic Sign Recognition Benchmark (GTSRB) in the context of different classification models including ladder networks. Our experiments demonstrate improvements with respect to the standard data augmentation approach for supervised and semi-supervised learning tasks, in particular in the presence of little annotated data. In addition, we analyze the performance of the proposed approach with respect to its hyperparameters, including the strength of the regularization as well as the layer where representation similarity is enforced.

avg

pdf [BibTex]

pdf [BibTex]

2016


Soft continuous microrobots with multiple intrinsic degrees of freedom
Soft continuous microrobots with multiple intrinsic degrees of freedom

Palagi, S., Mark, A. G., Melde, K., Zeng, H., Parmeggiani, C., Martella, D., Wiersma, D. S., Fischer, P.

In 2016 International Conference on Manipulation, Automation and Robotics at Small Scales (MARSS), pages: 1-5, July 2016 (inproceedings)

Abstract
One of the main challenges in the development of microrobots, i.e. robots at the sub-millimeter scale, is the difficulty of adopting traditional solutions for power, control and, especially, actuation. As a result, most current microrobots are directly manipulated by external fields, and possess only a few passive degrees of freedom (DOFs). We have reported a strategy that enables embodiment, remote powering and control of a large number of DOFs in mobile soft microrobots. These consist of photo-responsive materials, such that the actuation of their soft continuous body can be selectively and dynamically controlled by structured light fields. Here we use finite-element modelling to evaluate the effective number of DOFs that are addressable in our microrobots. We also demonstrate that by this flexible approach different actuation patterns can be obtained, and thus different locomotion performances can be achieved within the very same microrobot. The reported results confirm the versatility of the proposed approach, which allows for easy application-specific optimization and online reconfiguration of the microrobot's behavior. Such versatility will enable advanced applications of robotics and automation at the micro scale.

pf

DOI [BibTex]

2016


DOI [BibTex]


Wireless actuator based on ultrasonic bubble streaming
Wireless actuator based on ultrasonic bubble streaming

Qiu, T., Palagi, S., Mark, A. G., Melde, K., Fischer, P.

In 2016 International Conference on Manipulation, Automation and Robotics at Small Scales (MARSS), pages: 1-5, July 2016 (inproceedings)

Abstract
Miniaturized actuators are a key element for the manipulation and automation at small scales. Here, we propose a new miniaturized actuator, which consists of an array of micro gas bubbles immersed in a fluid. Under ultrasonic excitation, the oscillation of micro gas bubbles results in acoustic streaming and provides a propulsive force that drives the actuator. The actuator was fabricated by lithography and fluidic streaming was observed under ultrasound excitation. Theoretical modelling and numerical simulations were carried out to show that lowing the surface tension results in a larger amplitude of the bubble oscillation, and thus leads to a higher propulsive force. Experimental results also demonstrate that the propulsive force increases 3.5 times when the surface tension is lowered by adding a surfactant. An actuator with a 4×4 mm 2 surface area provides a driving force of about 0.46 mN, suggesting that it is possible to be used as a wireless actuator for small-scale robots and medical instruments.

pf

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Patches, Planes and Probabilities: A Non-local Prior for Volumetric {3D} Reconstruction
Patches, Planes and Probabilities: A Non-local Prior for Volumetric 3D Reconstruction

Ulusoy, A. O., Black, M. J., Geiger, A.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)

Abstract
In this paper, we propose a non-local structured prior for volumetric multi-view 3D reconstruction. Towards this goal, we present a novel Markov random field model based on ray potentials in which assumptions about large 3D surface patches such as planarity or Manhattan world constraints can be efficiently encoded as probabilistic priors. We further derive an inference algorithm that reasons jointly about voxels, pixels and image segments, and estimates marginal distributions of appearance, occupancy, depth, normals and planarity. Key to tractable inference is a novel hybrid representation that spans both voxel and pixel space and that integrates non-local information from 2D image segmentations in a principled way. We compare our non-local prior to commonly employed local smoothness assumptions and a variety of state-of-the-art volumetric reconstruction baselines on challenging outdoor scenes with textureless and reflective surfaces. Our experiments indicate that regularizing over larger distances has the potential to resolve ambiguities where local regularizers fail.

avg ps

YouTube pdf poster suppmat Project Page [BibTex]

YouTube pdf poster suppmat Project Page [BibTex]


Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer
Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer

Xie, J., Kiefel, M., Sun, M., Geiger, A.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)

Abstract
Semantic annotations are vital for training models for object recognition, semantic segmentation or scene understanding. Unfortunately, pixelwise annotation of images at very large scale is labor-intensive and only little labeled data is available, particularly at instance level and for street scenes. In this paper, we propose to tackle this problem by lifting the semantic instance labeling task from 2D into 3D. Given reconstructions from stereo or laser data, we annotate static 3D scene elements with rough bounding primitives and develop a probabilistic model which transfers this information into the image domain. We leverage our method to obtain 2D labels for a novel suburban video dataset which we have collected, resulting in 400k semantic and instance image annotations. A comparison of our method to state-of-the-art label transfer baselines reveals that 3D information enables more efficient annotation while at the same time resulting in improved accuracy and time-coherent labels.

avg ps

pdf suppmat Project Page Project Page [BibTex]

pdf suppmat Project Page Project Page [BibTex]


Auxetic Metamaterial Simplifies Soft Robot Design
Auxetic Metamaterial Simplifies Soft Robot Design

Mark, A. G., Palagi, S., Qiu, T., Fischer, P.

In 2016 IEEE Int. Conf. on Robotics and Automation (ICRA), pages: 4951-4956, May 2016 (inproceedings)

Abstract
Soft materials are being adopted in robotics in order to facilitate biomedical applications and in order to achieve simpler and more capable robots. One route to simplification is to design the robot's body using `smart materials' that carry the burden of control and actuation. Metamaterials enable just such rational design of the material properties. Here we present a soft robot that exploits mechanical metamaterials for the intrinsic synchronization of two passive clutches which contact its travel surface. Doing so allows it to move through an enclosed passage with an inchworm motion propelled by a single actuator. Our soft robot consists of two 3D-printed metamaterials that implement auxetic and normal elastic properties. The design, fabrication and characterization of the metamaterials are described. In addition, a working soft robot is presented. Since the synchronization mechanism is a feature of the robot's material body, we believe that the proposed design will enable compliant and robust implementations that scale well with miniaturization.

pf

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Towards Photo-Induced Swimming: Actuation of Liquid Crystalline  Elastomer in Water
Towards Photo-Induced Swimming: Actuation of Liquid Crystalline Elastomer in Water

cerretti, G., Martella, D., Zeng, H., Parmeggiani, C., Palagi, S., Mark, A. G., Melde, K., Qiu, T., Fischer, P., Wiersma, D.

In Proc. of SPIE 9738, pages: Laser 3D Manufacturing III, 97380T, April 2016 (inproceedings)

Abstract
Liquid Crystalline Elastomers (LCEs) are very promising smart materials that can be made sensitive to different external stimuli, such as heat, pH, humidity and light, by changing their chemical composition. In this paper we report the implementation of a nematically aligned LCE actuator able to undergo large light-induced deformations. We prove that this property is still present even when the actuator is submerged in fresh water. Thanks to the presence of azo-dye moieties, capable of going through a reversible trans-cis photo-isomerization, and by applying light with two different wavelengths we managed to control the bending of such actuator in the liquid environment. The reported results represent the first step towards swimming microdevices powered by light.

pf

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Deep Discrete Flow
Deep Discrete Flow

Güney, F., Geiger, A.

Asian Conference on Computer Vision (ACCV), 2016 (conference) Accepted

avg ps

pdf suppmat Project Page [BibTex]

pdf suppmat Project Page [BibTex]