Header logo is


2019


Attacking Optical Flow
Attacking Optical Flow

Ranjan, A., Janai, J., Geiger, A., Black, M. J.

In Proceedings International Conference on Computer Vision (ICCV), pages: 2404-2413, IEEE, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), November 2019, ISSN: 2380-7504 (inproceedings)

Abstract
Deep neural nets achieve state-of-the-art performance on the problem of optical flow estimation. Since optical flow is used in several safety-critical applications like self-driving cars, it is important to gain insights into the robustness of those techniques. Recently, it has been shown that adversarial attacks easily fool deep neural networks to misclassify objects. The robustness of optical flow networks to adversarial attacks, however, has not been studied so far. In this paper, we extend adversarial patch attacks to optical flow networks and show that such attacks can compromise their performance. We show that corrupting a small patch of less than 1% of the image size can significantly affect optical flow estimates. Our attacks lead to noisy flow estimates that extend significantly beyond the region of the attack, in many cases even completely erasing the motion of objects in the scene. While networks using an encoder-decoder architecture are very sensitive to these attacks, we found that networks using a spatial pyramid architecture are less affected. We analyse the success and failure of attacking both architectures by visualizing their feature maps and comparing them to classical optical flow techniques which are robust to these attacks. We also demonstrate that such attacks are practical by placing a printed pattern into real scenes.

avg ps

Video Project Page Paper Supplementary Material link (url) DOI [BibTex]

2019


Video Project Page Paper Supplementary Material link (url) DOI [BibTex]


Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics
Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics

Niemeyer, M., Mescheder, L., Oechsle, M., Geiger, A.

International Conference on Computer Vision, October 2019 (conference)

Abstract
Deep learning based 3D reconstruction techniques have recently achieved impressive results. However, while state-of-the-art methods are able to output complex 3D geometry, it is not clear how to extend these results to time-varying topologies. Approaches treating each time step individually lack continuity and exhibit slow inference, while traditional 4D reconstruction methods often utilize a template model or discretize the 4D space at fixed resolution. In this work, we present Occupancy Flow, a novel spatio-temporal representation of time-varying 3D geometry with implicit correspondences. Towards this goal, we learn a temporally and spatially continuous vector field which assigns a motion vector to every point in space and time. In order to perform dense 4D reconstruction from images or sparse point clouds, we combine our method with a continuous 3D representation. Implicitly, our model yields correspondences over time, thus enabling fast inference while providing a sound physical description of the temporal dynamics. We show that our method can be used for interpolation and reconstruction tasks, and demonstrate the accuracy of the learned correspondences. We believe that Occupancy Flow is a promising new 4D representation which will be useful for a variety of spatio-temporal reconstruction tasks.

avg

pdf poster suppmat code Project page video blog [BibTex]


Texture Fields: Learning Texture Representations in Function Space
Texture Fields: Learning Texture Representations in Function Space

Oechsle, M., Mescheder, L., Niemeyer, M., Strauss, T., Geiger, A.

International Conference on Computer Vision, October 2019 (conference)

Abstract
In recent years, substantial progress has been achieved in learning-based reconstruction of 3D objects. At the same time, generative models were proposed that can generate highly realistic images. However, despite this success in these closely related tasks, texture reconstruction of 3D objects has received little attention from the research community and state-of-the-art methods are either limited to comparably low resolution or constrained experimental setups. A major reason for these limitations is that common representations of texture are inefficient or hard to interface for modern deep learning techniques. In this paper, we propose Texture Fields, a novel texture representation which is based on regressing a continuous 3D function parameterized with a neural network. Our approach circumvents limiting factors like shape discretization and parameterization, as the proposed texture representation is independent of the shape representation of the 3D object. We show that Texture Fields are able to represent high frequency texture and naturally blend with modern deep learning techniques. Experimentally, we find that Texture Fields compare favorably to state-of-the-art methods for conditional texture reconstruction of 3D objects and enable learning of probabilistic generative models for texturing unseen 3D models. We believe that Texture Fields will become an important building block for the next generation of generative 3D models.

avg

pdf suppmat video poster blog Project Page [BibTex]


NoVA: Learning to See in Novel Viewpoints and Domains
NoVA: Learning to See in Novel Viewpoints and Domains

Coors, B., Condurache, A. P., Geiger, A.

In 2019 International Conference on 3D Vision (3DV), pages: 116-125, IEEE, 2019 International Conference on 3D Vision (3DV), September 2019 (inproceedings)

Abstract
Domain adaptation techniques enable the re-use and transfer of existing labeled datasets from a source to a target domain in which little or no labeled data exists. Recently, image-level domain adaptation approaches have demonstrated impressive results in adapting from synthetic to real-world environments by translating source images to the style of a target domain. However, the domain gap between source and target may not only be caused by a different style but also by a change in viewpoint. This case necessitates a semantically consistent translation of source images and labels to the style and viewpoint of the target domain. In this work, we propose the Novel Viewpoint Adaptation (NoVA) model, which enables unsupervised adaptation to a novel viewpoint in a target domain for which no labeled data is available. NoVA utilizes an explicit representation of the 3D scene geometry to translate source view images and labels to the target view. Experiments on adaptation to synthetic and real-world datasets show the benefit of NoVA compared to state-of-the-art domain adaptation approaches on the task of semantic segmentation.

avg

pdf suppmat poster video DOI [BibTex]

pdf suppmat poster video DOI [BibTex]


Taking a Deeper Look at the Inverse Compositional Algorithm
Taking a Deeper Look at the Inverse Compositional Algorithm

Lv, Z., Dellaert, F., Rehg, J. M., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
In this paper, we provide a modern synthesis of the classic inverse compositional algorithm for dense image alignment. We first discuss the assumptions made by this well-established technique, and subsequently propose to relax these assumptions by incorporating data-driven priors into this model. More specifically, we unroll a robust version of the inverse compositional algorithm and replace multiple components of this algorithm using more expressive models whose parameters we train in an end-to-end fashion from data. Our experiments on several challenging 3D rigid motion estimation tasks demonstrate the advantages of combining optimization with learning-based techniques, outperforming the classic inverse compositional algorithm as well as data-driven image-to-pose regression approaches.

avg

pdf suppmat Video Project Page Poster [BibTex]

pdf suppmat Video Project Page Poster [BibTex]


no image
Collective Formation and Cooperative Function of a Magnetic Microrobotic Swarm

Xiaoguang Dong, M. S.

IEEE, Robotics: Science and Systems, June 2019 (conference)

Abstract
Untethered magnetically actuated microrobots can access distant, enclosed and small spaces, such as inside microfluidic channels and the human body, making them appealing for minimal invasive tasks. Despite the simplicity of individual magnetic microrobots, a collective of these microrobots that can work closely and cooperatively would significantly enhance their capabilities. However, a challenge of realizing such collective magnetic microrobots is to coordinate their formations and motions with underactuated control signals. Here, we report a method that allows collective magnetic microrobots working closely and cooperatively by controlling their two-dimensional (2D) formations and collective motions in a programmable manner. The actively designed formation and intrinsic adjustable compliance within the group allow bio-inspired collective behaviors, such as navigating through cluttered environments and reconfigurable cooperative manipulation ability. These collective magnetic microrobots thus could enable potential applications in programmable self-assembly, modular robotics, swarm robotics, and biomedicine.

pi

Collective Formation and Cooperative Function of a Magnetic Microrobotic Swarm DOI [BibTex]


MOTS: Multi-Object Tracking and Segmentation
MOTS: Multi-Object Tracking and Segmentation

Voigtlaender, P., Krause, M., Osep, A., Luiten, J., Sekar, B. B. G., Geiger, A., Leibe, B.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
This paper extends the popular task of multi-object tracking to multi-object tracking and segmentation (MOTS). Towards this goal, we create dense pixel-level annotations for two existing tracking datasets using a semi-automatic annotation procedure. Our new annotations comprise 65,213 pixel masks for 977 distinct objects (cars and pedestrians) in 10,870 video frames. For evaluation, we extend existing multi-object tracking metrics to this new task. Moreover, we propose a new baseline method which jointly addresses detection, tracking, and segmentation with a single convolutional network. We demonstrate the value of our datasets by achieving improvements in performance when training on MOTS annotations. We believe that our datasets, metrics and baseline will become a valuable resource towards developing multi-object tracking approaches that go beyond 2D bounding boxes.

avg

pdf suppmat Project Page Poster Video Project Page [BibTex]

pdf suppmat Project Page Poster Video Project Page [BibTex]


PointFlowNet: Learning Representations for Rigid Motion Estimation from Point Clouds
PointFlowNet: Learning Representations for Rigid Motion Estimation from Point Clouds

Behl, A., Paschalidou, D., Donne, S., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
Despite significant progress in image-based 3D scene flow estimation, the performance of such approaches has not yet reached the fidelity required by many applications. Simultaneously, these applications are often not restricted to image-based estimation: laser scanners provide a popular alternative to traditional cameras, for example in the context of self-driving cars, as they directly yield a 3D point cloud. In this paper, we propose to estimate 3D motion from such unstructured point clouds using a deep neural network. In a single forward pass, our model jointly predicts 3D scene flow as well as the 3D bounding box and rigid body motion of objects in the scene. While the prospect of estimating 3D scene flow from unstructured point clouds is promising, it is also a challenging task. We show that the traditional global representation of rigid body motion prohibits inference by CNNs, and propose a translation equivariant representation to circumvent this problem. For training our deep network, a large dataset is required. Because of this, we augment real scans from KITTI with virtual objects, realistically modeling occlusions and simulating sensor noise. A thorough comparison with classic and learning-based techniques highlights the robustness of the proposed approach.

avg

pdf suppmat Project Page Poster Video [BibTex]

pdf suppmat Project Page Poster Video [BibTex]


Learning Non-volumetric Depth Fusion using Successive Reprojections
Learning Non-volumetric Depth Fusion using Successive Reprojections

Donne, S., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
Given a set of input views, multi-view stereopsis techniques estimate depth maps to represent the 3D reconstruction of the scene; these are fused into a single, consistent, reconstruction -- most often a point cloud. In this work we propose to learn an auto-regressive depth refinement directly from data. While deep learning has improved the accuracy and speed of depth estimation significantly, learned MVS techniques remain limited to the planesweeping paradigm. We refine a set of input depth maps by successively reprojecting information from neighbouring views to leverage multi-view constraints. Compared to learning-based volumetric fusion techniques, an image-based representation allows significantly more detailed reconstructions; compared to traditional point-based techniques, our method learns noise suppression and surface completion in a data-driven fashion. Due to the limited availability of high-quality reconstruction datasets with ground truth, we introduce two novel synthetic datasets to (pre-)train our network. Our approach is able to improve both the output depth maps and the reconstructed point cloud, for both learned and traditional depth estimation front-ends, on both synthetic and real data.

avg

pdf suppmat Project Page Video Poster blog [BibTex]

pdf suppmat Project Page Video Poster blog [BibTex]


Connecting the Dots: Learning Representations for Active Monocular Depth Estimation
Connecting the Dots: Learning Representations for Active Monocular Depth Estimation

Riegler, G., Liao, Y., Donne, S., Koltun, V., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
We propose a technique for depth estimation with a monocular structured-light camera, \ie, a calibrated stereo set-up with one camera and one laser projector. Instead of formulating the depth estimation via a correspondence search problem, we show that a simple convolutional architecture is sufficient for high-quality disparity estimates in this setting. As accurate ground-truth is hard to obtain, we train our model in a self-supervised fashion with a combination of photometric and geometric losses. Further, we demonstrate that the projected pattern of the structured light sensor can be reliably separated from the ambient information. This can then be used to improve depth boundaries in a weakly supervised fashion by modeling the joint statistics of image and depth edges. The model trained in this fashion compares favorably to the state-of-the-art on challenging synthetic and real-world datasets. In addition, we contribute a novel simulator, which allows to benchmark active depth prediction algorithms in controlled conditions.

avg

pdf suppmat Poster Project Page [BibTex]

pdf suppmat Poster Project Page [BibTex]


Superquadrics Revisited: Learning 3D Shape Parsing beyond Cuboids
Superquadrics Revisited: Learning 3D Shape Parsing beyond Cuboids

Paschalidou, D., Ulusoy, A. O., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, June 2019 (inproceedings)

Abstract
Abstracting complex 3D shapes with parsimonious part-based representations has been a long standing goal in computer vision. This paper presents a learning-based solution to this problem which goes beyond the traditional 3D cuboid representation by exploiting superquadrics as atomic elements. We demonstrate that superquadrics lead to more expressive 3D scene parses while being easier to learn than 3D cuboid representations. Moreover, we provide an analytical solution to the Chamfer loss which avoids the need for computational expensive reinforcement learning or iterative prediction. Our model learns to parse 3D objects into consistent superquadric representations without supervision. Results on various ShapeNet categories as well as the SURREAL human body dataset demonstrate the flexibility of our model in capturing fine details and complex poses that could not have been modelled using cuboids.

avg

Project Page Poster suppmat pdf Video blog handout [BibTex]

Project Page Poster suppmat pdf Video blog handout [BibTex]


A Magnetically-Actuated Untethered Jellyfish-Inspired Soft Milliswimmer
A Magnetically-Actuated Untethered Jellyfish-Inspired Soft Milliswimmer

(Best Paper Award)

Ziyu Ren, T. W., Hu, W.

RSS 2019: Robotics: Science and Systems Conference, June 2019 (conference)

pi

[BibTex]

[BibTex]


no image
Impact of Expertise on Interaction Preferences for Navigation Assistance of Visually Impaired Individuals

Dragan, A., Joao, G., Eshed, O., M., K. K., Chieko, A.

Proceedings International Web for All Conference (W4A), Association for Computing Machinery, 16th International Web for All Conference (W4A), May 2019 (conference)

avg

DOI [BibTex]

DOI [BibTex]


Real-Time Dense Mapping for Self-Driving Vehicles using Fisheye Cameras
Real-Time Dense Mapping for Self-Driving Vehicles using Fisheye Cameras

Cui, Z., Heng, L., Yeo, Y. C., Geiger, A., Pollefeys, M., Sattler, T.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2019, IEEE, International Conference on Robotics and Automation, May 2019 (inproceedings)

Abstract
We present a real-time dense geometric mapping algorithm for large-scale environments. Unlike existing methods which use pinhole cameras, our implementation is based on fisheye cameras which have larger field of view and benefit some other tasks including Visual-Inertial Odometry, localization and object detection around vehicles. Our algorithm runs on in-vehicle PCs at 15 Hz approximately, enabling vision-only 3D scene perception for self-driving vehicles. For each synchronized set of images captured by multiple cameras, we first compute a depth map for a reference camera using plane-sweeping stereo. To maintain both accuracy and efficiency, while accounting for the fact that fisheye images have a rather low resolution, we recover the depths using multiple image resolutions. We adopt the fast object detection framework YOLOv3 to remove potentially dynamic objects. At the end of the pipeline, we fuse the fisheye depth images into the truncated signed distance function (TSDF) volume to obtain a 3D map. We evaluate our method on large-scale urban datasets, and results show that our method works well even in complex environments.

avg

pdf video poster Project Page [BibTex]

pdf video poster Project Page [BibTex]


Project AutoVision: Localization and 3D Scene Perception for an Autonomous Vehicle with a Multi-Camera System
Project AutoVision: Localization and 3D Scene Perception for an Autonomous Vehicle with a Multi-Camera System

Heng, L., Choi, B., Cui, Z., Geppert, M., Hu, S., Kuan, B., Liu, P., Nguyen, R. M. H., Yeo, Y. C., Geiger, A., Lee, G. H., Pollefeys, M., Sattler, T.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2019, IEEE, International Conference on Robotics and Automation, May 2019 (inproceedings)

Abstract
Project AutoVision aims to develop localization and 3D scene perception capabilities for a self-driving vehicle. Such capabilities will enable autonomous navigation in urban and rural environments, in day and night, and with cameras as the only exteroceptive sensors. The sensor suite employs many cameras for both 360-degree coverage and accurate multi-view stereo; the use of low-cost cameras keeps the cost of this sensor suite to a minimum. In addition, the project seeks to extend the operating envelope to include GNSS-less conditions which are typical for environments with tall buildings, foliage, and tunnels. Emphasis is placed on leveraging multi-view geometry and deep learning to enable the vehicle to localize and perceive in 3D space. This paper presents an overview of the project, and describes the sensor suite and current progress in the areas of calibration, localization, and perception.

avg

pdf [BibTex]

pdf [BibTex]


no image
Elastic modulus affects adhesive strength of gecko-inspired synthetics in variable temperature and humidity

Mitchell, CT, Drotlef, D, Dayan, CB, Sitti, M, Stark, AY

In INTEGRATIVE AND COMPARATIVE BIOLOGY, pages: E372-E372, OXFORD UNIV PRESS INC JOURNALS DEPT, 2001 EVANS RD, CARY, NC 27513 USA, March 2019 (inproceedings)

pi

[BibTex]

[BibTex]


Wide Range-Sensitive, Bending-Insensitive Pressure Detection and Application to Wearable Healthcare Device
Wide Range-Sensitive, Bending-Insensitive Pressure Detection and Application to Wearable Healthcare Device

Kim, S., Amjadi, M., Lee, T., Jeong, Y., Kwon, D., Kim, M. S., Kim, K., Kim, T., Oh, Y. S., Park, I.

In 2019 20th International Conference on Solid-State Sensors, Actuators and Microsystems & Eurosensors XXXIII (TRANSDUCERS & EUROSENSORS XXXIII), 2019 (inproceedings)

pi

[BibTex]

[BibTex]


no image
Gecko-inspired composite microfibers for reversible adhesion on smooth and rough surfaces

Drotlef, D., Dayan, C., Sitti, M.

In INTEGRATIVE AND COMPARATIVE BIOLOGY, pages: E58-E58, OXFORD UNIV PRESS INC JOURNALS DEPT, 2001 EVANS RD, CARY, NC 27513 USA, 2019 (inproceedings)

pi

[BibTex]

[BibTex]


no image
Geometric Image Synthesis

Abu Alhaija, H., Mustikovela, S. K., Geiger, A., Rother, C.

Computer Vision – ACCV 2018, 11366, pages: 85-100, Lecture Notes in Computer Science, (Editors: Jawahar, C. and Li, H. and Mori, G. and Schindler, K. ), Asian Conference on Computer Vision, 2019 (conference)

avg

DOI Project Page [BibTex]

DOI Project Page [BibTex]


Occupancy Networks: Learning 3D Reconstruction in Function Space
Occupancy Networks: Learning 3D Reconstruction in Function Space

Mescheder, L., Oechsle, M., Niemeyer, M., Nowozin, S., Geiger, A.

In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) 2019, 2019 (inproceedings)

Abstract
With the advent of deep neural networks, learning-based approaches for 3D reconstruction have gained popularity. However, unlike for images, in 3D there is no canonical representation which is both computationally and memory efficient yet allows for representing high-resolution geometry of arbitrary topology. Many of the state-of-the-art learning-based 3D reconstruction approaches can hence only represent very coarse 3D geometry or are limited to a restricted domain. In this paper, we propose Occupancy Networks, a new representation for learning-based 3D reconstruction methods. Occupancy networks implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier. In contrast to existing approaches, our representation encodes a description of the 3D output at infinite resolution without excessive memory footprint. We validate that our representation can efficiently encode 3D structure and can be inferred from various kinds of input. Our experiments demonstrate competitive results, both qualitatively and quantitatively, for the challenging tasks of 3D reconstruction from single images, noisy point clouds and coarse discrete voxel grids. We believe that occupancy networks will become a useful tool in a wide variety of learning-based 3D tasks.

avg

Code Video pdf suppmat Project Page blog [BibTex]

Code Video pdf suppmat Project Page blog [BibTex]

2016


Steering control of a water-running robot using an active tail
Steering control of a water-running robot using an active tail

Kim, H., Jeong, K., Sitti, M., Seo, T.

In Intelligent Robots and Systems (IROS), 2016 IEEE/RSJ International Conference on, pages: 4945-4950, October 2016 (inproceedings)

Abstract
Many highly dynamic novel mobile robots have been developed being inspired by animals. In this study, we are inspired by a basilisk lizard's ability to run and steer on water surface for a hexapedal robot. The robot has an active tail with a circular plate, which the robot rotates to steer on water. We dynamically modeled the platform and conducted simulations and experiments on steering locomotion with a bang-bang controller. The robot can steer on water by rotating the tail, and the controlled steering locomotion is stable. The dynamic modelling approximates the robot's steering locomotion and the trends of the simulations and experiments are similar, although there are errors between the desired and actual angles. The robot's maneuverability on water can be improved through further research.

pi

DOI [BibTex]

2016


DOI [BibTex]


Targeting of cell mockups using sperm-shaped microrobots in vitro
Targeting of cell mockups using sperm-shaped microrobots in vitro

Khalil, I. S., Tabak, A. F., Hosney, A., Klingner, A., Shalaby, M., Abdel-Kader, R. M., Serry, M., Sitti, M.

In Biomedical Robotics and Biomechatronics (BioRob), 2016 6th IEEE International Conference on, pages: 495-501, July 2016 (inproceedings)

Abstract
Sperm-shaped microrobots are controlled under the influence of weak oscillating magnetic fields (milliTesla range) to selectively target cell mockups (i.e., gas bubbles with average diameter of 200 μm). The sperm-shaped microrobots are fabricated by electrospinning using a solution of polystyrene, dimethylformamide, and iron oxide nanoparticles. These nanoparticles are concentrated within the head of the microrobot, and hence enable directional control along external magnetic fields. The magnetic dipole moment of the microrobot is characterized (using the flip-time technique) to be 1.4×10-11 A.m2, at magnetic field of 28 mT. In addition, the morphology of the microrobot is characterized using Scanning Electron Microscopy images. The characterized parameters and morphology are used in the simulation of the locomotion mechanism of the microrobot to prove that its motion depends on breaking the time-reversal symmetry, rather than pulling with the magnetic field gradient. We experimentally demonstrate that the microrobot can controllably follow S-shaped, U-shaped, and square paths, and selectively target the cell mockups using image guidance and under the influence of the oscillating magnetic fields.

pi

DOI [BibTex]

DOI [BibTex]


Analysis of the magnetic torque on a tilted permanent magnet for drug delivery in capsule robots
Analysis of the magnetic torque on a tilted permanent magnet for drug delivery in capsule robots

Munoz, F., Alici, G., Zhou, H., Li, W., Sitti, M.

In Advanced Intelligent Mechatronics (AIM), 2016 IEEE International Conference on, pages: 1386-1391, July 2016 (inproceedings)

Abstract
In this paper, we present the analysis of the torque transmitted to a tilted permanent magnet that is to be embedded in a capsule robot to achieve targeted drug delivery. This analysis is carried out by using an analytical model and experimental results for a small cubic permanent magnet that is driven by an external magnetic system made of an array of arc-shaped permanent magnets (ASMs). Our experimental results, which are in agreement with the analytical results, show that the cubic permanent magnet can safely be actuated for inclinations lower than 75° without having to make positional adjustments in the external magnetic system. We have found that with further inclinations, the cubic permanent magnet to be embedded in a drug delivery mechanism may stall. When it stalls, the external magnetic system's position and orientation would have to be adjusted to actuate the cubic permanent magnet and the drug release mechanism. This analysis of the transmitted torque is helpful for the development of real-time control strategies for magnetically articulated devices.

pi

DOI [BibTex]

DOI [BibTex]


Patches, Planes and Probabilities: A Non-local Prior for Volumetric {3D} Reconstruction
Patches, Planes and Probabilities: A Non-local Prior for Volumetric 3D Reconstruction

Ulusoy, A. O., Black, M. J., Geiger, A.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)

Abstract
In this paper, we propose a non-local structured prior for volumetric multi-view 3D reconstruction. Towards this goal, we present a novel Markov random field model based on ray potentials in which assumptions about large 3D surface patches such as planarity or Manhattan world constraints can be efficiently encoded as probabilistic priors. We further derive an inference algorithm that reasons jointly about voxels, pixels and image segments, and estimates marginal distributions of appearance, occupancy, depth, normals and planarity. Key to tractable inference is a novel hybrid representation that spans both voxel and pixel space and that integrates non-local information from 2D image segmentations in a principled way. We compare our non-local prior to commonly employed local smoothness assumptions and a variety of state-of-the-art volumetric reconstruction baselines on challenging outdoor scenes with textureless and reflective surfaces. Our experiments indicate that regularizing over larger distances has the potential to resolve ambiguities where local regularizers fail.

avg ps

YouTube pdf poster suppmat Project Page [BibTex]

YouTube pdf poster suppmat Project Page [BibTex]


Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer
Semantic Instance Annotation of Street Scenes by 3D to 2D Label Transfer

Xie, J., Kiefel, M., Sun, M., Geiger, A.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)

Abstract
Semantic annotations are vital for training models for object recognition, semantic segmentation or scene understanding. Unfortunately, pixelwise annotation of images at very large scale is labor-intensive and only little labeled data is available, particularly at instance level and for street scenes. In this paper, we propose to tackle this problem by lifting the semantic instance labeling task from 2D into 3D. Given reconstructions from stereo or laser data, we annotate static 3D scene elements with rough bounding primitives and develop a probabilistic model which transfers this information into the image domain. We leverage our method to obtain 2D labels for a novel suburban video dataset which we have collected, resulting in 400k semantic and instance image annotations. A comparison of our method to state-of-the-art label transfer baselines reveals that 3D information enables more efficient annotation while at the same time resulting in improved accuracy and time-coherent labels.

avg ps

pdf suppmat Project Page Project Page [BibTex]

pdf suppmat Project Page Project Page [BibTex]


Sperm-shaped magnetic microrobots: Fabrication using electrospinning, modeling, and characterization
Sperm-shaped magnetic microrobots: Fabrication using electrospinning, modeling, and characterization

Khalil, I. S., Tabak, A. F., Hosney, A., Mohamed, A., Klingner, A., Ghoneima, M., Sitti, M.

In Robotics and Automation (ICRA), 2016 IEEE International Conference on, pages: 1939-1944, May 2016 (inproceedings)

Abstract
We use electrospinning to fabricate sperm-shaped magnetic microrobots with a range of diameters from 50 μm to 500 μm. The variables of the electrospinning operation (voltage, concentration of the solution, dynamic viscosity, and distance between the syringe needle and collector) to achieve beading effect are determined. This beading effect allows us to fabricate microrobots with similar morphology to that of sperm cells. The bead and the ultra-fine fiber resemble the morphology of the head and tail of the sperm cell, respectively. We incorporate iron oxide nanoparticles to the head of the sperm-shaped microrobot to provide a magnetic dipole moment. This dipole enables directional control under the influence of external magnetic fields. We also apply weak (less than 2 mT) oscillating magnetic fields to exert a magnetic torque on the magnetic head, and generate planar flagellar waves and flagellated swim. The average speed of the sperm-shaped microrobot is calculated to be 0.5 body lengths per second and 1 body lengths per second at frequencies of 5 Hz and 10 Hz, respectively. We also develop a model of the microrobot using elastohydrodynamics approach and Timoshenko-Rayleigh beam theory, and find good agreement with the experimental results.

pi

DOI [BibTex]

DOI [BibTex]


Deep Discrete Flow
Deep Discrete Flow

Güney, F., Geiger, A.

Asian Conference on Computer Vision (ACCV), 2016 (conference) Accepted

avg ps

pdf suppmat Project Page [BibTex]

pdf suppmat Project Page [BibTex]

2015


Exploiting Object Similarity in 3D Reconstruction
Exploiting Object Similarity in 3D Reconstruction

Zhou, C., Güney, F., Wang, Y., Geiger, A.

In International Conference on Computer Vision (ICCV), December 2015 (inproceedings)

Abstract
Despite recent progress, reconstructing outdoor scenes in 3D from movable platforms remains a highly difficult endeavor. Challenges include low frame rates, occlusions, large distortions and difficult lighting conditions. In this paper, we leverage the fact that the larger the reconstructed area, the more likely objects of similar type and shape will occur in the scene. This is particularly true for outdoor scenes where buildings and vehicles often suffer from missing texture or reflections, but share similarity in 3D shape. We take advantage of this shape similarity by locating objects using detectors and jointly reconstructing them while learning a volumetric model of their shape. This allows us to reduce noise while completing missing surfaces as objects of similar shape benefit from all observations for the respective category. We evaluate our approach with respect to LIDAR ground truth on a novel challenging suburban dataset and show its advantages over the state-of-the-art.

avg ps

pdf suppmat [BibTex]

2015


pdf suppmat [BibTex]


FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation
FollowMe: Efficient Online Min-Cost Flow Tracking with Bounded Memory and Computation

Lenz, P., Geiger, A., Urtasun, R.

In International Conference on Computer Vision (ICCV), International Conference on Computer Vision (ICCV), December 2015 (inproceedings)

Abstract
One of the most popular approaches to multi-target tracking is tracking-by-detection. Current min-cost flow algorithms which solve the data association problem optimally have three main drawbacks: they are computationally expensive, they assume that the whole video is given as a batch, and they scale badly in memory and computation with the length of the video sequence. In this paper, we address each of these issues, resulting in a computationally and memory-bounded solution. First, we introduce a dynamic version of the successive shortest-path algorithm which solves the data association problem optimally while reusing computation, resulting in faster inference than standard solvers. Second, we address the optimal solution to the data association problem when dealing with an incoming stream of data (i.e., online setting). Finally, we present our main contribution which is an approximate online solution with bounded memory and computation which is capable of handling videos of arbitrary length while performing tracking in real time. We demonstrate the effectiveness of our algorithms on the KITTI and PETS2009 benchmarks and show state-of-the-art performance, while being significantly faster than existing solvers.

avg ps

pdf suppmat video project [BibTex]

pdf suppmat video project [BibTex]


Towards Probabilistic Volumetric Reconstruction using Ray Potentials
Towards Probabilistic Volumetric Reconstruction using Ray Potentials

(Best Paper Award)

Ulusoy, A. O., Geiger, A., Black, M. J.

In 3D Vision (3DV), 2015 3rd International Conference on, pages: 10-18, Lyon, October 2015 (inproceedings)

Abstract
This paper presents a novel probabilistic foundation for volumetric 3-d reconstruction. We formulate the problem as inference in a Markov random field, which accurately captures the dependencies between the occupancy and appearance of each voxel, given all input images. Our main contribution is an approximate highly parallelized discrete-continuous inference algorithm to compute the marginal distributions of each voxel's occupancy and appearance. In contrast to the MAP solution, marginals encode the underlying uncertainty and ambiguity in the reconstruction. Moreover, the proposed algorithm allows for a Bayes optimal prediction with respect to a natural reconstruction loss. We compare our method to two state-of-the-art volumetric reconstruction algorithms on three challenging aerial datasets with LIDAR ground truth. Our experiments demonstrate that the proposed algorithm compares favorably in terms of reconstruction accuracy and the ability to expose reconstruction uncertainty.

avg ps

code YouTube pdf suppmat DOI Project Page [BibTex]

code YouTube pdf suppmat DOI Project Page [BibTex]


Compliant wing design for a flapping wing micro air vehicle
Compliant wing design for a flapping wing micro air vehicle

Colmenares, D., Kania, R., Zhang, W., Sitti, M.

In Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International Conference on, pages: 32-39, September 2015 (inproceedings)

Abstract
In this work, we examine several wing designs for a motor-driven, flapping-wing micro air vehicle capable of liftoff. The full system consists of two wings independently driven by geared pager motors that include a spring in parallel with the output shaft. The linear transmission allows for resonant operation, while control is achieved by direct drive of the wing angle. Wings used in previous work were chosen to be fully rigid for simplicity of modeling and fabrication. However, biological wings are highly flexible and other micro air vehicles have successfully utilized flexible wing structures for specialized tasks. The goal of our study is to determine if wing flexibility can be generally used to increase wing performance. Two approaches to lift improvement using flexible wings are explored, resonance of the wing cantilever structure and dynamic wing twisting. We design and test several wings that are compared using different figures of merit. A twisted design improved lift per power by 73.6% and maximum lift production by 53.2% compared to the original rigid design. Wing twist is then modeled in order to propose optimal wing twist profiles that can maximize either wing efficiency or lift production.

pi

DOI [BibTex]

DOI [BibTex]


no image
Millimeter-scale magnetic swimmers using elastomeric undulations

Zhang, J., Diller, E.

In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages: 1706-1711, September 2015 (inproceedings)

Abstract
This paper presents a new soft-bodied millimeterscale swimmer actuated by rotating uniform magnetic fields. The proposed swimmer moves through internal undulatory deformations, resulting from a magnetization profile programmed into its body. To understand the motion of the swimmer, a mathematical model is developed to describe the general relationship between the deflection of a flexible strip and its magnetization profile. As a special case, the situation of the swimmer on the water surface is analyzed and predictions made by the model are experimentally verified. Experimental results show the controllability of the proposed swimmer under a computer vision-based closed-loop controller. The swimmers have nominal dimensions of 1.5×4.9×0.06 mm and a top speed of 50 mm/s (10 body lengths per second). Waypoint following and multiagent control are demonstrated for swimmers constrained at the air-water interface and underwater swimming is also shown, suggesting the promising potential of this type of swimmer in biomedical and microfluidic applications.

pi

link (url) DOI [BibTex]

link (url) DOI [BibTex]


Displets: Resolving Stereo Ambiguities using Object Knowledge
Displets: Resolving Stereo Ambiguities using Object Knowledge

Güney, F., Geiger, A.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) 2015, pages: 4165-4175, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2015 (inproceedings)

Abstract
Stereo techniques have witnessed tremendous progress over the last decades, yet some aspects of the problem still remain challenging today. Striking examples are reflecting and textureless surfaces which cannot easily be recovered using traditional local regularizers. In this paper, we therefore propose to regularize over larger distances using object-category specific disparity proposals (displets) which we sample using inverse graphics techniques based on a sparse disparity estimate and a semantic segmentation of the image. The proposed displets encode the fact that objects of certain categories are not arbitrarily shaped but typically exhibit regular structures. We integrate them as non-local regularizer for the challenging object class 'car' into a superpixel based CRF framework and demonstrate its benefits on the KITTI stereo evaluation.

avg ps

pdf abstract suppmat [BibTex]

pdf abstract suppmat [BibTex]


Object Scene Flow for Autonomous Vehicles
Object Scene Flow for Autonomous Vehicles

Menze, M., Geiger, A.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) 2015, pages: 3061-3070, IEEE, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2015 (inproceedings)

Abstract
This paper proposes a novel model and dataset for 3D scene flow estimation with an application to autonomous driving. Taking advantage of the fact that outdoor scenes often decompose into a small number of independently moving objects, we represent each element in the scene by its rigid motion parameters and each superpixel by a 3D plane as well as an index to the corresponding object. This minimal representation increases robustness and leads to a discrete-continuous CRF where the data term decomposes into pairwise potentials between superpixels and objects. Moreover, our model intrinsically segments the scene into its constituting dynamic components. We demonstrate the performance of our model on existing benchmarks as well as a novel realistic dataset with scene flow ground truth. We obtain this dataset by annotating 400 dynamic scenes from the KITTI raw data collection using detailed 3D CAD models for all vehicles in motion. Our experiments also reveal novel challenges which can't be handled by existing methods.

avg ps

pdf abstract suppmat DOI [BibTex]

pdf abstract suppmat DOI [BibTex]


Fiberbot: A miniature crawling robot using a directional fibrillar pad
Fiberbot: A miniature crawling robot using a directional fibrillar pad

Han, Y., Marvi, H., Sitti, M.

In Robotics and Automation (ICRA), 2015 IEEE International Conference on, pages: 3122-3127, May 2015 (inproceedings)

Abstract
Vibration-driven locomotion has been widely used for crawling robot studies. Such robots usually have a vibration motor as the actuator and a fibrillar structure for providing directional friction on the substrate. However, there has not been any studies about the effect of fiber structure on robot crawling performance. In this paper, we develop Fiberbot, a custom made mini vibration robot, for studying the effect of fiber angle on robot velocity, steering, and climbing performance. It is known that the friction force with and against fibers depends on the fiber angle. Thus, we first present a new fabrication method for making millimeter scale fibers at a wide range of angles. We then show that using 30° angle fibers that have the highest friction anisotropy (ratio of backward to forward friction force) among the other fibers we fabricated in this study, Fiberbot speed on glass increases to 13.8±0.4 cm/s (compared to ν = 0.6±0.1 cm/s using vertical fibers). We also demonstrate that the locomotion direction of Fiberbot depends on the tilting direction of fibers and we can steer the robot by rotating the fiber pad. Fiberbot could also climb on glass at inclinations of up to 10° when equipped with fibers of high friction anisotropy. We show that adding a rigid tail to the robot it can climb on glass at 25° inclines. Moreover, the robot is able to crawl on rough surfaces such as wood (ν = 10.0±0.2 cm/s using 30° fiber pad). Fiberbot, a low-cost vibration robot equipped with a custom-designed fiber pad with steering and climbing capabilities could be used for studies on collective behavior on a wide range of topographies as well as search and exploratory missions.

pi

DOI [BibTex]

DOI [BibTex]


Platform design and tethered flight of a motor-driven flapping-wing system
Platform design and tethered flight of a motor-driven flapping-wing system

Hines, L., Colmenares, D., Sitti, M.

In Robotics and Automation (ICRA), 2015 IEEE International Conference on, pages: 5838-5845, May 2015 (inproceedings)

Abstract
In this work, we examine two design modifications to a tethered motor-driven flapping-wing system. Previously, we had demonstrated a simple mechanism utilizing a linear transmission for resonant operation and direct drive of the wing flapping angle for control. The initial two-wing system had a weight of 2.7 grams and a maximum lift-to-weight ratio of 1.4. While capable of vertical takeoff, in open-loop flight it demonstrated instability and pitch oscillations at the wing flapping frequency, leading to flight times of only a few wing strokes. Here the effect of vertical wing offset as well as an alternative multi-wing layout is investigated and experimentally tested with newly constructed prototypes. With only a change in vertical wing offset, stable open-loop flight of the two-wing flapping system is shown to be theoretically possible, but difficult to achieve with our current design and operating parameters. Both of the new two and four-wing systems, however, prove capable of flying to the end of the tether, with the four-wing system prototype eliminating disruptive wing beat oscillations.

pi

DOI [BibTex]

DOI [BibTex]


Joint 3D Object and Layout Inference from a single RGB-D Image
Joint 3D Object and Layout Inference from a single RGB-D Image

(Best Paper Award)

Geiger, A., Wang, C.

In German Conference on Pattern Recognition (GCPR), 9358, pages: 183-195, Lecture Notes in Computer Science, Springer International Publishing, 2015 (inproceedings)

Abstract
Inferring 3D objects and the layout of indoor scenes from a single RGB-D image captured with a Kinect camera is a challenging task. Towards this goal, we propose a high-order graphical model and jointly reason about the layout, objects and superpixels in the image. In contrast to existing holistic approaches, our model leverages detailed 3D geometry using inverse graphics and explicitly enforces occlusion and visibility constraints for respecting scene properties and projective geometry. We cast the task as MAP inference in a factor graph and solve it efficiently using message passing. We evaluate our method with respect to several baselines on the challenging NYUv2 indoor dataset using 21 object categories. Our experiments demonstrate that the proposed method is able to infer scenes with a large degree of clutter and occlusions.

avg ps

pdf suppmat video project DOI [BibTex]

pdf suppmat video project DOI [BibTex]


Discrete Optimization for Optical Flow
Discrete Optimization for Optical Flow

Menze, M., Heipke, C., Geiger, A.

In German Conference on Pattern Recognition (GCPR), 9358, pages: 16-28, Springer International Publishing, 2015 (inproceedings)

Abstract
We propose to look at large-displacement optical flow from a discrete point of view. Motivated by the observation that sub-pixel accuracy is easily obtained given pixel-accurate optical flow, we conjecture that computing the integral part is the hardest piece of the problem. Consequently, we formulate optical flow estimation as a discrete inference problem in a conditional random field, followed by sub-pixel refinement. Naive discretization of the 2D flow space, however, is intractable due to the resulting size of the label set. In this paper, we therefore investigate three different strategies, each able to reduce computation and memory demands by several orders of magnitude. Their combination allows us to estimate large-displacement optical flow both accurately and efficiently and demonstrates the potential of discrete optimization for optical flow. We obtain state-of-the-art performance on MPI Sintel and KITTI.

avg ps

pdf suppmat project DOI [BibTex]

pdf suppmat project DOI [BibTex]


Joint 3D Estimation of Vehicles and Scene Flow
Joint 3D Estimation of Vehicles and Scene Flow

Menze, M., Heipke, C., Geiger, A.

In Proc. of the ISPRS Workshop on Image Sequence Analysis (ISA), 2015 (inproceedings)

Abstract
Three-dimensional reconstruction of dynamic scenes is an important prerequisite for applications like mobile robotics or autonomous driving. While much progress has been made in recent years, imaging conditions in natural outdoor environments are still very challenging for current reconstruction and recognition methods. In this paper, we propose a novel unified approach which reasons jointly about 3D scene flow as well as the pose, shape and motion of vehicles in the scene. Towards this goal, we incorporate a deformable CAD model into a slanted-plane conditional random field for scene flow estimation and enforce shape consistency between the rendered 3D models and the parameters of all superpixels in the image. The association of superpixels to objects is established by an index variable which implicitly enables model selection. We evaluate our approach on the challenging KITTI scene flow dataset in terms of object and scene flow estimation. Our results provide a prove of concept and demonstrate the usefulness of our method.

avg ps

PDF [BibTex]

PDF [BibTex]

2007


no image
A strategy for vision-based controlled pushing of microparticles

Lynch, N. A., Onal, C., Schuster, E., Sitti, M.

In Robotics and Automation, 2007 IEEE International Conference on, pages: 1413-1418, 2007 (inproceedings)

pi

[BibTex]

2007


[BibTex]


no image
Autonomous 2D microparticle manipulation based on visual feedback

Onal, C. D., Sitti, M.

In Advanced intelligent mechatronics, 2007 IEEE/ASME international conference on, pages: 1-6, 2007 (inproceedings)

pi

[BibTex]

[BibTex]


no image
STRIDE: A highly maneuverable and non-tethered water strider robot

Song, Y. S., Sitti, M.

In Robotics and Automation, 2007 IEEE International Conference on, pages: 980-984, 2007 (inproceedings)

pi

[BibTex]

[BibTex]


no image
Dry spinning polymeric nano/microfiber arrays using glass micropipettes with controlled porosities and fiber diameters

Nain, A. S., Gupta, A., Amon, C., Sitti, M.

In Nanotechnology, 2007. IEEE-NANO 2007. 7th IEEE Conference on, pages: 728-732, 2007 (inproceedings)

pi

[BibTex]

[BibTex]


no image
Microrobotically fabricated biological scaffolds for tissue engineering

Nain, A. S., Chung, F., Rule, M., Jadlowiec, J. A., Campbell, P. G., Amon, C., Sitti, M.

In Robotics and Automation, 2007 IEEE International Conference on, pages: 1918-1923, 2007 (inproceedings)

pi

[BibTex]

[BibTex]


no image
Bacterial flagella assisted propulsion of patterned latex particles: Effect of particle size

Behkam, B., Sitti, M.

In Nanotechnology, 2007. IEEE-NANO 2007. 7th IEEE Conference on, pages: 723-727, 2007 (inproceedings)

pi

Project Page [BibTex]

Project Page [BibTex]


no image
A scaled bilateral control system for experimental 1-D teleoperated nanomanipulation applications

Onal, C. D., Pawashe, C., Sitti, M.

In Intelligent Robots and Systems, 2007. IROS 2007. IEEE/RSJ International Conference on, pages: 483-488, 2007 (inproceedings)

pi

[BibTex]

[BibTex]

2002


no image
Nanomolding based fabrication of synthetic gecko foot-hairs

Sitti, M., Fearing, R. S.

In Nanotechnology, 2002. IEEE-NANO 2002. Proceedings of the 2002 2nd IEEE Conference on, pages: 137-140, 2002 (inproceedings)

pi

Project Page [BibTex]

2002


Project Page [BibTex]

1995


no image
Visual tracking for moving multiple objects: an integration of vision and control

Sitti, M, Bozma, I, Denker, A

In Industrial Electronics, 1995. ISIE’95., Proceedings of the IEEE International Symposium on, 2, pages: 535-540, 1995 (inproceedings)

pi

[BibTex]

1995


[BibTex]