Autonomous Robotic Manipulation
Modeling Top-Down Saliency for Visual Object Search
Interactive Perception
State Estimation and Sensor Fusion for the Control of Legged Robots
Probabilistic Object and Manipulator Tracking
Global Object Shape Reconstruction by Fusing Visual and Tactile Data
Robot Arm Pose Estimation as a Learning Problem
Learning to Grasp from Big Data
Gaussian Filtering as Variational Inference
Template-Based Learning of Model Free Grasping
Associative Skill Memories
Real-Time Perception meets Reactive Motion Generation
Autonomous Robotic Manipulation
Learning Coupling Terms of Movement Primitives
State Estimation and Sensor Fusion for the Control of Legged Robots
Inverse Optimal Control
Motion Optimization
Optimal Control for Legged Robots
Movement Representation for Reactive Behavior
Associative Skill Memories
Real-Time Perception meets Reactive Motion Generation
Robot Arm Pose Estimation as a Learning Problem

Purposeful and robust manipulation requires a good hand-eye coordination. To a certain extend this can be achieved using information from joint encoders and known kinematics. However, for many robots a significant error in the pose of the end-effector and fingers of several centimeters remains. Especially for fine manipulation tasks, this poses a challenge.
For achieving the desired accuracy, we aim to visually track the arm in the camera; the same frame in which we usually detect the target object. Given these estimates, we can then control the manipulation tasks with techniques such as visual servoing.
In this project, we propose to frame the problem of marker-less robot arm pose estimation as a learning problem. The only input to the method is the depth image from an RGB-D sensor. The output is the joint configuration of the robot arm. We learn the mapping from a large number of synthetically generated and labeled depth images.
In [], we treat this problem as a pixel-wise classification problem using a random decision forest. From all the training samples ending up at a leaf node, a set of offsets is learned that votes for relative joint positions. Pooling these votes over all foreground pixels and subsequent clustering gives us an estimate of the true joint positions. Due to the intrinsic parallelism of pixel-wise classification, this approach can run faster than 30Hz. The approach is a frame-by-frame method and does not require any initialization as for example ICP-style or tracking methods.
Video
Members
Publications