Header logo is


2019


Controlling Heterogeneous Stochastic Growth Processes on Lattices with Limited Resources
Controlling Heterogeneous Stochastic Growth Processes on Lattices with Limited Resources

Haksar, R., Solowjow, F., Trimpe, S., Schwager, M.

In Proceedings of the 58th IEEE International Conference on Decision and Control (CDC) , pages: 1315-1322, 58th IEEE International Conference on Decision and Control (CDC), December 2019 (conference)

ics

PDF [BibTex]

2019


PDF [BibTex]


no image
On the Transfer of Inductive Bias from Simulation to the Real World: a New Disentanglement Dataset

Gondal, M. W., Wuthrich, M., Miladinovic, D., Locatello, F., Breidt, M., Volchkov, V., Akpo, J., Bachem, O., Schölkopf, B., Bauer, S.

Advances in Neural Information Processing Systems 32, pages: 15714-15725, (Editors: H. Wallach and H. Larochelle and A. Beygelzimer and F. d’Alché-Buc and E. Fox and R. Garnett), Curran Associates, Inc., 33rd Annual Conference on Neural Information Processing Systems, December 2019 (conference)

am ei sf

link (url) [BibTex]

link (url) [BibTex]


Learning to Explore in Motion and Interaction Tasks
Learning to Explore in Motion and Interaction Tasks

Bogdanovic, M., Righetti, L.

Proceedings 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages: 2686-2692, IEEE, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), November 2019, ISSN: 2153-0866 (conference)

Abstract
Model free reinforcement learning suffers from the high sampling complexity inherent to robotic manipulation or locomotion tasks. Most successful approaches typically use random sampling strategies which leads to slow policy convergence. In this paper we present a novel approach for efficient exploration that leverages previously learned tasks. We exploit the fact that the same system is used across many tasks and build a generative model for exploration based on data from previously solved tasks to improve learning new tasks. The approach also enables continuous learning of improved exploration strategies as novel tasks are learned. Extensive simulations on a robot manipulator performing a variety of motion and contact interaction tasks demonstrate the capabilities of the approach. In particular, our experiments suggest that the exploration strategy can more than double learning speed, especially when rewards are sparse. Moreover, the algorithm is robust to task variations and parameter tuning, making it beneficial for complex robotic problems.

mg

DOI [BibTex]

DOI [BibTex]


A Learnable Safety Measure
A Learnable Safety Measure

Heim, S., Rohr, A. V., Trimpe, S., Badri-Spröwitz, A.

Conference on Robot Learning, November 2019 (conference) Accepted

dlg ics

Arxiv [BibTex]

Arxiv [BibTex]


no image
Robust Humanoid Locomotion Using Trajectory Optimization and Sample-Efficient Learning

Yeganegi, M. H., Khadiv, M., Moosavian, S. A. A., Zhu, J., Prete, A. D., Righetti, L.

Proceedings International Conference on Humanoid Robots, IEEE, 2019 IEEE-RAS International Conference on Humanoid Robots, October 2019 (conference)

Abstract
Trajectory optimization (TO) is one of the most powerful tools for generating feasible motions for humanoid robots. However, including uncertainties and stochasticity in the TO problem to generate robust motions can easily lead to intractable problems. Furthermore, since the models used in TO have always some level of abstraction, it can be hard to find a realistic set of uncertainties in the model space. In this paper we leverage a sample-efficient learning technique (Bayesian optimization) to robustify TO for humanoid locomotion. The main idea is to use data from full-body simulations to make the TO stage robust by tuning the cost weights. To this end, we split the TO problem into two phases. The first phase solves a convex optimization problem for generating center of mass (CoM) trajectories based on simplified linear dynamics. The second stage employs iterative Linear-Quadratic Gaussian (iLQG) as a whole-body controller to generate full body control inputs. Then we use Bayesian optimization to find the cost weights to use in the first stage that yields robust performance in the simulation/experiment, in the presence of different disturbance/uncertainties. The results show that the proposed approach is able to generate robust motions for different sets of disturbances and uncertainties.

mg

https://arxiv.org/abs/1907.04616 link (url) [BibTex]

https://arxiv.org/abs/1907.04616 link (url) [BibTex]


How do people learn how to plan?
How do people learn how to plan?

Jain, Y. R., Gupta, S., Rakesh, V., Dayan, P., Callaway, F., Lieder, F.

Conference on Cognitive Computational Neuroscience, September 2019 (conference)

Abstract
How does the brain learn how to plan? We reverse-engineer people's underlying learning mechanisms by combining rational process models of cognitive plasticity with recently developed empirical methods that allow us to trace the temporal evolution of people's planning strategies. We find that our Learned Value of Computation model (LVOC) accurately captures people's average learning curve. However, there were also substantial individual differences in metacognitive learning that are best understood in terms of multiple different learning mechanisms-including strategy selection learning. Furthermore, we observed that LVOC could not fully capture people's ability to adaptively decide when to stop planning. We successfully extended the LVOC model to address these discrepancies. Our models broadly capture people's ability to improve their decision mechanisms and represent a significant step towards reverse-engineering how the brain learns increasingly effective cognitive strategies through its interaction with the environment.

re

How do people learn to plan? How do people learn to plan? [BibTex]

How do people learn to plan? How do people learn to plan? [BibTex]


no image
Testing Computational Models of Goal Pursuit

Mohnert, F., Tosic, M., Lieder, F.

CCN2019, September 2019 (conference)

Abstract
Goals are essential to human cognition and behavior. But how do we pursue them? To address this question, we model how capacity limits on planning and attention shape the computational mechanisms of human goal pursuit. We test the predictions of a simple model based on previous theories in a behavioral experiment. The results show that to fully capture how people pursue their goals it is critical to account for people’s limited attention in addition to their limited planning. Our findings elucidate the cognitive constraints that shape human goal pursuit and point to an improved model of human goal pursuit that can reliably predict which goals a person will achieve and which goals they will struggle to pursue effectively.

re

link (url) DOI Project Page [BibTex]


Predictive Triggering for Distributed Control of Resource Constrained Multi-agent Systems
Predictive Triggering for Distributed Control of Resource Constrained Multi-agent Systems

Mastrangelo, J. M., Baumann, D., Trimpe, S.

In Proceedings of the 8th IFAC Workshop on Distributed Estimation and Control in Networked Systems, pages: 79-84, 8th IFAC Workshop on Distributed Estimation and Control in Networked Systems (NecSys), September 2019 (inproceedings)

ics

arXiv PDF DOI [BibTex]

arXiv PDF DOI [BibTex]


no image
Measuring How People Learn How to Plan

Jain, Y. R., Callaway, F., Lieder, F.

Proceedings 41st Annual Meeting of the Cognitive Science Society, pages: 1956-1962, CogSci2019, 41st Annual Meeting of the Cognitive Science Society, July 2019 (conference)

Abstract
The human mind has an unparalleled ability to acquire complex cognitive skills, discover new strategies, and refine its ways of thinking and decision-making; these phenomena are collectively known as cognitive plasticity. One important manifestation of cognitive plasticity is learning to make better–more far-sighted–decisions via planning. A serious obstacle to studying how people learn how to plan is that cognitive plasticity is even more difficult to observe than cognitive strategies are. To address this problem, we develop a computational microscope for measuring cognitive plasticity and validate it on simulated and empirical data. Our approach employs a process tracing paradigm recording signatures of human planning and how they change over time. We then invert a generative model of the recorded changes to infer the underlying cognitive plasticity. Our computational microscope measures cognitive plasticity significantly more accurately than simpler approaches, and it correctly detected the effect of an external manipulation known to promote cognitive plasticity. We illustrate how computational microscopes can be used to gain new insights into the time course of metacognitive learning and to test theories of cognitive development and hypotheses about the nature of cognitive plasticity. Future work will leverage our computational microscope to reverse-engineer the learning mechanisms enabling people to acquire complex cognitive skills such as planning and problem solving.

re

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


no image
Extending Rationality

Pothos, E. M., Busemeyer, J. R., Pleskac, T., Yearsley, J. M., Tenenbaum, J. B., Goodman, N. D., Tessler, M. H., Griffiths, T. L., Lieder, F., Hertwig, R., Pachur, T., Leuker, C., Shiffrin, R. M.

Proceedings of the 41st Annual Conference of the Cognitive Science Society, pages: 39-40, CogSci 2019, July 2019 (conference)

re

Proceedings of the 41st Annual Conference of the Cognitive Science Society [BibTex]

Proceedings of the 41st Annual Conference of the Cognitive Science Society [BibTex]


How should we incentivize learning? An optimal feedback mechanism for educational games and online courses
How should we incentivize learning? An optimal feedback mechanism for educational games and online courses

Xu, L., Wirzberger, M., Lieder, F.

41st Annual Meeting of the Cognitive Science Society, July 2019 (conference)

Abstract
Online courses offer much-needed opportunities for lifelong self-directed learning, but people rarely follow through on their noble intentions to complete them. To increase student retention educational software often uses game elements to motivate students to engage in and persist in learning activities. However, gamification only works when it is done properly, and there is currently no principled method that educational software could use to achieve this. We develop a principled feedback mechanism for encouraging good study choices and persistence in self-directed learning environments. Rather than giving performance feedback, our method rewards the learner's efforts with optimal brain points that convey the value of practice. To derive these optimal brain points, we applied the theory of optimal gamification to a mathematical model of skill acquisition. In contrast to hand-designed incentive structures, optimal brain points are constructed in such a way that the incentive system cannot be gamed. Evaluating our method in a behavioral experiment, we find that optimal brain points significantly increased the proportion of participants who instead of exploiting an inefficient skill they already knew-attempted to learn a difficult but more efficient skill, persisted through failure, and succeeded to master the new skill. Our method provides a principled approach to designing incentive structures and feedback mechanisms for educational games and online courses. We are optimistic that optimal brain points will prove useful for increasing student retention and helping people overcome the motivational obstacles that stand in the way of self-directed lifelong learning.

re

link (url) Project Page [BibTex]


no image
What’s in the Adaptive Toolbox and How Do People Choose From It? Rational Models of Strategy Selection in Risky Choice

Mohnert, F., Pachur, T., Lieder, F.

41st Annual Meeting of the Cognitive Science Society, July 2019 (conference)

Abstract
Although process data indicates that people often rely on various (often heuristic) strategies to choose between risky options, our models of heuristics cannot predict people's choices very accurately. To address this challenge, it has been proposed that people adaptively choose from a toolbox of simple strategies. But which strategies are contained in this toolbox? And how do people decide when to use which decision strategy? Here, we develop a model according to which each person selects decisions strategies rationally from their personal toolbox; our model allows one to infer which strategies are contained in the cognitive toolbox of an individual decision-maker and specifies when she will use which strategy. Using cross-validation on an empirical data set, we find that this rational model of strategy selection from a personal adaptive toolbox predicts people's choices better than any single strategy (even when it is allowed to vary across participants) and better than previously proposed toolbox models. Our model comparisons show that both inferring the toolbox and rational strategy selection are critical for accurately predicting people's risky choices. Furthermore, our model-based data analysis reveals considerable individual differences in the set of strategies people are equipped with and how they choose among them; these individual differences could partly explain why some people make better choices than others. These findings represent an important step towards a complete formalization of the notion that people select their cognitive strategies from a personal adaptive toolbox.

re

link (url) [BibTex]


no image
Measuring How People Learn How to Plan

Jain, Y. R., Callaway, F., Lieder, F.

pages: 357-361, RLDM 2019, July 2019 (conference)

Abstract
The human mind has an unparalleled ability to acquire complex cognitive skills, discover new strategies, and refine its ways of thinking and decision-making; these phenomena are collectively known as cognitive plasticity. One important manifestation of cognitive plasticity is learning to make better – more far-sighted – decisions via planning. A serious obstacle to studying how people learn how to plan is that cognitive plasticity is even more difficult to observe than cognitive strategies are. To address this problem, we develop a computational microscope for measuring cognitive plasticity and validate it on simulated and empirical data. Our approach employs a process tracing paradigm recording signatures of human planning and how they change over time. We then invert a generative model of the recorded changes to infer the underlying cognitive plasticity. Our computational microscope measures cognitive plasticity significantly more accurately than simpler approaches, and it correctly detected the effect of an external manipulation known to promote cognitive plasticity. We illustrate how computational microscopes can be used to gain new insights into the time course of metacognitive learning and to test theories of cognitive development and hypotheses about the nature of cognitive plasticity. Future work will leverage our computational microscope to reverse-engineer the learning mechanisms enabling people to acquire complex cognitive skills such as planning and problem solving.

re

link (url) [BibTex]

link (url) [BibTex]


Event-triggered Pulse Control with Model Learning (if Necessary)
Event-triggered Pulse Control with Model Learning (if Necessary)

Baumann, D., Solowjow, F., Johansson, K. H., Trimpe, S.

In Proceedings of the American Control Conference, pages: 792-797, American Control Conference (ACC), July 2019 (inproceedings)

ics

arXiv PDF Project Page [BibTex]

arXiv PDF Project Page [BibTex]


no image
A Cognitive Tutor for Helping People Overcome Present Bias

Lieder, F., Callaway, F., Jain, Y. R., Krueger, P. M., Das, P., Gul, S., Griffiths, T. L.

RLDM 2019, July 2019, Falk Lieder and Frederick Callaway contributed equally to this publication. (conference)

Abstract
People's reliance on suboptimal heuristics gives rise to a plethora of cognitive biases in decision-making including the present bias, which denotes people's tendency to be overly swayed by an action's immediate costs/benefits rather than its more important long-term consequences. One approach to helping people overcome such biases is to teach them better decision strategies. But which strategies should we teach them? And how can we teach them effectively? Here, we leverage an automatic method for discovering rational heuristics and insights into how people acquire cognitive skills to develop an intelligent tutor that teaches people how to make better decisions. As a proof of concept, we derive the optimal planning strategy for a simple model of situations where people fall prey to the present bias. Our cognitive tutor teaches people this optimal planning strategy by giving them metacognitive feedback on how they plan in a 3-step sequential decision-making task. Our tutor's feedback is designed to maximally accelerate people's metacognitive reinforcement learning towards the optimal planning strategy. A series of four experiments confirmed that training with the cognitive tutor significantly reduced present bias and improved people's decision-making competency: Experiment 1 demonstrated that the cognitive tutor's feedback can help participants discover far-sighted planning strategies. Experiment 2 found that this training effect transfers to more complex environments. Experiment 3 found that these transfer effects are retained for at least 24 hours after the training. Finally, Experiment 4 found that practicing with the cognitive tutor can have additional benefits over being told the strategy in words. The results suggest that promoting metacognitive reinforcement learning with optimal feedback is a promising approach to improving the human mind.

re

DOI [BibTex]

DOI [BibTex]


Data-driven inference of passivity properties via Gaussian process optimization
Data-driven inference of passivity properties via Gaussian process optimization

Romer, A., Trimpe, S., Allgöwer, F.

In Proceedings of the European Control Conference, European Control Conference (ECC), June 2019 (inproceedings)

ics

PDF [BibTex]

PDF [BibTex]


no image
Introducing the Decision Advisor: A simple online tool that helps people overcome cognitive biases and experience less regret in real-life decisions

lawama, G., Greenberg, S., Moore, D., Lieder, F.

40th Annual Meeting of the Society for Judgement and Decision Making, June 2019 (conference)

Abstract
Cognitive biases shape many decisions people come to regret. To help people overcome these biases, Clear-erThinking.org developed a free online tool, called the Decision Advisor (https://programs.clearerthinking.org/decisionmaker.html). The Decision Advisor assists people in big real-life decisions by prompting them to generate more alternatives, guiding them to evaluate their alternatives according to principles of decision analysis, and educates them about pertinent biases while they are making their decision. In a within-subjects experiment, 99 participants reported significantly fewer biases and less regret for a decision supported by the Decision Advisor than for a previous unassisted decision.

re

DOI [BibTex]

DOI [BibTex]


Trajectory-Based Off-Policy Deep Reinforcement Learning
Trajectory-Based Off-Policy Deep Reinforcement Learning

Doerr, A., Volpp, M., Toussaint, M., Trimpe, S., Daniel, C.

In Proceedings of the International Conference on Machine Learning (ICML), International Conference on Machine Learning (ICML), June 2019 (inproceedings)

Abstract
Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently get stuck in local optima. This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter space with deterministic behavioral policies. The resulting objective is amenable to standard neural network optimization strategies like stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo. Incorporation of previous rollouts via importance sampling greatly improves data-efficiency, whilst stochastic optimization schemes facilitate the escape from local optima. We evaluate the proposed approach on a series of continuous control benchmark tasks. The results show that the proposed algorithm is able to successfully and reliably learn solutions using fewer system interactions than standard policy gradient methods.

ics

arXiv PDF [BibTex]

arXiv PDF [BibTex]


no image
The Goal Characteristics (GC) questionannaire: A comprehensive measure for goals’ content, attainability, interestingness, and usefulness

Iwama, G., Wirzberger, M., Lieder, F.

40th Annual Meeting of the Society for Judgement and Decision Making, June 2019 (conference)

Abstract
Many studies have investigated how goal characteristics affect goal achievement. However, most of them considered only a small number of characteristics and the psychometric properties of their measures remains unclear. To overcome these limitations, we developed and validated a comprehensive questionnaire of goal characteristics with four subscales - measuring the goal’s content, attainability, interestingness, and usefulness respectively. 590 participants completed the questionnaire online. A confirmatory factor analysis supported the four subscales and their structure. The GC questionnaire (https://osf.io/qfhup) can be easily applied to investigate goal setting, pursuit and adjustment in a wide range of contexts.

re

DOI [BibTex]


Accurate Vision-based Manipulation through Contact Reasoning
Accurate Vision-based Manipulation through Contact Reasoning

Kloss, A., Bauza, M., Wu, J., Tenenbaum, J. B., Rodriguez, A., Bohg, J.

In International Conference on Robotics and Automation, May 2019 (inproceedings) Accepted

Abstract
Planning contact interactions is one of the core challenges of many robotic tasks. Optimizing contact locations while taking dynamics into account is computationally costly and in only partially observed environments, executing contact-based tasks often suffers from low accuracy. We present an approach that addresses these two challenges for the problem of vision-based manipulation. First, we propose to disentangle contact from motion optimization. Thereby, we improve planning efficiency by focusing computation on promising contact locations. Second, we use a hybrid approach for perception and state estimation that combines neural networks with a physically meaningful state representation. In simulation and real-world experiments on the task of planar pushing, we show that our method is more efficient and achieves a higher manipulation accuracy than previous vision-based approaches.

am

Video link (url) [BibTex]

Video link (url) [BibTex]


no image
Efficient Humanoid Contact Planning using Learned Centroidal Dynamics Prediction

Lin, Y., Ponton, B., Righetti, L., Berenson, D.

International Conference on Robotics and Automation (ICRA), pages: 5280-5286, IEEE, May 2019 (conference)

mg

DOI [BibTex]

DOI [BibTex]


Learning Latent Space Dynamics for Tactile Servoing
Learning Latent Space Dynamics for Tactile Servoing

Sutanto, G., Ratliff, N., Sundaralingam, B., Chebotar, Y., Su, Z., Handa, A., Fox, D.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2019, IEEE, International Conference on Robotics and Automation, May 2019 (inproceedings) Accepted

am

pdf video [BibTex]

pdf video [BibTex]


Leveraging Contact Forces for Learning to Grasp
Leveraging Contact Forces for Learning to Grasp

Merzic, H., Bogdanovic, M., Kappler, D., Righetti, L., Bohg, J.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) 2019, IEEE, International Conference on Robotics and Automation, May 2019 (inproceedings)

Abstract
Grasping objects under uncertainty remains an open problem in robotics research. This uncertainty is often due to noisy or partial observations of the object pose or shape. To enable a robot to react appropriately to unforeseen effects, it is crucial that it continuously takes sensor feedback into account. While visual feedback is important for inferring a grasp pose and reaching for an object, contact feedback offers valuable information during manipulation and grasp acquisition. In this paper, we use model-free deep reinforcement learning to synthesize control policies that exploit contact sensing to generate robust grasping under uncertainty. We demonstrate our approach on a multi-fingered hand that exhibits more complex finger coordination than the commonly used two- fingered grippers. We conduct extensive experiments in order to assess the performance of the learned policies, with and without contact sensing. While it is possible to learn grasping policies without contact sensing, our results suggest that contact feedback allows for a significant improvement of grasping robustness under object pose uncertainty and for objects with a complex shape.

am mg

video arXiv [BibTex]

video arXiv [BibTex]


Feedback Control Goes Wireless: Guaranteed Stability over Low-power Multi-hop Networks
Feedback Control Goes Wireless: Guaranteed Stability over Low-power Multi-hop Networks

(Best Paper Award)

Mager, F., Baumann, D., Jacob, R., Thiele, L., Trimpe, S., Zimmerling, M.

In Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems, pages: 97-108, 10th ACM/IEEE International Conference on Cyber-Physical Systems, April 2019 (inproceedings)

Abstract
Closing feedback loops fast and over long distances is key to emerging applications; for example, robot motion control and swarm coordination require update intervals below 100 ms. Low-power wireless is preferred for its flexibility, low cost, and small form factor, especially if the devices support multi-hop communication. Thus far, however, closed-loop control over multi-hop low-power wireless has only been demonstrated for update intervals on the order of multiple seconds. This paper presents a wireless embedded system that tames imperfections impairing control performance such as jitter or packet loss, and a control design that exploits the essential properties of this system to provably guarantee closed-loop stability for linear dynamic systems. Using experiments on a testbed with multiple cart-pole systems, we are the first to demonstrate the feasibility and to assess the performance of closed-loop control and coordination over multi-hop low-power wireless for update intervals from 20 ms to 50 ms.

ics

arXiv PDF DOI Project Page [BibTex]

arXiv PDF DOI Project Page [BibTex]


no image
Remediating Cognitive Decline with Cognitive Tutors

Das, P., Callaway, F., Griffiths, T. L., Lieder, F.

RLDM 2019, 2019 (conference)

Abstract
As people age, their cognitive abilities tend to deteriorate, including their ability to make complex plans. To remediate this cognitive decline, many commercial brain training programs target basic cognitive capacities, such as working memory. We have recently developed an alternative approach: intelligent tutors that teach people cognitive strategies for making the best possible use of their limited cognitive resources. Here, we apply this approach to improve older adults' planning skills. In a process-tracing experiment we found that the decline in planning performance may be partly because older adults use less effective planning strategies. We also found that, with practice, both older and younger adults learned more effective planning strategies from experience. But despite these gains there was still room for improvement-especially for older people. In a second experiment, we let older and younger adults train their planning skills with an intelligent cognitive tutor that teaches optimal planning strategies via metacognitive feedback. We found that practicing planning with this intelligent tutor allowed older adults to catch up to their younger counterparts. These findings suggest that intelligent tutors that teach clever cognitive strategies can help aging decision-makers stay sharp.

re

DOI [BibTex]

DOI [BibTex]

2013


Probabilistic Object Tracking Using a Range Camera
Probabilistic Object Tracking Using a Range Camera

Wüthrich, M., Pastor, P., Kalakrishnan, M., Bohg, J., Schaal, S.

In IEEE/RSJ International Conference on Intelligent Robots and Systems, pages: 3195-3202, IEEE, November 2013 (inproceedings)

Abstract
We address the problem of tracking the 6-DoF pose of an object while it is being manipulated by a human or a robot. We use a dynamic Bayesian network to perform inference and compute a posterior distribution over the current object pose. Depending on whether a robot or a human manipulates the object, we employ a process model with or without knowledge of control inputs. Observations are obtained from a range camera. As opposed to previous object tracking methods, we explicitly model self-occlusions and occlusions from the environment, e.g, the human or robotic hand. This leads to a strongly non-linear observation model and additional dependencies in the Bayesian network. We employ a Rao-Blackwellised particle filter to compute an estimate of the object pose at every time step. In a set of experiments, we demonstrate the ability of our method to accurately and robustly track the object pose in real-time while it is being manipulated by a human or a robot.

am

arXiv Video Code Video DOI Project Page [BibTex]

2013


arXiv Video Code Video DOI Project Page [BibTex]


Hypothesis Testing Framework for Active Object Detection
Hypothesis Testing Framework for Active Object Detection

Sankaran, B., Atanasov, N., Le Ny, J., Koletschka, T., Pappas, G., Daniilidis, K.

In IEEE International Conference on Robotics and Automation (ICRA), May 2013, clmc (inproceedings)

Abstract
One of the central problems in computer vision is the detection of semantically important objects and the estimation of their pose. Most of the work in object detection has been based on single image processing and its performance is limited by occlusions and ambiguity in appearance and geometry. This paper proposes an active approach to object detection by controlling the point of view of a mobile depth camera. When an initial static detection phase identifies an object of interest, several hypotheses are made about its class and orientation. The sensor then plans a sequence of view-points, which balances the amount of energy used to move with the chance of identifying the correct hypothesis. We formulate an active M-ary hypothesis testing problem, which includes sensor mobility, and solve it using a point-based approximate POMDP algorithm. The validity of our approach is verified through simulation and experiments with real scenes captured by a kinect sensor. The results suggest a significant improvement over static object detection.

am

pdf [BibTex]

pdf [BibTex]


no image
Action and Goal Related Decision Variables Modulate the Competition Between Multiple Potential Targets

Enachescu, V, Christopoulos, Vassilios N, Schrater, P. R., Schaal, S.

In Abstracts of Neural Control of Movement Conference (NCM 2013), February 2013 (inproceedings)

am

[BibTex]

[BibTex]


The functional role of automatic body response in shaping voluntary actions based on muscle synergy theory
The functional role of automatic body response in shaping voluntary actions based on muscle synergy theory

Alnajjar, F. S., Berenz, V., Shimoda, S.

In Neural Engineering (NER), 2013 6th International IEEE/EMBS Conference on, pages: 1230-1233, 2013 (inproceedings)

am

DOI [BibTex]

DOI [BibTex]


Coaching robots with biosignals based on human affective social behaviors
Coaching robots with biosignals based on human affective social behaviors

Suzuki, K., Gruebler, A., Berenz, V.

In ACM/IEEE International Conference on Human-Robot Interaction, HRI 2013, Tokyo, Japan, March 3-6, 2013, pages: 419-420, 2013 (inproceedings)

am

link (url) [BibTex]

link (url) [BibTex]


Fusing visual and tactile sensing for 3-D object reconstruction while grasping
Fusing visual and tactile sensing for 3-D object reconstruction while grasping

Ilonen, J., Bohg, J., Kyrki, V.

In IEEE International Conference on Robotics and Automation (ICRA), pages: 3547-3554, 2013 (inproceedings)

Abstract
In this work, we propose to reconstruct a complete 3-D model of an unknown object by fusion of visual and tactile information while the object is grasped. Assuming the object is symmetric, a first hypothesis of its complete 3-D shape is generated from a single view. This initial model is used to plan a grasp on the object which is then executed with a robotic manipulator equipped with tactile sensors. Given the detected contacts between the fingers and the object, the full object model including the symmetry parameters can be refined. This refined model will then allow the planning of more complex manipulation tasks. The main contribution of this work is an optimal estimation approach for the fusion of visual and tactile data applying the constraint of object symmetry. The fusion is formulated as a state estimation problem and solved with an iterative extended Kalman filter. The approach is validated experimentally using both artificial and real data from two different robotic platforms.

am

DOI Project Page [BibTex]

DOI Project Page [BibTex]


no image
AGILITY – Dynamic Full Body Locomotion and Manipulation with Autonomous Legged Robots

Hutter, M., Bloesch, M., Buchli, J., Semini, C., Bazeille, S., Righetti, L., Bohg, J.

In 2013 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pages: 1-4, IEEE, Linköping, Sweden, 2013 (inproceedings)

mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Controllability and Resource-Rational Planning

Lieder, F., Goodman, N. D., Huys, Q. J.

In Computational and Systems Neuroscience (Cosyne), pages: 112, 2013 (inproceedings)

Abstract
Learned helplessness experiments involving controllable vs. uncontrollable stressors have shown that the perceived ability to control events has profound consequences for decision making. Normative models of decision making, however, do not naturally incorporate knowledge about controllability, and previous approaches to incorporating it have led to solutions with biologically implausible computational demands [1,2]. Intuitively, controllability bounds the differential rewards for choosing one strategy over another, and therefore believing that the environment is uncontrollable should reduce one’s willingness to invest time and effort into choosing between options. Here, we offer a normative, resource-rational account of the role of controllability in trading mental effort for expected gain. In this view, the brain not only faces the task of solving Markov decision problems (MDPs), but it also has to optimally allocate its finite computational resources to solve them efficiently. This joint problem can itself be cast as a MDP [3], and its optimal solution respects computational constraints by design. We start with an analytic characterisation of the influence of controllability on the use of computational resources. We then replicate previous results on the effects of controllability on the differential value of exploration vs. exploitation, showing that these are also seen in a cognitively plausible regime of computational complexity. Third, we find that controllability makes computation valuable, so that it is worth investing more mental effort the higher the subjective controllability. Fourth, we show that in this model the perceived lack of control (helplessness) replicates empirical findings [4] whereby patients with major depressive disorder are less likely to repeat a choice that led to a reward, or to avoid a choice that led to a loss. Finally, the model makes empirically testable predictions about the relationship between reaction time and helplessness.

re

[BibTex]

[BibTex]


no image
Learned helplessness and generalization

Lieder, F., Goodman, N. D., Huys, Q. J. M.

In 35th Annual Conference of the Cognitive Science Society, 2013 (inproceedings)

re

[BibTex]

[BibTex]


no image
Learning Objective Functions for Manipulation

Kalakrishnan, M., Pastor, P., Righetti, L., Schaal, S.

In 2013 IEEE International Conference on Robotics and Automation, IEEE, Karlsruhe, Germany, 2013 (inproceedings)

Abstract
We present an approach to learning objective functions for robotic manipulation based on inverse reinforcement learning. Our path integral inverse reinforcement learning algorithm can deal with high-dimensional continuous state-action spaces, and only requires local optimality of demonstrated trajectories. We use L 1 regularization in order to achieve feature selection, and propose an efficient algorithm to minimize the resulting convex objective function. We demonstrate our approach by applying it to two core problems in robotic manipulation. First, we learn a cost function for redundancy resolution in inverse kinematics. Second, we use our method to learn a cost function over trajectories, which is then used in optimization-based motion planning for grasping and manipulation tasks. Experimental results show that our method outperforms previous algorithms in high-dimensional settings.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Reverse-Engineering Resource-Efficient Algorithms

Lieder, F., Goodman, N. D., Griffiths, T. L.

In NIPS Workshop Resource-Efficient Machine Learning, 2013 (inproceedings)

re

[BibTex]

[BibTex]


no image
Learning Task Error Models for Manipulation

Pastor, P., Kalakrishnan, M., Binney, J., Kelly, J., Righetti, L., Sukhatme, G. S., Schaal, S.

In 2013 IEEE Conference on Robotics and Automation, IEEE, Karlsruhe, Germany, 2013 (inproceedings)

Abstract
Precise kinematic forward models are important for robots to successfully perform dexterous grasping and manipulation tasks, especially when visual servoing is rendered infeasible due to occlusions. A lot of research has been conducted to estimate geometric and non-geometric parameters of kinematic chains to minimize reconstruction errors. However, kinematic chains can include non-linearities, e.g. due to cable stretch and motor-side encoders, that result in significantly different errors for different parts of the state space. Previous work either does not consider such non-linearities or proposes to estimate non-geometric parameters of carefully engineered models that are robot specific. We propose a data-driven approach that learns task error models that account for such unmodeled non-linearities. We argue that in the context of grasping and manipulation, it is sufficient to achieve high accuracy in the task relevant state space. We identify this relevant state space using previously executed joint configurations and learn error corrections for those. Therefore, our system is developed to generate subsequent executions that are similar to previous ones. The experiments show that our method successfully captures the non-linearities in the head kinematic chain (due to a counterbalancing spring) and the arm kinematic chains (due to cable stretch) of the considered experimental platform, see Fig. 1. The feasibility of the presented error learning approach has also been evaluated in independent DARPA ARM-S testing contributing to successfully complete 67 out of 72 grasping and manipulation tasks.

am mg

link (url) DOI [BibTex]

link (url) DOI [BibTex]

2011


Mind the gap - robotic grasping under incomplete observation
Mind the gap - robotic grasping under incomplete observation

Bohg, J., Johnson-Roberson, M., Leon, B., Felip, J., Gratal, X., Bergstrom, N., Kragic, D., Morales, A.

In Robotics and Automation (ICRA), 2011 IEEE International Conference on, pages: 686-693, May 2011 (inproceedings)

Abstract
We consider the problem of grasp and manipulation planning when the state of the world is only partially observable. Specifically, we address the task of picking up unknown objects from a table top. The proposed approach to object shape prediction aims at closing the knowledge gaps in the robot's understanding of the world. A completed state estimate of the environment can then be provided to a simulator in which stable grasps and collision-free movements are planned. The proposed approach is based on the observation that many objects commonly in use in a service robotic scenario possess symmetries. We search for the optimal parameters of these symmetries given visibility constraints. Once found, the point cloud is completed and a surface mesh reconstructed. Quantitative experiments show that the predictions are valid approximations of the real object shape. By demonstrating the approach on two very different robotic platforms its generality is emphasized.

am

pdf video code data DOI Project Page [BibTex]

2011


pdf video code data DOI Project Page [BibTex]


no image
STOMP: Stochastic trajectory optimization for motion planning

Kalakrishnan, M., Chitta, S., Theodorou, E., Pastor, P., Schaal, S.

In IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, May 9-13, 2011, clmc (inproceedings)

Abstract
We present a new approach to motion planning using a stochastic trajectory optimization framework. The approach relies on generating noisy trajectories to explore the space around an initial (possibly infeasible) trajectory, which are then combined to produced an updated trajectory with lower cost. A cost function based on a combination of obstacle and smoothness cost is optimized in each iteration. No gradient information is required for the particular optimization algorithm that we use and so general costs for which derivatives may not be available (e.g. costs corresponding to constraints and motor torques) can be included in the cost function. We demonstrate the approach both in simulation and on a dual-arm mobile manipulation system for unconstrained and constrained tasks. We experimentally show that the stochastic nature of STOMP allows it to overcome local minima that gradient-based optimizers like CHOMP can get stuck in.

am

link (url) Project Page [BibTex]

link (url) Project Page [BibTex]


no image
Development of a Low-Pressure Fluidic Servo-Valve for Wearable Haptic Interfaces and Lightweight Robotic Systems"

Folgheraiter, M., Jordan, M., Benitez, L. M. V., Grimminger, F., Schmidt, S., Albiez, J., Kirchner, F.

In Informatics in Control, Automation and Robotics, pages: 239-252, Springer Berlin Heidelberg, Berlin, Heidelberg, 2011 (inproceedings)

Abstract
This document presents a low-pressure servo-valve specifically designed for haptic interfaces and lightweight robotic applications. The device is able to work with hydraulic and pneumatic fluidic sources, operating within a pressure range of (0{\thinspace}−{\thinspace}50 {\textperiodcentered}105Pa). All sensors and electronics were integrated inside the body of the valve, reducing the need for external circuits. Positioning repeatability as well as the capability to fine modulate the hydraulic flow were measured and verified. Furthermore, the static and dynamic behavior of the valve were evaluated for different working conditions, and a non-linear model identified using a recursive Hammerstein-Wiener parameter adaptation algorithm.

am

DOI [BibTex]

DOI [BibTex]


no image
An Experimental Demonstration of a Distributed and Event-based State Estimation Algorithm

(Best Interactive Paper Award (top out of 450))

Trimpe, S., D’Andrea, R.

In Proceedings of the 18th IFAC World Congress, 2011 (inproceedings)

am ics

PDF DOI [BibTex]

PDF DOI [BibTex]


no image
Path Integral Control and Bounded Rationality

Braun, D. A., Ortega, P. A., Theodorou, E., Schaal, S.

In IEEE Symposium on Adaptive Dynamic Programming And Reinforcement Learning (ADPRL), 2011, clmc (inproceedings)

Abstract
Path integral methods [7], [15],[1] have recently been shown to be applicable to a very general class of optimal control problems. Here we examine the path integral formalism from a decision-theoretic point of view, since an optimal controller can always be regarded as an instance of a perfectly rational decision-maker that chooses its actions so as to maximize its expected utility [8]. The problem with perfect rationality is, however, that finding optimal actions is often very difficult due to prohibitive computational resource costs that are not taken into account. In contrast, a bounded rational decision-maker has only limited resources and therefore needs to strike some compromise between the desired utility and the required resource costs [14]. In particular, we suggest an information-theoretic measure of resource costs that can be derived axiomatically [11]. As a consequence we obtain a variational principle for choice probabilities that trades off maximizing a given utility criterion and avoiding resource costs that arise due to deviating from initially given default choice probabilities. The resulting bounded rational policies are in general probabilistic. We show that the solutions found by the path integral formalism are such bounded rational policies. Furthermore, we show that the same formalism generalizes to discrete control problems, leading to linearly solvable bounded rational control policies in the case of Markov systems. Importantly, Bellman?s optimality principle is not presupposed by this variational principle, but it can be derived as a limit case. This suggests that the information- theoretic formalization of bounded rationality might serve as a general principle in control design that unifies a number of recently reported approximate optimal control methods both in the continuous and discrete domain.

am

PDF [BibTex]

PDF [BibTex]


no image
Skill learning and task outcome prediction for manipulation

Pastor, P., Kalakrishnan, M., Chitta, S., Theodorou, E., Schaal, S.

In IEEE International Conference on Robotics and Automation (ICRA), Shanghai, China, May 9-13, 2011, clmc (inproceedings)

Abstract
Learning complex motor skills for real world tasks is a hard problem in robotic manipulation that often requires painstaking manual tuning and design by a human expert. In this work, we present a Reinforcement Learning based approach to acquiring new motor skills from demonstration. Our approach allows the robot to learn fine manipulation skills and significantly improve its success rate and skill level starting from a possibly coarse demonstration. Our approach aims to incorporate task domain knowledge, where appropriate, by working in a space consistent with the constraints of a specific task. In addition, we also present an approach to using sensor feedback to learn a predictive model of the task outcome. This allows our system to learn the proprioceptive sensor feedback needed to monitor subsequent executions of the task online and abort execution in the event of predicted failure. We illustrate our approach using two example tasks executed with the PR2 dual-arm robot: a straight and accurate pool stroke and a box flipping task using two chopsticks as tools.

am

link (url) Project Page Project Page [BibTex]

link (url) Project Page Project Page [BibTex]


no image
An Iterative Path Integral Stochastic Optimal Control Approach for Learning Robotic Tasks

Theodorou, E., Stulp, F., Buchli, J., Schaal, S.

In Proceedings of the 18th World Congress of the International Federation of Automatic Control, 2011, clmc (inproceedings)

Abstract
Recent work on path integral stochastic optimal control theory Theodorou et al. (2010a); Theodorou (2011) has shown promising results in planning and control of nonlinear systems in high dimensional state spaces. The path integral control framework relies on the transformation of the nonlinear Hamilton Jacobi Bellman (HJB) partial differential equation (PDE) into a linear PDE and the approximation of its solution via the use of the Feynman Kac lemma. In this work, we are reviewing the generalized version of path integral stochastic optimal control formalism Theodorou et al. (2010a), used for optimal control and planing of stochastic dynamical systems with state dependent control and diffusion matrices. Moreover we present the iterative path integral control approach, the so called Policy Improvement with Path Integrals or (PI2 ) which is capable of scaling in high dimensional robotic control problems. Furthermore we present a convergence analysis of the proposed algorithm and we apply the proposed framework to a variety of robotic tasks. Finally with the goal to perform locomotion the iterative path integral control is applied for learning nonlinear limit cycle attractors with adjustable land scape.

am

PDF [BibTex]

PDF [BibTex]


Enhanced visual scene understanding through human-robot dialog
Enhanced visual scene understanding through human-robot dialog

Johnson-Roberson, M., Bohg, J., Skantze, G., Gustafson, J., Carlson, R., Rasolzadeh, B., Kragic, D.

In Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on, pages: 3342-3348, 2011 (inproceedings)

Abstract
We propose a novel human-robot-interaction framework for robust visual scene understanding. Without any a-priori knowledge about the objects, the task of the robot is to correctly enumerate how many of them are in the scene and segment them from the background. Our approach builds on top of state-of-the-art computer vision methods, generating object hypotheses through segmentation. This process is combined with a natural dialog system, thus including a `human in the loop' where, by exploiting the natural conversation of an advanced dialog system, the robot gains knowledge about ambiguous situations. We present an entropy-based system allowing the robot to detect the poorest object hypotheses and query the user for arbitration. Based on the information obtained from the human-robot dialog, the scene segmentation can be re-seeded and thereby improved. We present experimental results on real data that show an improved segmentation performance compared to segmentation without interaction.

am

pdf video DOI Project Page [BibTex]

pdf video DOI Project Page [BibTex]


Risk and gain battery management for self-docking mobile robots
Risk and gain battery management for self-docking mobile robots

Berenz, V., Suzuki, K.

In Robotics and Biomimetics (ROBIO), 2011 IEEE International Conference on, pages: 1766-1771, 2011 (inproceedings)

am

DOI [BibTex]

DOI [BibTex]


no image
Reduced Communication State Estimation for Control of an Unstable Networked Control System

Trimpe, S., D’Andrea, R.

In Proceedings of the 50th IEEE Conference on Decision and Control and European Control Conference, 2011 (inproceedings)

am ics

PDF Supplementary material DOI [BibTex]

PDF Supplementary material DOI [BibTex]


TDM: A software framework for elegant and rapid development of autonomous behaviors for humanoid robots.
TDM: A software framework for elegant and rapid development of autonomous behaviors for humanoid robots.

Berenz, V., Tanaka, F., Suzuki, K., Herink, M.

In Humanoids, pages: 179-186, IEEE, 2011 (inproceedings)

am

link (url) [BibTex]

link (url) [BibTex]


Coaching robot behavior using continuous physiological affective feedback
Coaching robot behavior using continuous physiological affective feedback

Gruebler, A., Berenz, V., Suzuki, K.

In 11th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2011), Bled, Slovenia, October 26-28, 2011, pages: 466-471, 2011 (inproceedings)

am

link (url) DOI [BibTex]

link (url) DOI [BibTex]


no image
Neuromuscular Stochastic Optimal Control of a Tendon Driven Index Finger

Theodorou, E. A., Todorov, E., Valero-Cuevas, F.

In Proceedings of American Control Conference (ACC), 2011, clmc (inproceedings)

Abstract
With the goal to build robotic hands which can reach the levels of dexterity and robustness of the hand, the question of what are the candidate control principles that can handle the nonlinearities, the high dimensionality and the internal noise of biomechanical structures of the complexity of the hand, is still open. In this work we present the first stochastic optimal feedback controller applied to a full tendon driven simulated robotic index finger. In our model we do take into account the full tendon structure of the index finger which consist of 11 tendons based on the underlying physiology and we consider muscle with the typical force - length and force velocity properties. Our feedback controller show robustness against noise and perturbation of the dynamics while it can also successfully handle the nonlinearities and high dimensionality of the robotic index finger. Furthermore as it is shown in the evaluations, it provides the complete time history of the tendon excursions and the tendon velocities of the index finger for the tasks of tapping with zero and nonzero terminal velocities.

am

PDF [BibTex]

PDF [BibTex]