Navigation

Home

Research

Publications

Curriculum Vitae

Videos

Sourcecodes

Contact

 

Research on robot learning by imitation

Research on robot learning by imitation

Here is a description of my research concerning robot programming by demonstration and the autonomous extraction of the task constraints.

Please also visit the LASA website for a more complete description of the research conducted at the Learning Algorithms And Systems Laboratory.


You can also download printable papers in PDF from the publications section.



Robot programming by demonstration



Robot Programming by Demonstration (PbD) covers methods by which a robot learns new skills through human guidance. Also referred to as learning by imitation or apprenticeship learning, PbD takes inspiration from the way humans learn new skills by imitation to develop methods by which new skills can be transmitted to a robot.


PbD covers a broad range of applications. In industrial robotics, the goal is to reduce the time and costs required to program the robot. The rationale is that PbD would allow to modify an existing product, create several versions of a similar product or assemble new products in a very rapid way without using a teach pendant or a computer language. This could then be done by lay users without help from an expert in robotics.


PbD is perceived as particularly useful to service robots, i.e. robots deemed to work in direct collaboration with humans. In this case, methods for PbD go beyond transferring skills and offer new ways for the robot to interact with the human, from being capable of recognizing people's motion to predicting their intention and seconding them in the accomplishment of complex tasks. As the technology improved to provide these robots with more and more complex hardware, including multiple sensor modalities and numerous degrees of freedom, robot control and especially robot learning became more and more complex too.


Learning control strategies for numerous degrees of freedom platforms deemed to interact in complex and variable environments, such as households is faced with two key challenges: first, the complexity of the tasks to be learned is such that pure trial and error learning would be too slow. PbD appears thus a good approach to speed up learning by reducing the search space, while still allowing the robot to refine its model of the demonstration through trial and error. Second, there should be a continuum between learning and control, so that control strategies can adapt on the fly to drastic changes in the environment. The present work addresses both challenges in investigating methods by which PbD is used to learn the dynamics of robot's motion, and, by so doing, provide the robot with a generic and adaptive model of control.



Generalization of a skill through observation



An efficient way to transfer new skills to robots is to provide the robot with the ability to learn through imitation and to generalize the learned skills to different contexts. When observing a human demonstrator performing a gesture, the robot needs to identify which parts of the complete motion are essential for the reproduction of the skill, and which ones may be reproduced differently, e.g., by deviating from the original observed gesture or by using different means to fulfill the task requirements.



Continuous encoding of the task constraints in a probabilistic framework



While several approaches in Robot Programming by Demonstration (PbD) represented a skill as a sequence of discrete events that are a priori defined by the user, our work suggest to adopt a more general perspective where the skill is encoded in a continuous way (at a trajectory level) within a probabilistic framework. Representing the skill at such low level may indeed be advantageous when learning skills that are not specified in advance. It also avoids the need to segment the whole motion into only two types of behaviours (the actions that are relevant for the task and the ones that are not).


In our research, we thus consider the most general stance where different levels of constraints are allowed, which can freely change during the skill. Indeed, representing the task constraints in a binary manner (relevant versus irrelevant features) is not appropriate for continuous movements. Some goals require different precisions, that is, they can be described with different degrees of invariance. For example, the movement used to drop a piece of sugar in a tiny cup of coffee is more constrained than the movement to drop a bouillon cube in a large pan.


We propose to use a probabilistic approach based on Gaussian Mixture Regression (GMR) to encode a skill at a trajectory level. We also propose generic inverse kinematics solutions that allow to take into consideration constraints both in task space and in joint space. This approach allows to consider task that combine several constraints simultaneously, e.g. when considering manipulation of objects requiring specific gestures to be manipulated (learning of the objects affordances and associated effectivities).



Learning through multiple observations



When demonstrating a skill several times, some aspects of the motion will differ and some aspects will remain similar. For example, when stacking an object on top of another object, the final position of the object is constrained by the size of these objects (if the first object is smaller than the second one, different positions are allowed that still keep the balance). Similarly, the trajectory to reach for the second object may allow more variability, but is still constrained by the obstacle or size of the working space.


This variability can be discovered by the robot through multiple observations of the skill, benefitting from the natural variability involved by the human gestures to extract the task constraints.



Learning robot controllers robust to perturbations



Most approaches to trajectory modeling estimate a time-dependent model of the trajectories, by either exploiting variants along the concept of spline decomposition or through statistical encoding of the time-space dependencies. Such modeling methods are very effective and precise in the description of the actual trajectory, and benefit from an explicit time-precedence across the motion segments to ensure precise reproduction of the task. However, the explicit time-dependency of these models require the use of other methods for realigning and scaling the trajectories to handle perturbation.


As an alternative, more recent approaches have considered modeling the intrinsic dynamics of motion. Such approaches are advantageous in that the system is time-independent and can be modulated to produce trajectories with similar dynamics in areas of the workspace not covered during training. These approaches however either assumed a basic form for the dynamical system to be learned or explored the adaptivity of the system only locally around the stable points of the system.


We propose an approach that exploits the strength of parametric statistical techniques to learn a model of the dynamics of the motions. Statistical modeling is based on Gaussian Mixture Regression (GMR). In comparison to other regression methods, Gaussian Mixture Regression does not model the regression function directly, but models a joint probability density function of the data and then derive the regression function from the density model.


Relying on the user's pedagogical skills


It is nearly always possible to extract the task constraints with only a few demonstrations (around five for most of the tasks considered in our work) by providing demonstrations where the skill is executed in slightly different situations. For example, in the stacking task example, this can be done by changing the initial positions of the objects prior to each demonstration.


This strategy shares similarities with the human way of teaching where a good teacher will provide several examples in different contexts to transfer the skill more easily. Similarly, a good teacher will also extend the demonstrations progressively so that the learner can more easily infer the connections between the different examples, i.e., the range of the possible situations where the skill may apply is progressively increased.


Throughout our work, we suggest that one way of increasing the speed of the teaching process is to rely on the user's natural propensity for teaching.



Designing human-robot social interaction systems




The robot's capacity to generalize over different situations depends on the number of demonstrations provided to the robot, but more importantly on the pedagogical quality of these demonstrations (gradual variability of the situations and exaggerations of the key features to reproduce). To succeed, it is therefore crucial to design human-robot interaction systems where the teacher feels implicated in the teamwork and where he/she understands his/her role in the interaction.


Compared to traditional approaches in Robot Programming by Demonstration where the demonstration phase is separated from the reproduction phase, our research tends to break down these two processes by considering a more continuous and bi-directional teaching interaction where these two processes are intertwined.



Learning through the user's support (scaffolding process)


When designing a teaching system, interactive scenarios needs to be considered where the user can guide the teaching process and where the robot may request for clarifications.


It is then important to let the user evaluate the robot's current understanding of the skill. One solution is to let the robot try to reproduce the skill after each demonstration. By observing the reproduction attempts, the user can then evaluate which important aspects of the skill the robot currently misses and can adapt his or her further demonstration to highlight this particular aspect of the knowledge.



Kinesthetic teaching



Often, some parts of the motion will be correctly reproduced by the robot while some other parts will require refinement. It would be cumbersome to demonstrate the whole motion again each time the robot needs a particular refinement for a subset of joint angles (e.g., the right arm performs the motion correctly but the left arm motion needs refinement).


To deal with this issue, we propose to mix observational learning and kinesthetic teaching as a way to support the robot while reproducing the skill. While moving the robot's arms manually, the robot record proprioceptive information on its own gesture. By moving a subset of the robot's motors during the reproduction attempt, it thus possible to provide partial demonstration of the skill for a particular situation.


The advantage of this approach is that it allows to provide demonstrations using the robot's own kinematics and to demonstrate the task in the robot's own environment. This kinesthetic teaching process also allows the user to feel the robot's body limitations and provide examples that take these limitations into consideration.