The Sim to Real Transfer Problem: Overcoming the Gap Between Simulation and Reality
When transferring a machine learning model from simulation to reality, there is often a complication that arises. This is known as the Sim to real transfer problem. To overcome this challenge, researchers are exploring various approaches. One approach involves doubling down on the machine learning approach and learning a model of how the system responds to input data. This allows for the creation of a detailed understanding of the system's behavior in simulation.
Another approach involves baking the necessary parameters into a controller that learns from experience. Instead of relying on a single model, this approach involves creating a range of models that can be adjusted by changing variables within the simulator. By adding these variants, controllers can become robust against changes in parameters that are difficult to estimate accurately in simulation. These parameters are often hard to approximate and require precise estimation.
To demonstrate this concept, researchers have created a quadruped robot, called Coyote, which is equipped with sensors and actuators that allow it to navigate through obstacles. The robot's performance can be evaluated by tracking its movement, sensing data from the environment, and observing how it adapts to changing conditions. This approach allows for the creation of controllers that are not only accurate but also robust in the face of uncertainty.
A key challenge in developing such controllers is the difference between simulation-based learning and real-world experience. Simulation-based learning involves training a neural network on a dataset of examples, whereas real-world experience requires adaptation to changing environments. Researchers are exploring various approaches to overcome this gap, including online learning and distilling experience from previous interactions.
One approach that has been explored is using reinforcement learning (RL) to train controllers for the quadruped robot. This involves training an RL-based controller on a dataset of examples, allowing it to learn how to control the system and navigate through obstacles. The resulting controller can then be evaluated on real-world data, demonstrating its ability to generalize from simulation to reality.
To further demonstrate this concept, researchers have created a control system for Coyote that relies on machine learning and RL. This system receives perception information from sensors, such as joints, torque sensors, vision, and depth sensors, and uses this information to decide where to place the robot's legs and how to orchestrate its motion. The resulting controller is able to navigate through obstacles with ease, demonstrating its ability to generalize from simulation to reality.
In addition to RL-based learning, researchers are also exploring approaches that involve combining model predictive controllers with learned aspects. This approach aims to create a more comprehensive understanding of the system's behavior, allowing for more robust and accurate control in real-world environments. By combining these two approaches, researchers hope to develop controllers that can adapt to changing conditions and navigate through complex obstacles.
The article highlights the challenges of transferring machine learning models from simulation to reality and the various approaches being explored to overcome these challenges. By creating a detailed understanding of the system's behavior and developing robust controllers that can adapt to changing conditions, researchers aim to create more accurate and reliable systems for real-world applications.
"WEBVTTKind: captionsLanguage: enthis is animal in its version sea this is built by a company in Zurich and we've been working with for quite some time now that is a spin-off of the eth university in Zurich one of the main difference between working with animal and the spot robot is that we have we have access to the low level controllers in this robot well for uh for spots we don't have access to The Locomotion controller so this shout out to Boston Dynamics please make this accessible and we can do a lot more with your Hardware so today I'll I'll tell you a bit more about how we do control for legged locomotion so this is how we control the limbs of robots with legs so that they can navigate environments that are flat and beyond flat I mean the interesting parts are you know where you need to do Dynamic Maneuvers and sort of uh scale obstacles jump off boxes and whatnot currently there are two prevalent approaches to it one of them is model predictive controls and the other one that I'll focus a bit on would be reinforcement learning so that way you get the robots of work it out for itself exactly yeah that's the whole point so this is MPC MPC is kind of on the more traditional control side of the spectrum right you have models of your system and then you can in essence simulate forward your model and decide whether what your control input would result into and then that's whether what your what you're asking the robot to do is successful or Not by some Metric to see how this is done let's let's think of a robot system robot quadruped this is the robot body and it's got four legs right this is one of the four legs without going into too much detail robot moves forward let's say leg has three degrees of freedom three degrees of freedom it means that it has three ways in which it can move right it has three joints that in most cases are actuated by three motors now each degree of freedom is one variable that we need to control and for the entire system if we consider a robot with uh four legs three degrees of Venom per leg then we have 12 degrees of freedom for the joints and then we have six more division of Freedom about where the uh where the base position is with respect to the world now we model what each joint needs to do and one way to go about this is to model that as a polynomial a polynomial would be a way of representing a curve let's say against time and be that Q be the position of one joint in model predictive control we would have a set of polynomials that would orchestrate how the robot moves and then we would look at a time Horizon let's say capital T and then forward simulate the system see what it does in the environment that it navigates and then for example play out t h which is the Horizon we can Loop through that and repeat that multiple times and as time flows we sort of command the robots to do what it might and this helps us deal with the things like sensory noise actuation noise estimation uncertainty and so on drawbacks we need to have good models of the system and this is uh particularly difficult especially if we're dealing with a system that you know is operating a dynamic environment is out in the world you have things like wear and tear of Motors or of fits or actually waiters themselves you have vibrations from making and breaking contact you make and break contact all the time which is an an added complication so this is this is the kind of analytical approach the sort of more control heavy approach now on the other side we have a sort of I think it's fair to say more recent development on the machine learning side of uh of control where uh we train a neural network as everyone these days the idea here is that this being annual Network these are nodes this is input and this is output so we want to input the state of the system and get us outputs uh uh a vector of control inputs be that Torx position desired positions desirable losses and so on that accomplish a a Target that we set for the robot for the system in many cases this would be being robust to external perturbations pushes uh long story short the robot's not tumbling over not falling and stepping on where it is supposed to step the great benefit to that is that okay to start with this operates in uh with data right we would learn these controls from data and the great benefit with that is that we can simulate examples uh episodes that we learn from so we can simulate the robot moving in a in an environment that we control and we can collect data of the robot performance and the uh the controller performance in the simulation we can leave that run sort of overnight let's say or have a cluster that uses uh that forcing that can do the simulations asynchronously and then we can use this Corpus of data to learn variable bus controllers there is a complication when taking this from the simulation to the real robot this is this is known as the Sim to real transfer problem there are a couple of ways that we can we can work through that one of it is double down on the machine learning approach and learn and learn from data a model of how the system responds to our input and the other approach is to actually bake that in into a into a controller learning so instead of learning with one particular model in simulation we have a range of models that we if think of it as a range where we can sort of have knobs for each variable of the system imagine having a model of a quadruped where we can twist a knob and get longer legs or shorter legs or larger bodies we don't change the morphology so we would not change the degrees of freedom we would change the the the variables of the simulator itself so by adding these variants we can make the controller robust against changes in that parameters and these are the set of parameters that are hard to estimate from the real system and accurately approximate in the simulator so we can make our controllers or bus to variations of these parameters which means that it can transfer to uh at a somewhat different model which is the model that it's not the model it's a real system right okay so can we see some of this is that possible absolutely yeah we have one of our quadrupeds downstairs in the lab and we can show you what the robot does in in in working over some of sort of our Benchmark obstacles and we can also have a look at what the robot sees how the robot person how the robot models its surroundings and where it decides to step and so on so this is quadrupled robot right this is animal version C its name is coyote it's uh one of the robots in our lab the research that we do here with we're looking into machine learning approaches to a legged Locomotion this is quite different from traditional approaches that use for example module based control or model predictive control there is a kind of convergence in the fields at the moment we we are working and we're seeing a lot of approaches that combine model predictive controllers and and learned aspects to these controllers now sit here is going to demonstrate our RL based learning controller right this is a control that has been trained uh in simulation having a set of examples shown to a neural network that learns to control the system and after having it trained then we can run it on the robot and evaluate its performance so you've got projection now it's kind of a similar to where we're standing isn't it so what's going on there exactly so we can these are the lidar returns from the lidar sensor that said on the back of the robot and then on the robot there's four depth sensors that are distributed around the body and this is the estimate of the system of how the ground looks around its feet and with that the controller can decide where to place its uh its legs at least an estimate so there's still the robot can still stumble and sort of a place fit at the sort of opposition where it's sort of three minutes away almost yeah the goal here is to have a controller that is also robust to sleeping that plank could be wet it could be slippery yeah exactly yeah or a stumbling and so on this is animal and its version see this is built by a a company in Zurich that builds robots and we've been working with for quite quite some time now that is a spin-off of the eth university in Zurich we talked about different methods of learning upstairs the reinforcement any other model but where does that fit in with doing that then is that then you've not got a static kind of uh equation that you plug all those numbers into it works differently doesn't it yeah exactly so this is this is uh uh this annual Network that then online receives uh perception information from the joints and the torque sensors and the vision and depth sensors and then decides where to place fit and how to orchestrate the motion of all the actuators in order to be able to coordinate the system into walking over things and instead going up flights of steps and going down flights of steps and so on and so is it learning as it goes or is it is that then a it's already done the learning and then you've applying that to this yes in this particular controller this is the learning has been done and it's not learned as it goes but we're currently looking into approaches that we can uh we can distill some of the experience the online experience of the of the robot and see how we can incorporate that into into a sort of uh on La ongoing Learning System all right so this is the signature here what if we do take this and we decrypt it with the public key because remember they reverse one another and then we can sort of change their location they can start Gathering more information so by moving you actually introduce uncertainty into the motionthis is animal in its version sea this is built by a company in Zurich and we've been working with for quite some time now that is a spin-off of the eth university in Zurich one of the main difference between working with animal and the spot robot is that we have we have access to the low level controllers in this robot well for uh for spots we don't have access to The Locomotion controller so this shout out to Boston Dynamics please make this accessible and we can do a lot more with your Hardware so today I'll I'll tell you a bit more about how we do control for legged locomotion so this is how we control the limbs of robots with legs so that they can navigate environments that are flat and beyond flat I mean the interesting parts are you know where you need to do Dynamic Maneuvers and sort of uh scale obstacles jump off boxes and whatnot currently there are two prevalent approaches to it one of them is model predictive controls and the other one that I'll focus a bit on would be reinforcement learning so that way you get the robots of work it out for itself exactly yeah that's the whole point so this is MPC MPC is kind of on the more traditional control side of the spectrum right you have models of your system and then you can in essence simulate forward your model and decide whether what your control input would result into and then that's whether what your what you're asking the robot to do is successful or Not by some Metric to see how this is done let's let's think of a robot system robot quadruped this is the robot body and it's got four legs right this is one of the four legs without going into too much detail robot moves forward let's say leg has three degrees of freedom three degrees of freedom it means that it has three ways in which it can move right it has three joints that in most cases are actuated by three motors now each degree of freedom is one variable that we need to control and for the entire system if we consider a robot with uh four legs three degrees of Venom per leg then we have 12 degrees of freedom for the joints and then we have six more division of Freedom about where the uh where the base position is with respect to the world now we model what each joint needs to do and one way to go about this is to model that as a polynomial a polynomial would be a way of representing a curve let's say against time and be that Q be the position of one joint in model predictive control we would have a set of polynomials that would orchestrate how the robot moves and then we would look at a time Horizon let's say capital T and then forward simulate the system see what it does in the environment that it navigates and then for example play out t h which is the Horizon we can Loop through that and repeat that multiple times and as time flows we sort of command the robots to do what it might and this helps us deal with the things like sensory noise actuation noise estimation uncertainty and so on drawbacks we need to have good models of the system and this is uh particularly difficult especially if we're dealing with a system that you know is operating a dynamic environment is out in the world you have things like wear and tear of Motors or of fits or actually waiters themselves you have vibrations from making and breaking contact you make and break contact all the time which is an an added complication so this is this is the kind of analytical approach the sort of more control heavy approach now on the other side we have a sort of I think it's fair to say more recent development on the machine learning side of uh of control where uh we train a neural network as everyone these days the idea here is that this being annual Network these are nodes this is input and this is output so we want to input the state of the system and get us outputs uh uh a vector of control inputs be that Torx position desired positions desirable losses and so on that accomplish a a Target that we set for the robot for the system in many cases this would be being robust to external perturbations pushes uh long story short the robot's not tumbling over not falling and stepping on where it is supposed to step the great benefit to that is that okay to start with this operates in uh with data right we would learn these controls from data and the great benefit with that is that we can simulate examples uh episodes that we learn from so we can simulate the robot moving in a in an environment that we control and we can collect data of the robot performance and the uh the controller performance in the simulation we can leave that run sort of overnight let's say or have a cluster that uses uh that forcing that can do the simulations asynchronously and then we can use this Corpus of data to learn variable bus controllers there is a complication when taking this from the simulation to the real robot this is this is known as the Sim to real transfer problem there are a couple of ways that we can we can work through that one of it is double down on the machine learning approach and learn and learn from data a model of how the system responds to our input and the other approach is to actually bake that in into a into a controller learning so instead of learning with one particular model in simulation we have a range of models that we if think of it as a range where we can sort of have knobs for each variable of the system imagine having a model of a quadruped where we can twist a knob and get longer legs or shorter legs or larger bodies we don't change the morphology so we would not change the degrees of freedom we would change the the the variables of the simulator itself so by adding these variants we can make the controller robust against changes in that parameters and these are the set of parameters that are hard to estimate from the real system and accurately approximate in the simulator so we can make our controllers or bus to variations of these parameters which means that it can transfer to uh at a somewhat different model which is the model that it's not the model it's a real system right okay so can we see some of this is that possible absolutely yeah we have one of our quadrupeds downstairs in the lab and we can show you what the robot does in in in working over some of sort of our Benchmark obstacles and we can also have a look at what the robot sees how the robot person how the robot models its surroundings and where it decides to step and so on so this is quadrupled robot right this is animal version C its name is coyote it's uh one of the robots in our lab the research that we do here with we're looking into machine learning approaches to a legged Locomotion this is quite different from traditional approaches that use for example module based control or model predictive control there is a kind of convergence in the fields at the moment we we are working and we're seeing a lot of approaches that combine model predictive controllers and and learned aspects to these controllers now sit here is going to demonstrate our RL based learning controller right this is a control that has been trained uh in simulation having a set of examples shown to a neural network that learns to control the system and after having it trained then we can run it on the robot and evaluate its performance so you've got projection now it's kind of a similar to where we're standing isn't it so what's going on there exactly so we can these are the lidar returns from the lidar sensor that said on the back of the robot and then on the robot there's four depth sensors that are distributed around the body and this is the estimate of the system of how the ground looks around its feet and with that the controller can decide where to place its uh its legs at least an estimate so there's still the robot can still stumble and sort of a place fit at the sort of opposition where it's sort of three minutes away almost yeah the goal here is to have a controller that is also robust to sleeping that plank could be wet it could be slippery yeah exactly yeah or a stumbling and so on this is animal and its version see this is built by a a company in Zurich that builds robots and we've been working with for quite quite some time now that is a spin-off of the eth university in Zurich we talked about different methods of learning upstairs the reinforcement any other model but where does that fit in with doing that then is that then you've not got a static kind of uh equation that you plug all those numbers into it works differently doesn't it yeah exactly so this is this is uh uh this annual Network that then online receives uh perception information from the joints and the torque sensors and the vision and depth sensors and then decides where to place fit and how to orchestrate the motion of all the actuators in order to be able to coordinate the system into walking over things and instead going up flights of steps and going down flights of steps and so on and so is it learning as it goes or is it is that then a it's already done the learning and then you've applying that to this yes in this particular controller this is the learning has been done and it's not learned as it goes but we're currently looking into approaches that we can uh we can distill some of the experience the online experience of the of the robot and see how we can incorporate that into into a sort of uh on La ongoing Learning System all right so this is the signature here what if we do take this and we decrypt it with the public key because remember they reverse one another and then we can sort of change their location they can start Gathering more information so by moving you actually introduce uncertainty into the motion\n"