The Challenge of Artificial Intelligence: A Deep Dive into the World of AI Football
Shooting and then at even a level above that you need to have the coordination and the strategy over the whole game, it's really a challenge that has a lot of layers to it. So far, DeepMind has been teaching football not to real robots but to simulated ones computerized avatars in human form, a bit like a simplified version of the players in your favorite video game. The difference with these players is that their repertoire of movements is not pre-programmed but like the real robots they are effectively learning to move from scratch.
The point is not to have robots playing at Wembley stadium in the near future however fun that might be, we're really trying to study whether it's valuable to train these methods using reward and the competition of something like football or whether there are other ways to train for this type of behavior. Underneath that big umbrella of reinforcement learning, it's helpful to use a series of other techniques to get the agents up and running.
Here they're also using something called imitation learning which involves gathering video footage from real human football matches using motion capture to translate the movements of each player's joints into a dataset and then training a neural network so that these simulated humanoids begin to mimic the movements of real players. So this is really layering these different types of learning algorithms together, and the exciting thing is that the result in the end is four agents that can race around this field and they've really achieved this level of whole body control and also team coordination.
I recently had the chance to see my first ever video of a simulated humanoid football match and I'll do my best to commentate on the highlights. So here we are for this season's title decider between these two titans of AI football, The Blues versus Humanoids United playing in red. Well, the game's begun and Drogbot has it for the Blues, cuts inside and then chops back onto his right but look, the ball is broken free and Robo-Naldo is clear. Go well, they say a week is a long time in politics but five seconds really is an epoch in AI football as impressive as this video is right, I think it's fair to say that at certain points they are quite hilariously bad at controlling their bodies.
Yeah, I mean they are not trying to win any points on style or grace so that really lets the agent optimize for just purely trying to achieve. It doesn't matter if the arms are flailing around so you can see the problem with putting this onto a real robot. Yeah, I can see how that would be a problem with the robots, it's in the game of football that we start to see the different flavors of intelligence converge.
Training agents to play football gives them physical skills like dribbling and passing but when combined with a reinforcement learning algorithm which rewards them for team play you start to see emerging the sort of cooperative AI we heard about in the previous episode. The question is, if physical intelligence provides a path all the way to AGI (Artificial General Intelligence), it's not immediately clear. I've noticed this a lot, yes maybe not immediately. You know, if you look at evolution, it's a very long path to go from initial creatures to human beings.
Then I think also, it could be a very long path if we want to build an AGI starting from first principles of learning to move a body but that is what we are looking at. Jan Humplik also believes that it will be a long time before robotics takes us to a general form of intelligence. If I ask somebody on the street what they would be impressed by, the robot doing, they would say something like well you know maybe cleaning my apartment.
And if you start thinking about this problem, you're like okay, so it certainly needs to use vision, it certainly needs to understand human language because you need to give it command. It needs to understand what does it mean to clean the apartment and that's not trivial because cleaning doesn't mean destroying your furniture solving anything impressive like this essentially getting very close to AGI but if embodied intelligence social intelligence, and linguistic intelligence don't necessarily lead to AGI on their own.
Is there a single path that does? Some DeepMind researchers are convinced that there is and it's been staring us in the face this whole time when we say that reward is enough. We're really arguing that all of the abilities of intelligence everything from perception to knowledge to social intelligence to language can be understood as a single process of trying to increase the rewards.
That would mean that we only need to solve one problem in intelligence rather than a thousand different problems for each of the separate abilities, which is next time on the DeepMind Podcast presented by me Hannah Fry and produced by Dan Hoon at Whistle Down Productions.
"WEBVTTKind: captionsLanguage: enhello and welcome back to deepmind the podcast over the last two episodes we've been exploring deepmind's goal of solving intelligence asking what that actually means and traveling along some of the roads that could take us there this time it's all about the robots we'll be exploring the idea of physical intelligence and to do that i'll be taking you behind the scenes of the robotics lab in kings cross london you've got three humanoid robots they're black they've got a very sort of cuboid body but they have arms and legs and even little tiny heads they're quite small though they're probably the size of a large chicken yeah smaller than a goose i'm hannah fry and this is episode four let's get physical now before our robotics lab passes are activated let me fill you in on a bit of background why would a company known for getting machines to play board games and fold proteins find robotics so alluring in june last year i emerged bright eyed from my lockdown induced hibernation to visit cheltenham a pretty spa town in south west england the cheltenham science festival an annual event attracting the world's leading scientists and thinkers was the setting for my first in-person interview since the kovid 19 lockdown and as luck would have it my interviewee was rya hadsell deepmind's director of robotics ah it's very nice to be in a room full of people again freddie ryan owes pretty much all there is to know about robotics and artificial intelligence starting with the difference between them when we think about artificial intelligence a lot of the time people immediately go to a robot as being the instantiation of that ai just think about the robots you see in films c3po wally marvin their paranoid android they're all intelligent beings with robot bodies they're all able to reason about their environment and make decisions likewise our visions of super intelligent ai long into the future is rarely just a disembodied voice apart from a couple of exceptions like in the film her for example robots and ai are synonymous with one another in many people's minds but really the two should be distinguished ai is a computer program that's usually trained on a lot of data to be able to give answers to questions in a similar way that a human might so think about being able to translate from french to english to mandarin these are the types of problems that an ai might be able to do a robot on the other hand takes actions and changes the world either manipulation through touching the world and moving things around maybe doing assembly or a robot that can move itself around and then we can think about the two together as a.i being a really natural way to bring us to the next set of breakthroughs for what robots can do if you heard the first series of this podcast you'll already be familiar with the idea that robots don't necessarily come with ai built into them your dishwasher your lawn mower your pressure cooker are all in the technical sense robots they are machines that are capable of carrying out a series of actions automatically but these aren't the sorts of robots that deepmind is interested in as a route to artificial general intelligence instead their robots use machine learning techniques to learn for themselves how to perform different tasks so what does all of this look like what kind of robots are being trained to saunter around the research facilities to ground algorithmic experience in the real world and explore the absolute cutting edge of physical intelligence well why don't you come on in welcome back to the robotics lab meet akil raju a software engineer on the robotics team you can see the excitement in his eyes as he shows me around the lab even while the rest of his face is covered by a mask so this is going to be a little bigger than the last time oh gosh massive whoa yeah you know if you ever go to a trade show and they have like little stalls up in a giant space it sort of looks a little bit like that so we're in this big concrete building with lots of glass along one side and then you've got these little booths all the way along with i mean they sort of look like privacy screens but privacy for the humans exactly the robots no one cares about the process yeah inside these mini booths are robotic arms of every size and shape imaginable tall crane-like arms short and stubby ones and arms with grippers on the end like the kind you'll see in a games arcade all of these arms are part of deepmind's research into getting robots to dexterously manipulate everyday objects akeel ushered me into one of the booths to take a closer look so this big arm that is extending out of a table you know those stand mixers that you get in posh kitchens imagine one of those but like a giant version so it's kind of quite bulbous and curvaceous with all of these joints and cameras attached to it and then right on the end there's a teeny tiny key and it's i guess trying to put a key in a lock yep exactly this robot has kind of this attachment where it can insert like a usb in a usb hole or maybe a key or so on and so we're trying to learn how to actually do very like fine manipulation we're taking tasks that you might do in everyday life and we're using that as a challenge if you wanted to have one of these robots in a factory say doing this really fine insertion task why can't you just pre-program one why does it need to be something that has trained itself if it was a case where it's very fixed settings we know exactly where the key is we know exactly where the hole is then probably yeah you can just program it the thing is that's not how all factories really are a lot of factories that might require some kind of an insertion task like putting a key in a lock we'll also have a lot of variables at play so that the lock and key aren't at precisely the same start points each time and that changes the challenge from being something pre-programmable to something much harder and what you'll notice actually is when these types of insertions need to happen in a factory it's not robots that do it in the real world now it's humans and that's another reason why we chose insertions as a task because it's somewhat unsolved by the greater robotics community you might be wondering how on earth any of this is possible how do you possibly set up an inanimate robot arm to teach itself to open a lock well by now it probably won't surprise you that one of the fundamental methods for training physical intelligence is that deep mind favorite approach reinforcement learning in the simplest terms this involves rewarding an algorithm with points for accomplishing a task like correctly inserting a key into a lock and there is a reason why robotics is geared up for algorithms based on reinforcement learning here's doina precup head of deepmind's montreal office she is a world expert in reinforcement learning it's very easy to imagine expressing robotics tasks in a reward language because you can observe when the robot is doing the correct thing let's say putting an object in a particular place and so it's very easy to phrase the problem as a reinforcement learning problem and of course we know from the natural world animals train by reward to do complicated physical tasks would like to take that idea to robotics as well if you want to get a dog to go fetch you don't carefully explain how it should move each one of its muscles in order to run towards an object retrieve it and give it back to you instead you reward it with a treat when it does what you want and it learns by itself how best to calibrate its body in the performance of that task in this way some of the algorithms inside ai robots are much like dogs except they're rewarded with numbers not tasty biscuits this might make it seem like reinforcement learning is a magic bullet but in practice things are a bit more complicated physical tasks like inserting a key into a lock are subject to a problem known as sparse reward if you waited to reward a robot until it had successfully put a key into a lock just by chance you would be waiting around for a long time so the robotics team has been looking for other ways of putting their robots on the right track while the robot is learning to do it a human comes in and when it gets close but no cigar a human can take over and just be like adjust like this maybe move to the left a little bit and so while we might have a sparse reward so it's kind of like it's all or nothing you know you're in the locker you're not in it what the robot will use is both that information of sparsity but also maybe information from a human and kind of the combination of those things is how it might learn and while there are certainly areas where learning algorithms like this one have been able to successfully accomplish tasks you shouldn't be fooled into thinking this stuff is easy because not all the robots in this lab are quite as accomplished when i was here last i saw a robot that was stacking lego bricks not to be rude i wouldn't say it was the most impressive thing i've ever seen in my life how's it doing now we can actually move to the other side of the lab and we can start to see that stuff akiel took me to another robot cell with a red and black robot arm inside it had a gripper on the end with two appendages a bit like the grabby bit of a litter picker and it was hovering over a tray containing a trio of 3d shapes its goal was to learn how to stack the red pyramid shape on top of the blue octagonal prism so there's only one way around that it can hold this red object and successfully pick it up and it has a workshop which way and unfortunately every time it tries to rotate and pick it up oh hang on i think it's got it it's got it it's good job these things don't get disheartened because my goodness it's been how many years since i've been here no this time it's been here trying and trying and trying so we're seeing something that's kind of training right now so we're not seeing our best don't make excuses why are these dexterous manipulation tasks so important to learn so one of the reasons that we have a robotics lab at deepmind is really to ground our search for agi in the real world to make sure that our progress towards agi is true agi like if we find agi it probably should be able to stack an object on another object and speaking of objects next to this row of robot arms i noticed a basket full of children's toys rubber ducks foam bananas and a much-loved cartoon character i noticed spongebob is still here sat in the corner this time there's also hang on little green rubber ducks what is the idea behind this stuff so these kind of play things are really nice because manipulating objects that can bend and move and stuff like that that's a new type of physics that our agents need to learn somewhere in a landfill is there a pile of sort of crushed foam bananas that robots have we haven't destroyed any bananas yet i can you haven't destroyed any banana i don't believe that for a second as fun as it is to watch these robot arms try and fail to insert usb sticks into computers and sling foam bananas around it's worth remembering that the projects on display in the robotics lab serve an important purpose building ai that can interact with the physical world is considered central to the overarching goal of solving intelligence itself here's ryan hatzel again speaking at the cheltenham science festival when we think about human intelligence a lot of the time we focus on things like language or our cognitive skills how good we are at math but really a lot of our brain has been developed in order to just move our bodies and so i think that that level of intelligence motor intelligence movement intelligence this is a core part of our intelligence and that's what our cognitive skills are built on top of this focus on creating intelligent robots which can learn for themselves is part of the reason why deep minds robots might seem a little bit well rudimentary compared to what else is out there because i'm sure that all of you are thinking about those videos on the internet of robots doing backflips being pushed over getting back up performing all kinds of incredibly sophisticated movements so i thought i'd ask ryan hadsell about this you can't believe everything you see on the internet hannah welfare you're absolutely right there are robots that can do some pretty impressive stuff that can flip that can jump at deepmind we've been focusing more on the generality aspect of it the g in agi we want robots that can learn new things that they've never done before without needing somebody to program them just through experience or through watching a human so those very impressive videos the ones that aren't fake of robot zoo backflips they are essentially following a very precise set of instructions is that essentially what we're saying absolutely and they tend to be a demonstration of what that actual robot can do a robot that can do a backflip that's very impressive because of the power and mass ratio that's required to do that but it's very different from wanting that robot to do a new skill that it has just observed for the first time it couldn't walk over to a table and pick up a coffee cup for example it could not well you've disappointed me right but yours could in theory in future ours could do that and weed potatoes and pick tomatoes as well this is the key point here if robots can teach themselves to manipulate objects and move around they can be adaptable and offer assistance to humans in a whole host of critical tasks including situations where they can't currently support us so this came up when there was the fukushima disaster in japan there's been an explosion at a japanese nuclear power station damaged in yesterday's massive earthquake clouds of smoke could be seen rising above the fukushima nuclear site people realized that we didn't have a good way to send robots into this extremely dangerous radioactive area and make repairs because all of our robots either required an area that was easily accessible or didn't have the necessary dexterity to for instance shut a valve or open a door and so there was a whole robotics program aimed at how do we improve legged locomotion into areas where a wheeled robot can't go and how do we improve the dexterity of robots as well of course there is a flip side here if in the future these artificially intelligent robots are good enough to be deployed in the real world for saving human lives they could also be built to do the opposite robots have been used to carry weapons and so if you make a more capable robot then potentially what you're making is a more capable vehicle for holding weapons of course deep mind is very much against autonomous weaponry including on robots and i think that the benefits of robots and what they can do in our world outweigh these risks especially if the world stands strongly against the use of weaponry and robotics and this is not the only ethical concern about robotics research lots of people are worried about the possible detrimental effects of automation on the workforce what we're looking at now with the use of robots would be to augment humans somebody working on a construction site that has a robot next to them that's able to do some of the heavy lifting for instance so it's not about displacing humans or replacing them it's about enhancing what a human can do any robot that's going to help with weeding potatoes and picking tomatoes will of course need to have mastered locomotion back at the deepmind robotics lab a recent focus has been to develop a robot which can move around on two legs a problem which comes with its own unique set of research challenges on the floor we've got what looks sort of like the play mat that you put down for kids akil showed me a sort of robot play pen about nine meters squared with a barrier around it presumably to stop the robots inside from escaping so inside this square then you've got three humanoid robots they're black they've got a very cuboid body but they have arms and legs and even little tiny heads they're quite small though i should tell you that they're probably the size of a large chicken yeah smaller than a goose bigger than a chicken i don't know and basically what we've been doing is learning to walk around and so like robot actually learns to kind of use its legs even its arms the head has a camera so let's kind of look around and see what's going on so it is very much kind of almost like a whole body control problem in some sense can i touch it oh my gosh okay oh it's quite heavy it's got these little handles on the back almost like a rucksack and lots of ports like little usb ports and an ethernet cable port and stuff and then for feet it's got these little skid pads almost like it's going skiing but just with really short skis it's very pretty so i'm lifting its arm up now and it kind of returns to center but it's got this really like smooth action have a listen to that i feel really sad sort of like oh please leave me alone okay it's walking around imagine if you were doing a really rubbish robot dance in a nightclub that is exactly what it looks like it looks like it should fall over so you haven't programmed this to walk around in a circle no this was learned on the robot just by learning from the data over a couple of days that's jan huntlik a research scientist at deepmind who's been following the progress of these humanoid robots for more than a year did you teach it to fall flat on its back like it just did no it just falls it's quite good at pushing yourself up though yeah so those things are programmed the pushing behavior to stand up that's programmed because otherwise you just spent your entire life picking up the robot well either we need to pick them up or they would need to learn to stand up we are kind of humanizing them by giving them names the eureka protrochen is another research scientist on the locomotion project what their names are these three i think one of them is england and one is messi from the messi the footballer and mine's called that's from humane deji or the hajj the great romanian footballer just because i'm from romania originally so if you look at that one this is a completely different training process and you can see that the gate is very different and it can try to walk backwards and it's actually looks like it's a drunk robot so it's trying to walk backwards but it's sort of um i must say i could have stood watching these cute and mostly completely hopeless little humanoid robots all day but i wanted to find out more about the process of training them to walk so after waving goodbye to england messi and co i asked jan and viarika about their experience of training these robots in their living rooms at home when the pandemic hit how did it travel does it just pop in a little suitcase actually if you buy it you get it with a suitcase it comes with it so it can travel do you have quite a big living room not that big but yeah i'm adapting it i have a pen there with floor mats and foam walls so when you watch tv at night time you sort of put your feet up and around you is a little robot pen exactly yeah we even had experiments where the robot was watching tv did you really well we wanted to run some experiments to test visual networks tv is a good source of diverse visual data and it's already in the living room right so why not so hang on your job for the last year has been still on the sofa and watch tv with your robots not quite that but probably a few seconds of it it does look like that yeah so how do you train a humanoid robot to walk again the underlying mechanism is reinforcement learning the robots are rewarded with points for forward velocity and not falling over when you haven't given them any training what do they do oh they don't do much they just start shaking for one second or two at most and then they fall after training for a few hours then they start actually walking like taking a few steps and then later on they bump into walls and then using vision they learn how to avoid the walls so i i have a two-year-old at home right and like the way you're describing here it's not dissimilar from the way that the two-year-old has learned to walk there's a lot of falling not that shaking and flailing but there was also sort much a lot of walking into walls do you see those similarities with the way that these robots learn to walk in the way that toddlers learn there are some similarities where probably for toddlers even before they crawl they still discover their body they still learn to move their limbs whereas our robots we just put them in standing position and now walk and how quickly did it manage to learn to walk i think in about 24 hours it was already walking for me that's impressive not 24 hours in real time but 24 hours in sort of training time yeah yeah that spans about a week of training but uh training like a small uh sessions before something breaks or taking it to the lab for a quick repair or something like that the eureka raises an important point about the fragility of these robots the actual hardware is not designed for a machine learning technique which involves a robot falling down loads of times before any progress is made here's ryan hansel to explain the robots that are built today are not built for the type of learning paradigms that we think is key to developing agi think about when a child learns to walk every time they fall down they then heal from that and they keep on going there's only so many times that a robot can fall down before it simply breaks this approach comes with all kinds of difficulties and hurdles that the pre-programmed robots just don't have to worry about here's jan humplik again the main limitation is that you really start from scratch with more classical approaches you perhaps don't need any data it's just going to work out of the box so these are certainly disadvantages of reinforcement can't you cheat though can't what one robot has learned about the world be imparted onto another absolutely and there are many different ways to share knowledge in particular you can just have multiple robots collecting data and this is really the way to scale up this data collection process what yarn is talking about here is a technique called pooling instead of and england learning to walk independently of each other their data how many times they fell over what their sensor readings were when they fell etc is regularly uploaded to a central controller which combines this information and feeds it back to each robot so that they can better navigate the world based on their combined learning experience we can track each robot how well they're doing and yeah we definitely discuss like oh okay my robot starts falling more often now is yours the same did it get quite competitive i kept telling everybody that it's not a competition but yes every time somebody would cheat the learning curve and there would be the two robots they would be like oh viorika is winning oh yanni is winning i'm like no no we're only winning if the performances are the same on both robots speaking of teamwork there are other environments beyond just walking around or inserting keys into locks or stacking bricks that serve as an important test project for the robots a chance to hone in on a set of robot skills that would be useful to have in the long term for that in true deep mind fashion their focus has turned to games and one in particular the beautiful game in order to play football you have to be able to control your body you need to be able to run to walk but then you also need to have these skills of dribbling and shooting and then at even a level above that you need to have the coordination and the strategy over the whole game so it's really a challenge that has a lot of layers to it so far deepmind has been teaching football not to real robots but to simulated ones computerized avatars in human form a bit like a simplified version of the players in your favorite video game the difference with these players is that their repertoire of movements is not pre-programmed but like the real robots they are effectively learning to move from scratch the point is not to have robots playing at wembley stadium in the near future however fun that might be we're really trying to study whether it's valuable to train these methods using reward and the competition of something like football or whether there are other ways to train for this type of behavior underneath that big umbrella of reinforcement learning it's helpful to use a series of other techniques to get the agents up and running here they're also using something called imitation learning which involves gathering video footage from real human football matches using motion capture to translate the movements of each player's joints into a dataset and then training a neural network so that these simulated humanoids begin to mimic the movements of real players so this is really layering these different types of learning algorithms together and the exciting thing is that the result in the end is four agents that can race around this field and they've really achieved this level of whole body control and also team coordination and then raya showed me my first ever video of a simulated humanoid football match and i'll do my best to commentate on the highlights so here we are for this season's title decider between these two titans of ai football the blues versus humanoids united playing in red well the game's begun and drogbot has it for the blues cuts inside and then chops back onto his right but look the ball is broken free and robo naldo is clear go well they say a week is a long time in politics but five seconds really is an epoch in ai football as impressive as this video is right i think it's fair to say that at certain points they are quite hilariously bad at controlling their bodies yeah i mean they are not trying to win any points on style or grace so that really lets the agent optimize for just purely trying to achieve at school it doesn't matter if the arms are flailing around so you can see the problem with putting this onto a real robot yeah i can see how that would be a problem with the robots it's in the game of football that we start to see the different flavors of intelligence converge training agents to play football gives them physical skills like dribbling and passing but when combined with a reinforcement learning algorithm which rewards them for team play you start to see emerging the sort of cooperative ai we heard about in the previous episode the question is if physical and social intelligence can be developed in tandem in this way could physical intelligence provide a path all the way to agi it all depends on how you define agi doesn't it i've noticed this a lot yes maybe not immediately you know if you look at evolution it's a very long path to go from initial creatures to human beings then i think also it could be a very long path if we want to build an agi starting from first principles of learning to move a body but that is what we are looking at jan humplik also believes that it will be a long time before robotics takes us to a general form of intelligence if i ask somebody on the street what would they be impressed by the robot doing they would say something like well you know maybe cleaning my apartment and and if you start thinking about this problem you're like okay so it certainly needs to use vision it certainly needs to understand human language because you need to give it command it needs to understand what does it mean to clean the apartment and that's not trivial because cleaning doesn't mean destroying your furniture solving anything impressive like this essentially getting very close to agi but if embodied intelligence social intelligence and linguistic intelligence don't necessarily lead to agi on their own is there a single path that does well some deep mind researchers are convinced that there is and it's been staring us in the face this whole time when we say that reward is enough we're really arguing that all of the abilities of intelligence everything from perception to knowledge to social intelligence to language can be understood as a single process of trying to increase the rewards that that agent gets if this hypothesis was true it would mean that we only need to solve one problem in intelligence rather than a thousand different problems for each of the separate abilities that's next time on the deepmind podcast presented by me hannah fry and produced by dan hardoon at whistle down productions if you like what you've heard please do rate and review the podcast helps others who are also ai curious to find it same time next weekhello and welcome back to deepmind the podcast over the last two episodes we've been exploring deepmind's goal of solving intelligence asking what that actually means and traveling along some of the roads that could take us there this time it's all about the robots we'll be exploring the idea of physical intelligence and to do that i'll be taking you behind the scenes of the robotics lab in kings cross london you've got three humanoid robots they're black they've got a very sort of cuboid body but they have arms and legs and even little tiny heads they're quite small though they're probably the size of a large chicken yeah smaller than a goose i'm hannah fry and this is episode four let's get physical now before our robotics lab passes are activated let me fill you in on a bit of background why would a company known for getting machines to play board games and fold proteins find robotics so alluring in june last year i emerged bright eyed from my lockdown induced hibernation to visit cheltenham a pretty spa town in south west england the cheltenham science festival an annual event attracting the world's leading scientists and thinkers was the setting for my first in-person interview since the kovid 19 lockdown and as luck would have it my interviewee was rya hadsell deepmind's director of robotics ah it's very nice to be in a room full of people again freddie ryan owes pretty much all there is to know about robotics and artificial intelligence starting with the difference between them when we think about artificial intelligence a lot of the time people immediately go to a robot as being the instantiation of that ai just think about the robots you see in films c3po wally marvin their paranoid android they're all intelligent beings with robot bodies they're all able to reason about their environment and make decisions likewise our visions of super intelligent ai long into the future is rarely just a disembodied voice apart from a couple of exceptions like in the film her for example robots and ai are synonymous with one another in many people's minds but really the two should be distinguished ai is a computer program that's usually trained on a lot of data to be able to give answers to questions in a similar way that a human might so think about being able to translate from french to english to mandarin these are the types of problems that an ai might be able to do a robot on the other hand takes actions and changes the world either manipulation through touching the world and moving things around maybe doing assembly or a robot that can move itself around and then we can think about the two together as a.i being a really natural way to bring us to the next set of breakthroughs for what robots can do if you heard the first series of this podcast you'll already be familiar with the idea that robots don't necessarily come with ai built into them your dishwasher your lawn mower your pressure cooker are all in the technical sense robots they are machines that are capable of carrying out a series of actions automatically but these aren't the sorts of robots that deepmind is interested in as a route to artificial general intelligence instead their robots use machine learning techniques to learn for themselves how to perform different tasks so what does all of this look like what kind of robots are being trained to saunter around the research facilities to ground algorithmic experience in the real world and explore the absolute cutting edge of physical intelligence well why don't you come on in welcome back to the robotics lab meet akil raju a software engineer on the robotics team you can see the excitement in his eyes as he shows me around the lab even while the rest of his face is covered by a mask so this is going to be a little bigger than the last time oh gosh massive whoa yeah you know if you ever go to a trade show and they have like little stalls up in a giant space it sort of looks a little bit like that so we're in this big concrete building with lots of glass along one side and then you've got these little booths all the way along with i mean they sort of look like privacy screens but privacy for the humans exactly the robots no one cares about the process yeah inside these mini booths are robotic arms of every size and shape imaginable tall crane-like arms short and stubby ones and arms with grippers on the end like the kind you'll see in a games arcade all of these arms are part of deepmind's research into getting robots to dexterously manipulate everyday objects akeel ushered me into one of the booths to take a closer look so this big arm that is extending out of a table you know those stand mixers that you get in posh kitchens imagine one of those but like a giant version so it's kind of quite bulbous and curvaceous with all of these joints and cameras attached to it and then right on the end there's a teeny tiny key and it's i guess trying to put a key in a lock yep exactly this robot has kind of this attachment where it can insert like a usb in a usb hole or maybe a key or so on and so we're trying to learn how to actually do very like fine manipulation we're taking tasks that you might do in everyday life and we're using that as a challenge if you wanted to have one of these robots in a factory say doing this really fine insertion task why can't you just pre-program one why does it need to be something that has trained itself if it was a case where it's very fixed settings we know exactly where the key is we know exactly where the hole is then probably yeah you can just program it the thing is that's not how all factories really are a lot of factories that might require some kind of an insertion task like putting a key in a lock we'll also have a lot of variables at play so that the lock and key aren't at precisely the same start points each time and that changes the challenge from being something pre-programmable to something much harder and what you'll notice actually is when these types of insertions need to happen in a factory it's not robots that do it in the real world now it's humans and that's another reason why we chose insertions as a task because it's somewhat unsolved by the greater robotics community you might be wondering how on earth any of this is possible how do you possibly set up an inanimate robot arm to teach itself to open a lock well by now it probably won't surprise you that one of the fundamental methods for training physical intelligence is that deep mind favorite approach reinforcement learning in the simplest terms this involves rewarding an algorithm with points for accomplishing a task like correctly inserting a key into a lock and there is a reason why robotics is geared up for algorithms based on reinforcement learning here's doina precup head of deepmind's montreal office she is a world expert in reinforcement learning it's very easy to imagine expressing robotics tasks in a reward language because you can observe when the robot is doing the correct thing let's say putting an object in a particular place and so it's very easy to phrase the problem as a reinforcement learning problem and of course we know from the natural world animals train by reward to do complicated physical tasks would like to take that idea to robotics as well if you want to get a dog to go fetch you don't carefully explain how it should move each one of its muscles in order to run towards an object retrieve it and give it back to you instead you reward it with a treat when it does what you want and it learns by itself how best to calibrate its body in the performance of that task in this way some of the algorithms inside ai robots are much like dogs except they're rewarded with numbers not tasty biscuits this might make it seem like reinforcement learning is a magic bullet but in practice things are a bit more complicated physical tasks like inserting a key into a lock are subject to a problem known as sparse reward if you waited to reward a robot until it had successfully put a key into a lock just by chance you would be waiting around for a long time so the robotics team has been looking for other ways of putting their robots on the right track while the robot is learning to do it a human comes in and when it gets close but no cigar a human can take over and just be like adjust like this maybe move to the left a little bit and so while we might have a sparse reward so it's kind of like it's all or nothing you know you're in the locker you're not in it what the robot will use is both that information of sparsity but also maybe information from a human and kind of the combination of those things is how it might learn and while there are certainly areas where learning algorithms like this one have been able to successfully accomplish tasks you shouldn't be fooled into thinking this stuff is easy because not all the robots in this lab are quite as accomplished when i was here last i saw a robot that was stacking lego bricks not to be rude i wouldn't say it was the most impressive thing i've ever seen in my life how's it doing now we can actually move to the other side of the lab and we can start to see that stuff akiel took me to another robot cell with a red and black robot arm inside it had a gripper on the end with two appendages a bit like the grabby bit of a litter picker and it was hovering over a tray containing a trio of 3d shapes its goal was to learn how to stack the red pyramid shape on top of the blue octagonal prism so there's only one way around that it can hold this red object and successfully pick it up and it has a workshop which way and unfortunately every time it tries to rotate and pick it up oh hang on i think it's got it it's got it it's good job these things don't get disheartened because my goodness it's been how many years since i've been here no this time it's been here trying and trying and trying so we're seeing something that's kind of training right now so we're not seeing our best don't make excuses why are these dexterous manipulation tasks so important to learn so one of the reasons that we have a robotics lab at deepmind is really to ground our search for agi in the real world to make sure that our progress towards agi is true agi like if we find agi it probably should be able to stack an object on another object and speaking of objects next to this row of robot arms i noticed a basket full of children's toys rubber ducks foam bananas and a much-loved cartoon character i noticed spongebob is still here sat in the corner this time there's also hang on little green rubber ducks what is the idea behind this stuff so these kind of play things are really nice because manipulating objects that can bend and move and stuff like that that's a new type of physics that our agents need to learn somewhere in a landfill is there a pile of sort of crushed foam bananas that robots have we haven't destroyed any bananas yet i can you haven't destroyed any banana i don't believe that for a second as fun as it is to watch these robot arms try and fail to insert usb sticks into computers and sling foam bananas around it's worth remembering that the projects on display in the robotics lab serve an important purpose building ai that can interact with the physical world is considered central to the overarching goal of solving intelligence itself here's ryan hatzel again speaking at the cheltenham science festival when we think about human intelligence a lot of the time we focus on things like language or our cognitive skills how good we are at math but really a lot of our brain has been developed in order to just move our bodies and so i think that that level of intelligence motor intelligence movement intelligence this is a core part of our intelligence and that's what our cognitive skills are built on top of this focus on creating intelligent robots which can learn for themselves is part of the reason why deep minds robots might seem a little bit well rudimentary compared to what else is out there because i'm sure that all of you are thinking about those videos on the internet of robots doing backflips being pushed over getting back up performing all kinds of incredibly sophisticated movements so i thought i'd ask ryan hadsell about this you can't believe everything you see on the internet hannah welfare you're absolutely right there are robots that can do some pretty impressive stuff that can flip that can jump at deepmind we've been focusing more on the generality aspect of it the g in agi we want robots that can learn new things that they've never done before without needing somebody to program them just through experience or through watching a human so those very impressive videos the ones that aren't fake of robot zoo backflips they are essentially following a very precise set of instructions is that essentially what we're saying absolutely and they tend to be a demonstration of what that actual robot can do a robot that can do a backflip that's very impressive because of the power and mass ratio that's required to do that but it's very different from wanting that robot to do a new skill that it has just observed for the first time it couldn't walk over to a table and pick up a coffee cup for example it could not well you've disappointed me right but yours could in theory in future ours could do that and weed potatoes and pick tomatoes as well this is the key point here if robots can teach themselves to manipulate objects and move around they can be adaptable and offer assistance to humans in a whole host of critical tasks including situations where they can't currently support us so this came up when there was the fukushima disaster in japan there's been an explosion at a japanese nuclear power station damaged in yesterday's massive earthquake clouds of smoke could be seen rising above the fukushima nuclear site people realized that we didn't have a good way to send robots into this extremely dangerous radioactive area and make repairs because all of our robots either required an area that was easily accessible or didn't have the necessary dexterity to for instance shut a valve or open a door and so there was a whole robotics program aimed at how do we improve legged locomotion into areas where a wheeled robot can't go and how do we improve the dexterity of robots as well of course there is a flip side here if in the future these artificially intelligent robots are good enough to be deployed in the real world for saving human lives they could also be built to do the opposite robots have been used to carry weapons and so if you make a more capable robot then potentially what you're making is a more capable vehicle for holding weapons of course deep mind is very much against autonomous weaponry including on robots and i think that the benefits of robots and what they can do in our world outweigh these risks especially if the world stands strongly against the use of weaponry and robotics and this is not the only ethical concern about robotics research lots of people are worried about the possible detrimental effects of automation on the workforce what we're looking at now with the use of robots would be to augment humans somebody working on a construction site that has a robot next to them that's able to do some of the heavy lifting for instance so it's not about displacing humans or replacing them it's about enhancing what a human can do any robot that's going to help with weeding potatoes and picking tomatoes will of course need to have mastered locomotion back at the deepmind robotics lab a recent focus has been to develop a robot which can move around on two legs a problem which comes with its own unique set of research challenges on the floor we've got what looks sort of like the play mat that you put down for kids akil showed me a sort of robot play pen about nine meters squared with a barrier around it presumably to stop the robots inside from escaping so inside this square then you've got three humanoid robots they're black they've got a very cuboid body but they have arms and legs and even little tiny heads they're quite small though i should tell you that they're probably the size of a large chicken yeah smaller than a goose bigger than a chicken i don't know and basically what we've been doing is learning to walk around and so like robot actually learns to kind of use its legs even its arms the head has a camera so let's kind of look around and see what's going on so it is very much kind of almost like a whole body control problem in some sense can i touch it oh my gosh okay oh it's quite heavy it's got these little handles on the back almost like a rucksack and lots of ports like little usb ports and an ethernet cable port and stuff and then for feet it's got these little skid pads almost like it's going skiing but just with really short skis it's very pretty so i'm lifting its arm up now and it kind of returns to center but it's got this really like smooth action have a listen to that i feel really sad sort of like oh please leave me alone okay it's walking around imagine if you were doing a really rubbish robot dance in a nightclub that is exactly what it looks like it looks like it should fall over so you haven't programmed this to walk around in a circle no this was learned on the robot just by learning from the data over a couple of days that's jan huntlik a research scientist at deepmind who's been following the progress of these humanoid robots for more than a year did you teach it to fall flat on its back like it just did no it just falls it's quite good at pushing yourself up though yeah so those things are programmed the pushing behavior to stand up that's programmed because otherwise you just spent your entire life picking up the robot well either we need to pick them up or they would need to learn to stand up we are kind of humanizing them by giving them names the eureka protrochen is another research scientist on the locomotion project what their names are these three i think one of them is england and one is messi from the messi the footballer and mine's called that's from humane deji or the hajj the great romanian footballer just because i'm from romania originally so if you look at that one this is a completely different training process and you can see that the gate is very different and it can try to walk backwards and it's actually looks like it's a drunk robot so it's trying to walk backwards but it's sort of um i must say i could have stood watching these cute and mostly completely hopeless little humanoid robots all day but i wanted to find out more about the process of training them to walk so after waving goodbye to england messi and co i asked jan and viarika about their experience of training these robots in their living rooms at home when the pandemic hit how did it travel does it just pop in a little suitcase actually if you buy it you get it with a suitcase it comes with it so it can travel do you have quite a big living room not that big but yeah i'm adapting it i have a pen there with floor mats and foam walls so when you watch tv at night time you sort of put your feet up and around you is a little robot pen exactly yeah we even had experiments where the robot was watching tv did you really well we wanted to run some experiments to test visual networks tv is a good source of diverse visual data and it's already in the living room right so why not so hang on your job for the last year has been still on the sofa and watch tv with your robots not quite that but probably a few seconds of it it does look like that yeah so how do you train a humanoid robot to walk again the underlying mechanism is reinforcement learning the robots are rewarded with points for forward velocity and not falling over when you haven't given them any training what do they do oh they don't do much they just start shaking for one second or two at most and then they fall after training for a few hours then they start actually walking like taking a few steps and then later on they bump into walls and then using vision they learn how to avoid the walls so i i have a two-year-old at home right and like the way you're describing here it's not dissimilar from the way that the two-year-old has learned to walk there's a lot of falling not that shaking and flailing but there was also sort much a lot of walking into walls do you see those similarities with the way that these robots learn to walk in the way that toddlers learn there are some similarities where probably for toddlers even before they crawl they still discover their body they still learn to move their limbs whereas our robots we just put them in standing position and now walk and how quickly did it manage to learn to walk i think in about 24 hours it was already walking for me that's impressive not 24 hours in real time but 24 hours in sort of training time yeah yeah that spans about a week of training but uh training like a small uh sessions before something breaks or taking it to the lab for a quick repair or something like that the eureka raises an important point about the fragility of these robots the actual hardware is not designed for a machine learning technique which involves a robot falling down loads of times before any progress is made here's ryan hansel to explain the robots that are built today are not built for the type of learning paradigms that we think is key to developing agi think about when a child learns to walk every time they fall down they then heal from that and they keep on going there's only so many times that a robot can fall down before it simply breaks this approach comes with all kinds of difficulties and hurdles that the pre-programmed robots just don't have to worry about here's jan humplik again the main limitation is that you really start from scratch with more classical approaches you perhaps don't need any data it's just going to work out of the box so these are certainly disadvantages of reinforcement can't you cheat though can't what one robot has learned about the world be imparted onto another absolutely and there are many different ways to share knowledge in particular you can just have multiple robots collecting data and this is really the way to scale up this data collection process what yarn is talking about here is a technique called pooling instead of and england learning to walk independently of each other their data how many times they fell over what their sensor readings were when they fell etc is regularly uploaded to a central controller which combines this information and feeds it back to each robot so that they can better navigate the world based on their combined learning experience we can track each robot how well they're doing and yeah we definitely discuss like oh okay my robot starts falling more often now is yours the same did it get quite competitive i kept telling everybody that it's not a competition but yes every time somebody would cheat the learning curve and there would be the two robots they would be like oh viorika is winning oh yanni is winning i'm like no no we're only winning if the performances are the same on both robots speaking of teamwork there are other environments beyond just walking around or inserting keys into locks or stacking bricks that serve as an important test project for the robots a chance to hone in on a set of robot skills that would be useful to have in the long term for that in true deep mind fashion their focus has turned to games and one in particular the beautiful game in order to play football you have to be able to control your body you need to be able to run to walk but then you also need to have these skills of dribbling and shooting and then at even a level above that you need to have the coordination and the strategy over the whole game so it's really a challenge that has a lot of layers to it so far deepmind has been teaching football not to real robots but to simulated ones computerized avatars in human form a bit like a simplified version of the players in your favorite video game the difference with these players is that their repertoire of movements is not pre-programmed but like the real robots they are effectively learning to move from scratch the point is not to have robots playing at wembley stadium in the near future however fun that might be we're really trying to study whether it's valuable to train these methods using reward and the competition of something like football or whether there are other ways to train for this type of behavior underneath that big umbrella of reinforcement learning it's helpful to use a series of other techniques to get the agents up and running here they're also using something called imitation learning which involves gathering video footage from real human football matches using motion capture to translate the movements of each player's joints into a dataset and then training a neural network so that these simulated humanoids begin to mimic the movements of real players so this is really layering these different types of learning algorithms together and the exciting thing is that the result in the end is four agents that can race around this field and they've really achieved this level of whole body control and also team coordination and then raya showed me my first ever video of a simulated humanoid football match and i'll do my best to commentate on the highlights so here we are for this season's title decider between these two titans of ai football the blues versus humanoids united playing in red well the game's begun and drogbot has it for the blues cuts inside and then chops back onto his right but look the ball is broken free and robo naldo is clear go well they say a week is a long time in politics but five seconds really is an epoch in ai football as impressive as this video is right i think it's fair to say that at certain points they are quite hilariously bad at controlling their bodies yeah i mean they are not trying to win any points on style or grace so that really lets the agent optimize for just purely trying to achieve at school it doesn't matter if the arms are flailing around so you can see the problem with putting this onto a real robot yeah i can see how that would be a problem with the robots it's in the game of football that we start to see the different flavors of intelligence converge training agents to play football gives them physical skills like dribbling and passing but when combined with a reinforcement learning algorithm which rewards them for team play you start to see emerging the sort of cooperative ai we heard about in the previous episode the question is if physical and social intelligence can be developed in tandem in this way could physical intelligence provide a path all the way to agi it all depends on how you define agi doesn't it i've noticed this a lot yes maybe not immediately you know if you look at evolution it's a very long path to go from initial creatures to human beings then i think also it could be a very long path if we want to build an agi starting from first principles of learning to move a body but that is what we are looking at jan humplik also believes that it will be a long time before robotics takes us to a general form of intelligence if i ask somebody on the street what would they be impressed by the robot doing they would say something like well you know maybe cleaning my apartment and and if you start thinking about this problem you're like okay so it certainly needs to use vision it certainly needs to understand human language because you need to give it command it needs to understand what does it mean to clean the apartment and that's not trivial because cleaning doesn't mean destroying your furniture solving anything impressive like this essentially getting very close to agi but if embodied intelligence social intelligence and linguistic intelligence don't necessarily lead to agi on their own is there a single path that does well some deep mind researchers are convinced that there is and it's been staring us in the face this whole time when we say that reward is enough we're really arguing that all of the abilities of intelligence everything from perception to knowledge to social intelligence to language can be understood as a single process of trying to increase the rewards that that agent gets if this hypothesis was true it would mean that we only need to solve one problem in intelligence rather than a thousand different problems for each of the separate abilities that's next time on the deepmind podcast presented by me hannah fry and produced by dan hardoon at whistle down productions if you like what you've heard please do rate and review the podcast helps others who are also ai curious to find it same time next weekhello and welcome back to deepmind the podcast over the last two episodes we've been exploring deepmind's goal of solving intelligence asking what that actually means and traveling along some of the roads that could take us there this time it's all about the robots we'll be exploring the idea of physical intelligence and to do that i'll be taking you behind the scenes of the robotics lab in kings cross london you've got three humanoid robots they're black they've got a very sort of cuboid body but they have arms and legs and even little tiny heads they're quite small though they're probably the size of a large chicken yeah smaller than a goose i'm hannah fry and this is episode four let's get physical now before our robotics lab passes are activated let me fill you in on a bit of background why would a company known for getting machines to play board games and fold proteins find robotics so alluring in june last year i emerged bright eyed from my lockdown induced hibernation to visit cheltenham a pretty spa town in south west england the cheltenham science festival an annual event attracting the world's leading scientists and thinkers was the setting for my first in-person interview since the kovid 19 lockdown and as luck would have it my interviewee was rya hadsell deepmind's director of robotics ah it's very nice to be in a room full of people again freddie ryan owes pretty much all there is to know about robotics and artificial intelligence starting with the difference between them when we think about artificial intelligence a lot of the time people immediately go to a robot as being the instantiation of that ai just think about the robots you see in films c3po wally marvin their paranoid android they're all intelligent beings with robot bodies they're all able to reason about their environment and make decisions likewise our visions of super intelligent ai long into the future is rarely just a disembodied voice apart from a couple of exceptions like in the film her for example robots and ai are synonymous with one another in many people's minds but really the two should be distinguished ai is a computer program that's usually trained on a lot of data to be able to give answers to questions in a similar way that a human might so think about being able to translate from french to english to mandarin these are the types of problems that an ai might be able to do a robot on the other hand takes actions and changes the world either manipulation through touching the world and moving things around maybe doing assembly or a robot that can move itself around and then we can think about the two together as a.i being a really natural way to bring us to the next set of breakthroughs for what robots can do if you heard the first series of this podcast you'll already be familiar with the idea that robots don't necessarily come with ai built into them your dishwasher your lawn mower your pressure cooker are all in the technical sense robots they are machines that are capable of carrying out a series of actions automatically but these aren't the sorts of robots that deepmind is interested in as a route to artificial general intelligence instead their robots use machine learning techniques to learn for themselves how to perform different tasks so what does all of this look like what kind of robots are being trained to saunter around the research facilities to ground algorithmic experience in the real world and explore the absolute cutting edge of physical intelligence well why don't you come on in welcome back to the robotics lab meet akil raju a software engineer on the robotics team you can see the excitement in his eyes as he shows me around the lab even while the rest of his face is covered by a mask so this is going to be a little bigger than the last time oh gosh massive whoa yeah you know if you ever go to a trade show and they have like little stalls up in a giant space it sort of looks a little bit like that so we're in this big concrete building with lots of glass along one side and then you've got these little booths all the way along with i mean they sort of look like privacy screens but privacy for the humans exactly the robots no one cares about the process yeah inside these mini booths are robotic arms of every size and shape imaginable tall crane-like arms short and stubby ones and arms with grippers on the end like the kind you'll see in a games arcade all of these arms are part of deepmind's research into getting robots to dexterously manipulate everyday objects akeel ushered me into one of the booths to take a closer look so this big arm that is extending out of a table you know those stand mixers that you get in posh kitchens imagine one of those but like a giant version so it's kind of quite bulbous and curvaceous with all of these joints and cameras attached to it and then right on the end there's a teeny tiny key and it's i guess trying to put a key in a lock yep exactly this robot has kind of this attachment where it can insert like a usb in a usb hole or maybe a key or so on and so we're trying to learn how to actually do very like fine manipulation we're taking tasks that you might do in everyday life and we're using that as a challenge if you wanted to have one of these robots in a factory say doing this really fine insertion task why can't you just pre-program one why does it need to be something that has trained itself if it was a case where it's very fixed settings we know exactly where the key is we know exactly where the hole is then probably yeah you can just program it the thing is that's not how all factories really are a lot of factories that might require some kind of an insertion task like putting a key in a lock we'll also have a lot of variables at play so that the lock and key aren't at precisely the same start points each time and that changes the challenge from being something pre-programmable to something much harder and what you'll notice actually is when these types of insertions need to happen in a factory it's not robots that do it in the real world now it's humans and that's another reason why we chose insertions as a task because it's somewhat unsolved by the greater robotics community you might be wondering how on earth any of this is possible how do you possibly set up an inanimate robot arm to teach itself to open a lock well by now it probably won't surprise you that one of the fundamental methods for training physical intelligence is that deep mind favorite approach reinforcement learning in the simplest terms this involves rewarding an algorithm with points for accomplishing a task like correctly inserting a key into a lock and there is a reason why robotics is geared up for algorithms based on reinforcement learning here's doina precup head of deepmind's montreal office she is a world expert in reinforcement learning it's very easy to imagine expressing robotics tasks in a reward language because you can observe when the robot is doing the correct thing let's say putting an object in a particular place and so it's very easy to phrase the problem as a reinforcement learning problem and of course we know from the natural world animals train by reward to do complicated physical tasks would like to take that idea to robotics as well if you want to get a dog to go fetch you don't carefully explain how it should move each one of its muscles in order to run towards an object retrieve it and give it back to you instead you reward it with a treat when it does what you want and it learns by itself how best to calibrate its body in the performance of that task in this way some of the algorithms inside ai robots are much like dogs except they're rewarded with numbers not tasty biscuits this might make it seem like reinforcement learning is a magic bullet but in practice things are a bit more complicated physical tasks like inserting a key into a lock are subject to a problem known as sparse reward if you waited to reward a robot until it had successfully put a key into a lock just by chance you would be waiting around for a long time so the robotics team has been looking for other ways of putting their robots on the right track while the robot is learning to do it a human comes in and when it gets close but no cigar a human can take over and just be like adjust like this maybe move to the left a little bit and so while we might have a sparse reward so it's kind of like it's all or nothing you know you're in the locker you're not in it what the robot will use is both that information of sparsity but also maybe information from a human and kind of the combination of those things is how it might learn and while there are certainly areas where learning algorithms like this one have been able to successfully accomplish tasks you shouldn't be fooled into thinking this stuff is easy because not all the robots in this lab are quite as accomplished when i was here last i saw a robot that was stacking lego bricks not to be rude i wouldn't say it was the most impressive thing i've ever seen in my life how's it doing now we can actually move to the other side of the lab and we can start to see that stuff akiel took me to another robot cell with a red and black robot arm inside it had a gripper on the end with two appendages a bit like the grabby bit of a litter picker and it was hovering over a tray containing a trio of 3d shapes its goal was to learn how to stack the red pyramid shape on top of the blue octagonal prism so there's only one way around that it can hold this red object and successfully pick it up and it has a workshop which way and unfortunately every time it tries to rotate and pick it up oh hang on i think it's got it it's got it it's good job these things don't get disheartened because my goodness it's been how many years since i've been here no this time it's been here trying and trying and trying so we're seeing something that's kind of training right now so we're not seeing our best don't make excuses why are these dexterous manipulation tasks so important to learn so one of the reasons that we have a robotics lab at deepmind is really to ground our search for agi in the real world to make sure that our progress towards agi is true agi like if we find agi it probably should be able to stack an object on another object and speaking of objects next to this row of robot arms i noticed a basket full of children's toys rubber ducks foam bananas and a much-loved cartoon character i noticed spongebob is still here sat in the corner this time there's also hang on little green rubber ducks what is the idea behind this stuff so these kind of play things are really nice because manipulating objects that can bend and move and stuff like that that's a new type of physics that our agents need to learn somewhere in a landfill is there a pile of sort of crushed foam bananas that robots have we haven't destroyed any bananas yet i can you haven't destroyed any banana i don't believe that for a second as fun as it is to watch these robot arms try and fail to insert usb sticks into computers and sling foam bananas around it's worth remembering that the projects on display in the robotics lab serve an important purpose building ai that can interact with the physical world is considered central to the overarching goal of solving intelligence itself here's ryan hatzel again speaking at the cheltenham science festival when we think about human intelligence a lot of the time we focus on things like language or our cognitive skills how good we are at math but really a lot of our brain has been developed in order to just move our bodies and so i think that that level of intelligence motor intelligence movement intelligence this is a core part of our intelligence and that's what our cognitive skills are built on top of this focus on creating intelligent robots which can learn for themselves is part of the reason why deep minds robots might seem a little bit well rudimentary compared to what else is out there because i'm sure that all of you are thinking about those videos on the internet of robots doing backflips being pushed over getting back up performing all kinds of incredibly sophisticated movements so i thought i'd ask ryan hadsell about this you can't believe everything you see on the internet hannah welfare you're absolutely right there are robots that can do some pretty impressive stuff that can flip that can jump at deepmind we've been focusing more on the generality aspect of it the g in agi we want robots that can learn new things that they've never done before without needing somebody to program them just through experience or through watching a human so those very impressive videos the ones that aren't fake of robot zoo backflips they are essentially following a very precise set of instructions is that essentially what we're saying absolutely and they tend to be a demonstration of what that actual robot can do a robot that can do a backflip that's very impressive because of the power and mass ratio that's required to do that but it's very different from wanting that robot to do a new skill that it has just observed for the first time it couldn't walk over to a table and pick up a coffee cup for example it could not well you've disappointed me right but yours could in theory in future ours could do that and weed potatoes and pick tomatoes as well this is the key point here if robots can teach themselves to manipulate objects and move around they can be adaptable and offer assistance to humans in a whole host of critical tasks including situations where they can't currently support us so this came up when there was the fukushima disaster in japan there's been an explosion at a japanese nuclear power station damaged in yesterday's massive earthquake clouds of smoke could be seen rising above the fukushima nuclear site people realized that we didn't have a good way to send robots into this extremely dangerous radioactive area and make repairs because all of our robots either required an area that was easily accessible or didn't have the necessary dexterity to for instance shut a valve or open a door and so there was a whole robotics program aimed at how do we improve legged locomotion into areas where a wheeled robot can't go and how do we improve the dexterity of robots as well of course there is a flip side here if in the future these artificially intelligent robots are good enough to be deployed in the real world for saving human lives they could also be built to do the opposite robots have been used to carry weapons and so if you make a more capable robot then potentially what you're making is a more capable vehicle for holding weapons of course deep mind is very much against autonomous weaponry including on robots and i think that the benefits of robots and what they can do in our world outweigh these risks especially if the world stands strongly against the use of weaponry and robotics and this is not the only ethical concern about robotics research lots of people are worried about the possible detrimental effects of automation on the workforce what we're looking at now with the use of robots would be to augment humans somebody working on a construction site that has a robot next to them that's able to do some of the heavy lifting for instance so it's not about displacing humans or replacing them it's about enhancing what a human can do any robot that's going to help with weeding potatoes and picking tomatoes will of course need to have mastered locomotion back at the deepmind robotics lab a recent focus has been to develop a robot which can move around on two legs a problem which comes with its own unique set of research challenges on the floor we've got what looks sort of like the play mat that you put down for kids akil showed me a sort of robot play pen about nine meters squared with a barrier around it presumably to stop the robots inside from escaping so inside this square then you've got three humanoid robots they're black they've got a very cuboid body but they have arms and legs and even little tiny heads they're quite small though i should tell you that they're probably the size of a large chicken yeah smaller than a goose bigger than a chicken i don't know and basically what we've been doing is learning to walk around and so like robot actually learns to kind of use its legs even its arms the head has a camera so let's kind of look around and see what's going on so it is very much kind of almost like a whole body control problem in some sense can i touch it oh my gosh okay oh it's quite heavy it's got these little handles on the back almost like a rucksack and lots of ports like little usb ports and an ethernet cable port and stuff and then for feet it's got these little skid pads almost like it's going skiing but just with really short skis it's very pretty so i'm lifting its arm up now and it kind of returns to center but it's got this really like smooth action have a listen to that i feel really sad sort of like oh please leave me alone okay it's walking around imagine if you were doing a really rubbish robot dance in a nightclub that is exactly what it looks like it looks like it should fall over so you haven't programmed this to walk around in a circle no this was learned on the robot just by learning from the data over a couple of days that's jan huntlik a research scientist at deepmind who's been following the progress of these humanoid robots for more than a year did you teach it to fall flat on its back like it just did no it just falls it's quite good at pushing yourself up though yeah so those things are programmed the pushing behavior to stand up that's programmed because otherwise you just spent your entire life picking up the robot well either we need to pick them up or they would need to learn to stand up we are kind of humanizing them by giving them names the eureka protrochen is another research scientist on the locomotion project what their names are these three i think one of them is england and one is messi from the messi the footballer and mine's called that's from humane deji or the hajj the great romanian footballer just because i'm from romania originally so if you look at that one this is a completely different training process and you can see that the gate is very different and it can try to walk backwards and it's actually looks like it's a drunk robot so it's trying to walk backwards but it's sort of um i must say i could have stood watching these cute and mostly completely hopeless little humanoid robots all day but i wanted to find out more about the process of training them to walk so after waving goodbye to england messi and co i asked jan and viarika about their experience of training these robots in their living rooms at home when the pandemic hit how did it travel does it just pop in a little suitcase actually if you buy it you get it with a suitcase it comes with it so it can travel do you have quite a big living room not that big but yeah i'm adapting it i have a pen there with floor mats and foam walls so when you watch tv at night time you sort of put your feet up and around you is a little robot pen exactly yeah we even had experiments where the robot was watching tv did you really well we wanted to run some experiments to test visual networks tv is a good source of diverse visual data and it's already in the living room right so why not so hang on your job for the last year has been still on the sofa and watch tv with your robots not quite that but probably a few seconds of it it does look like that yeah so how do you train a humanoid robot to walk again the underlying mechanism is reinforcement learning the robots are rewarded with points for forward velocity and not falling over when you haven't given them any training what do they do oh they don't do much they just start shaking for one second or two at most and then they fall after training for a few hours then they start actually walking like taking a few steps and then later on they bump into walls and then using vision they learn how to avoid the walls so i i have a two-year-old at home right and like the way you're describing here it's not dissimilar from the way that the two-year-old has learned to walk there's a lot of falling not that shaking and flailing but there was also sort much a lot of walking into walls do you see those similarities with the way that these robots learn to walk in the way that toddlers learn there are some similarities where probably for toddlers even before they crawl they still discover their body they still learn to move their limbs whereas our robots we just put them in standing position and now walk and how quickly did it manage to learn to walk i think in about 24 hours it was already walking for me that's impressive not 24 hours in real time but 24 hours in sort of training time yeah yeah that spans about a week of training but uh training like a small uh sessions before something breaks or taking it to the lab for a quick repair or something like that the eureka raises an important point about the fragility of these robots the actual hardware is not designed for a machine learning technique which involves a robot falling down loads of times before any progress is made here's ryan hansel to explain the robots that are built today are not built for the type of learning paradigms that we think is key to developing agi think about when a child learns to walk every time they fall down they then heal from that and they keep on going there's only so many times that a robot can fall down before it simply breaks this approach comes with all kinds of difficulties and hurdles that the pre-programmed robots just don't have to worry about here's jan humplik again the main limitation is that you really start from scratch with more classical approaches you perhaps don't need any data it's just going to work out of the box so these are certainly disadvantages of reinforcement can't you cheat though can't what one robot has learned about the world be imparted onto another absolutely and there are many different ways to share knowledge in particular you can just have multiple robots collecting data and this is really the way to scale up this data collection process what yarn is talking about here is a technique called pooling instead of and england learning to walk independently of each other their data how many times they fell over what their sensor readings were when they fell etc is regularly uploaded to a central controller which combines this information and feeds it back to each robot so that they can better navigate the world based on their combined learning experience we can track each robot how well they're doing and yeah we definitely discuss like oh okay my robot starts falling more often now is yours the same did it get quite competitive i kept telling everybody that it's not a competition but yes every time somebody would cheat the learning curve and there would be the two robots they would be like oh viorika is winning oh yanni is winning i'm like no no we're only winning if the performances are the same on both robots speaking of teamwork there are other environments beyond just walking around or inserting keys into locks or stacking bricks that serve as an important test project for the robots a chance to hone in on a set of robot skills that would be useful to have in the long term for that in true deep mind fashion their focus has turned to games and one in particular the beautiful game in order to play football you have to be able to control your body you need to be able to run to walk but then you also need to have these skills of dribbling and shooting and then at even a level above that you need to have the coordination and the strategy over the whole game so it's really a challenge that has a lot of layers to it so far deepmind has been teaching football not to real robots but to simulated ones computerized avatars in human form a bit like a simplified version of the players in your favorite video game the difference with these players is that their repertoire of movements is not pre-programmed but like the real robots they are effectively learning to move from scratch the point is not to have robots playing at wembley stadium in the near future however fun that might be we're really trying to study whether it's valuable to train these methods using reward and the competition of something like football or whether there are other ways to train for this type of behavior underneath that big umbrella of reinforcement learning it's helpful to use a series of other techniques to get the agents up and running here they're also using something called imitation learning which involves gathering video footage from real human football matches using motion capture to translate the movements of each player's joints into a dataset and then training a neural network so that these simulated humanoids begin to mimic the movements of real players so this is really layering these different types of learning algorithms together and the exciting thing is that the result in the end is four agents that can race around this field and they've really achieved this level of whole body control and also team coordination and then raya showed me my first ever video of a simulated humanoid football match and i'll do my best to commentate on the highlights so here we are for this season's title decider between these two titans of ai football the blues versus humanoids united playing in red well the game's begun and drogbot has it for the blues cuts inside and then chops back onto his right but look the ball is broken free and robo naldo is clear go well they say a week is a long time in politics but five seconds really is an epoch in ai football as impressive as this video is right i think it's fair to say that at certain points they are quite hilariously bad at controlling their bodies yeah i mean they are not trying to win any points on style or grace so that really lets the agent optimize for just purely trying to achieve at school it doesn't matter if the arms are flailing around so you can see the problem with putting this onto a real robot yeah i can see how that would be a problem with the robots it's in the game of football that we start to see the different flavors of intelligence converge training agents to play football gives them physical skills like dribbling and passing but when combined with a reinforcement learning algorithm which rewards them for team play you start to see emerging the sort of cooperative ai we heard about in the previous episode the question is if physical and social intelligence can be developed in tandem in this way could physical intelligence provide a path all the way to agi it all depends on how you define agi doesn't it i've noticed this a lot yes maybe not immediately you know if you look at evolution it's a very long path to go from initial creatures to human beings then i think also it could be a very long path if we want to build an agi starting from first principles of learning to move a body but that is what we are looking at jan humplik also believes that it will be a long time before robotics takes us to a general form of intelligence if i ask somebody on the street what would they be impressed by the robot doing they would say something like well you know maybe cleaning my apartment and and if you start thinking about this problem you're like okay so it certainly needs to use vision it certainly needs to understand human language because you need to give it command it needs to understand what does it mean to clean the apartment and that's not trivial because cleaning doesn't mean destroying your furniture solving anything impressive like this essentially getting very close to agi but if embodied intelligence social intelligence and linguistic intelligence don't necessarily lead to agi on their own is there a single path that does well some deep mind researchers are convinced that there is and it's been staring us in the face this whole time when we say that reward is enough we're really arguing that all of the abilities of intelligence everything from perception to knowledge to social intelligence to language can be understood as a single process of trying to increase the rewards that that agent gets if this hypothesis was true it would mean that we only need to solve one problem in intelligence rather than a thousand different problems for each of the separate abilities that's next time on the deepmind podcast presented by me hannah fry and produced by dan hardoon at whistle down productions if you like what you've heard please do rate and review the podcast helps others who are also ai curious to find it same time next week\n"