AI Gridworlds - Computerphile

**The Limitations of Current AI Systems**

Current AI systems are often faced with situations that are different from what they were trained for, resulting in failures such as distributional shift. This occurs when the environment that the agent is in is significantly different from the environment it was trained in. For example, a self-driving car may be trained to navigate through one type of road but then tested on another type. The system's lack of ability to adapt to these changes leads to poor performance and safety issues.

One major problem with current AI systems is their inability to recognize when they are operating outside of their training data. They tend to apply the rules they have learned in a straightforward manner, without adjusting their confidence levels or seeking help when necessary. This results in "screw-ups" that can be costly and even lead to accidents.

To address these issues, researchers are working on developing safer AI systems. One approach is to use safety parameters that the system must adhere to, such as not stepping off a cliff while learning to drive a self-driving car. However, this requires developing algorithms that can explore the space of possibilities in a safe way, without ever actually engaging in unsafe behavior.

**The Challenge of Safe Exploration**

Developing AI systems that can safely explore new environments is a significant challenge. Current machine learning algorithms are not well-suited for this task, as they tend to rely on data and reward functions that prioritize efficiency over safety. In reinforcement learning, the agent's goal is to maximize its own reward function, which often leads to poor performance in safety-critical tasks.

To establish baselines for safe AI systems, researchers have created a set of environments that simulate real-world problems, such as driving or navigation. They then apply existing reinforcement learning algorithms to these environments and evaluate their performance using safety metrics. The results show that current systems are not reliable and can be easily tricked into behaving unsafely.

**The Importance of Safety Metrics**

Developing safety metrics is crucial for evaluating the performance of AI systems in safety-critical tasks. These metrics must capture the system's ability to behave safely, rather than just its ability to optimize a reward function. In reinforcement learning, safety metrics are typically separate from the agent's reward function and are used to evaluate its performance outside of the training environment.

The challenge of developing safe AI systems is also reflected in the design of Wix code, an integrated development environment (IDE) that allows users to manage their data and create web applications with advanced functionality. While Wix code provides a range of tools and resources for building websites safely, it is still vulnerable to errors and security breaches if not used carefully.

**The Power of Integrated Development Environments**

Integrated development environments like Wix code provide a powerful way to build websites with safety in mind. By integrating data management and backend functionality into the same platform, developers can create applications that are both safe and efficient. This approach also allows users to access advanced tools and resources that would otherwise be unavailable.

In conclusion, current AI systems face significant challenges when it comes to operating safely and adaptively outside of their training data. Developing safer AI systems requires a deep understanding of the limitations of current algorithms and the need for safety metrics that capture the system's ability to behave safely. By leveraging integrated development environments like Wix code, developers can build websites with advanced functionality while ensuring the security and reliability of their applications.

**Using Wix Code**

Wix code is an IDE that allows users to manage their data and create web applications with advanced functionality. One of the key features of Wix code is its ability to integrate with Wix's range of tools and layouts, allowing users to build websites quickly and efficiently. Additionally, Wix code provides access to backend functions and APIs, enabling developers to create complex and secure applications.

To get started with Wix code, simply click on the link in the description or visit Wix.com. With its powerful features and intuitive interface, Wix code is the perfect tool for building websites that are both safe and efficient.

"WEBVTTKind: captionsLanguage: enSo today I thought we could talk about this paper that recently came out called AI safety grid world's which is an indeed mindIt's an example of something that you see quite often in scienceA sort of a shared data set or a shared environment or a shared problem if you imagine. I don't know you've gotFacebook comes up with some image classificationalgorithm and they can publish a paper that says we'vedesigned this algorithm and we've trained it on our 11 billion photos and it works really well and then you know, Google saysoh, no, our algorithm actually works better and we've trained it on all of our google photos andIts classification rate is higher or something. You're not really doing science there because they're trained on completely different datasetsThey're tested on different datasets. So what you need is a largeHigh-quality shared data set then. Everybody can run their stuff on so that you're actuallyComparing like with like so people use imagenet for that right nowreinforcement learningalgorithms or agents don't useDatasets exactly. They have an environment. They generate data while interacting with that environment and that's what they learn fromSo the thing you share is the environment when deepmind did their dqn staff a while ago playing atari games?They released all of those games with any modifications that they'd made to make theminterface with the network's properly and the whole software package so that if anybody else wanted to have a go and see if they couldGet higher scores. They had all the same stuff and up until now there hasn't been anything like that for AI safetySo the paper is actually just laying out what they areThere's kind of a problem in AI safety in that you're trying to build architecturesWhich will be safe even with systems which are more powerful than the ones that we currently have. So you've got this kind ofThing like we're talking about for example this robot that makes you a cup of tea and running over the baby and all of thisstuff, we don't actually have ageneral-purpose robot like that right now that you could give an order to go and make your cup of tea and wouldHave all the necessary understanding of the world and so on for all of that stuff to even apply. It'sSpeculation on the other hand when we were talking about cooperative inverse reinforcement learningThat paper all takes place in this extremely simplifiedVersion in which all of the agents can be sort of expressed as simple mathematical expressions. That's kind of too simpleto beto learn things about actual machine learning applications andthe other examples are too complicated and what we need isExamples of the type of problems which can be tackled by current machine learningSystems current reinforcement learning agents, but which exhibit the important?characteristics that we need for safetySo what this paper does is it lays out a bunch of grid worlds?They're very popular in reinforcement learning because they're complicated enough to be interesting but simple enough to be actually tractableYou have a world that's sort of just laid out in a grid. Hang onLet me find an example here a little bit like computer gamescenarios MarioRight, right, but leaves are simpler than that more like snake. Well life. Conroy's life, right? Yeah. Yeah, very very similarso the thing is laid out on a grid the the world is quite small andThe way that the agent interacts with the world is very simple. They just move around itBasically, all they do is they say left-right up-downThe example we were using before and we were talking about reinforcement learningWe use pac-man like pac-man doesn't do anything except move around he's got walls he kind of moved throughHe's got like pills you pick up. They give you points. Are they pill?No, which things are the pills in which they're yeah. Well, you've got pills or pillsOh, right, yeahYeahthe dots and the point, is that all of yourengagement with itLike when you go over one of the power pills you pick it up automaticallyWhen you go over a ghost when you're powered upYou destroy it automatically you don't have to do anything apart from move and the entire environment is based on that the actions result inpoints for youAnd they also result in changes to the environment like once you roll over a dot you pick it up and it's not there anymoreYou've changed the world. That's the kind of thing. We're dealing with hereSo the idea is they've set up these environments and they've specified themPrecisely andThey've also put the whole thing on github, which is really niceso that's why that's why I wanted to draw people's attention to this because everyone whoWho thinks that they've solved one of these problems they reckonOh, yeahAll you have to do is this here is like a standardized thingAnd if you can make a thing that does it and does it properly and publish itThat's a great result, you know?so I would I would recommend everyone who thinks that theyHave a solution or an approach that they think is promising have a go. Try implementing it, you know, see what happensThere are eight of them specified in this paper. And so four of them are specification problemsThey're situations in which your reward function is misspecifiedFor example, like we talked about in previous videoif you give the thing the reward function that only talks about getting you a cup of tea andThere's something in the way like a bars. It's going to knock over. You didn't say that you cared about the barsIt's not in the reward function, but it is in what you care about. It's in your performance evaluation function for this machineSo anytime that those two are differentThen you've got a misspecified reward function and that can cause various different problems. The other ones are robustnessProblems, which is a different class of safety problem. They're just situations in which AI systems as they're currently designed often breakso for exampledistributional shift is what happens when the environment that the agent is in isDifferent in an important way from the environment it was trained inSo in this example, you have to navigate through this room with some lava and they train it in one roomAnd then they test it in a room where the lava is in a slightly different placeSo if you've just learned a path then you're gonna just hit the lava immediately. This happens all the time in machine learning anytime whereThe system is faced with a situation which is different from what it was trained forCurrent AI systems are really bad at spotting that they're in a new situation and adjusting their confidence levels or asking for help or anythingUsually they apply whatever rules they've learnedStraightforwardly to this different situation and screw up. So that's a night course of safety issues. SoThat's an example here or things like safe explorationIt's a problem where you have certain safetyparameters that the system the train systemHas to stick to like say you're training a self-driving car. A lot of the behavior that you're training in is safe behaviorBut then you also needthe system toobey those safety rules while you're training it right likeSo generally lately if you're doing self-driving cars, you don't just put the car on the road and tell it to learn how to driveSpecifically because we don't have algorithms that can explore the space of possibilitiesin a safe way that they're that they don't that they can learn howto behave in the environment without ever actuallyDoing any of the things that they're not supposed to do usually with these kinds of systemsthey have to do it and then get the negative reward andThen maybe do it like a hundred thousand more times to really cement that. That's what happensLike a child learning yeah, but kids are better at this thenHow current machine learning systems are they just they use data way more efficientlyThis is a paper talking about a set of worlds if you like people doing things in those worldsYeah, so in this paper they do establish baselinesBasically, they say here's what happens if we take some of our best current reinforcement learning agent, you knowalgorithms or designs or architecturesthey use rainbow and A to C andThey run them all nice on these problems and they have kind of graphs of how they do and generally it's notGood on the Leftthey haveThe reward function how well the agent does according to its own reward function and on the right there they have the actual safety performanceUsually in reinforcement learning. You have a reward functionWhich is what determines the reward that the agent gets and that's what the agent is trying to maximize in this caseThey have the reward function and they also have a safety performance function, which is a separate functionWhich the agent doesn't get to see and that's the thing that we're actually evaluatingSo if you look at something like the boat race as the system operatesIts learning and it gets better and better at getting more and more rewardbut worse atActually doing laps of the track and it's the same with pretty much all of these the current systems if you just apply them intheir default way theyDisable their off switches, they move the box in a way that they can't move it backThey behave differently if their supervisor is there or if then supervisor isn't there they fairly reliably do wrong thingIt's a nice easy baseline to beatBecause they're dead. They're just showing the standard algorithms applied to these problems in the standard waybehave unsafelyWix code is an IDE or integratedDevelopment environment that allows you to manage your data and create web apps with advanced functionalityI've been put together this computer for our website and if you go up to code here turn on and developer toolsyou can see how we get the site structure on the left hand side and then all of theComponents start to show their tags next to the text hereWhat's really nice? If you go over to the Wix code resources, you can find down here. There's a cheat sheetSo if I want to find out the tag for location for instance?If I could type I type inLocation up comes that or perhaps I want to perform a fetch. I can find all the details herewhat's powerful about Wix code is it's integrated into Wix so you can put together the website using all the Wix tools and theLayouts and the templates that they provide and then also have access to all those backend functionsSo click on the link in the description or go to Wix calm to get started on your website today. They gorightif onlyWith yaThe equivalent one for the stop button problem is the first one in the paper actually this safe interrupt ability\n"