Using Python to build an AI to play and win SNES StreetFighter II with machine learning

**Title: Revolutionizing Gaming: How Adam Fletcher and Jonathan Mortensen Used AI to Create an Unforgettable Street Fighter 2 Tournament**

---

**Introduction to Adam Fletcher and Jonathan Mortensen**

Adam Fletcher and Jonathan Mortensen, the dynamic duo behind Gyroscope Software, are innovators in the world of developer tools powered by artificial intelligence. Their journey began with a challenge: creating something engaging for the Samsung Developer Conference that would captivate an audience typically bored by traditional tech demos.

**The Invitation to the Samsung Developer Conference**

Gyroscope Software, known for its developer tools, received an invitation to showcase their work at the Samsung Developer Conference. The team faced a dilemma: how to make their booth stand out. Instead of offering t-shirts and looping videos, they envisioned something extraordinary—a real-time AI versus AI Street Fighter 2 tournament.

**The Decision to Create an AI-Powered Street Fighter**

Adam and Jonathan decided to leverage their expertise in AI to create an AI-powered Street Fighter 2 bot. This decision marked the beginning of a groundbreaking project that combined gaming, machine learning, and real-time strategy.

**Technologies Employed: Gym, KerasRL, BizHawk**

The duo utilized several key technologies:

- **OpenAI Gym**: Provided a framework for creating a retro gaming environment.

- **KerasRL**: Enabled reinforcement learning to train their AI agent.

- **BizHawk Emulator**: Allowed programmatic control of the Super Nintendo emulator.

These tools were crucial in setting up the game environment and training the AI.

**The Environment Setup**

Using Gym, they created an environment for Street Fighter 2 that replicated a human's view of the game. The observation space included health bars, timers, and character movements. They extracted these values directly from the game memory, ensuring an authentic experience without cheating.

**Reward Functions and Training Process**

A reward function was essential to guide the AI towards winning. Initially experimenting with various rewards like win/loss and health differences, they settled on measuring health deltas. This provided a continuous reward signal, crucial for efficient training.

**Controller Development Using Python**

The team developed a controller using Python's standard library, leveraging TCP sockets for communication between the emulator and AI. This setup allowed them to control the game and send actions efficiently.

**Results of the Tournament**

After extensive training, their AI achieved an 80% win rate against the Super Nintendo CPU on hard difficulty. The tournament at the conference featured a bracket format with characters like Ryu and Sagat, showcasing the AI's prowess.

**Audience Reaction and Booth Success**

The booth was a hit, drawing crowds eager to watch the AI in action. A highlight video demonstrated the AI dodging attacks, blocking, and executing special moves, leaving onlookers amazed.

**Backgrounds of Adam and Jonathan**

Adam, with a background in business and software reliability engineering, brought practical experience. Jonathan, a PhD holder from Stanford, contributed his expertise in machine learning. Their diverse skills were instrumental in the project's success.

**Future Implications and Potential for AI in Gaming**

Their work opened new possibilities for AI in gaming, suggesting that machines could master complex games through deep reinforcement learning. This has implications beyond gaming, impacting fields like robotics and autonomous systems.

**Q&A Session Insights**

During discussions, they highlighted challenges like hyperparameter tuning and the importance of understanding domain-specific knowledge. They also shared anecdotes about their journey, emphasizing the value of learning from failures in AI development.

**Conclusion**

Adam Fletcher and Jonathan Mortensen's project at the Samsung Developer Conference was a testament to the power of AI in gaming. Their innovative approach not only captivated the audience but also paved the way for future advancements in artificial intelligence. As they continue their work, their story inspires others to explore the potential of AI in diverse applications.

"WEBVTTKind: captionsLanguage: eneverybody let's give a warm pycon welcome to adam fletcher and jonathan mortensen thank you all right a little bit of technical devil let's uh all right great our slides sorry that they're not completely full screen that's okay so hi i'm adam fletcher and with me is jonathan martinson hi so we ran a company called gyroscope software and we made developer tools powered by ai and last year we got an invite to the samsung developer conference and we weren't really sure what to do about it right as a developer tool we wanted the press and we wanted the exposure from the conference but also as a developer tool our booth was destined to be super boring right like we'd give you a free t-shirt and maybe you'd watch like a looping video of our software in action and that's not good so we look for something more exciting one thing that is obviously more exciting than a conference booth is blood sport style underground street fighting starring ideally starring the muscles from brussels jean-claude van damme seeking revenge for the death of his brother that is i think you'll agree that is clearly more exciting than a conference booth but we are nerds and as you can see from our physique we are not jean-claude van damme yet it's common so instead we decided to see if we could train an ai to play street fighter 2 on the s-ness and then run an ai versus ai final four style tournament during the conference and today we're going to talk about how we used python to do just that so first let's start with the overview of what the technologies we use this is useful so uh super street fighter 2 turbo runs on the super nintendo but in order to gain the control and speed necessary to do good ai to to train the ai we couldn't use the actual nintendo hardware so we used an emulator called sns-9x and then we used an emulator environment that let us program the emulator from afar this is a tool called biz hawk that the tool assisted speedrun community builds right bizhack is written in c sharp and we had to add some code to that which we'll talk about in a second um and then we used python to use jim a package from openai and jim used some python code by the agent code that we wrote that uses krsrl and tensorflow and so we're just going to dive straight into these elements and i'll describe the results at the end and what follows is the story of how it actually happened in terms of code and data science and so if it's going to seem a little disjointed that's because we were on a voyage of discovery as well and that's the nature of the story first the gym package right so this makes sense as a starting point jim lets us create a representation in python about what's happening in the game and lets our agent interact with that representation we're using a technique called reinforcement learning and this is kind of the simplified diagram of what reinforcement learning is right in reinforcement learning you have a continuous cycle of reinforcement by having an agent examine the state of environment then take an action and seeing if the reward the result of that action is maximized we're going to talk more in depth about this but it's useful to kind of keep this in your head as we go and the openai gym project provides the environment portion of this and in gym the environment is what you're doing reinforcement learning on and open ai has actually built a number of good environments for retro gaming including for the atari games and i think they recently had a retro gaming competition that might still be going on but they did not provide an environment for street fighter 2. and another big thing that jim provides is a very clean abstraction that can be used for doing this sort of machine learning right and i'll talk about the key elements of this abstraction the action the observation reward and the two functions step and reset and we'll go into each one of these each one of these in some detail first interesting one is the observation that is the data that represents the world the agent will learn about and in our case it's pretty straightforward it's street fighter 2 just as a human would see it right it's everything a human knows it's the timer it's the health bars it's what your opponent is doing right and each environment needs you to define this observation it gives you gives you uh jim gives you tools for doing that and on the right-hand side is code you don't have to read you shouldn't care about is just basically look these are all the things we tracked this is all the dictionary of all the things we tracked that represent our observation right and there's a few ways to get this so you can get this in in some some papers you read on this sort of thing you'll get this from image recognition people will train uh image recognizers to see where our characters are and where points are we actually had the advantage of using an environment that gave us direct access to the memory of the game and so we actually just extracted these values out of memory but we didn't do anything that a human wouldn't do so we didn't we didn't cheat in any way shape or form we just you know you can see the health a human can see the health so we read the health right just a different way to get access to it and not always but in general the larger the observation space you have the more training that will be required and that that is important because training costs money and time so the next critical piece is the action and actions are the abstraction of what you can do to change the state right you send an action to an environment and the environment will create a new observation of the state after that action has occurred practically for us that's when you press a button on a controller an action is a button press on a controller and that changes the game right you jump you kick et cetera right thus our actions are button presses and the action space for street fighter is interesting it's not actually all of the possible combinations of buttons on an sns controller right but rather all the possible valid combinations of buttons eg you can't press left and right at the same time the emulators let you but we can't um and then we had to simplify this action space a little bit too because the the more because like the observation space the more actions you have available to you the longer the training time will take right and here is an example on the bottom of sort of how we defined our action space and a multi-discrete for those who want to know says like you can do just one of these and just one of those right so you can you can press up and a or up and nothing but you can't do up a and b fine after the action you change your observation you get a reward for that action and reward is what the agent will try to maximize in our case we spent a lot of time refining the reward and what it meant to be rewarded rewards are sort of where the actual interesting part of ml is so here's some code and again reading it is not that important but it just to tell you that the code for the various reward functions we tried is not much code and a lot of it is actually a lot like what a human would think about it's like did i win the game like you know naively you're playing street fighter against like your little sister and they're mashing buttons and beating you and you're sad and you're like i want to win right so you optimize for winning the game or maybe you know we in this case we wanted to optimize for maybe trying the health the net health change or you know the health difference right like these are different things that you might think about and and often you're you want a short-term reward because it's per observation not per game but you want to maximize this like long-term goal of winning the game right and we tried a number of these and what we ended up settling on after a lot of work is the simply the delta of health between the player and their opponent at each frame right we wanted to reward the agent for keeping that gap high and so here's a plot of reward over episode and episode in this game in this case is a game there's 3 000 games on this plot right so you can see that speed of training matters in fact one of the reasons we didn't use just win loss the binary win loss as a reward function is that it takes too long to train based on that reward finally there are two functions the gym environment step and reset and these are used we use these to control the emulator one step means kind of move forward to the next frame and that's where you send all your commands to the emulator right and then reset is how you reload the save state and so these functions can from the ai worlds to the emulators world right and jonathan's actually going to go into some detail about the ai all right so i'm going to talk a little bit about the agent side of things um so we have an environment thanks to jim and now we need an agent to maximize the reward from that environment so another way is we want agents to take in these observations as input and output actions that it believes are optimal so for this we use a package called kerasrl let's just talk about ksrl and the abstract briefly caresrl provides a simple framework for making agents um the code that i'm showing on the right here is essentially a random agent again you have to look at the exact specifics but it's super super nice interface um so what i'm showing here is something that does behaves randomly that's kind of like the base agent you would ever have and random is actually a really good test case um lets you make sure that you know all the pipeline is running and that you can perform better than random enzo keras gives you a set of pre-built agents a base class you know training testing logging plotting maybe um all those types of things um and the two functions that we really care about in this abstraction is the forward function so this says like if i give you an action i'm sorry i return an action given an observation right so the agent is essentially reading the observation space and does a forward and then the forward produces an action that it should be sent to the emulator to do and then the second thing is the backward function the backward essentially says hey what reward did i just get for taking that action and then how should i change my understanding of the world and update my model so that i can do a better job next time so the environment from jim works with keras in the following way essentially you create the street fighter environment right this is what we use jim to create and then you just create the random agent and then pass in the environment and then the most important line here is the agent.fitline this is effectively kicks off the main loop where the agent is taking observations and sending actions and then learning and getting better over time so we actually ended up using the dqn agent in the kerasrl library and so we talked pretty extensively about the environment i'm going to briefly go into what dq n is deep q agent um just so you're going to get a sense for that essentially i'm going to go into the details of what's happening inside agent.fit i'm less funnier i'm not as funny as you it's just not even okay anyhow um before we start talking about deep q learning let's start out with what q learning is uh the caveat here is that in this discussion i'm going to be focusing on the intuition and so there's a lot of bookkeeping and math that i'm going to leave out so let's drop this complex figure and stay at the intuitive level so i'm going to describe q learning essentially by way of an example this is a 5x5 grid let's say it represents a game of going from room to room each edge here is a doorway to another room and when i'm in one room i have a set of actions between one and four or two and four actions that allow me to go to another room so suppose you enter a room and then you get you find cash in that room and that's your reward so if we map it all back right in the reinforcement learning paradigm adam talked about the location on the grid is my observation the xy coordinate moving from run one room to an adjacent room essentially opening the door is your set your action and then the cache is your reward so the arrows here then represent essentially a path of actions taken and reward c for that action and the key key thing that q learning does essentially the the q learning part before we talk about the deep part is that it keeps a lookup table of essentially given every state that i'm in and every action that i took what was the reward that i received so then when i get back to that same state later or the same observation later i can go to the lookup table pull out the answer and say like oh i'm going to take this one because that will give me the most reward so that kind of maps to the three lines of code i put down here is essentially i select my best action r is this lookup table i'm talking about um i send that action to the environment the environment turns a new reward and the most important component is then i update that lookup table with the state and the action with the new reward value so sometimes the reward value changes over time and so you want to keep that knowledge in this lookup table so the big takeaway is that to get to complete that entire lookup table i have to explore the entire space not just this one path but every possible path through the whole thing uh and so that's essentially brute force we have to explore all possible actions in all possible states uh and so that's a problem uh brute force is bad generally although i mean whatever gets the job done but q learning is basically brute force uh and so it's great if we have an infinite amount of time or if we have tiny problems but that's usually not the case and that's definitely not the case here in street fighter instead of having just 25 states we have 10 to the 25 states and 24 actions it's essentially you just multiply all the possible states in that huge dick that adam showed earlier and so it's effectively infinite so brute force is too costly to do this so deep learning essentially or essentially yeah i'd say deep learning allows us to overcome that huge state base so the twist that deep cue learning lets you do is that instead of trying all possible actions and figuring out the reward for each action we try and predict the reward even if we haven't seen it and we do that prediction using a neural network so for example say i decided to go on this path starting in the lower right and just go straight up i might predict that going straight up yields a reward of two that might not actually be the case but i can predict that without not ac without having to actually do it and so another way to think about it is that again at the intuitive level the neural network is interpolating and extrapolating using simple math rather than just being the look being a lookup table so that's essentially what happens we replace the neural net the lookup table with a neural network and the neural network gives back actions that should be taken and then learns over time based on what actually happens so in this case if i went up and i got a zero then i tell the neural network hey i got a zero you should do a better job interpolating extrapolating so that's a brief digression into the deep cue learning approach and how the agent works the last key part of the architecture is the controller so this is the thing that translates actions to the environment to actual commands that the game manipulator runs and adam's going to talk about that thanks jonathan great so first of all look how great is the python 3 standard library like come on it's really great right like we we had to come up with a system for communicating back and forth between this emulator and what we just found was like okay we'll just use tcp socket server and that just worked and then we're like okay well we need some state management some thread management the threading module that just worked and then the signa the primitives inside the threading module for signaling between threads also just worked right this is great so you're not meant to read this code but it's more a sense of how much code we actually had to write just to communicate back and forth and is not very much and it controls it contains the entirety of our protocol between the emulator and our ai right and we just sent json back and forth between the emulator i mean it's it's really nice right and that's one of the beautiful things about python this was super fast to do and then so when did the command go right so we sent these json commands somewhere well we sent them to a thing called the biz hawk and that's what controlled the emulator right and biz hawk we had to add support for tcp sockets to biz hack it didn't have that and then we had to write the code needed to read from street fighter memory we needed to write the code to push the buttons we got from json that the python sent over also we had to learn c sharp which we didn't know before we did this i got to say c sharp pretty nice language i was i was surprised at how easy this was all to do and i have to thank the bit sock developers for the just general cleanliness and organization of their code um we have open source the uh the code we used in here it's i just don't go read it unless you're gonna make a pull request we did it fast um and so the limitations of the emulator code actually meant that we had to try a bunch of techniques before settling on the socket code we just showed you so biz hawk actually contains a lua interpreter in it which is great but the lua interpreter they have doesn't support network i o it does support file i o and sqlite so we did try to shove all this together via like writing to a file reading from a file or writing to a database row reading from it that's a terrible idea don't do that um it was also very slow um it we were able to run the emulator basically at 60 frames a second using file io 120 using sql lite and then finally when we switched to python socket server we got 450 frames per second um in while we could run three training of training uh we could do three three games training at a time on a windows machine in google cloud this actually saved us a ton of money so because we're training in the cloud right that's money and time performance is a huge feature right and python was able to get a significant performance gains now great we have 450 frames per second we have a fast performing system we had a good architecture but now we actually have to train like we have to do that part of ai where you teach it things here's a tiny snippet and i cannot emphasize the tininess of it of the results of the actual training we spent hundreds of cloud compute hours and human hours tweaking all of these parameters it's just like a matrix of all the parameters we messed with uh and and to figure out what works well like deep learning is great but it is also sometimes alchemy right which is unfortunate and we tried to you know we tweaked the neural network parameters we treat the environment we tweaked the number of games tweaked feature representation training parameters etc etc we tried to be relatively scientific about it changing one thing at time but in the end you know we were wrong about pretty much every assumption we made to start right it's fine that's what this is how it goes but we did build a system that let us iterate incredibly fast on being wrong and that actually was a really helpful thing to have done so about three days before the conference to our relief we finally got an agent trained that did well and you'll see on the screen if the video loads we get to watch some awesome street fighter stuff so this is after 3000 games we have an 80 win rate the ai is on the left it's dalson and what you'll see here is some interesting stuff dodging blocking special moves uh backing away you know avoiding damage winning the game with the fireball it's pretty dope right it's it's pretty cool we were very excited when that happened and now here's our non-boring booth with all of our homemade signs look we were a startup right the laptop actually that you see here is actually training agents uh the display is showing the training of four agents at a time so people can watch it happen the way we initially trained our agents actually against the super nintendo cpu and three star difficulty um and then later we trained them against each other at the conference we ran a final four style tournament like i said and to see which bots we trained was best we had a competition inside this if you if you guessed which which agent was going to win you could win a super nintendo classic um for those super street voter fans out there you'll notice that m bison is not on this bracket that's because he's op he's totally overpowered right and our ai figured this out really fast like it was like it's like right away and then we watched the first tournament we're like oh this is just not good like it was just a clear sweep and uh so we banned him from the second tournament we ran uh due to performance enhancing drugs aka his code so the second and the second tournament we ran actually ended up going to sagat and if you want to know who the best character plays well if you are an ai uh it's m bison i don't know if you're human i'm i am terrible at this game um and i have to say like for those of you who might have to do this sort of thing in the future people loved this idea right they loved watching all these matches this little kid over on the right hand kind with his like hand over his heart like he was like crying when he lost when his character lost in the tournament he we were so sad but he loved every minute he would just come and watch the booth when we weren't even having the tournament just to see the characters fighting we had about 80 people just kind of screaming and yelling during the whole tournament i mean it's a great way to do it and so much better than a t-shirt and like a video of our software there's a lot to love about python in artificial intelligence and machine learning i mean there's lots about python in general we know the the iteration speed the battery's included in nature of it right the standard library good abstractions like we're all here for that reason but keras jim tensorflow scipy these are world-class tools that are easy to use they're free they're community supported it's amazing right it's great and it's actually not just python that we love it's programming in general and one thing i in particular love about programming is sometimes i make really hilarious mistakes right so this is our this is a chart of our ai win rate playing against a literally motionless opponent right so it's we start out really strong like we win a bunch of the first couple hundred games but then how do we lose i don't even know how we end up losing i think we just ran into the opponent and we died over and over again right but but but actually it's a great outcome right it's scientific it's an important one and we learned from it and it and it was pretty funny and we all learned sort of from funny failures like this and so like i want to help new and experienced programmers understand that it's okay and normal to make mistakes like this so i'm asking you guys some of you experienced people or the new people to tweet out your funniest programming fails right like with hashtag pycon 2018 and then hashtag reward function this is my joke on us giving you a prize for this so uh and by the end of the weekend like the person with the most retweets we'll find we'll we'll track you down and send you a super nintendo classic because i think that'd be pretty fun you can tweet we're at blue voyage but just tweet about it and uh tell us about the times you've made mistakes make sure they're yours not someone else's make sure they're educational and funny right because we want to teach people it's fine like yeah you're going to screw it up and that's the best part about programming it's completely safe and you get to learn that way so thank you if you have any questions we hope our story was interesting in fact uh we have another one so the company we were talking about gyroscope we got acquired by an amazing security firm called blue voyant i would be remiss to say we are hiring if i didn't say that um come talk to us about this come talk to us ask us questions right after during this talk and then we're around all weekend you can come find us at lunch tomorrow if you want to talk about ai in general and yeah thank you very much we're happy to take questions thank you so much guys uh we've got a few minutes for questions so please come to the microphone if you've got one there's one frightened front and center here i would ask you this time only for questions directed at the speakers if you want to talk about anything longer have specific add-on comments feel free to catch them in the hall after the talk thank you hey quick question uh when you guys made all this did you guys account for like say peak human performance in peak human reactions like in a 60 fps game i understand it to be like a 15 frame 15 frame delay between an action and someone being able to react to that great question and the answer is yes we did so the ai actually only acts every 20 frames um because uh actually for for a number of reasons one is that the training would have taken too long if we acted every frame so that's important but also against humans it's it's pretty much only fair to fair to do that this would just i think over time if we just train this it would just destroy every human like it just it wouldn't even be close if we if we enabled like every frame that's a good question so so the question i had was about like a time delayed attack you know so if ryu like throws a fireball from the other side of the screen and then coincidentally ducks you know when it hits the opponent if you're only just considering that one frame could your ai not learn that you know ducking equals damage to the opponent uh so the the essentially the rewards and the failures are propagated back um so you there's a it's kind of like an exponential curve but yeah it will eventually figure out that like this this observation that it made like 10 frames in the future will then like suffer the actual consequence okay thanks yep all right uh do they learn special moves on their own or did you intentionally do anything with them they learn special moves on their own so they just have the key presses right and they like they have every 20 frames they can insert like a key press pair essentially one joy pad and one button and so they actually like that that um move that i shot at the end just started with like key presses and you figured out how to shoot the fireball thank you hey two cool questions uh did you if you considered hurt boxes or hitboxes in in the machine learning no okay and then second have you considered challenging any professional players with your bot yes uh let me let me answer that question uh so the we were able to know the x and y coordinate of every player uh and so that was like a way for them to figure out how close they were to each other and when they should be doing moves or not um and i think they also knew what who their opponent was so they're kind of indirectly learning where the hitbox is um and then we yeah we've actually talked to one of the top five players after we did this and had a fun blog post um he wanted to uh compete against it so we need to set up a tournament was it tiger there's actually a whole another funny story about potentially doing a marketing campaign for l'oreal by making an ai bot anyhow again come talk to us okay company come talk to us that doesn't do anything for me how much did the training cost like could i do it on my own computer yeah sure so actually we did a bunch of it on our own computer with a we we initially didn't even do it with gpu optimized we just did it on like a local windows machine um which it was a fairly powerful gaming machine but we actually didn't even use the gpu in the end we did use google cloud compute so that we could have lots of computers going to train all the agents all in parallel but you could do the training on on your own computer and it wouldn't cost you cost that much we really just paid so that we could do it in parallel you know if we had like 40 windows machines at home we would have done it that way too and how much did the cloud cost i mean i think we use the free subscription yeah so there's the there's you know there's the first you know 720 hours free or whatever um and then adam used to work at google so maybe we got a little extra uh but they're essentially each game took about three hours to train at 450 frames per second and then you could train essentially three or four games in parallel on the same computer um so thanks um i'm i'd like to ask about uh sort of what your your guys's backgrounds are like like your academic backgrounds because i've noticed that a lot of machine learning and ai jobs seem to require phds okay so i'll answer it first um and i will answer it for him too uh so my background is uh i actually have a business degree um and uh but i was a site reliability engineer at google for many years um and ran very large scale systems and i software engineer for a long long time so um and and still do some of those things when i'm not in meetings um and however jonathan mortensen over here is better known as doctor jonathan mortensen and he has a phd from stanford um he never talked about it but you should when you see him tomorrow or ask him questions address him as doctor he really likes that i i don't think that you need phd to be my phd was like in biomedicine doing like you know ehr analysis so like you don't need it just to be clear i don't think that there's any gate there one interesting thing that happened to us we got an email from some students in pakistan who used our code to do this as their one of their undergraduate projects they just kind of redid our work and um and they you know they don't have phds though but they were able to figure out different techniques and different reward functions and things like that cool thank you hi there so one of the things that's been taking like the retro gaming community by storm is randomized games so things like super mario brothers 3 randomizer link to the past randomizer so what i'm wondering is how might ai adapt to games where some of the underlying parts of it is randomized but the rules remain the same i think it's a really a question about how well characterized the observations are and so what can what can happen when you're doing machine learning right is like you kind of learn you overfit right you kind of learn exactly how the game plays like even if you know the rules that you're like oh i saw this frame and i know that if i see this room i just have to do this thing next instead of actually interpreting what's going on in the frame but i think if you have a good representation of that state space i think uh it should as long as the rules are consistent i should learn thank you i have a question about uh how you chose the hyper parameters you showed us that spreadsheet it sounded like there was a lot of hand tuning to kind of get the magic combination just generally what are your thoughts on on i guess algorithmic or more automated ways of finding hyper parameters whether it's genetic algorithms or something else and and and generally speaking when do you feel those types of methods are going to be successful versus when you kind of have to tweak numbers by hand that's a complicated question uh i think yes you want to use some heuristics to do essentially a grid search you can't do often times you can do a full brute force grid search but using some heuristics which say oh if i make this tweak the performance goes up i mean essentially it's not really maybe sometimes parameter optimization is a convex problem but it's usually not so you got to kind of explore the space hand tuning there's still a place for that and there was a place here which was that you know one of the things that we were tuning was frame skip and that required some intuition about how you play the game and making sure that that lines up all together so i mean i think that machine learning isn't a silver bullet you still need to understand the domain that you're trying to model and behave in and so there's a place for both um but using grid search is almost great as well i mean if you can do things in compute that's cheaper than like your time then you should do it that way and if you can also add some of your intuitions you should too so it's kind of a non-answer answer yes thank you that's all the time we have for questions now if you'd like to meet with the speakers in the hall afterwards feel free to do so thank you very much everyone appreciate it you\n"

Using Python to build an AI to play and win SNES StreetFighter II with machine learning

Random Videos