AlphaGo & Deep Learning - Computerphile

# Machine Learning and AlphaGo: A Deep Dive into How Computers Learn to Play Go

## Introduction to Machine Learning and DeepMind

The world of technology has witnessed a remarkable breakthrough with the advent of machine learning, particularly in the realm of games. The company responsible for this groundbreaking achievement is DeepMind, a London-based startup acquired by Google before its success was widely recognized. DeepMind's innovative approach to artificial intelligence (AI) has revolutionized how we perceive machine learning, especially through their famous creation, AlphaGo.

Deep learning, a subset of machine learning, plays a pivotal role in the development of AlphaGo. This technique involves training neural networks to learn from vast amounts of data without being explicitly programmed. The primary focus of DeepMind's work is reinforcement learning, a method where an algorithm learns by performing tasks and receiving feedback on its actions.

Reinforcement learning operates under the principle of trial and error. Instead of being given specific instructions on how to perform tasks, the algorithm experiments with different strategies and improves based on the outcomes. This approach mirrors human learning processes, where individuals learn from their experiences and adapt their behaviors accordingly.

## Understanding Machine Learning: From Heuristics to Automation

Machine learning aims to automate tasks traditionally performed by humans. One such task is stock market investment, where traditional methods often rely on heuristics—rules of thumb derived from experience or observation. For instance, investors might use the average of the past ten values to predict future trends, applying this heuristic across numerous stocks and economic indicators.

However, manually applying these rules becomes impractical as the number of variables increases. The stock market, with its countless factors influencing prices, makes it nearly impossible for humans to evaluate all potential heuristics without computational assistance. This is where machine learning shines, as computers can efficiently analyze vast datasets to identify optimal strategies that might otherwise remain elusive.

## Reinforcement Learning: Teaching Machines to Learn

Reinforcement learning differs from traditional machine learning in its approach. Instead of providing direct instructions, the algorithm is given a task and evaluated based on performance. For example, in game playing, the algorithm receives feedback only on whether its moves were successful, without being told which specific moves were correct.

DeepMind employed this technique to develop AlphaGo. The algorithm was not programmed with predefined strategies or tactics but instead learned by engaging in numerous matches against itself. This process of trial and error allowed it to refine its parameters—variables within the algorithm that determine how it processes information and makes decisions—ultimately leading to a highly effective playing strategy.

## The Chess vs Go Dilemma: Why Go Is More Challenging

Chess and Go, while both strategic games, present different challenges for AI. Chess has a finite number of possible moves at any given point, allowing algorithms to evaluate all potential outcomes through brute-force computation. This method relies on extensive computational power but is limited by the complexity of the game's rules.

Go, on the other hand, presents an exponentially larger number of possible board configurations. The sheer volume of potential moves makes it impractical for AI to rely solely on brute-force methods. DeepMind's solution was to create a neural network capable of evaluating board positions and selecting optimal moves through reinforcement learning.

## Evolutionary Computing and Parameter Optimization

The process of optimizing AlphaGo's performance involved concepts from evolutionary computing, where algorithms "evolve" by iteratively improving their parameters based on feedback. By pitting different versions of the algorithm against each other, DeepMind identified which configurations led to better gameplay.

This iterative refinement is akin to natural evolution, where favorable traits are passed on to subsequent generations. In AlphaGo's case, the "traits" are the specific weights and biases within its neural network that influence decision-making.

## The Power of Deep Neural Networks

Deep learning models, like those used in AlphaGo, consist of multiple layers of interconnected nodes (neurons). Each layer processes information and passes it on to the next, enabling the model to recognize complex patterns. This hierarchical structure allows deep neural networks to handle intricate tasks that simpler models cannot.

The depth of these networks contributes significantly to their learning capacity. While a linear model might struggle with non-linear relationships in data, deep networks can capture such complexities through their layered architecture. This flexibility makes them highly effective for tasks like game playing, where patterns and strategies are often non-obvious.

## Historical Context: The Evolution of Machine Learning

Throughout history, claims about groundbreaking advancements in machine learning have often been met with skepticism. However, the success of AlphaGo has proven that these technologies can achieve extraordinary results when properly developed. DeepMind's work stands as a testament to the potential of reinforcement learning and deep neural networks.

## Conclusion: The Future of Machine Learning

As we look ahead, it's clear that machine learning will continue to transform various fields, from gaming to finance. AlphaGo exemplifies how powerful algorithms can achieve tasks once deemed impossible for machines. While there are still challenges to overcome, the progress made by DeepMind offers a glimpse into a future where AI can tackle even more complex problems with remarkable effectiveness.

"WEBVTTKind: captionsLanguage: enthe alphago has been using deep learning to make this happen than is up yes yes that's I mean the company that did that well now is Google right but is the deep mind company that was a start-up in London and Google acquired them at some point before actually anyone knew what they were doing right so they were smelling that with something nice right and they're the best things that they're doing deep learning with reinforcement learning right so reinforcement learning is the type of way you try to teach and deep learning is attracted type of algorithm that you that you use to try to learn that right so the reinforcement learning is exactly this thing of I'm not going to tell you exactly what they are what the output is I'm just going to tell you what is the task and I'm going to tell you whether you're doing the task well or bad and it's going to be your your job as an algorithm to find the best way of doing these tasks first idea that we have to understand is with what we're trying to do with it right which is basically machine learning is a is trying to automate automatically do some tasks that what we can do it by counting usually right we can do basically statistics but if we have a really lot of numbers we can end up doing some heuristics what we can come up with right and we might want to use computers to actually learn from these statistics in an optimal manner say the best way of finding these information out of this data that you have is following these rules right so the computer does this for you this is this is the basic concept of machine learning right one typical example is if you want to invest in a either an exchange market on the stock market something like that and then I guess the people that invest they have 13 rules right if you have last five weeks going on in the value then overall respect that this week will go down because it's been like reaching a maximum or this kind of idea right that you complete the medium of the last ten values and this gives you an idea of the real value and if it's above that it will go down and and so forth right but this is just basically heuristics that we think might work right and we can even through experience think that some are better than others and so forth right but we'll have like 50 values on the different stocks in the market and we have you know like the currency exchange and we have many other factors right how - how many rules can we find there how many heuristics right this is just almost infinite number so what you would like is ideally a way of choosing the best and the computer can has a capacity of checking them all right you can do that pen and paper you spend like five lifetimes doing it and maybe you know by the time you finish this no stock market but the computer does it everything for you very quickly right so the only thing is it will not do something better than you can do giving enough time but just do it so much quicker that you will never get the right so now we're beginning to see cases in which machine learning is actually doing something better than humans especially with very specific tasks and maybe better than humans that are not experts on that task right but historically machine learning was always worse than an expert human doing something right so one of the cases is this like now famously alphago right so they say this is one of the few cases in which machine learning has managed to beat to beat the an expert I mean like the world expert on the on one specific problem right and this mean this is all this talking about why this is different from chess right the main difference is that for chess you have a limited number of options right and once you move one piece right then you have I don't know how many options right but again a limited number so you can organize that in a tree and more or less check exhaustively you tell the program and this is what you want to do check all the options and just you know do what is best according to what I told you that is good right so this is just sheer computational power in order to check the options right this this tells you our computers are really powerful right but it doesn't say anything about the algorithm itself right with go it's totally different because there are so many options that you cannot really check them right a lot of people are repeating this fact that there's more go positions and particles in the universe the people in deep mind which is the people that did these have said that they really don't know what kind of tactic the computer is following right they didn't really hard code any tactic so what they did is they created this machine learning algorithm right that that would play go right and would play it terribly right it's just you know some random rules they didn't really have a clue how to implement that so they started at some random algorithm that would play go terribly right and the point is that they pitted this this this machine this algorithm against some other algorithm that would have different parameters right would play go differently right but you still wouldn't hard-code anything is would you some parameters that would define how you would play but you you really don't know what the meaning of this parameters are right and they would play against each other and at some point after 100 matches one would win right and you say okay this one is better and then you keep these parameters you change them somehow right and you keep on doing that for very very long time it's a bit like evolution there is a part of computer science that is called the evolutionary computing right so I don't want to this is this is a slightly different thing right but the concept is you have a parameter space right and this will tell you all the possible ways to play go according to you know your 13 machine learning algorithm right it will it will have certain possibilities that are specific to that algorithm but you have a lot of parameters in its parameter can take a real value so you have infinite possibilities and the question is how to find the best parameters to play go most machine learning algorithms they have you search for the optimal parameters in some other way this this was a specific of a type of problems that are called reinforcement learning where you you basically tell them what you want to do but you you are not sure which is the correct way of doing it right and that's why deep mind says we don't really know because they didn't gives the examples for the machine to imitate they said okay tried to learn by yourself right this is what you want to do what we want you to do just learn how to do it every time that the machine learning decides a move right so this is this is what the algorithm is learning we have this configuration what is my next move right you are not specifying the best move is this because no one knows right I mean the idea is that they beat the world champion so you know maybe now the algorithm was able to figure out which next move was better and better than than the world champion right so there's no golden standard to imitate there right the difference in essence I suppose I don't really understand the game go so that doesn't help much but suppose the biggest thing here is and feel free to kind of correct me but in chess you can brute-force it yeah yeah and in go you can't you thought beefer that is this very fair yeah I mean you still have to do it smartly to brute-force it for chess because you have to find a way to tell the computer whether a movie is a smart move or not down the line right but you still will check all the consequences of your move right but this is kind of a standard you put a bit of domain knowledge and you put brute force right the goal is this is the totally different game this is really fair assessment of it so basically there were two sets of algorithms working against each other and every time one did a bit better than the other they thought yeah something about that is better yes so that this there were two different set of parameters right of the same algorithm right so the decisions that would take would be different the algorithm was the same with a needle network and basically what you tell is the input is what is the configuration right and then you have a certain structure that will you know like decide on the move right but this is this structure is you need some parameters that this is what in the end the size right what so to see the difference for example you can you can see that you have a function that will take the input and decide on the on the output right this function you can decide that this is a linear function right oh so is just a line or or a plane right is it display the traditional like a regressing a line right and this is when you modify the parameters of the line this is this model is more or less the same you change the slope of the line you change the height of the line right and this gives you different lines right so this is exactly the same but instead of having a line it was like much more complex a function so that will be until you dimensions but there are just so many more dimensions yeah it was not only about the dimensions it's also about the way that the parameters are structured right and this is why we use the word deep on it right so a line is what what we call a shallow a structure of the other variables because you just have some inputs and you multiply by some values and you get the app right so the inner indifference with deep learning or well a deep deep algorithms in general is that you have certain input then you get some intermediate variables by doing the same is like if you fit a line and then you fit another line and another line and another line and then when you input some values this will give you 13 values right so you have like predictions right these predictions are not the ultimate thing that you need their intermediate values that I do at the same time are input to another layer of prediction right and you go doing like that right so the input to one layer is the output of the previous layer right and this hierarchical structure is a much more powerful in the sense of what kind of functions what kind of the flexibility of the function right you can model much more complex things with many less variables right and this is a decent magnificent trade-off that gives it a lot of power can go back and find historical examples endless historical examples of people claiming that something fantastic is right around the corner we should get a show okay and we did okay so that's a good start right we know our program worksthe alphago has been using deep learning to make this happen than is up yes yes that's I mean the company that did that well now is Google right but is the deep mind company that was a start-up in London and Google acquired them at some point before actually anyone knew what they were doing right so they were smelling that with something nice right and they're the best things that they're doing deep learning with reinforcement learning right so reinforcement learning is the type of way you try to teach and deep learning is attracted type of algorithm that you that you use to try to learn that right so the reinforcement learning is exactly this thing of I'm not going to tell you exactly what they are what the output is I'm just going to tell you what is the task and I'm going to tell you whether you're doing the task well or bad and it's going to be your your job as an algorithm to find the best way of doing these tasks first idea that we have to understand is with what we're trying to do with it right which is basically machine learning is a is trying to automate automatically do some tasks that what we can do it by counting usually right we can do basically statistics but if we have a really lot of numbers we can end up doing some heuristics what we can come up with right and we might want to use computers to actually learn from these statistics in an optimal manner say the best way of finding these information out of this data that you have is following these rules right so the computer does this for you this is this is the basic concept of machine learning right one typical example is if you want to invest in a either an exchange market on the stock market something like that and then I guess the people that invest they have 13 rules right if you have last five weeks going on in the value then overall respect that this week will go down because it's been like reaching a maximum or this kind of idea right that you complete the medium of the last ten values and this gives you an idea of the real value and if it's above that it will go down and and so forth right but this is just basically heuristics that we think might work right and we can even through experience think that some are better than others and so forth right but we'll have like 50 values on the different stocks in the market and we have you know like the currency exchange and we have many other factors right how - how many rules can we find there how many heuristics right this is just almost infinite number so what you would like is ideally a way of choosing the best and the computer can has a capacity of checking them all right you can do that pen and paper you spend like five lifetimes doing it and maybe you know by the time you finish this no stock market but the computer does it everything for you very quickly right so the only thing is it will not do something better than you can do giving enough time but just do it so much quicker that you will never get the right so now we're beginning to see cases in which machine learning is actually doing something better than humans especially with very specific tasks and maybe better than humans that are not experts on that task right but historically machine learning was always worse than an expert human doing something right so one of the cases is this like now famously alphago right so they say this is one of the few cases in which machine learning has managed to beat to beat the an expert I mean like the world expert on the on one specific problem right and this mean this is all this talking about why this is different from chess right the main difference is that for chess you have a limited number of options right and once you move one piece right then you have I don't know how many options right but again a limited number so you can organize that in a tree and more or less check exhaustively you tell the program and this is what you want to do check all the options and just you know do what is best according to what I told you that is good right so this is just sheer computational power in order to check the options right this this tells you our computers are really powerful right but it doesn't say anything about the algorithm itself right with go it's totally different because there are so many options that you cannot really check them right a lot of people are repeating this fact that there's more go positions and particles in the universe the people in deep mind which is the people that did these have said that they really don't know what kind of tactic the computer is following right they didn't really hard code any tactic so what they did is they created this machine learning algorithm right that that would play go right and would play it terribly right it's just you know some random rules they didn't really have a clue how to implement that so they started at some random algorithm that would play go terribly right and the point is that they pitted this this this machine this algorithm against some other algorithm that would have different parameters right would play go differently right but you still wouldn't hard-code anything is would you some parameters that would define how you would play but you you really don't know what the meaning of this parameters are right and they would play against each other and at some point after 100 matches one would win right and you say okay this one is better and then you keep these parameters you change them somehow right and you keep on doing that for very very long time it's a bit like evolution there is a part of computer science that is called the evolutionary computing right so I don't want to this is this is a slightly different thing right but the concept is you have a parameter space right and this will tell you all the possible ways to play go according to you know your 13 machine learning algorithm right it will it will have certain possibilities that are specific to that algorithm but you have a lot of parameters in its parameter can take a real value so you have infinite possibilities and the question is how to find the best parameters to play go most machine learning algorithms they have you search for the optimal parameters in some other way this this was a specific of a type of problems that are called reinforcement learning where you you basically tell them what you want to do but you you are not sure which is the correct way of doing it right and that's why deep mind says we don't really know because they didn't gives the examples for the machine to imitate they said okay tried to learn by yourself right this is what you want to do what we want you to do just learn how to do it every time that the machine learning decides a move right so this is this is what the algorithm is learning we have this configuration what is my next move right you are not specifying the best move is this because no one knows right I mean the idea is that they beat the world champion so you know maybe now the algorithm was able to figure out which next move was better and better than than the world champion right so there's no golden standard to imitate there right the difference in essence I suppose I don't really understand the game go so that doesn't help much but suppose the biggest thing here is and feel free to kind of correct me but in chess you can brute-force it yeah yeah and in go you can't you thought beefer that is this very fair yeah I mean you still have to do it smartly to brute-force it for chess because you have to find a way to tell the computer whether a movie is a smart move or not down the line right but you still will check all the consequences of your move right but this is kind of a standard you put a bit of domain knowledge and you put brute force right the goal is this is the totally different game this is really fair assessment of it so basically there were two sets of algorithms working against each other and every time one did a bit better than the other they thought yeah something about that is better yes so that this there were two different set of parameters right of the same algorithm right so the decisions that would take would be different the algorithm was the same with a needle network and basically what you tell is the input is what is the configuration right and then you have a certain structure that will you know like decide on the move right but this is this structure is you need some parameters that this is what in the end the size right what so to see the difference for example you can you can see that you have a function that will take the input and decide on the on the output right this function you can decide that this is a linear function right oh so is just a line or or a plane right is it display the traditional like a regressing a line right and this is when you modify the parameters of the line this is this model is more or less the same you change the slope of the line you change the height of the line right and this gives you different lines right so this is exactly the same but instead of having a line it was like much more complex a function so that will be until you dimensions but there are just so many more dimensions yeah it was not only about the dimensions it's also about the way that the parameters are structured right and this is why we use the word deep on it right so a line is what what we call a shallow a structure of the other variables because you just have some inputs and you multiply by some values and you get the app right so the inner indifference with deep learning or well a deep deep algorithms in general is that you have certain input then you get some intermediate variables by doing the same is like if you fit a line and then you fit another line and another line and another line and then when you input some values this will give you 13 values right so you have like predictions right these predictions are not the ultimate thing that you need their intermediate values that I do at the same time are input to another layer of prediction right and you go doing like that right so the input to one layer is the output of the previous layer right and this hierarchical structure is a much more powerful in the sense of what kind of functions what kind of the flexibility of the function right you can model much more complex things with many less variables right and this is a decent magnificent trade-off that gives it a lot of power can go back and find historical examples endless historical examples of people claiming that something fantastic is right around the corner we should get a show okay and we did okay so that's a good start right we know our program works\n"