The Benefit of Bottlenecks in Evolving Artificial Intelligence with David Ha - #535

**Exploring Collective Intelligence in Machine Learning**

In recent years, there has been significant progress in machine learning, with researchers and developers making strides in areas such as neural networks, deep learning, and reinforcement learning. However, despite these advances, machine learning still faces several challenges, including robustness, generalization, sample efficiency, and the ability to tackle complex problems. One area that holds promise for addressing these challenges is collective intelligence, a concept that refers to the emergent behavior of individual agents or units working together to achieve a common goal.

**The Sensory Neuron Paper**

A recent paper on "Sensory Neurons" has made significant contributions to this field. The authors of the paper proposed a novel approach to machine learning, where every input is processed by an identical neural network with its own hidden recurrent state. This creates an emergent property, where the individual agents learn to communicate via a tension mechanism, resulting in a policy that can be learned through self-organization.

The Sensory Neuron approach has several advantages over traditional machine learning methods. For example, it allows for robustness and generalization to unseen environments, as well as sample efficiency. In addition, the approach can be used to tackle complex problems by breaking them down into smaller, more manageable parts. This is achieved through a process of self-organization, where individual agents learn to communicate with each other to achieve a common goal.

**Advantages and Applications**

One of the most exciting applications of this approach is in the field of visual tasks, such as car racing games or pong. The authors of the paper demonstrated that their approach can be used to train policies that can generalize to new environments with different backgrounds. This was achieved through a process of self-organization, where individual agents learned to communicate with each other to achieve a common goal.

**Self-Organization and Collective Intelligence**

The Sensory Neuron approach is based on the idea of self-organization, where individual agents or units learn to communicate with each other to achieve a common goal. This is achieved through a process of tension mechanism learning, where individual agents learn to adjust their parameters to minimize the distance between their internal states.

Self-organization is a key concept in collective intelligence, and it refers to the emergent behavior that arises from the interaction of individual agents or units. In the context of machine learning, self-organization can be used to tackle complex problems by breaking them down into smaller, more manageable parts.

**Inspiration from Other Fields**

Collective intelligence is a field that draws inspiration from other areas of research, such as swarm computing, swarm optimization, and multi-agent systems. By exploring these fields, researchers can gain insights into how individual agents or units can be designed to work together to achieve a common goal.

For example, the concept of neurocellular automata has been proposed as a way to create self-organized systems that can learn from experience. This approach involves creating a system of cells that can interact with each other and adapt to changing environments.

**Future Research Directions**

The Sensory Neuron paper has opened up new avenues for research in collective intelligence, and there are several directions that this field is likely to explore in the future. One area that holds promise is the development of more sophisticated self-organization mechanisms, such as those inspired by neurocellular automata.

Another area that holds promise is the exploration of how collective intelligence can be used to tackle complex problems in machine learning. This includes areas such as robustness, generalization, sample efficiency, and the ability to tackle large-scale problems.

**Conclusion**

The Sensory Neuron paper has made significant contributions to the field of collective intelligence, and it has opened up new avenues for research in this area. By exploring self-organization and collective intelligence, researchers can gain insights into how individual agents or units can be designed to work together to achieve a common goal.

As we move forward, it is likely that we will see significant advances in machine learning, driven by the development of more sophisticated self-organization mechanisms and the exploration of how collective intelligence can be used to tackle complex problems. The potential applications of this field are vast, and it holds great promise for advancing our understanding of machine learning and its ability to solve some of the world's most complex challenges.

"WEBVTTKind: captionsLanguage: enall right everyone i am here with david ha david is a research scientist at google brain david welcome to the swimwell ai podcast thanks for having me sam hey i'm really looking forward to diving into our conversation i've been a long time follower viewers on twitter uh and i definitely recommend folks check you out there at hard meru why don't we get started by having you share a little bit about your background and how you came to work in ai yeah sure uh it's kind of a weird background uh you know i i was originally you know studying control systems back in back in the day in university eventually for some one reason other than i entered i entered the finance industry and uh became a i became i started off as a like a quant on uh in on wall street actually and i tried working at banks and then eventually became uh worked on a trading desk as a trader and i spent around 10 years or so of my life in the derivatives trading at various different investment banks but you know it kind of things kind of got a bit old and try to learn different things and i was always interested in neural networks because they're they're always fascinating especially the biological inspired component and i started to to do some reading and learning by myself and you know one thing led to the other and you know around five years ago i was able to join google in one of their research residency programs and as a researcher so that i was able to to change careers and became a full-time ai researcher and so this is where i am now that's awesome that's awesome uh has the idea or the attraction the initial attraction to the biological inspiration has that held up for you do you uh do you feel like the the biological inspiration inspiration continues to inspire you or was it a letdown to find out that you know the neural networks and computers uh are not all that similar to the biological ones well i think to this day it still continues to inspire me you know and it drives some of my work but like we do have to recognize that modern deep learning or machine learning systems are very different than biological processes for one thing we can we can scale them up we have lots of electricity and compute power and you know the trend is actually you know having more and more compute resources and for machine learning and training to increasingly scale to larger models and larger data sets in larger environments and it's a bit different than biology because uh in in biological systems it's or like more like a biological intelligent life is more like you know coming from and evolving because we we have uh we have not because we have an abundance of resources but more like we have a lack of it and some i was i'm fascinated at how like evolution seems to select systems that are able to to always do more with less but in a way you know that's it's not just biological systems but also like the the creativity process as well like sometimes we from creative works is always like like uh you're able to to express more with less and uh you know i think the the good the interesting thing about you know being a researcher especially at google is you know yeah you do have a lot of resources so you get to see both ends of the spectrum right so on one hand you know you do get to see people who are really excited at scaling up the research research and making like very large large systems work on large data sets and on the other end of the spectrum you have people working on on theory or on like you know coming from like a theoretical physics backgrounds and and actually they may not even you know like do a lot of extensive computational modeling so it's it's like it's good to see a balance of this the spectrum of you know like a resource heavy stuff and also kind of like that the the things that concentrate on having very low resources uh and ultimately you know you need both you can have a large models and you have a small chips that run them with less power you alluded to this idea of constraints as playing a role in the way you think about machine learning systems can you elaborate a bit on that yeah i mean uh like getting back to the the idea of constraints we see in in nature it certainly plays a role in in like a shaping some of the research work that we've been doing like you know uh like in in nature for example like uh there are lots of examples of these like so-called bottlenecks that shaped our developments as a species like just looking at the fascinating way of how like uh our brain is wired and how how our consciousness is able to like process abstract thought like we have a language i'm talking to you in language even though we have a video feed and and also how we're able to convey concepts to each other like not just using languages but using like say drawings or like gestures like this right and then that's developed into languages stories and cultures so so like i guess like to me it's it's kind of debatable whether these uh these bottlenecks or constraints from our development is a requirement for intelligence to emerge but it's also not deniable that like our own intelligence is a reason is a result of these constraints like on one hand you know like maybe the argument is like just because we have constraints that led to us it doesn't mean we we cannot you know like have a the development of a general like a strong like ai needs to have such constraints so there are opposite views of the spectrum but for my for my research work it's more like a led by led by uh the idea of constraints and you you can you can see this from some some of the work i've done like even very earlier like a few years ago uh when when i started to get into things like generative models so like back then uh gans were really taking off in 2017 or 2016 they started to take off uh and uh you know at the beginning people were really excited at generating like c510 images 32 by 32 pixels but then they got bigger like 64 by 64 128 by 128 you know like a pictures of data sets of cd albums or something really cool and so there's all this exciting work going on and i had my i share of playing around with these uh these generative models as well so so a few few early pro early works i've done uh is to build a generative model for for amnest the simplest data set ever but you know rather than taking the approach of generating you know pixels directly i try to generate a parametrization of mnist which can be very very abstract in nature and that led to like an early work i combined some of the the models from another researcher ken stanley at the time who who created designed this network called uh cppns so uh if you're familiar with that it's more like you take in the pixels you take in the coordinates and it outputs a pixel so if you have a simple simple rule that can uh that can take in a coordinate and i'll put a pixel then you actually don't need to train the network to output the entire pixel you just train the network just simply give me the coordinate i'll give you the pixel value so so i train such a network to generate eminence digits so it's very elegant and and the end process is you can actually you know train it on an eminence data set 28 by 28 into this abstract generative model which i call the cppn vae or gans and then you can actually blow it up and generate you know like the mnist digits that are like 1000 by 1000 resolution back in uh 2016. so this this before uh people can do it now we can actually you know model gains on 1000 by 1000 resolution uh you know uh the data sets uh with our exponentially increasing hardware but i thought at the time you know uh it was kind of cool to be able to train again in 2016 you know right after the en goodfellows again paper came out for a few months and you're able to also produce 1 000 by 1 000 resolution images by skipping entirely the need to to produce such big images and the key is to abstract the principles of that image into into like a an abstract representation uh using using the cppns and later work i i kind of followed the same trend uh i looked at creating a generative model of doodles uh there's a model i don't know like you probably played around with it called a sketch rnn you can interactively draw something on the web browser and the model is like a language model it'll continue to to predict what you're what you're going to draw like a stroke by stroke in a vector format it's also like an auto encoder model as well so you can you can draw pick a full pig and then it you can compress it into lay in space and then redraw the pig out so now now is a very trivial in retrospect but a few years ago it was one of the uh the dif different models because uh most people are working on on gans and uh you know generative models on on pixels and that we're trying to do it on doodles and at the time the challenging thing was finding the data sets and luckily one of uh my colleague uh jonas john uh jonas at uh creative labs uh they they created a viral game called a quick draw that collected some of this this uh doodle data that we can use so i thought i thought that was kind of fun man like it's it's more for that project it was more like inspired by uh okay rather than trying to create a representation a minimalist representation maybe we can use machine learning to study how we humans ourselves do like a representation representation learning because of our own inductive biases you know uh we're forced to draw doodles maybe from the time we're cave people uh because we have a we have hands and we have sticks so we developed this type of drawing so maybe it's you know a good idea a cool idea to get machine learning models to to analyze how we develop this representation and that could lead to lead to other ideas and one of those uh the ideas after this sketch rnn paper was when i started to get into uh doing some work on reinforcement learning and uh and you know like uh what i thought was we have all these cool algorithms that can train agents to to perform tasks when the agents are fed pixels uh like the entire screen which was really amazing at the time you know at the time uh the dqn model from deepmind came out and agents started playing pong or like atari games entirely pixels i thought that was cool but but in a way i kind of uh think that could be information overload as well as most of the pixels are not useful when when you're playing uh like pong games uh or atari games so so i i what i try to do is like you know have enforce the type of constraints uh onto the the policy or the the controller so that it's not allowed to see to see the the full pixel information or the stream of pixels and it's only allowed to see a representation of its environments so that that work led to a model uh a paper called uh world models uh it's kind of an exploratory paper uh that i published a few years ago with jurgen schmidt uber and it was a really fun project so the the idea is we have a really simple generative model we use a variational auto encoder to simply compress all of the screens into a low dimensional latent space and we have another recurring neural network that simply predicts the the future maintenance space of the environment so if you have a vae that's trained on train on your your game uh that produces a low dimensional latent vector your rnn will predict your your future latent vectors that that like uh depending on the score in the future state of the game yes exactly exactly so uh that that was a fun project uh because we're able to use these two simple concepts to to build uh a neural simulator of of like uh games uh if we're able to collect enough data on it and and what what was fun about the project is we we can we show that we can just feed in uh the representations learned from such a model like the va latent vector and also the arduino's hidden state and feed it to an agent and like at the time uh this we showed that this hidden vector like the small this bottleneck allowed the agents to to discover policies much easier much more easier than compared to you know like having to see the entire uh pixel information is from an optimization standpoint it's easier to figure out what you have to do if you're only given like uh maybe 200 numbers and give me an action uh compared to if you're given like you know a million numbers every time instead give me an action right so because of that uh it was able to to like solve tasks like the the car racing game and open ai gym uh back then it was like considered a hard task now it's trivial but uh no but no one was able to get the required score and i'm sure if people tried hard enough they could but at the time this was the first approach that was able to to get the required score to for for that game you know which was i guess uh from from the machine learning research point of view it's it was considered state-of-the-art although i'm not sure if you tweak it any model enough you can also beat it but but uh apart from that with those kind of results have we seen um that idea of you know constraining your latent space um become generally used as part of state-of-the-art approaches in rl and in similar areas yeah yeah definitely so uh from that from that paper uh i think you know whether we got state-of-the-art or not it doesn't really matter for in general in machine learning papers because it will always be beaten by later on but but the idea on that paper of that you can you can train and you can learn a generative model and train an agent entirely inside of that model to produce a policy uh that that was the main idea that that seems to have uh taken off in subsequent works uh like like for example after the the role models paper there was another paper about uh about the model based learning for atari so and where they they literally called their algorithm simple and the idea is basically okay you would you would you have your agents collect data train a generative model of the environment to predict the future uh and then and then uh and then train your policy inside of that environment only right of course at the beginning you're not going to get a good policy but then it doesn't matter you deploy that policy up and you collect more data and then you would refine your your model and then you would redeploy it so so at when that work came out in 2019 at the time that was then the state of the art for for uh for sample efficiency for various atari games because uh simply because the learning uh took place in the model uh because a lot of the sample efficiency we if we notice when we run an rl algorithm is uh you know you have the data collection process but you're also learning in the environments and that that could there could be some slippage in in efficiency so if you're able to isolate the data collection from from policy learning and and your interactions with the environment is strictly for data collection and for evaluation policy of your policy and all of your learning is done in the model then then intuitively that will help the that would also help the data efficiency and another line of work done by my uh colleague danny jarrer hefner uh who who is also uh working at a google but based in toronto is is uh he he started using these latent-based role models and combined them with planning algorithms so like traditionally uh planning algorithms are really useful for robotics but at the same time they're kind of flaky as well like especially when like when it comes to when you're getting the like for example four video feeds uh of of sensory data uh maybe traditionally a lot of the the planning algorithms were used on like state observations so you you feed in like you know like a really well engineered uh measurements of your robot controller right but like how then how the key question is how do you get your your robot controller or your control system to work on video feeds right so so then i guess something like a world model or a latent base role model with uh with this latent bottleneck could be useful for planning algorithms because then you uh they're really good at working with low dimensional data so then you give it low dimensional data so the the key idea behind that line of work initially started by a model called a planet so it's kind of a really name is to you have a you have this kind of a latent latent spatial temporal role model that is is constantly updated as you get more data and you have a planner that would that would uh get that will figure out the optimal action within the model so so then you don't actually need to do any learning does this also help with generalization uh you'd think that a lower latent space model has gotten rid of some of the noise that the full the full world or environment contains and so the agent might be able to perform uh you know the agent performance might transfer from you know one specific environment to another better is that actually the case that that is still like an open question like uh if like for it for instance if we if i naively train a world model based on based on the data that we collect then like no it's it's not going to generalize to to variations like like uh an example is uh if if we change the background color of of your environments then yeah you're you're a vae or or your latent model has never seen that before and that generalization can only be done in via learning so that that that algorithm would need to collect the new data and relearn its world again so and and whether it can generalize or not will be a question of how like of how many shots how many time steps it has to generalize uh rather than a zero shot but that being said there is like a line of work on on looking at uh generalization problems within latent space models there there's a actually like a few challenges like uh uh like there's a deep mind robotic control have a variation where they explicitly you know introduce lots of distractions and changing the backgrounds and and then you can employ lots all sorts of uh like uh ways like rather than training in like a image-based latent space you can do contrastive learning there's a there's a lot of work doing that and so on uh but for me um around that time i also you know stepped back a bit there's all these approach i have the same question as you you know we can just do generalization yeah of course there's lots of different ways of doing these latent space models uh but i looked at um along with my my colleagues and my team we we started to to explore a bit and maybe latent space bottlenecks is one solution but it might not be the best solution for this generalization task so we looked at uh maybe another bottleneck we can use is is something like attention or or in our case we try to use our heart attention so like uh in in a what in a paper we published uh two years ago called a neural evolution of self-interpretable agents which is led by my colleague yujin tang rather than using using a latent space for to do this bottleneck the the idea is we will only allow an agent to to see 10 patches for instance of the screen and its decision is solely based on those 10 patches or however not however number of patches we want so it's kind of like you know biological vision uh with our fovea uh type system where we have to really you know like when when when we study how humans see things it's always like you know attending to to a bunch of points in front of us uh but somehow we have a mental uh understanding of what we're seeing but it's not like we're i'm getting like you know full-on hd resolution directly in my eyes i'm actually seeing a bunch of things so it's kind of inspired by and in this example or or paper is the were the patches were you trying to emulate like a visual field and the patches were kind of config contiguous in a particular arrangement or were they you know randomly distributed across the the image oh yeah for so for this work um it's part of the policy actually so so rather than having a randomized uh frame the the agent actually has to learn to decide uh first uh which 10 patches to choose right so like it's like it's like how when you're looking at me somehow you're deciding where to position your your eyeball on the screen so in the same way the agent has to to decide uh which 10 or it doesn't have to be 10 it could be one or two that'll still work but when we we do a sweep then we can be a bit more general and then choose five or ten so it's less like uh i was originally thinking it was along the lines of masking for kind of generalization or regularization this is more learning where to attend to within the image uh as a as a constraint yes exactly and it was it's also inspired by a line of work in psychology a few decades ago uh the whole concept of uh i'm sure you've heard about this this uh this in uh in attentive uh selective attention so this in attentional blindness so some sometimes uh our brains just don't see part of the screen or part of what we see so there was a psychology experiment done back then where the the subjects were asked to to look at a scene and the scene had two two teams of basketball players one wearing white shirts and two the second one wearing black shirts and i think the subjects have to count the number of times that's the ball was passed between the white shirt the players to the black shirt players nothing like that and then there's a gorilla working walking up in the background and most of the time the subjects were not able to to to see the gorilla because they're so focused on the ball and and the colors of of the what people are wearing so so that that kind of helped okay create some some uh analogies between okay we have this thing called whether we like it or not called intentional blindness what if we try to do something like that with an rl agent uh what are the pros and cons uh does it give it more abilities or does it actually deduct some of the abilities and that's that was what we're trying to explore and it turns out that's uh using this simple scheme we were also able to train some simple agents to do the same task as the world models paper like getting a pretty good score on the car racing game from pixels and playing the doom game but but like uh unlike the previous latent space models this model we found can easily adapt to to augmentations to the environment like like for instance if we in in doom in that doom these environments if we change the color of of the ground uh it will still work if we add like a little blob on the side of the tracks in the car racing game it will still work and the reason is those patches are likely not to be selected by the pre-trained agent so so it's just simply uh like uh it works because it's just not attending to the things that deemed not to be that relevant to some extent of course this is very like a naive way of like uh like approaching the problem because like in reality it's very nuanced like we do see a bit of it but it's kind of a simple model uh that that clearly uh demonstrates that in attentional blindness if we strictly enforce it in the context of uh nrl agents it will have these properties that because it's simply not allowed to see certain parts of the screen it may lack you throw away information but you also gain ability to to generalize to to changes in the environments yeah yeah and does it how does it compare from sample efficiency to the constrained latent space does it retain that advantage in some way for for this one no no because uh it actually takes a bit more time to train or to evolve the policy uh for to be able to perform the task but but the way i think about these issues is that there's a few dimensions uh you know you can you can work on optimizing the sample efficiency like maybe reducing an rl algorithm from you know 200 million time steps to 100 million time steps to achieve some score or you can think about sample efficiency in terms of zero shot transfer so one could argue that okay i spent all of this time figuring out the policy using a heart attention in this paper but if you give it an a new environment which is not the same as the original environment but one that has has uh some augmentations to it we can argue that that's a new environment and and how many times steps would it take to asians to adapt to that new environment and here the case is uh zero because it's doing a zero shot so that's also like another way of looking at uh sample efficiency as well not not on the training task but on the test that it has never seen in life kind of arguing for a global sample efficiency in a sense across multiple problems or versions of a problem exactly or in our case i think uh i'm really interested in in sample efficiency across unseen versions of the problem and that that's basically what what uh one of the goals of of ai is is of course given enough compute where we're gonna solve every known problem that is well defined but one of the things that distinguishes us from machines so far is like our ability to to solve problems quickly that that we have not seen before variations you mentioned earlier your work with ken stanley and you just mentioned uh this concept of evolution i spoke to him probably several years ago talking about his work in neuroevolution uh does were you using evolution loosely or have you also studied the these ideas of neural evolution and kind of evolutionary uh neural nets and machine learning well yeah yeah for sure for example the work that i just talked about uh the the neural evolution of in self-interpretive agents like we actually use the evolution or computational evolution algorithms to train the agents uh so rather than using like uh like reinforcement learning to train them so like uh in in general i kind of like some of the evolution algorithms uh because they're kind of like uh we can use them as a black box optimizer as well we we don't necessarily need everything to be nice and differentiable which is one of the key properties of many many domains right now so once once things are differentiable then you can put it into the machine and and you know your and your gradient based optimizer would get the solution but because heart attention is is kind of difficult to make uh it very differentiable or there are methods but it is challenging it's just easier sometimes to use evolution to solve these problems so one hand we i we do like to use uh evolution and specifically evolution strategies and genetic algorithms as a tool to to help us find solutions but we i did work on some research projects where we're also developing these evolution algorithms as well uh so like there there was a another paper with with these themes uh of constraints uh was done with with me and uh i was led to my former colleague and intern adam geyer in a paper called weight agnostic neural networks so the the the key concept in that paper is uh you know we we want to find the neural net network architectures that have a really strong inductive bias for certain like reinforcement learning or machine learning tasks and can we go to the extreme and find architectures that can work even without training weights so so usually when we you think of neural networks you think of having a neural network architecture okay and then let's run the optimization algorithm using sgd to find the ways but here we okay what can we still find the architecture that can still work when we don't train the weights like when the weights are chosen from a random distribution so that that one is when we when we looked at we essentially looked at doing architecture search uh where we want to optimize the performance of the architecture with a given weight distribution right so of course your architecture is not going to perform as well as when all the wastes were fine-tuned but this is still very useful because uh as you know neural architecture search is extremely computational intensive so you will have a batch of architectures and then you have to find the weights of all of these architectures and then you would go on to find the next set of architectures uh you use the results but here we can simply find the architectures and evaluate their performance on on random weights and and then we can find architectures that are have a have a very strong inductive or even like an innate bias for certain tasks so then uh the the intuition is like uh is kind of inspired by by the biology sometimes the organisms have some ability the moment they're born to to escape predators right so you can imagine like maybe uh you can you can have a bipedal walker controller that can already still walk forward when the weights are not trained but uh if that's the starting point then then training the waste will be a lot more efficient if you want to fine-tune the network later on so so that that's like uh to answer your your question earlier is like this is like one example of the work that i was involved in where we actually tried to extend and and improve upon uh architecture or neural evolution methods uh to find neural networks that that are where we're not just the end user of the evolution algorithm as a black box optimizer this paper was was apparently like a uh talked about in the neuroscience community a bit more than the machine learning community it was not so useful to the ml community like that we got our best score on mnist was like 92 percent or africa around there so it was like the the mnist performance is is horrible so it's not going to be so useful for for the embassy but the the papers still still got accepted at the nurbs conference you know and it got like a spotlight so it's probably one of those papers that where we we massively underperformed the state of the art with like 90 cents on eminence but somehow still got it kind of locked up on that one nice nice um i i tend to hear hear neuroevolution coming up most in the context of architecture search are there other areas where you see it being used yeah well like uh as i mentioned earlier we we use it a lot just for policy search um we also see it used quite often in in robotics as well like for example some of my colleagues in in the robotics team they they like to use uh simple evolution algorithms uh to to quickly find policies uh like the the one that is is used most often uh there's two that is really used often one one is a cmaes so that that is kind of like the default evolution strategies algorithm that people like to use as a black box optimizer the other one is called a i think it's called an augmented random search it's basically evolution is a form of random search and then this one is a it's a very simple random search algorithm that's directed in uh in a very simple way so so the robotics folks likes these simple approaches because they're they're they're explainable and they're intuitive so i see some people using them to find the policies on on like legged robots and then using them to to like uh uh control control these uh like mini tar robots that that they have in the lab uh we i but i use them a lot for in general like especially if when i have a i have a neural net without so many parameters like which is very common in rl like like unlike deep learning where you you have like you know 20 million parameter solutions in rl a lot of the the controllers like might might work when when we only have have like you know the 10 000 parameters or even 1000 parameters so so as it's like uh yeah like likes to say the rl is like a cherry on a cake right so so the trend is uh you have all these self supervised models that are trained with gradient descent with hundreds of millions of parameters but your actual policy network that could be using all of these things maybe perhaps via a world model and and those networks could might even just be a few thousand parameters and that can you know why bother using gradient descent to train them when uh like uh if we're able to use evolution to train them we can get away with doing things like non-differentiable environments and whatnot so so we we tend to like to use those as a baseline in that case in conjunction with uh with other rl methods and and approaches to policy or kind of um in isolation we usually like if we're able to get the solution we want then we can use them like uh in isolation okay i mean there there's some of my colleagues have been working on ways to to combine the reinforcement learning with evolution so what are you curious about yeah then evolution can kind of be the outer loop and the rl can be the inner loop but in a lot of uh the work where i'm simply using like a an end user of an optimizer then i i simply use it to to get me get me a set of weights or get me a set of parameters and and call it a day we talk about the sensory neuron paper oh no no no no it's it sounds like i mean it kind of fits right into this idea of uh constraints and um applying constraints to to make problem solving easier can you talk a little bit about that paper yeah sure so like uh some of the previous work i i discussed it's more like you know the constraint is more like a bottleneck like an information bottleneck and maybe you're doing more with less but it doesn't always have to be like that uh so in this paper uh we it was a really fun project uh also with my teammates eugene tang we we looked at the problem of uh what what if we we gave an agent uh an observation space that is shuffled around so like usually in a in these reinforcement learning environments or in machine learning in general you have to give a model a very well specified uh input data like like if you give it like the the the observation space of of a humanoid or an ants robots uh like every single input means something like maybe a torque or the velocity or the positions or maybe the the pixels on the screen this pixel corresponds to to this has to be this position so we we toyed around with the idea of what if we we can randomly shuffle the observations and the agent actually has to like figure out what each input like each sensory input means before you know deciding an action and if an agent is able to to solve a particular task or environments or or a machine learning problem from from shuffled observations we can also examine the the properties whether it has extra benefits uh that it has compared to agents that are otherwise trained the normal way of just getting getting the inputs so this is another type of constraint that i don't consider to be a bottleneck or information bottleneck because you're actually giving the agent the same in information i guess like the dimensionality of the information is the same but but here we we try to just shuffle the order uh and and surprisingly we're able to we're able to get it to work so like uh the the inspiration of this work originally came from some ideas on in a meta learning space because we we were essentially trying to to get an agent to adapt to changing environments when it's like the agent will get a shuffled and reshuffled screen and it has to re-adapt but also in in the neuroscience there's a the area called a sensory substitution and uh psychologists have to measure uh the human's ability to adapt to when when what our senses give us suddenly change like like there's this popular uh experiment done even a hundred years ago where you're wearing this uh you're wearing an upside down goggle i'm not sure you saw that so there's a there's a simple mirror glass in front of your eyes so what you're seeing is is completely flipped and what people notice is that it requires maybe maybe like 10 minutes or half an hour of readjusting and you're able to walk perfectly fine uh with with this flip sensory but once you take off the glass then you're messed up again for another half an hour or so so that that's kind of i guess one of the easier tasks uh there's a ted talk uh a few years ago where someone had a video of a of an inverted bicycle so this one is harder when when you turn left you actually go right and when you turn right you actually go left and they found this one really messed people up so you like i guess because uh riding up when you're riding a bicycle is is more like uh it's like a human invention i guess so it takes it takes a long time for people to to read that because you actually have to balance as well you're like a complicated control system constantly balancing yourself that one really mess people up i don't know if you've ever had the experience where your your screen controls get flipped i don't remember what caused it um but you know your you know trackpad right becomes left and up comes down and and vice versa and that can be you know infuriating it's very difficult to adjust to yeah exactly yeah especially like you know apple like whenever you use like apple products okay where sometimes uh your trackpad goes the other way around when you're on another person's trackpad or when they have new models of ibooks or macbook pros you have a you have a touch bar and you suddenly don't have a touch bar now now we're back to your days but hey there's another uh there's another uh neuroscientist uh paul bacharita who's kind of a pioneer of sensory substitution and and his claim to fame was uh he he had uh some he had uh people who were unfortunately blind they cannot like uh see uh uh just lack vision and back in the end of the sixties he had an experiment where uh he put a low dimensional video camera his analog back in the day and he he he fed some of those signals from the analog video camera into a lower dimensional 2d grid of pokes into the person's back i've heard about this yeah so yeah one of the the cool things is uh our our skin or our our touch senses is underutilized everywhere outside of our hands like maybe from evolution when we are hunter gatherers our skin was really important but in kind of modern times that we we wear you know like a clothing and we don't really use our touch senses so that that's another interesting topic but it's kind of getting sidelined but that for for this particular idea is that he poked a low dimensional reservation of uh image into the subject's back and within a few weeks or months people gain vision they were able to to see and understand things by sitting on this chair uh so so he showed that through through touches or through pokes on a set person's back that person can can learn to to interpret those signals as if that person was was seeing what's in front of the camera and in the in the late 90s and turn of the century there was a variation of this uh pro from this team where they they fed in a higher resolution video feed into a 2d grid of electrodes that was placed on a person's tongue so so from from the stimulation on the tongue the person's is able to the the subject is able to interpret uh what the video camera mounted on the subject's head is seeing and this was actually you know gained popularity uh so like people were able to to live their lives and like having a low dimensional vision system simply by learning to predicting um how the the sensory signals from the from their i guess from their tongue however these these are incredible it shows how how great we are but they require like months if not years of training uh to gain mastery so it's kind of like okay sure you can you can change you can switch around your inputs and retrain your machine learning model from scratch even uh and using the new inputs and of course yeah then then you can deal with these sensory substitutions so what the what we're trying to ask ourselves is can we actually get an algorithm to do this without training in in the traditional sense like without like retraining your model where so uh where the the the the agent is able to explicitly adapt to to these uh inputs so in the end even though this work is biologically inspired from on the problem side the solution we use has nothing to do with biology we're lucky enough to to build on the previous work that gave us to do the tools to work with uh permutation in varying networks and some of these works have been pioneered by by people working on the transformer paper the original transformer paper uses the linear attention which which predates transformer by a lot but those were shown to be per permutation equal variance so if if you change the input order the output order changes the same way but uh there's another paper that came out later called the set transformer which had a really cool trick on making one of the the query matrix uh constant and that converted this attention mechanism to be permutation invariant so suddenly you're able to feed in a signal of like a of any order and the output will be the same thing okay so it'll be like a it's a method to to take on a set of an unordered uh variable length set and and you're able to to get a get a permutation and variant representation of it so we played around with this idea and applied it to reinforcement learning problems so so whether you can uh you can feed in all of your signals whether those signals could be the states of a pi voltage like locomotion environments uh or it could be like all of the the tiles of of uh of an atari game uh and you can feed them in any order you want and and you know you get the same representation coming up so we we try to uh to to fit these representations into into a policy network and train the entire system to to perform the task and uh what we notice is that it after after start uh development we have to iterate on this method it doesn't work at the beginning so uh some some of the the improvements that my my colleague eugene discovered is we actually have to feed in things like the previous action and have have each sensory neuron for certain tasks have its own internal state so so for for for example like your local motion robot actually you know every sensory input goes into its own lstm and that lstm would output a broadcast signal to the attention mechanism that will generate this permutation and variance representation that can produce the action so it's fairly expensive yeah yeah but so in a way it's kind of like abstracting like usually uh you know are in traditionally our networks are we just get the input right away into into a particular uh input node of a neural network but here we we treat every input node as uh as a neural network itself so that's why i guess the paper's title is the sensory neuron as a transformer because these neural networks is has been inspired by the transformer architecture so to to uh uh to i guess pay some uh give some credit to that uh and what we notice is like uh of course this it's gonna work uh for permutation invariant uh observations but miraculously without additional training these agents tend to work even when we shuffle them during the observations during an episode like like for example if you have a locomotion robot walk forward and of course it's gonna work when you shuffle the input once at the beginning and keep keep that shuffle order the same for the rest of the 1000 time steps because like by definition the the the representations don't change but what we notice is we can shuffle them many times during the environments like like if if your episode is 1 000 time steps you're going to shuffle them you know every 100 time steps and the performance without additional training remains roughly the same so so there's something to be said about the the power of of the agent's ability to to quickly re-adapt without explicitly learning to re-adapt to the environments um the other did you look at um or would you expect to see that if you then uh you train an agent with this capability or this constraint you know as you might say and then you give it unshuffled data does it perform better because it you know has learned to attend to important relationships in the scene as opposed to um an agent that you know hasn't been trained in this way for this one um if if we give it shuffled or unshuffled data it'll perform exactly the same way because the representations are consistent uh uh but but that being said uh we we could do things like uh like take away information from the input or give it additional redundance information uh in the input and have it still kind of work like like for instance if the agent expects like five input signals to do a task uh you can give it like 20 signals but five of them are the actual important signals and the other 15 are pure noise uh of it like a small amount of noise and the whole thing can be shuffled and without actually extra further training like uh it's only trained originally on the five inputs it's still able to identify and like it work i guess uh it's able to to somehow learn that it should identify which signals are important without explicitly training to identify those so i feel it's it's kind of like a this is somewhere between learning and meta learning for something like meta learning you're explicitly training the algorithm to learn an algorithm that learns and therefore for learning you're just getting the algorithm policy for the task here i think the method is ex like indirectly learning uh self identification method to to identify um like which patches or or which uh inputs are important and the the other uh interesting result is uh is from the robustness standpoint so if we apply this uh methodology to visual tasks like the car racing game or pong like atari games uh we notice that we can do we can change the backgrounds of the game and the policy can still continue to work to some extent this was not possible in the earlier work on the on the heart attention so when we change the background it still fails but here uh when we change the background for for the car racing task uh without explicitly training on these new backgrounds the policy can can still work to some extent and the the performance is is almost as wealth as good for these generalization tasks compared to existing works in their literature that are explicitly designed to do such generalization but here it's like a byproduct of of okay let's train our agent to work with shuffle inputs and oh by the way you know the the generalization abilities are are like just a byproduct of of this this constraint so when we dug into further like uh the hypothesis is if if we if we shuffle up all of the the patches or the tiles of the screen we could force the agents to to learn like the essence the essential important things for the task and because it's forced to learn the essential properties that may help it generalize to to variations of the environment with different backgrounds and when we did further analysis we actually looked at the patches that they learned they learned to attend to like in in the car racing game and it turns out that even though the is all shuffled around it still learns to to attend mostly to the patches that correspond to the edge of the road even even when the screen is all shuffled around and so then uh this can help us explain why the generalization still works to to environments with if uh when we change the background uh because like it's not seeing it's not really attending to to the positions with different backgrounds is still looking at at the road so some of the analysis is done in the paper to explain why the the transfer works got it so given this uh you know body of research that that you've pursued focused on the ideas of constraints incorporating ideas like neural evolution you know what are you excited about looking forward where do you see your research headed yeah so like uh i'm really fascinated uh with the whole concept of uh the self organization uh so um especially like uh i i was i'm really inspired by this body of work that that my my colleague alexander madinsev did on a neurocellular automata and also self-organizing class amnesty classifiers which was uh recent articles on the distilled platform and like one one of the things that uh excited me about this self uh the sensory neuron paper is uh the it is sort of like a self-organized system every input goes into uh an identical neural network with its own hidden recurrent state and somehow these neural networks learn to communicate via a tension mechanism to have this emergent property which is the policy and i'm really excited about uh like going forward with exploring more of these uh collective intelligence themes when you uh where you have a you have an emergent property from thousands or even like hundreds of thousands of different uh unique agents or units that have their own local processing rules but somehow as a whole you have some global emerging property that emerge that that is that is a result of maybe some evolutionary optimization and i want to explore like a properties of these emerging prop uh behavior because maybe that will help us address some of the shortcomings we see in in reinforcement learning like uh like a lot of like some some of the issues in rl has to do with like a robustness generalization um maybe like uh out like a sample efficiency and so on but uh we can get inspiration from other uh areas like like for example swarm swarm computing swarm optimization uh like a multi-agent systems and and maybe if we look at if we try to break down a problem into into a large like a complex systems problem where you have lots of local computation perhaps that might give us some insight or different types of solutions uh to to how we've been able to approach them so far so i'm excited about the the general idea of like a collective intelligence complex systems and going forward we we want to see you know how we can like a bridge between the complex systems research and incorporate some some of the good ideas into machine learning and also maybe look at the other way around maybe we can use machine learning to also help advance the state of complex systems research awesome awesome i'm looking forward to following along as you uh push forward in that direction david has been wonderful chatting with you thanks so much for joining us well thanks for having me sam always\n"

The Benefit of Bottlenecks in Evolving Artificial Intelligence with David Ha - #535

Random Videos