Yann LeCun - Deep Learning, ConvNets, and Self-Supervised Learning _ Lex Fridman Podcast #36

**The Importance of Embodiment in Artificial Intelligence**

The necessity of embodiment in artificial intelligence (AI) is a topic that has been extensively debated among researchers and experts. Some argue that embodiment, or the integration of sensors and actuators to interact with the physical world, is essential for creating truly intelligent machines. However, others believe that grounding, or establishing a connection between language and reality, is sufficient to achieve human-like intelligence.

One of the concerns with relying solely on language-based interactions is the lack of common sense reasoning. This is evident in the classic Winograd schema problem, where it's difficult to determine which "it" refers to the trophy or the suitcase based on the context. While language can provide a wealth of information about the world, there are limits to what can be learned through text alone. Perception and experience of the physical world are essential for developing a deeper understanding of reality.

This is why embodiment has gained significant attention in recent years. By integrating sensors and actuators into AI systems, researchers hope to create machines that can interact with the world in a more meaningful way. This could involve using robots to interact with virtual environments or even real-world objects. However, the necessity of embodiment is still a topic of debate, and some argue that grounding through language alone may be sufficient.

**Emotions and Intelligence**

Emotions play a crucial role in human intelligence, and it's likely that emotions will also be essential for creating truly intelligent machines. The basal ganglia, a region in the brain responsible for emotional regulation, is involved in calculating an individual's level of discontentment. This is opposed to the emotional state that arises from anticipation of bad things that may happen, which creates fear.

The connection between emotions and intelligence is complex, but it's clear that emotions will play a significant role in any system that strives to achieve human-like intelligence. One question that comes up when considering the development of an AGI system is what would be the first question that such a system would ask? The answer may not be as simple as we expect.

A four-year-old child might be a good starting point for testing the limits of an AI system's understanding. By asking questions that reveal common sense reasoning about the physical world, researchers can gain insight into the system's ability to understand reality. Questions like "What makes leaves move?" or "Why are some things bigger than others?" would require the system to demonstrate a basic understanding of causality and the physical world.

**The Challenges of Creating an AGI System**

Creating an AGI system that surpasses human intelligence is a daunting task, even for experts in the field. The development of such a system requires a deep understanding of human cognition, emotional regulation, and the complexities of language interaction.

One of the challenges facing researchers is how to create a system that can learn from experience without becoming overwhelmed by the sheer volume of information available. Another challenge is how to integrate emotions into an AGI system in a way that feels natural and intuitive.

Ultimately, creating an AGI system that truly surpasses human intelligence will require a multidisciplinary approach that incorporates insights from psychology, neuroscience, computer science, and engineering. By understanding the complexities of human cognition and emotional regulation, researchers can develop systems that are more empathetic, more creative, and ultimately, more intelligent.

**A New Perspective on Intelligence**

The concept of intelligence is complex and multifaceted, and it's likely that true AI will require a new perspective on what it means to be intelligent. Rather than simply relying on language-based interactions or embodiment, researchers may need to consider alternative approaches that integrate multiple forms of cognition and perception.

By exploring the boundaries of human intelligence and emotional regulation, researchers can gain a deeper understanding of what makes us tick. This knowledge can then be applied to the development of truly intelligent machines that are capable of understanding the world in a way that feels natural and intuitive.

"WEBVTTKind: captionsLanguage: enthe following is a conversation with Jana kun he's considered to be one of the fathers of deep learning which if you've been hiding under a rock is the recent revolution in AI that's captivated the world with the possibility of what machines can learn from data he's a professor in New York University a vice president and chief AI scientist a Facebook & Co recipient of the Turing Award for his work on deep learning he's probably best known as the founding father of convolutional neural networks in particular their application to optical character recognition and the famed M NIST data set he is also an outspoken personality unafraid to speak his mind in a distinctive French accent and explore provocative ideas both in the rigorous medium of academic research and the somewhat less rigorous medium of Twitter and Facebook this is the artificial intelligence podcast if you enjoy it subscribe on YouTube give it five stars on iTunes support and on patreon we're simply gonna equip me on Twitter Alex Friedman spelled the Fri D ma N and now here's my conversation with Yann Laocoon you said that 2001 Space Odyssey is one of your favorite movies Hal 9000 decides to get rid of the astronauts for people haven't seen the movie spoiler alert because he it she believes that the astronauts they will interfere with the mission do you see how is flawed in some fundamental way or even evil or did he do the right thing neither there's no notion of evil in that in that context other than the fact that people die but it was an example of what people call value misalignment right you give an objective to a machine and the Machine strives to achieve this objective and if you don't put any constraints on this objective like don't kill people and don't do things like this the Machine given the power will do stupid things just to achieve this dis objective or damaging things to achieve its objective it's a little bit like we are used to this in the context of human society we we put in place laws to prevent people from doing bad things because fantasy did we do those bad things right so we have to shave their cost function the objective function if you want through laws to kind of correct an education obviously to sort of correct for for those so maybe just pushing a little further on on that point how you know there's a mission there's a this fuzziness around the ambiguity around what the actual mission is but you know do you think that there will be a time from a utilitarian perspective or an AI system where it is not misalignment where it is alignment for the greater good of society that kneei system will make decisions that are difficult well that's the trick I mean eventually we'll have to figure out how to do this and again we're not starting from scratch because we've been doing this with humans for four millennia so designing objective functions for people is something that we know how to do and we don't do it by you know programming things although the legal code is called code so that tells you something and it's actually the design of an object you function that's really what legal code is right it tells you you can do it what you can't do if you do it you pay that much that's that's an objective function so there is this idea somehow that it's a new thing for people to try to design objective functions are aligned with the common good but no we've been writing laws for millennia and that's exactly what it is so this that's where you know the science of lawmaking and and computer science will come together will come together so it's nothing there's nothing special about how or a I systems is just the continuation of tools used to make some of these difficult ethical judgments that laws make yeah and we and we have systems like this already that you know make many decisions for ourselves in society that you know need to be designed in a way that they like you know rules about things that sometimes sometimes have bad side effects and we have to be flexible enough about those rules so that they can be broken when it's obvious that they shouldn't be applied so you don't see this on the camera here but all the decorations in this room is all pictures from 2001 a Space Odyssey Wow and by accident or is there a lot about accident it's by design Wow so if you were if you were to build hell 10,000 so an improvement of Hal 9000 what would you improve well first of all I wouldn't ask you to hold secrets and tell lies because that's really what breaks it in the end that's the the fact that it's asking itself questions about the purpose of the mission and it's you know pieces things together that it's heard you know all the secrecy of the preparation of the mission and the fact that it was discovery and on the lunar surface that really was kept secret and and one part of Hal's memory knows this and the other part is does not know it and it's supposed to not tell anyone and that creates a internal conflict do you think there's never should be a set of things that night AI system should not be allowed like a set of facts that should not be shared with the human operators well I think no I think the I think it should be a bit like in the design of autonomous AI systems there should be the equivalent of you know the the the oath that hypocrite Oh calm yourself yeah that doctors sign up to right so the certain thing certain rule said that that you have to abide by and we can sort of hardwire this into into our into our machines to kind of make sure they don't go so I'm not you know advocate of the the 303 dollars of Robotics you know the as you move kind of thing because I don't think it's practical but but you know some some level of of limits but but to be clear this is not these are not questions that are kind of really worth asking today because we just don't have the technology to do this we don't we don't have a ton of missing teller machines we have intelligent machines so my intelligent machines that are very specialized but they don't they don't really sort of satisfy an objective they're just you know kind of trained to do one thing so until we have some idea for design of a full-fledged autonomous intelligent system asking the question of how we design use objective I think is a little a little too abstract it's a little tough rat there's useful elements to it in that it helps us understand our own ethical codes humans so even just as a thought experiment if you imagine that in a GI system is here today how would we program it is a kind of nice thought experiment of constructing how should we have a law have a system of laws far as humans it's just a nice practical tool and I think there's echoes of that idea too in the AI systems left today it don't have to be that intelligent yeah like autonomous vehicles there's these things start creeping in that were thinking about but certainly they shouldn't be framed as as hell yeah looking back what is the most I'm sorry if it's a silly question but what is the most beautiful or surprising idea and deep learning or AI in general that you've ever come across sort of personally well you said back and and just had this kind of wow that's pretty cool moment that's nice well surprising I don't know if it's an idea rather than a sort of empirical fact the fact that you gigantic neural nets trying to train them on you know relatively small amounts of data relatively with the caste grid in the center that it actually works breaks everything you read in every textbook right every pre deep learning textbook that told you you need to have fewer parameters and you have data samples you know if you have non-convex objective function you have no guarantee of convergence you know all the things that you read in textbook and they tell you stay away from this and they were all wrong huge number of parameters non-convex and somehow which is very relative to the number of parameters data it's able to learn anything right does that surprise you today well it it was kind of obvious to me before I knew anything that that's that this is a good idea and then it became surprising that it worked because I started reading those text books okay so okay you talk to the intuition of why was obviously if you remember well okay so the intuition was it's it's sort of like you know those people in the late 19th century who proved that heavier than than air flight was impossible right and of course you have birds right they do fly and so on the face of it it it's obviously wrong as an empirical question right and so we have the same kind of thing that you know the we know that the brain works we don't know how but we know it works and we know it's a large network of neurons and interaction and the learning takes place by changing the connection so kind of getting this level of inspiration without copying the details but sort of trying to derive basic principles you know that kind of gives you a clue as to which direction to go there's also the idea somehow that I've been convinced of since I was an undergrad that even before that intelligence is inseparable from running so you the idea somehow that you can create an intelligent machine by basically programming for me was a non-starter you know from the start every intelligent entity that we know about arrives at this intelligence to learning so learning you know machine learning was completely obvious path also because I'm lazy so you know it's automate basically everything and learning is the automation of intelligence right so do you think so what is learning then what what falls under learning because do you think of reasoning is learning where reasoning is certainly a consequence of learning as well just like other functions of of the brain the big question about reasoning is how do you make reasoning compatible with gradient based learning do you think neural networks can be made to reason yes that there's no question about that again we have a good example right the question is is how so the question is how much prior structure you have to put in the neural net so that something like human reasoning will emerge from it you know from running another question is all of our kind of model of what reasoning is that are based on logic are discrete and and and are therefore incompatible with gradient based learning and I was very strong believer in this idea Grandin baserunning I don't believe that other types of learning that don't use kind of gradient information if you want so you don't like discrete mathematics you don't like anything discrete well that's it's not that I don't like it it's just that it's it's incompatible with learning and I'm a big fan of running right so in fact that's perhaps one reason why deep learning has been kind of looked at with suspicion by a lot of computer scientists because the math is very different the method you use for deep running you know we kind of as more to do with you know cybernetics the kind of math you do in electrical engineering then the kind of math you doing computer science and and you know nothing in in machine learning is exact right computer science is all about sort of you know obviously compulsive attention to details of like you know every index has to be right and you can prove that an algorithm is correct right machine learning is the science of sloppiness really that's beautiful so okay maybe let's feel around in the dark of what is a neural network that reasons or a system that is works with continuous functions that's able to do build knowledge however we think about reasoning builds on previous knowledge build on extra knowledge create new knowledge generalized outside of any training set ever built what does that look like if yeah maybe do you have Inklings of thoughts of what that might look like well yeah I mean yes or no if I had precise ideas about this I think you know we'd be building it right now but and there are people working on this or whose main research interest is actually exactly that right so what you need to have is a working memory so you need to have some device if you want some subsystem they can store a relatively large number of factual episodic information for you know a reasonable amount of time so you you know in the in the brain for example it kind of three main types of memory one is the sort of memory of the the state of your cortex and that sort of disappears within 20 seconds you can't remember things for more than about 20 seconds or a minute if if you don't have any other form of memory the second type of memory which is longer term is short term is the hippocampus so you can you know you came into this building you remember whether where the the exit is where the elevators are you have some map of that building that's stored in your hippocampus you might remember something about what I said you know if you minutes ago and forgot all our stars being raised but you know but that does not work in your hippocampus and then the the longer term memory is in the synapse the synapses right so what you need if you want for a system that's capable reasoning is that you want the hippocampus like thing right and that's what people have tried to do with memory networks and you know no Turing machines and stuff like that right and and now with transformers which have sort of a memory in their kind of self attention system you can you can think of it this way so so that's one element you need another thing you need is some sort of network that can access this memory get an information back and then kind of crunch on it and then do this iteratively multiple times because a chain of reasoning is a process by which you you you can you update your knowledge about the state of the world about you know what's gonna happen etc and that there has to be this sort of recurrent operation basically and you think that kind of if we think about a transformer so that seems to be too small to contain the knowledge that's that's to represent the knowledge as containing Wikipedia for example but transformer doesn't have this idea of recurrence it's got a fixed number of layers and that's number of steps that you know limits basically it's a representation but recurrence would build on the knowledge somehow I mean yeah it would evolve the knowledge and expand the amount of information perhaps or useful information within that knowledge yeah but is this something that just can emerge with size because it seems like everything we have now is just no it's not it's not it's not clear how you access and right into an associative memory in efficient way I mean sort of the original memory network maybe had something like the right architecture but if you try to scale up a memory network so that the memory contains all we keep here it doesn't quite work right so so this is a need for new ideas there okay but it's not the only form of reasoning so there's another form of reasoning which is true which is very classical so in some types of AI and it's based on let's call it energy minimization okay so you have some sort of objective some energy function that represents the the the quality or the negative quality okay energy goes up when things get bad and they get low when things get good so let's say you you want to figure out you know what gestures do I need to to do to grab an object or walk out the door if you have a good model of your own body a good model of the environment using this kind of energy minimization you can make a you can make you can do planning and it's in optimal control it's called it's called Marie put model predictive control you have a model of what's gonna happen in the world as consequence for your actions and that allows you to buy energy minimization figure out the sequence of action that optimizes a particular objective function which measures you know minimize the number of times you're gonna hit something and the energy gonna spend doing the gesture and etc so so that's performer reasoning planning is a form of reasoning and perhaps what led to the ability of humans to reason is the fact that or you know species you know that appear before us had to do some sort of planning to be able to hunt and survive and survive the winter in particular and so you know it's the same capacity that you need to have so in your intuition is if you look at expert systems in encoding knowledge as logic systems as graphs in this kind of way is not a useful way to think about knowledge graphs are your brittle or logic representation so basically you know variables that that have values and constraint between them that are represented by rules as well too rigid and too brittle right so one of the you know some of the early efforts in that respect were were to put probabilities on them so a rule you know you know if you have this in that symptom you know you have this disease with that probability and you should describe that antibiotic with that probability right this my sin system from the for the 70s and that that's what that branch of AI led to you know busy networks in graphical models and causal inference and vibrational you know method so so there there is I mean certainly a lot of interesting work going on in this area the main issue with this is is knowledge acquisition how do you reduce a bunch of data to graph of this type near relies on the expert and a human being to encode at add knowledge and that's essentially impractical yeah the question the second question is do you want to represent knowledge symbols and you want to manipulate them with logic and again that's incomparable we're learning so one suggestion with geoff hinton has been advocating for many decades is replace symbols by vectors think of it as pattern of activities in a bunch of neurons or units or whatever you wanna call them and replace logic by continuous functions okay and that becomes now compatible there's a very good set of ideas by region in a paper about 10 years ago by leon go to on who is here at face book the title of the paper is for machine learning to machine reasoning and his idea is that learning learning system should be able to manipulate objects that are in the same space in a space and then put the result back in the same space so is this idea of working memory basically and it's a very enlightening and in the sense that might learn something like the simple expert systems I mean it's with you can learn basic logic operations there yeah quite possibly yeah this is a big debate on sort of how much prior structure you have to put in for this kind of stuff to emerge that's the debate I have with Gary Marcus and people like that yeah yeah so and the other person so I just talked to judea pearl mm-hmm well you mentioned causal inference world his worry is that the current knew all networks are not able to learn what causes what causal inference between things so I think I think he's right and wrong about this if he's talking about the sort of classic type of neural nets people also didn't worry too much about this but there's a lot of people now working on causal inference and there's a paper that just came out last week by Leon Mbutu among others develop his path and push for other people exactly on that problem of how do you kind of you know get a neural net to sort of pay attention to real causal relationships which may also solve issues of bias in data and things like this so I'd like to read that paper because that ultimately the challenges also seems to fall back on the human expert to ultimately decide causality between things people are not very good at its direction causality first of all so first of all you talk to a physicist and physicists actually don't believe in causality because look at the all the busy clause or microphysics are time reversible so there is no causality the arrow of time is not right yeah it's it's as soon as you start looking at macroscopic systems where there is unpredictable randomness where there is clearly an arrow of time but it's a big mystery in physics actually well how that emerges is that emergent or is it part of the fundamental fabric of reality yeah or is it bias of intelligent systems that you know because of the second law of thermodynamics we perceive a particular arrow of time but in fact it's kind of arbitrary right so yeah physicists mathematicians they don't care about I mean the math doesn't care about the flow of time well certainly certainly macro physics doesn't people themselves are not very good at establishing causal causal relationships if you ask is I think it was in one of Seymour Papert spoken on like children learning you know he studied with Jean Piaget you know he's the guy who co-authored the book perceptron with Marvin Minsky that kind of killed the first wave but but he was actually a learning person he in the sense of studying learning in humans and machines that's what he got interested in for scepter on and he wrote that if you ask a little kid about what is the cause of the wind a lot of kids will say they will think for a while and they'll say oh it's the the branches in the trees they move and that creates wind right so they get the causal relationship backwards and it's because their understanding of the world and intuitive physics is not that great right I mean these are like you know four or five year old kids you know it gets better and then you understand that this it can't be right but there are many things which we can because of our common sense understanding of things what people call common sense yeah and we understanding of physics we can there's a lot of stuff that we can figure out causality even with diseases we can figure out what's not causing what often there's a lot of mystery of course but the idea is that you should be able to encode that into systems it seems unlikely to be able to figure that out themselves well whenever we can do intervention but you know all of humanity has been completely deluded for millennia probably since existence about a very very wrong causal relationship where whatever you can explain you attributed to you know some deity some divinity right and that's a cop-out that's the way of saying like I don't know the cause so you know God did it right so you mentioned Marvin Minsky and the irony of you know maybe causing the first day I winter you were there in the 90s you're there in the 80s of course in the 90s what do you think people lost faith and deep learning in the 90s and found it again a decade later over a decade later yeah it wasn't called dethroning yeah it was just called neural nets you know yeah they lost interests I mean I think I would put that around 1995 at least the machine learning community there was always a neural net community but it became disconnected from sort of ministry machine owning if you want there were it was basically electrical engineering that kept at it and computer science just gave up give up on neural nets I don't I don't know you know I was too close to it to really sort of analyze it with sort of a unbiased eye if you want but I would I would I would would make a few guesses so the first one is at the time neural nets were it was very hard to make them work in the sense that you would you know implement back prop in your favorite language and that favorite language was not Python it was not MATLAB it was not any of those things cuz they didn't exist right you had to write it in Fortran or C or something like this right so you would experiment with it you would probably make some very basic mistakes like you know badly initialize your weights make the network too small because you read in the textbook you know you don't want too many parameters right and of course you know and you would train on x4 because you didn't have any other data set to try it on and of course you know it works half the time so we'd say you give up also 22 the batch gradient which you know isn't it sufficient so there's a lot of bag of tricks that you had to know to make those things work or you had to reinvent and a lot of people just didn't and they just couldn't make it work so that's one thing the investment in software platform to be able to kind of you know display things figure out why things don't work and I get a good intuition for how to get them to work have enough flexibility so you can create you know network architectures well completion ads and stuff like that it was hard yeah when you had to write everything from scratch and again you didn't have any Python or MATLAB or anything right so what I read that sorry to interrupt but I read he wrote in in Lisp the first versions of Lynette accomplished in your networks which by the way one of my favorite languages that's how I knew you were legit the Turing Award whatever this would be programmed and list that's still my favorite language but it's not that we programmed in Lisp it's that we had to write or this printer printer okay cuz it's not that's right that's one that existed so we wrote a lisp interpreter that we hooked up to you know back in library that we wrote also for neural net competition and then after a few years around 1991 we invented this idea of basically having modules that know how to forward propagate and back propagate gradients and then interconnecting those modules in a graph loom but who had made proposals on this about this in the late 80s and were able to implement this using all this system eventually we wanted to use that system to make build production code for character recognition at Bell Labs so we actually wrote a compiler for that disp interpreter so that Christy Martin who is now Microsoft kind of did the bulk of it with Leone and me and and so we could write our system in lisp and then compiled to seee and then we'll have a self-contained complete system that could kind of do the entire thing neither Python or turn pro can do this today yeah okay it's coming yeah I mean there's something like that in Whitehorse called you know tor script and so you know we had to write or Lisp interpreter which retinol is compiler way to invest a huge amount of effort to do this and not everybody if you don't completely believe in the concept you're not going to invest the time to do this right now at the time also you know it were today this would turn into torture by torture and so for whatever we put it in open-source everybody would use it and you know realize it's good back before 1995 working at AT&T there's no way the lawyers would let you release anything in open source of this nature and so we could not distribute our code really and at that point and sorry to go on a million tangents but on that point I also read that there was some almost pad like a patent on convolution your network yes it was labs so that first of all I mean just to actually that ran out the thankfully 8007 in 2007 that what look can we can we just talk about that first I know you're a facebook but you're also done why you and and what does it mean patent ideas like these software ideas essentially or what are mathematical ideas or what are they okay so they're not mathematical idea so there are you know algorithms and there was a period where the US Patent Office would allow the patent of software as long as it was embodied the Europeans are very different they don't they don't quite accept that they have a different concept but you know I don't I know no I mean I never actually strongly believed in this but I don't believe in this kind of patent Facebook basically doesn't believe in this kind of pattern Google Files patterns because they've been burned with Apple and so now they do this for defensive purpose but usually they say we're not going to see you if you infringe Facebook has a similar policy they say you know we file pattern on certain things for defensive purpose we're not going to see you if you infringe unless you sue us so the the industry does not believe in in patterns they are there because of you know the legal landscape and and and various things but but I don't really believe in patterns for this kind of stuff yes so that's that's a great thing so I tell you a war story yeah you so what happens was the the first the first pattern of a condition that was about kind of the early version Congress on that that didn't have separate pudding layers it had the conditional layers which tried more than one if you want right and then there was a second one on commercial nets with separate pudding layers train with back probably in 89 and 1992 something like this at the time the life life of a pattern was 17 years so here's what happened over the next few years is that we started developing character recognition technology around commercial Nets and in 1994 a check reading system was deployed in ATM machines in 1995 it was for a large check reading machines in back offices etc and those systems were developed by an engineering group that we were collaborating with AT&T and they were commercialized by NCR which at the time was a subsidiary of AT&T now it ain't he split up in 1996 99 in 1996 and the lawyers just looked at all the patterns and they distributed the patterns among the various companies they gave the the commercial net pattern to NCR because they were actually selling products that used it but nobody I didn't see are at any idea where they come from that was yeah okay so between 1996 and 2007 there's a whole period until 2002 I didn't actually work on machine on your couch on that I resumed working on this around 2002 and between 2002 and 2007 I was working on them crossing my finger that nobody and NCR would notice nobody noticed yeah and I and I hope that this kind of somewhat as you said lawyers decide relative openness of the community now will continue it accelerates the entire progress of the industry and you know the problems that Facebook and Google and others are facing today is not whether Facebook or Google or Microsoft or IBM or whoever is ahead of the other it's that we don't have the technology to build the things we want to build we only build intelligent virtual systems that have common sense we don't have a monopoly on good ideas for this we don't believe with you maybe others do believe they do but we don't okay if a start-up tells you they have the secret to you know human level intelligence and common sense don't believe them they don't and it's going to take the entire work of the world research community for a while to get to the point where you can go off and in each of the company is going to start to build things on this we're not there yet it's absolutely in this this calls to the the gap between the space of ideas and the rigorous testing of those ideas of practical application that you often speak to you've written advice saying don't get fooled by people who claim to have a solution to artificial general intelligence who claim to have an AI system that work just like the human brain or who claim to have figured out how the brain works ask them what the error rate they get on em 'no store imagenet this is a little dated by the way that mean five years who's counting okay but i think your opinion it's the Amna stand imagenet yes may be data there may be new benchmarks right but i think that philosophy is one you still and and somewhat hold that benchmarks and the practical testing the practical application is where you really get to test the ideas well it may not be completely practical like for example you know it could be a toy data set but it has to be some sort of task that the community as a whole has accepted as some sort of standard you know kind of benchmark if you want it doesn't need to be real so for example many years ago here at fair people you know chosen Western art one born and a few others proposed the the babbitt asks which were kind of a toy problem to test the ability of machines to reason actually to access working memory and things like this and it was very useful even though it wasn't a real task amnesties kind of halfway a real task so you know toy problems can be very useful it's just that i was really struck by the fact that a lot of people particularly our people with money to invest would be fooled by people telling them oh we have you know the algorithm of the cortex and you should give us 50 million yes absolutely so there's a lot of people who who tried to take advantage of the hype for business reasons and so on but let me sort of talk to this idea that new ideas the ideas that push the field forward may not yet have a benchmark or it may be very difficult to establish a benchmark I agree that's part of the process establishing benchmarks is part of the process so what are your thoughts about so we have these benchmarks on around stuff we can do with images from classification to captioning to just every kind of information can pull off from images and the surface level there's audio datasets there's some video what can we start natural language what kind of stuff what kind of benchmarks do you see they start creeping on to more something like intelligence like reasoning like maybe you don't like the term but AGI echoes of that kind of yeah sort of elation a lot of people are working on interactive environments in which you can you can train and test intelligent systems so so there for example you know it's the classical paradigm of supervised running is that you you have a data set you partition it into a training site validation set test set and there's a clear protocol right but what if the that assumes that this apples are statistically independent you can exchange them the order in which you see them doesn't shouldn't matter you know things like that but what if the answer you give determines the next sample you see which is the case for example in robotics right you robot does something and then it gets exposed to a new room and depending on where it goes the room would be different so that's the decrease the exploration problem the what if the samples so that creates also a dependency between samples right you you if you move if you can only move it in in space the next sample you're gonna see is going to be probably in the same building most likely so so so the all the assumptions about the validity of this training set test set a potus's break whatever a machine can take an action that has an influence in the in the world and it's what is going to see so people are setting up artificial environments where what that takes place right the robot runs around a 3d model of a house and can interact with objects and things like this how you do robotics by simulation you have those you know opening a gym type thing or mu Joko kind of simulated robots and you have games you know things like that so that that's where the field is going really this kind of environment now back to the question of a GI like I don't like the term a GI because it implies that human intelligence is general and human intelligence is nothing like general it's very very specialized we think it's general we'd like to think of ourselves as having your own science we don't we're very specialized we're only slightly more general than why does it feel general so you kind of the term general I think what's impressive about humans is ability to learn as we were talking about learning to learn in just so many different domains is perhaps not arbitrarily general but just you can learn in many domains and integrate that knowledge somehow okay that knowledge persists so let me take a very specific example yes it's not an example it's more like a a quasi mathematical demonstration so you have about 1 million fibers coming out of one of your eyes okay two million total but let's let's talk about just one of them it's 1 million nerve fibers your optical nerve let's imagine that they are binary so they can be active or inactive right so the input to your visual cortex is 1 million bits now they connected to your brain in a particular way on your brain has connections that are kind of a little bit like accomplish on that they're kind of local you know in space and things like this I imagine I play a trick on you it's a pretty nasty trick I admit I I cut your optical nerve and I put a device that makes a random perturbation of a permutation of all the nerve fibers so now what comes to your to your brain is a fixed but random permutation of all the pixels there's no way in hell that your visual cortex even if I do this to you in infancy will actually learn vision to the same level of quality that you can got it and you're saying there's no way you ever learn that no because now two pixels that on your body in the world will end up in very different places in your visual cortex and your neurons there have no connections with each other because they only connect it locally so this whole our entire the hardware is built in many ways to support the locality of the real world yeah yes that's specialization yep okay it's still now really damn impressive so it's not perfect generalization I even closed no no it's it's it's it's not that it's not even close it's not at all yes it's socialize so how many boolean functions so let's imagine you want to train your visual system to you know recognize particular patterns of those 1 million bits ok so that's a boolean function right either the pattern is here or not here this is a to to a classification with 1 million binary inputs how many such boolean functions are there okay if you have 2 to the 1 million combinations of inputs for each of those you have an output bit and so you have 2 to the 2 to the 1 million boolean functions of this type okay which is an unimaginably large number how many of those functions can actually be computed by your visual cortex and the answer is a tiny tiny tiny tiny tiny tiny sliver like an enormous little tiny sliver yeah yeah so we are ridiculously specialized you know okay but okay that's an argument against the word general I think there's there's a I there's I agree with your intuition but I'm not sure it's it seems the breath the the brain is impressively capable of adjusting to things so it's because we can't imagine tasks that are outside of our comprehension right we think we think we are general because we're general of all the things that we can apprehend so yeah but there is a huge world out there of things that we have no idea we call that heat by the way heat heat so at least physicists call that heat or they call it entropy which is kokkonen you have a thing full of gas right call system for gas right goes on a coast it has you know pressure it has temperature has you know and you can write the equations PV equal NRT you know things like that right when you reduce a volume the temperature goes up the pressure goes up you know things like that right for perfect gas at least those are the things you can know about that system and it's a tiny tiny number of bits compared to the complete information of the state of the entire system because the state when HR system will give you the position and momentum of every every molecule of the gas and what you don't know about it is the entropy and you interpret it as heat the energy containing that thing is is what we call heat now it's very possible that in fact there is some very strong structure in how those molecules are moving is just that they are in a way that we are just not wired to perceive they are ignorant to it and there's in your infinite amount of things we're not wired to perceive any right that's a nice way to put it well general to all the things we can imagine which is a very tiny a subset of all things that are possible it was like coma growth complexity or the coma was charged in some one of complexity you know every bit string or every integer is random except for all the ones that you can actually write down yeah okay so beautifully put but you know so we can just call it artificial intelligence we don't need to have a general whatever novel human of all Nutella transmissible oh you know you'll start anytime you touch human it gets it gets interesting because you know it's just because we attach ourselves to human and it's difficult to define with human intelligences yeah nevertheless my definition is maybe damn impressive intelligence ok damn impressive demonstration of intelligence whatever and so on that topic most successes in deep learning have been in supervised learning what is your view on unsupervised learning is there a hope to reduce involvement of human input and still have successful systems that are have practically used yeah I mean there's definitely a hope is it's more than a hope actually it's it's you know mounting evidence for it and that's basically or I do like the only thing I'm interested in at the moment is I call it self supervised running not unsupervised cuz unsupervised running is a loaded term people who know something about machine learning you know tell us how you doing clustering or PCA yeah she's nice and the way public we know when you say enterprise only oh my god you know machines are gonna learn by themselves and without supervision you know there's the parents yeah so so I could sell supervised learning because in fact the underlying algorithms that I use are the same algorithms as the supervised learning algorithms except that what we trained them to do is not predict a particular set of variables like the category of an image and and not to predict a set of variables that have been provided by human labelers but what you're trying to machine to do is basically reconstruct a piece of its input that it's being this being masked masked out essentially you can think of it this way right so show a piece of a video to a machine and ask it to predict what's gonna happen next and of course after a while you can show what what happens and the machine will kind of train itself to do better at that task you can do like all the latest most successful models the natural language processing use cell supervised running you know sort of bird style systems for example right you show it a window of a thousand words on a test corpus you take out 15% of the words and then you train a machine to predict the words that are missing that's out supervised running it's not predicting the future it's just you know predicting things in middle but you could have you predict the future that's what language models do so you construct it so in an unsupervised way you construct a model of language do you think or video or the physical world or whatever right how far do you think that can take us do you think very far it understands anything to some level it has you know a shallow understanding of of text but it needs to I mean to have kind of true human level intelligence I think you need to ground language in reality so some people are attempting to do this right having systems that can I have some visual representation of what what is being talked about which is one reason you need interactive environments actually this is like a huge technical problem that is not solved and that explains why such super versioning works in the context of natural language that does not work in the context on at least not well in the context of image recognition and video although it's making progress quickly and the reason that reason is the fact that it's much easier to represent uncertainty in the prediction you know context of natural language than it is in the context of things like video and images so for example if I ask you to predict what words are missing you know 15 percent of the words that I've taken out the possibility is small that means small right there is 100,000 words in the in the lexicon and what the Machine spits out is a big probability vector right it's a bunch of numbers between 0 & 1 that's 1 to 1 and we know how to do how to do this with computers so they are representing uncertainty in the prediction is relatively easy and that's in my opinion why those techniques work for NLP for images if you ask if you block a piece of an image and you as a system reconstruct that piece of the image there are many possible answers there are all perfectly legit right and how do you represent that the set of possible answers you can't train a system to make one prediction you can train a neural net to say here it is that's the image because it's there's a whole set of things that are compatible with it so how do you get the machine to represent not a single output but all set of outputs and you know similarly with video prediction there's a lot of things that can happen in the future video you're looking at me right now I'm not moving my head very much but you know I might you know what turn my my head to the left or to the right right if you don't have a system that can predict this and you train it with least Square to kind of minimize the error with the prediction and what I'm doing what you get is a blurry image of myself in all possible future positions that I might be in which is not a good prediction but so there might be other ways to do the self supervision right for visual scenes like what if i I mean if I knew I wouldn't tell you publish it first I don't know I know there might be so I mean these are kind of there might be artificial ways of like self play in games the way you can simulate part of the environment you can oh that doesn't solve the problem it's just a way of generating data but because you have more of a country might mean you can control yeah it's a way to generate data and that's right and because you can do huge amounts of data generation that doesn't you write this well it's it's a creeps up on the problem from the side of data and you don't think that's the right way to it doesn't solve this problem of handling uncertainty in the world right so if you if you have a machine learn a predictive model of the world in a game that is deterministic or quasi deterministic it's easy right just you know give a few frames of the game to a combat put a bunch of layers and then half the game generates the next few frames and and if the game is deterministic it works fine and that includes you know feeding the system with the action that your little character is going to take the problem comes from the fact that the real world and certain most games are not entirely predictable that's what they're you get those blurry predictions and you can't do planning with very predictions all right so if you have a perfect model of the world you can in your head run this model with a hypothesis for a sequence of actions and you're going to predict the outcome of that sequence of actions but if your model is imperfect how can you plan yeah it quickly explodes what are your thoughts on the extension of this which topic I'm super excited about it's connected to something you're talking about in terms of robotics is active learning so as opposed to sort of unemployed and supervisors self supervised learning you ask the system for human help right for selecting parts you want annotated next so if you talk about a robot exploring a space or a baby exploring a space or a system exploring a data set every once in a while asking for human input you see value in that kind of work I don't see transformative value it's going to make things that we can already do more efficient or they will learn slightly more efficiently but it's not going to make machines sort of significantly more intelligent I think and I and by the way there is no opposition there is no conflict between self supervisor on reinforcement learning and supervisor on your imitation learning or active learning I see sub super wrestling as a as a preliminary to all of the above yes so the example I use very often is how is it that so if you use enforcement running deep enforcement running if you want the best methods today was so-called model free enforcement training to learn to play Atari games take about 80 hours of training to reach the level that any human can reach in about 15 minutes they get better than humans but it takes a long time alpha star okay the you know are your videos and his team's the system to play to to play Starcraft plays you know a single map a single type of player and which better than human level is about the equivalent of 200 years of training playing against itself it's 200 years right it's not something that no no human can could every I'm not sure what it doesn't take away from that okay now take those algorithms the best our algorithms we have today to train a car to drive itself it would probably have to drive millions of hours you will have to kill thousands of pedestrians it will have to run into thousands of trees it will have to run off cliffs and you had to run the cliff multiple times before it figures out it's a bad idea first of all yeah and second of all the figures that had not to do it and so I mean this type of running obviously does not reflect the kind of running that animals and humans do there is something missing that's really really important there and my apart is is which have been advocating for like five years now is that we have predictive models of the world that include the ability to predict under uncertainty and what allows us to not run off a cliff when we learn to drive most of us can learn to drive in about 20 or 30 hours of training without ever crashing causing any accident if we drive next to a cliff we know that if we turn the wheel to the right the car is going to run off the cliff and nothing good is gonna come out of this because we have a pretty good model of intuitive physics that tells us you know the car is gonna fall we know we know about gravity babies run this around the age of eight or nine months that objects don't float they fall and you know we have a pretty good idea of the effect of turning the wheel of the car and you know we know we need to stay on the road so there is a lot of things that we bring to the table which is basically or predictive model of the world and that model allows us to not do stupid things and to basically stay within the context of things we need to do we still face you know unpredictable situations and that's how we learn but that allows us to learn really really really quickly so that's called model-based reinforcement running there's some imitation and supervised running because we have a driving instructor that tells us occasionally what to do but most of the learning is Mauro bass is learning the model yeah running physics that we've done since we were babies that's where all almost all are learning and the physics is somewhat transferable from is transferable from scene to scene stupid things are the same everywhere yeah I mean if you you know you have experience of the world you don't need to be particularly from a particularly intelligent species to know that if you spill water from a container you know the rest is gonna get wet and you might get wet so you know cats know this right yeah so the main problem we need to solve is how do we learn models of the world that's and that's what I'm interesting that's what's a supervised learning is all about if you were to try to construct a benchmark for let's let's look at happiness I'd love that dataset but if you do you think it's useful interesting / possible to perform well on eminence with just one example of each digit and how would we solve that problem yeah so it's probably yes the question is what other type of running are you allowed to do so if what you like to do is train on some gigantic data set of labelled digit that's called transfer running and we know that works okay we do this at Facebook like in production right we we train large commercial nets to predict hashtags that people type on Instagram and we train on billions of images literally billions and and then we chop off the last layer and fine-tune on whatever task we want that works really well you can be you know the image net record with we actually open source the whole thing like a few weeks ago yeah that's still pretty cool but yeah so what in yet won't be impressive and what's useful an impressive what kind of transfer learning would be useful impressive is it Wikipedia that kind of thing no no I don't think transfer learning is really where we should focus we should try to do you know have a kind of scenario for benchmark where you have only ball data and you can and it's very large number of enabled data it could be video clips it could be what you do you know frame prediction it could be images you could choose to you know mask a piece of it it could be whatever but they're only bold and you're not allowed to label them so you do some training on this and then you train on a particular supervised task imagenet or nist and you measure how your test our decrease or variation error decreases as you increase the number of label training samples okay and and what what you would like to see is is that you know your your error decreases much faster than if you trained from scratch from random weights so that to reach the same level of performance and a completely supervised purely supervised system would reach you would need way fewer samples so that's the crucial question because it will answer the question to like you know people are interested in medical image analysis okay you know if I want to get to a particular level of error rate for this task I know I need a million samples can I do you know soft supervised pre-training to reduce this to about 100 or something anything the answer there is soft supervised retraining yep some form some form of it telling you active learning but you disagree you know it's not useless it's just not gonna lead to a quantum leap it's just gonna make things that we already do so you're way smarter than me I just disagree with you but I don't have anything to back that it's just intuition so I've worked a lot of large-scale data sets and there's something there might be magic and active learning but okay at least I said it publicly at least some being an idea publicly okay it's not bigoted yet it's you know working with the data you have I mean I mean certainly people are doing things like okay I have three thousand hours of you know imitation running for in car but most of those are incredibly boring what I like is select you know 10% of them that are kind of the most informative and with just that I would probably reach the same so it's a weak form of of active running if you want yes but there might be a much stronger version yeah that's right that's what another notion question is the question is how much talking yet Elon Musk is confident talk to him recently he's confident that large-scale data and deep learning can solve the autonomous driving problem what are your thoughts on the limitless possibilities of deep learning in this space I was it's obviously part of the solution I mean I don't think we'll ever have a set driving system or it is not in the foreseeable future that does not use deep running you put it this way now how much of it so in the history of sort of engineering particularly is sort of sort of a I like systems is generally your first phase where everything is built by hand and it was the second phase and that was the case for autonomous driving you know 23 years ago there's a phase where this a little bit of running is used but there's a lot of engineering that's involved in kind of you know taking care of corner cases and and putting limits etc because the learning system is not perfect and then I as technology progresses we end up relying more and more on learning that's the history of character recognition is a history of speech recognition now computer vision that ronnie was processing and I think the same is going to happen with with the term is driving that currently the the the methods that are closest to providing some level of autonomy some you know a decent level of autonomy where you don't expect a driver to kind of do anything is where you constrain the world so you only run within you know 100 square kilometers or square miles in Phoenix but the weather is nice and the roads are wide it wishes what Weimer is doing you completely over engineer the car with tons of light hours and sophisticated sensors that are too expensive for consumer cars but they're fine if you just run a fleet and you engineer the thing the hell out of the everything else you you map the entire world so you have complete 3d model of everything so the only thing that the perception system has to take care of is moving objects and and and construction and sort of you know things that that weren't in your map and you can engineer a good you know slam system or eye stuff right so so that's kind of the current approach that's closest to some level of autonomy but I think eventually the long term solution is going to rely more and more on learning and possibly using a combination of supervised learning and model-based reinforcement or something like that but ultimately learning will be at not just at the core but really the fundamental part of the system yeah it already is but it'll become more and more what do you think it takes to build a system with human level intelligence you talked about the AI system and then we her being way out of reach our current reach this might be outdated as well but this is still way out of reach what would it take to build her do you think so I can tell you the first two obstacles that we have to clear but I don't know how many obstacles they are after this so the image I usually use is that there is a bunch of mountains that we have to climb and we can see the first one but we don't know if there are 50 mountains behind it or not and this might be a good sort of metaphor for why AI researchers in the past I've been overly optimistic about the result of AI you know for example New Orleans Simon Wright wrote the general problem solver and they call it the general problems you have problems okay and of course if it's you realize is that all the problems you want to solve is financial and so you can't actually use it for anything useful but you know yes oh yeah all you see is the first peak so in general what are the first couple of peaks for her so the first peak which is precisely what I'm working on is self supervisor running high how do we get machines to learn models of the world by observation kind of like babies and like young animals so I we've been working with you know cognitive scientists so this Amanda depuis who is at fair and in Paris is half-time is also a researcher and French University and he he has his chart that shows that which how many months of life baby humans kind of learned different concepts and you can met you can measure this various ways so things like distinguishing animate objects from animate inanimate object you can you can tell the difference at age to three months whether an object is going to stay stable is gonna fall you know about four months you can tell you know things like this and then things like gravity the fact that objects are not supposed to float in the air but as opposed to fall you run this around the age of eight or nine months if you look at a lot of you know eight month old babies you give them a bunch of toys on the highchair first thing they do is it's why I'm on the ground that you look at them it's because you know they're learning about actively learning about gravity gravity yeah okay so they're not trying to know you but they you know they need to do the experiment right yeah so you know how do we get machines to learn like babies mostly by observation with a little bit of interaction and learning those those those models of the world because I think that's really a crucial piece of an intelligent autonomous system so if you think about the architecture of an intelligent autonomous system it needs to have a predictive model of the world so something that says here is a wall that time T here is a stable world at time T plus one if I take this action and it's not a single answer it can be education yeah yeah well but we don't know how to represent distributions in high dimension continuous basis so it's got to be something we care that data Hey but with some summer presentation with certainty if you have that then you can do what optimal control theory is called model predictive control which means that you can run your model with the hypothesis for a sequence of action and then see the result now what you need the other thing you need is some sort of objective that you want to optimize am i reaching the goal of grabbing the subject about minimizing energy am I whatever right so there is some sort of objectives that you have to minimize and so in your head if you had this model you can figure out the sequence of action that will optimize your objective that objective is something that ultimately is rooted in your basal ganglia at least in the human brain that's that's what is available Gambia computes your level of contentment or miss contentment oh no noise that's a word unhappiness okay yeah this contentment this contentment and so your entire behavior is driven towards kind of minimizing that objective which is maximizing your contentment computed by your your basal ganglia and what you have is an objective function which is basically a predictor of what your basal ganglia is going to tell you so you're not going to put your hand on fire because you know it's gonna you know it's gonna burn and you're gonna get hurt and you're predicting this because of your model of the world and your your predictor of this objective right so you if you have those you have those three components you have four components you have the the hard-wired contentment objective good computer if you want calculator and then you have the three components one is the objective predictor which basically predicts your level of contact and one is the model of the world and there's a third module I didn't mention which is a module that will figure out the best course of action to optimize an objective given your model okay yeah cool it's a policy policy network or something like that right now you need those three components to act autonomously intelligently and you can be stupid in three different ways you can be stupid because your model of the world is wrong you can be stupid because your objective is not aligned with what you actually want to achieve okay and in humans that would be a psychopath right and then the the third thing you the third way you can be stupid is that you have the right model you have the right objective but you're unable to figure out a course of action to optimize your objective given your model some people who are in charge of big countries actually have all three that are wrong all right which countries I don't know okay so if we think about this this agent if you think about the movie her you've criticized the art project that is Sophia the robot and what that project essentially does is uses our natural inclination to anthropomorphize things that look like human and given more do you think that could be used by AI systems like in the movie her so do you think that body is needed to create a feeling of intelligence well if Sophia was just an art piece I would have no problem with it but it's presented as something else let me add that comics real quick if creators of Sofia could change something about their marketing or behavior in general what would it be what what's just about everything I mean don't you think here's a tough question I mean so I agree with you so Sofia is not in the general public feels that Sofia can do way more than she actually can that's right and the people will create a Sofia are not honestly publicly communicating trying to teach the public right but here's a tough question don't you think this the same thing is scientists in industry and research are taking advantage of the sameness misunderstanding in the public when they create AI companies or published stuff some companies yes I mean there is no sense of there's no desire to delude there's no desire to kind of over claim what something is done right you know you should paper on AI that you know has this result on image net you know it's pretty clear I mean it's not even not even interesting anymore but you know I I don't think there is that I mean the reviewers are generally not very forgiving of of you know unsupported claims of this type and but there are certainly quite a few startups that have had a huge amount of hype around this that I find extremely damaging and I've been calling it out when I've seen it so yeah but to go back to your original question like the necessity of embodiment I think I don't think embodiment is necessary I think grounding is necessary so I don't think we're gonna get machines that I really understand language without some level of grounding in the world world and it's not clear to me that language is a kind of bandwidth medium to communicate how the real world works I think what this doctor ground our grounding means so running me he's that so there is this classic problem of common sense reasoning you know the the Winograd Winograd schema right and so I tell you the the trophy doesn't fit in the suitcase because this tool is too big what the trophy doesn't fit in the suitcase because it's too small and the it in the first case refers to the trophy in the second case to the suitcase and the reason you can figure this out is because you know what the trophy in the suitcase are you know one is supposed to fit in the other one and you know the notion of size and the big object doesn't fit in a small object and this is a TARDIS you know it things like that right so you have this got this knowledge of how the world works of geometry and things like that I don't believe you can learn everything about the world by just being told in language how the world works I think you need some low-level perception of the world you know be a visual touch you know whatever but some higher bandwidth perceptions of the world but by reading all the world's text you still may not have enough information that's right there's a lot of things that just will never appear in text and that you can't really infer so I think common sense will emerge from you know certainly a lot of language interaction but also with watching videos or perhaps even interacting in the in virtual environments and possibly you know robot interacting in the real world but I don't actually believe necessarily that this last one is absolutely necessary but I think there's a need for some grounding but the final product doesn't necessarily need to be embodied you know who say no it just needs to have an awareness a grounding right but it needs to know how the world works to have you know to not be frustrated frustrating to talk to and you talked about emotions being important that's that's a whole nother topic well so you know I talked about this the the basal ganglia ganglia as the you know this thing that could you know calculates your level of miss contentment contentment and then there is this other module that sort of tries to do a prediction of whether you're going to be content or not that's the source of some emotion so here for example is an anticipation of bad things that can happen to you right you have this inkling that there is some chance that something really bad is gonna happen to you and that creates here when you know for sure that something bad is gonna happen to you you cannot give up right it's not bad anymore it's uncertainty it creates fear so so the punchline is yes we're not gonna have a ton of intelligence without emotions whatever the heck emotions are so you mentioned very practical things of fear but there's a lot of other mess around but there are kind of the results of you know drives yeah there's deeper biological stuff going on and I've talked a few folks on this there's a fascinating stuff that ultimately connects to our joy to our brain if we create an AGI system sorry interminable human level intelligence system and you get to ask her one question what would that question be you know I think the the first one we'll create would probably not be that smart did you like a four-year-old okay so you would have to ask her a question - no she's not that smart yeah well what's a good question to ask you know to be responsive wind and if she answers oh it's because the leaves of the tree are moving in that creates wind she's on to something and if she says yeah that's a stupid question she's really obtuse no and then you tell her actually you know here is the the real thing and she says oh yeah that makes sense so questions that that reveal the ability to do common-sense reasoning about the physical world yeah and you know someone will call 20 ferns causal evidence well it was a huge honor congratulations returning award you know and thank you so much for talking today thank you youthe following is a conversation with Jana kun he's considered to be one of the fathers of deep learning which if you've been hiding under a rock is the recent revolution in AI that's captivated the world with the possibility of what machines can learn from data he's a professor in New York University a vice president and chief AI scientist a Facebook & Co recipient of the Turing Award for his work on deep learning he's probably best known as the founding father of convolutional neural networks in particular their application to optical character recognition and the famed M NIST data set he is also an outspoken personality unafraid to speak his mind in a distinctive French accent and explore provocative ideas both in the rigorous medium of academic research and the somewhat less rigorous medium of Twitter and Facebook this is the artificial intelligence podcast if you enjoy it subscribe on YouTube give it five stars on iTunes support and on patreon we're simply gonna equip me on Twitter Alex Friedman spelled the Fri D ma N and now here's my conversation with Yann Laocoon you said that 2001 Space Odyssey is one of your favorite movies Hal 9000 decides to get rid of the astronauts for people haven't seen the movie spoiler alert because he it she believes that the astronauts they will interfere with the mission do you see how is flawed in some fundamental way or even evil or did he do the right thing neither there's no notion of evil in that in that context other than the fact that people die but it was an example of what people call value misalignment right you give an objective to a machine and the Machine strives to achieve this objective and if you don't put any constraints on this objective like don't kill people and don't do things like this the Machine given the power will do stupid things just to achieve this dis objective or damaging things to achieve its objective it's a little bit like we are used to this in the context of human society we we put in place laws to prevent people from doing bad things because fantasy did we do those bad things right so we have to shave their cost function the objective function if you want through laws to kind of correct an education obviously to sort of correct for for those so maybe just pushing a little further on on that point how you know there's a mission there's a this fuzziness around the ambiguity around what the actual mission is but you know do you think that there will be a time from a utilitarian perspective or an AI system where it is not misalignment where it is alignment for the greater good of society that kneei system will make decisions that are difficult well that's the trick I mean eventually we'll have to figure out how to do this and again we're not starting from scratch because we've been doing this with humans for four millennia so designing objective functions for people is something that we know how to do and we don't do it by you know programming things although the legal code is called code so that tells you something and it's actually the design of an object you function that's really what legal code is right it tells you you can do it what you can't do if you do it you pay that much that's that's an objective function so there is this idea somehow that it's a new thing for people to try to design objective functions are aligned with the common good but no we've been writing laws for millennia and that's exactly what it is so this that's where you know the science of lawmaking and and computer science will come together will come together so it's nothing there's nothing special about how or a I systems is just the continuation of tools used to make some of these difficult ethical judgments that laws make yeah and we and we have systems like this already that you know make many decisions for ourselves in society that you know need to be designed in a way that they like you know rules about things that sometimes sometimes have bad side effects and we have to be flexible enough about those rules so that they can be broken when it's obvious that they shouldn't be applied so you don't see this on the camera here but all the decorations in this room is all pictures from 2001 a Space Odyssey Wow and by accident or is there a lot about accident it's by design Wow so if you were if you were to build hell 10,000 so an improvement of Hal 9000 what would you improve well first of all I wouldn't ask you to hold secrets and tell lies because that's really what breaks it in the end that's the the fact that it's asking itself questions about the purpose of the mission and it's you know pieces things together that it's heard you know all the secrecy of the preparation of the mission and the fact that it was discovery and on the lunar surface that really was kept secret and and one part of Hal's memory knows this and the other part is does not know it and it's supposed to not tell anyone and that creates a internal conflict do you think there's never should be a set of things that night AI system should not be allowed like a set of facts that should not be shared with the human operators well I think no I think the I think it should be a bit like in the design of autonomous AI systems there should be the equivalent of you know the the the oath that hypocrite Oh calm yourself yeah that doctors sign up to right so the certain thing certain rule said that that you have to abide by and we can sort of hardwire this into into our into our machines to kind of make sure they don't go so I'm not you know advocate of the the 303 dollars of Robotics you know the as you move kind of thing because I don't think it's practical but but you know some some level of of limits but but to be clear this is not these are not questions that are kind of really worth asking today because we just don't have the technology to do this we don't we don't have a ton of missing teller machines we have intelligent machines so my intelligent machines that are very specialized but they don't they don't really sort of satisfy an objective they're just you know kind of trained to do one thing so until we have some idea for design of a full-fledged autonomous intelligent system asking the question of how we design use objective I think is a little a little too abstract it's a little tough rat there's useful elements to it in that it helps us understand our own ethical codes humans so even just as a thought experiment if you imagine that in a GI system is here today how would we program it is a kind of nice thought experiment of constructing how should we have a law have a system of laws far as humans it's just a nice practical tool and I think there's echoes of that idea too in the AI systems left today it don't have to be that intelligent yeah like autonomous vehicles there's these things start creeping in that were thinking about but certainly they shouldn't be framed as as hell yeah looking back what is the most I'm sorry if it's a silly question but what is the most beautiful or surprising idea and deep learning or AI in general that you've ever come across sort of personally well you said back and and just had this kind of wow that's pretty cool moment that's nice well surprising I don't know if it's an idea rather than a sort of empirical fact the fact that you gigantic neural nets trying to train them on you know relatively small amounts of data relatively with the caste grid in the center that it actually works breaks everything you read in every textbook right every pre deep learning textbook that told you you need to have fewer parameters and you have data samples you know if you have non-convex objective function you have no guarantee of convergence you know all the things that you read in textbook and they tell you stay away from this and they were all wrong huge number of parameters non-convex and somehow which is very relative to the number of parameters data it's able to learn anything right does that surprise you today well it it was kind of obvious to me before I knew anything that that's that this is a good idea and then it became surprising that it worked because I started reading those text books okay so okay you talk to the intuition of why was obviously if you remember well okay so the intuition was it's it's sort of like you know those people in the late 19th century who proved that heavier than than air flight was impossible right and of course you have birds right they do fly and so on the face of it it it's obviously wrong as an empirical question right and so we have the same kind of thing that you know the we know that the brain works we don't know how but we know it works and we know it's a large network of neurons and interaction and the learning takes place by changing the connection so kind of getting this level of inspiration without copying the details but sort of trying to derive basic principles you know that kind of gives you a clue as to which direction to go there's also the idea somehow that I've been convinced of since I was an undergrad that even before that intelligence is inseparable from running so you the idea somehow that you can create an intelligent machine by basically programming for me was a non-starter you know from the start every intelligent entity that we know about arrives at this intelligence to learning so learning you know machine learning was completely obvious path also because I'm lazy so you know it's automate basically everything and learning is the automation of intelligence right so do you think so what is learning then what what falls under learning because do you think of reasoning is learning where reasoning is certainly a consequence of learning as well just like other functions of of the brain the big question about reasoning is how do you make reasoning compatible with gradient based learning do you think neural networks can be made to reason yes that there's no question about that again we have a good example right the question is is how so the question is how much prior structure you have to put in the neural net so that something like human reasoning will emerge from it you know from running another question is all of our kind of model of what reasoning is that are based on logic are discrete and and and are therefore incompatible with gradient based learning and I was very strong believer in this idea Grandin baserunning I don't believe that other types of learning that don't use kind of gradient information if you want so you don't like discrete mathematics you don't like anything discrete well that's it's not that I don't like it it's just that it's it's incompatible with learning and I'm a big fan of running right so in fact that's perhaps one reason why deep learning has been kind of looked at with suspicion by a lot of computer scientists because the math is very different the method you use for deep running you know we kind of as more to do with you know cybernetics the kind of math you do in electrical engineering then the kind of math you doing computer science and and you know nothing in in machine learning is exact right computer science is all about sort of you know obviously compulsive attention to details of like you know every index has to be right and you can prove that an algorithm is correct right machine learning is the science of sloppiness really that's beautiful so okay maybe let's feel around in the dark of what is a neural network that reasons or a system that is works with continuous functions that's able to do build knowledge however we think about reasoning builds on previous knowledge build on extra knowledge create new knowledge generalized outside of any training set ever built what does that look like if yeah maybe do you have Inklings of thoughts of what that might look like well yeah I mean yes or no if I had precise ideas about this I think you know we'd be building it right now but and there are people working on this or whose main research interest is actually exactly that right so what you need to have is a working memory so you need to have some device if you want some subsystem they can store a relatively large number of factual episodic information for you know a reasonable amount of time so you you know in the in the brain for example it kind of three main types of memory one is the sort of memory of the the state of your cortex and that sort of disappears within 20 seconds you can't remember things for more than about 20 seconds or a minute if if you don't have any other form of memory the second type of memory which is longer term is short term is the hippocampus so you can you know you came into this building you remember whether where the the exit is where the elevators are you have some map of that building that's stored in your hippocampus you might remember something about what I said you know if you minutes ago and forgot all our stars being raised but you know but that does not work in your hippocampus and then the the longer term memory is in the synapse the synapses right so what you need if you want for a system that's capable reasoning is that you want the hippocampus like thing right and that's what people have tried to do with memory networks and you know no Turing machines and stuff like that right and and now with transformers which have sort of a memory in their kind of self attention system you can you can think of it this way so so that's one element you need another thing you need is some sort of network that can access this memory get an information back and then kind of crunch on it and then do this iteratively multiple times because a chain of reasoning is a process by which you you you can you update your knowledge about the state of the world about you know what's gonna happen etc and that there has to be this sort of recurrent operation basically and you think that kind of if we think about a transformer so that seems to be too small to contain the knowledge that's that's to represent the knowledge as containing Wikipedia for example but transformer doesn't have this idea of recurrence it's got a fixed number of layers and that's number of steps that you know limits basically it's a representation but recurrence would build on the knowledge somehow I mean yeah it would evolve the knowledge and expand the amount of information perhaps or useful information within that knowledge yeah but is this something that just can emerge with size because it seems like everything we have now is just no it's not it's not it's not clear how you access and right into an associative memory in efficient way I mean sort of the original memory network maybe had something like the right architecture but if you try to scale up a memory network so that the memory contains all we keep here it doesn't quite work right so so this is a need for new ideas there okay but it's not the only form of reasoning so there's another form of reasoning which is true which is very classical so in some types of AI and it's based on let's call it energy minimization okay so you have some sort of objective some energy function that represents the the the quality or the negative quality okay energy goes up when things get bad and they get low when things get good so let's say you you want to figure out you know what gestures do I need to to do to grab an object or walk out the door if you have a good model of your own body a good model of the environment using this kind of energy minimization you can make a you can make you can do planning and it's in optimal control it's called it's called Marie put model predictive control you have a model of what's gonna happen in the world as consequence for your actions and that allows you to buy energy minimization figure out the sequence of action that optimizes a particular objective function which measures you know minimize the number of times you're gonna hit something and the energy gonna spend doing the gesture and etc so so that's performer reasoning planning is a form of reasoning and perhaps what led to the ability of humans to reason is the fact that or you know species you know that appear before us had to do some sort of planning to be able to hunt and survive and survive the winter in particular and so you know it's the same capacity that you need to have so in your intuition is if you look at expert systems in encoding knowledge as logic systems as graphs in this kind of way is not a useful way to think about knowledge graphs are your brittle or logic representation so basically you know variables that that have values and constraint between them that are represented by rules as well too rigid and too brittle right so one of the you know some of the early efforts in that respect were were to put probabilities on them so a rule you know you know if you have this in that symptom you know you have this disease with that probability and you should describe that antibiotic with that probability right this my sin system from the for the 70s and that that's what that branch of AI led to you know busy networks in graphical models and causal inference and vibrational you know method so so there there is I mean certainly a lot of interesting work going on in this area the main issue with this is is knowledge acquisition how do you reduce a bunch of data to graph of this type near relies on the expert and a human being to encode at add knowledge and that's essentially impractical yeah the question the second question is do you want to represent knowledge symbols and you want to manipulate them with logic and again that's incomparable we're learning so one suggestion with geoff hinton has been advocating for many decades is replace symbols by vectors think of it as pattern of activities in a bunch of neurons or units or whatever you wanna call them and replace logic by continuous functions okay and that becomes now compatible there's a very good set of ideas by region in a paper about 10 years ago by leon go to on who is here at face book the title of the paper is for machine learning to machine reasoning and his idea is that learning learning system should be able to manipulate objects that are in the same space in a space and then put the result back in the same space so is this idea of working memory basically and it's a very enlightening and in the sense that might learn something like the simple expert systems I mean it's with you can learn basic logic operations there yeah quite possibly yeah this is a big debate on sort of how much prior structure you have to put in for this kind of stuff to emerge that's the debate I have with Gary Marcus and people like that yeah yeah so and the other person so I just talked to judea pearl mm-hmm well you mentioned causal inference world his worry is that the current knew all networks are not able to learn what causes what causal inference between things so I think I think he's right and wrong about this if he's talking about the sort of classic type of neural nets people also didn't worry too much about this but there's a lot of people now working on causal inference and there's a paper that just came out last week by Leon Mbutu among others develop his path and push for other people exactly on that problem of how do you kind of you know get a neural net to sort of pay attention to real causal relationships which may also solve issues of bias in data and things like this so I'd like to read that paper because that ultimately the challenges also seems to fall back on the human expert to ultimately decide causality between things people are not very good at its direction causality first of all so first of all you talk to a physicist and physicists actually don't believe in causality because look at the all the busy clause or microphysics are time reversible so there is no causality the arrow of time is not right yeah it's it's as soon as you start looking at macroscopic systems where there is unpredictable randomness where there is clearly an arrow of time but it's a big mystery in physics actually well how that emerges is that emergent or is it part of the fundamental fabric of reality yeah or is it bias of intelligent systems that you know because of the second law of thermodynamics we perceive a particular arrow of time but in fact it's kind of arbitrary right so yeah physicists mathematicians they don't care about I mean the math doesn't care about the flow of time well certainly certainly macro physics doesn't people themselves are not very good at establishing causal causal relationships if you ask is I think it was in one of Seymour Papert spoken on like children learning you know he studied with Jean Piaget you know he's the guy who co-authored the book perceptron with Marvin Minsky that kind of killed the first wave but but he was actually a learning person he in the sense of studying learning in humans and machines that's what he got interested in for scepter on and he wrote that if you ask a little kid about what is the cause of the wind a lot of kids will say they will think for a while and they'll say oh it's the the branches in the trees they move and that creates wind right so they get the causal relationship backwards and it's because their understanding of the world and intuitive physics is not that great right I mean these are like you know four or five year old kids you know it gets better and then you understand that this it can't be right but there are many things which we can because of our common sense understanding of things what people call common sense yeah and we understanding of physics we can there's a lot of stuff that we can figure out causality even with diseases we can figure out what's not causing what often there's a lot of mystery of course but the idea is that you should be able to encode that into systems it seems unlikely to be able to figure that out themselves well whenever we can do intervention but you know all of humanity has been completely deluded for millennia probably since existence about a very very wrong causal relationship where whatever you can explain you attributed to you know some deity some divinity right and that's a cop-out that's the way of saying like I don't know the cause so you know God did it right so you mentioned Marvin Minsky and the irony of you know maybe causing the first day I winter you were there in the 90s you're there in the 80s of course in the 90s what do you think people lost faith and deep learning in the 90s and found it again a decade later over a decade later yeah it wasn't called dethroning yeah it was just called neural nets you know yeah they lost interests I mean I think I would put that around 1995 at least the machine learning community there was always a neural net community but it became disconnected from sort of ministry machine owning if you want there were it was basically electrical engineering that kept at it and computer science just gave up give up on neural nets I don't I don't know you know I was too close to it to really sort of analyze it with sort of a unbiased eye if you want but I would I would I would would make a few guesses so the first one is at the time neural nets were it was very hard to make them work in the sense that you would you know implement back prop in your favorite language and that favorite language was not Python it was not MATLAB it was not any of those things cuz they didn't exist right you had to write it in Fortran or C or something like this right so you would experiment with it you would probably make some very basic mistakes like you know badly initialize your weights make the network too small because you read in the textbook you know you don't want too many parameters right and of course you know and you would train on x4 because you didn't have any other data set to try it on and of course you know it works half the time so we'd say you give up also 22 the batch gradient which you know isn't it sufficient so there's a lot of bag of tricks that you had to know to make those things work or you had to reinvent and a lot of people just didn't and they just couldn't make it work so that's one thing the investment in software platform to be able to kind of you know display things figure out why things don't work and I get a good intuition for how to get them to work have enough flexibility so you can create you know network architectures well completion ads and stuff like that it was hard yeah when you had to write everything from scratch and again you didn't have any Python or MATLAB or anything right so what I read that sorry to interrupt but I read he wrote in in Lisp the first versions of Lynette accomplished in your networks which by the way one of my favorite languages that's how I knew you were legit the Turing Award whatever this would be programmed and list that's still my favorite language but it's not that we programmed in Lisp it's that we had to write or this printer printer okay cuz it's not that's right that's one that existed so we wrote a lisp interpreter that we hooked up to you know back in library that we wrote also for neural net competition and then after a few years around 1991 we invented this idea of basically having modules that know how to forward propagate and back propagate gradients and then interconnecting those modules in a graph loom but who had made proposals on this about this in the late 80s and were able to implement this using all this system eventually we wanted to use that system to make build production code for character recognition at Bell Labs so we actually wrote a compiler for that disp interpreter so that Christy Martin who is now Microsoft kind of did the bulk of it with Leone and me and and so we could write our system in lisp and then compiled to seee and then we'll have a self-contained complete system that could kind of do the entire thing neither Python or turn pro can do this today yeah okay it's coming yeah I mean there's something like that in Whitehorse called you know tor script and so you know we had to write or Lisp interpreter which retinol is compiler way to invest a huge amount of effort to do this and not everybody if you don't completely believe in the concept you're not going to invest the time to do this right now at the time also you know it were today this would turn into torture by torture and so for whatever we put it in open-source everybody would use it and you know realize it's good back before 1995 working at AT&T there's no way the lawyers would let you release anything in open source of this nature and so we could not distribute our code really and at that point and sorry to go on a million tangents but on that point I also read that there was some almost pad like a patent on convolution your network yes it was labs so that first of all I mean just to actually that ran out the thankfully 8007 in 2007 that what look can we can we just talk about that first I know you're a facebook but you're also done why you and and what does it mean patent ideas like these software ideas essentially or what are mathematical ideas or what are they okay so they're not mathematical idea so there are you know algorithms and there was a period where the US Patent Office would allow the patent of software as long as it was embodied the Europeans are very different they don't they don't quite accept that they have a different concept but you know I don't I know no I mean I never actually strongly believed in this but I don't believe in this kind of patent Facebook basically doesn't believe in this kind of pattern Google Files patterns because they've been burned with Apple and so now they do this for defensive purpose but usually they say we're not going to see you if you infringe Facebook has a similar policy they say you know we file pattern on certain things for defensive purpose we're not going to see you if you infringe unless you sue us so the the industry does not believe in in patterns they are there because of you know the legal landscape and and and various things but but I don't really believe in patterns for this kind of stuff yes so that's that's a great thing so I tell you a war story yeah you so what happens was the the first the first pattern of a condition that was about kind of the early version Congress on that that didn't have separate pudding layers it had the conditional layers which tried more than one if you want right and then there was a second one on commercial nets with separate pudding layers train with back probably in 89 and 1992 something like this at the time the life life of a pattern was 17 years so here's what happened over the next few years is that we started developing character recognition technology around commercial Nets and in 1994 a check reading system was deployed in ATM machines in 1995 it was for a large check reading machines in back offices etc and those systems were developed by an engineering group that we were collaborating with AT&T and they were commercialized by NCR which at the time was a subsidiary of AT&T now it ain't he split up in 1996 99 in 1996 and the lawyers just looked at all the patterns and they distributed the patterns among the various companies they gave the the commercial net pattern to NCR because they were actually selling products that used it but nobody I didn't see are at any idea where they come from that was yeah okay so between 1996 and 2007 there's a whole period until 2002 I didn't actually work on machine on your couch on that I resumed working on this around 2002 and between 2002 and 2007 I was working on them crossing my finger that nobody and NCR would notice nobody noticed yeah and I and I hope that this kind of somewhat as you said lawyers decide relative openness of the community now will continue it accelerates the entire progress of the industry and you know the problems that Facebook and Google and others are facing today is not whether Facebook or Google or Microsoft or IBM or whoever is ahead of the other it's that we don't have the technology to build the things we want to build we only build intelligent virtual systems that have common sense we don't have a monopoly on good ideas for this we don't believe with you maybe others do believe they do but we don't okay if a start-up tells you they have the secret to you know human level intelligence and common sense don't believe them they don't and it's going to take the entire work of the world research community for a while to get to the point where you can go off and in each of the company is going to start to build things on this we're not there yet it's absolutely in this this calls to the the gap between the space of ideas and the rigorous testing of those ideas of practical application that you often speak to you've written advice saying don't get fooled by people who claim to have a solution to artificial general intelligence who claim to have an AI system that work just like the human brain or who claim to have figured out how the brain works ask them what the error rate they get on em 'no store imagenet this is a little dated by the way that mean five years who's counting okay but i think your opinion it's the Amna stand imagenet yes may be data there may be new benchmarks right but i think that philosophy is one you still and and somewhat hold that benchmarks and the practical testing the practical application is where you really get to test the ideas well it may not be completely practical like for example you know it could be a toy data set but it has to be some sort of task that the community as a whole has accepted as some sort of standard you know kind of benchmark if you want it doesn't need to be real so for example many years ago here at fair people you know chosen Western art one born and a few others proposed the the babbitt asks which were kind of a toy problem to test the ability of machines to reason actually to access working memory and things like this and it was very useful even though it wasn't a real task amnesties kind of halfway a real task so you know toy problems can be very useful it's just that i was really struck by the fact that a lot of people particularly our people with money to invest would be fooled by people telling them oh we have you know the algorithm of the cortex and you should give us 50 million yes absolutely so there's a lot of people who who tried to take advantage of the hype for business reasons and so on but let me sort of talk to this idea that new ideas the ideas that push the field forward may not yet have a benchmark or it may be very difficult to establish a benchmark I agree that's part of the process establishing benchmarks is part of the process so what are your thoughts about so we have these benchmarks on around stuff we can do with images from classification to captioning to just every kind of information can pull off from images and the surface level there's audio datasets there's some video what can we start natural language what kind of stuff what kind of benchmarks do you see they start creeping on to more something like intelligence like reasoning like maybe you don't like the term but AGI echoes of that kind of yeah sort of elation a lot of people are working on interactive environments in which you can you can train and test intelligent systems so so there for example you know it's the classical paradigm of supervised running is that you you have a data set you partition it into a training site validation set test set and there's a clear protocol right but what if the that assumes that this apples are statistically independent you can exchange them the order in which you see them doesn't shouldn't matter you know things like that but what if the answer you give determines the next sample you see which is the case for example in robotics right you robot does something and then it gets exposed to a new room and depending on where it goes the room would be different so that's the decrease the exploration problem the what if the samples so that creates also a dependency between samples right you you if you move if you can only move it in in space the next sample you're gonna see is going to be probably in the same building most likely so so so the all the assumptions about the validity of this training set test set a potus's break whatever a machine can take an action that has an influence in the in the world and it's what is going to see so people are setting up artificial environments where what that takes place right the robot runs around a 3d model of a house and can interact with objects and things like this how you do robotics by simulation you have those you know opening a gym type thing or mu Joko kind of simulated robots and you have games you know things like that so that that's where the field is going really this kind of environment now back to the question of a GI like I don't like the term a GI because it implies that human intelligence is general and human intelligence is nothing like general it's very very specialized we think it's general we'd like to think of ourselves as having your own science we don't we're very specialized we're only slightly more general than why does it feel general so you kind of the term general I think what's impressive about humans is ability to learn as we were talking about learning to learn in just so many different domains is perhaps not arbitrarily general but just you can learn in many domains and integrate that knowledge somehow okay that knowledge persists so let me take a very specific example yes it's not an example it's more like a a quasi mathematical demonstration so you have about 1 million fibers coming out of one of your eyes okay two million total but let's let's talk about just one of them it's 1 million nerve fibers your optical nerve let's imagine that they are binary so they can be active or inactive right so the input to your visual cortex is 1 million bits now they connected to your brain in a particular way on your brain has connections that are kind of a little bit like accomplish on that they're kind of local you know in space and things like this I imagine I play a trick on you it's a pretty nasty trick I admit I I cut your optical nerve and I put a device that makes a random perturbation of a permutation of all the nerve fibers so now what comes to your to your brain is a fixed but random permutation of all the pixels there's no way in hell that your visual cortex even if I do this to you in infancy will actually learn vision to the same level of quality that you can got it and you're saying there's no way you ever learn that no because now two pixels that on your body in the world will end up in very different places in your visual cortex and your neurons there have no connections with each other because they only connect it locally so this whole our entire the hardware is built in many ways to support the locality of the real world yeah yes that's specialization yep okay it's still now really damn impressive so it's not perfect generalization I even closed no no it's it's it's it's not that it's not even close it's not at all yes it's socialize so how many boolean functions so let's imagine you want to train your visual system to you know recognize particular patterns of those 1 million bits ok so that's a boolean function right either the pattern is here or not here this is a to to a classification with 1 million binary inputs how many such boolean functions are there okay if you have 2 to the 1 million combinations of inputs for each of those you have an output bit and so you have 2 to the 2 to the 1 million boolean functions of this type okay which is an unimaginably large number how many of those functions can actually be computed by your visual cortex and the answer is a tiny tiny tiny tiny tiny tiny sliver like an enormous little tiny sliver yeah yeah so we are ridiculously specialized you know okay but okay that's an argument against the word general I think there's there's a I there's I agree with your intuition but I'm not sure it's it seems the breath the the brain is impressively capable of adjusting to things so it's because we can't imagine tasks that are outside of our comprehension right we think we think we are general because we're general of all the things that we can apprehend so yeah but there is a huge world out there of things that we have no idea we call that heat by the way heat heat so at least physicists call that heat or they call it entropy which is kokkonen you have a thing full of gas right call system for gas right goes on a coast it has you know pressure it has temperature has you know and you can write the equations PV equal NRT you know things like that right when you reduce a volume the temperature goes up the pressure goes up you know things like that right for perfect gas at least those are the things you can know about that system and it's a tiny tiny number of bits compared to the complete information of the state of the entire system because the state when HR system will give you the position and momentum of every every molecule of the gas and what you don't know about it is the entropy and you interpret it as heat the energy containing that thing is is what we call heat now it's very possible that in fact there is some very strong structure in how those molecules are moving is just that they are in a way that we are just not wired to perceive they are ignorant to it and there's in your infinite amount of things we're not wired to perceive any right that's a nice way to put it well general to all the things we can imagine which is a very tiny a subset of all things that are possible it was like coma growth complexity or the coma was charged in some one of complexity you know every bit string or every integer is random except for all the ones that you can actually write down yeah okay so beautifully put but you know so we can just call it artificial intelligence we don't need to have a general whatever novel human of all Nutella transmissible oh you know you'll start anytime you touch human it gets it gets interesting because you know it's just because we attach ourselves to human and it's difficult to define with human intelligences yeah nevertheless my definition is maybe damn impressive intelligence ok damn impressive demonstration of intelligence whatever and so on that topic most successes in deep learning have been in supervised learning what is your view on unsupervised learning is there a hope to reduce involvement of human input and still have successful systems that are have practically used yeah I mean there's definitely a hope is it's more than a hope actually it's it's you know mounting evidence for it and that's basically or I do like the only thing I'm interested in at the moment is I call it self supervised running not unsupervised cuz unsupervised running is a loaded term people who know something about machine learning you know tell us how you doing clustering or PCA yeah she's nice and the way public we know when you say enterprise only oh my god you know machines are gonna learn by themselves and without supervision you know there's the parents yeah so so I could sell supervised learning because in fact the underlying algorithms that I use are the same algorithms as the supervised learning algorithms except that what we trained them to do is not predict a particular set of variables like the category of an image and and not to predict a set of variables that have been provided by human labelers but what you're trying to machine to do is basically reconstruct a piece of its input that it's being this being masked masked out essentially you can think of it this way right so show a piece of a video to a machine and ask it to predict what's gonna happen next and of course after a while you can show what what happens and the machine will kind of train itself to do better at that task you can do like all the latest most successful models the natural language processing use cell supervised running you know sort of bird style systems for example right you show it a window of a thousand words on a test corpus you take out 15% of the words and then you train a machine to predict the words that are missing that's out supervised running it's not predicting the future it's just you know predicting things in middle but you could have you predict the future that's what language models do so you construct it so in an unsupervised way you construct a model of language do you think or video or the physical world or whatever right how far do you think that can take us do you think very far it understands anything to some level it has you know a shallow understanding of of text but it needs to I mean to have kind of true human level intelligence I think you need to ground language in reality so some people are attempting to do this right having systems that can I have some visual representation of what what is being talked about which is one reason you need interactive environments actually this is like a huge technical problem that is not solved and that explains why such super versioning works in the context of natural language that does not work in the context on at least not well in the context of image recognition and video although it's making progress quickly and the reason that reason is the fact that it's much easier to represent uncertainty in the prediction you know context of natural language than it is in the context of things like video and images so for example if I ask you to predict what words are missing you know 15 percent of the words that I've taken out the possibility is small that means small right there is 100,000 words in the in the lexicon and what the Machine spits out is a big probability vector right it's a bunch of numbers between 0 & 1 that's 1 to 1 and we know how to do how to do this with computers so they are representing uncertainty in the prediction is relatively easy and that's in my opinion why those techniques work for NLP for images if you ask if you block a piece of an image and you as a system reconstruct that piece of the image there are many possible answers there are all perfectly legit right and how do you represent that the set of possible answers you can't train a system to make one prediction you can train a neural net to say here it is that's the image because it's there's a whole set of things that are compatible with it so how do you get the machine to represent not a single output but all set of outputs and you know similarly with video prediction there's a lot of things that can happen in the future video you're looking at me right now I'm not moving my head very much but you know I might you know what turn my my head to the left or to the right right if you don't have a system that can predict this and you train it with least Square to kind of minimize the error with the prediction and what I'm doing what you get is a blurry image of myself in all possible future positions that I might be in which is not a good prediction but so there might be other ways to do the self supervision right for visual scenes like what if i I mean if I knew I wouldn't tell you publish it first I don't know I know there might be so I mean these are kind of there might be artificial ways of like self play in games the way you can simulate part of the environment you can oh that doesn't solve the problem it's just a way of generating data but because you have more of a country might mean you can control yeah it's a way to generate data and that's right and because you can do huge amounts of data generation that doesn't you write this well it's it's a creeps up on the problem from the side of data and you don't think that's the right way to it doesn't solve this problem of handling uncertainty in the world right so if you if you have a machine learn a predictive model of the world in a game that is deterministic or quasi deterministic it's easy right just you know give a few frames of the game to a combat put a bunch of layers and then half the game generates the next few frames and and if the game is deterministic it works fine and that includes you know feeding the system with the action that your little character is going to take the problem comes from the fact that the real world and certain most games are not entirely predictable that's what they're you get those blurry predictions and you can't do planning with very predictions all right so if you have a perfect model of the world you can in your head run this model with a hypothesis for a sequence of actions and you're going to predict the outcome of that sequence of actions but if your model is imperfect how can you plan yeah it quickly explodes what are your thoughts on the extension of this which topic I'm super excited about it's connected to something you're talking about in terms of robotics is active learning so as opposed to sort of unemployed and supervisors self supervised learning you ask the system for human help right for selecting parts you want annotated next so if you talk about a robot exploring a space or a baby exploring a space or a system exploring a data set every once in a while asking for human input you see value in that kind of work I don't see transformative value it's going to make things that we can already do more efficient or they will learn slightly more efficiently but it's not going to make machines sort of significantly more intelligent I think and I and by the way there is no opposition there is no conflict between self supervisor on reinforcement learning and supervisor on your imitation learning or active learning I see sub super wrestling as a as a preliminary to all of the above yes so the example I use very often is how is it that so if you use enforcement running deep enforcement running if you want the best methods today was so-called model free enforcement training to learn to play Atari games take about 80 hours of training to reach the level that any human can reach in about 15 minutes they get better than humans but it takes a long time alpha star okay the you know are your videos and his team's the system to play to to play Starcraft plays you know a single map a single type of player and which better than human level is about the equivalent of 200 years of training playing against itself it's 200 years right it's not something that no no human can could every I'm not sure what it doesn't take away from that okay now take those algorithms the best our algorithms we have today to train a car to drive itself it would probably have to drive millions of hours you will have to kill thousands of pedestrians it will have to run into thousands of trees it will have to run off cliffs and you had to run the cliff multiple times before it figures out it's a bad idea first of all yeah and second of all the figures that had not to do it and so I mean this type of running obviously does not reflect the kind of running that animals and humans do there is something missing that's really really important there and my apart is is which have been advocating for like five years now is that we have predictive models of the world that include the ability to predict under uncertainty and what allows us to not run off a cliff when we learn to drive most of us can learn to drive in about 20 or 30 hours of training without ever crashing causing any accident if we drive next to a cliff we know that if we turn the wheel to the right the car is going to run off the cliff and nothing good is gonna come out of this because we have a pretty good model of intuitive physics that tells us you know the car is gonna fall we know we know about gravity babies run this around the age of eight or nine months that objects don't float they fall and you know we have a pretty good idea of the effect of turning the wheel of the car and you know we know we need to stay on the road so there is a lot of things that we bring to the table which is basically or predictive model of the world and that model allows us to not do stupid things and to basically stay within the context of things we need to do we still face you know unpredictable situations and that's how we learn but that allows us to learn really really really quickly so that's called model-based reinforcement running there's some imitation and supervised running because we have a driving instructor that tells us occasionally what to do but most of the learning is Mauro bass is learning the model yeah running physics that we've done since we were babies that's where all almost all are learning and the physics is somewhat transferable from is transferable from scene to scene stupid things are the same everywhere yeah I mean if you you know you have experience of the world you don't need to be particularly from a particularly intelligent species to know that if you spill water from a container you know the rest is gonna get wet and you might get wet so you know cats know this right yeah so the main problem we need to solve is how do we learn models of the world that's and that's what I'm interesting that's what's a supervised learning is all about if you were to try to construct a benchmark for let's let's look at happiness I'd love that dataset but if you do you think it's useful interesting / possible to perform well on eminence with just one example of each digit and how would we solve that problem yeah so it's probably yes the question is what other type of running are you allowed to do so if what you like to do is train on some gigantic data set of labelled digit that's called transfer running and we know that works okay we do this at Facebook like in production right we we train large commercial nets to predict hashtags that people type on Instagram and we train on billions of images literally billions and and then we chop off the last layer and fine-tune on whatever task we want that works really well you can be you know the image net record with we actually open source the whole thing like a few weeks ago yeah that's still pretty cool but yeah so what in yet won't be impressive and what's useful an impressive what kind of transfer learning would be useful impressive is it Wikipedia that kind of thing no no I don't think transfer learning is really where we should focus we should try to do you know have a kind of scenario for benchmark where you have only ball data and you can and it's very large number of enabled data it could be video clips it could be what you do you know frame prediction it could be images you could choose to you know mask a piece of it it could be whatever but they're only bold and you're not allowed to label them so you do some training on this and then you train on a particular supervised task imagenet or nist and you measure how your test our decrease or variation error decreases as you increase the number of label training samples okay and and what what you would like to see is is that you know your your error decreases much faster than if you trained from scratch from random weights so that to reach the same level of performance and a completely supervised purely supervised system would reach you would need way fewer samples so that's the crucial question because it will answer the question to like you know people are interested in medical image analysis okay you know if I want to get to a particular level of error rate for this task I know I need a million samples can I do you know soft supervised pre-training to reduce this to about 100 or something anything the answer there is soft supervised retraining yep some form some form of it telling you active learning but you disagree you know it's not useless it's just not gonna lead to a quantum leap it's just gonna make things that we already do so you're way smarter than me I just disagree with you but I don't have anything to back that it's just intuition so I've worked a lot of large-scale data sets and there's something there might be magic and active learning but okay at least I said it publicly at least some being an idea publicly okay it's not bigoted yet it's you know working with the data you have I mean I mean certainly people are doing things like okay I have three thousand hours of you know imitation running for in car but most of those are incredibly boring what I like is select you know 10% of them that are kind of the most informative and with just that I would probably reach the same so it's a weak form of of active running if you want yes but there might be a much stronger version yeah that's right that's what another notion question is the question is how much talking yet Elon Musk is confident talk to him recently he's confident that large-scale data and deep learning can solve the autonomous driving problem what are your thoughts on the limitless possibilities of deep learning in this space I was it's obviously part of the solution I mean I don't think we'll ever have a set driving system or it is not in the foreseeable future that does not use deep running you put it this way now how much of it so in the history of sort of engineering particularly is sort of sort of a I like systems is generally your first phase where everything is built by hand and it was the second phase and that was the case for autonomous driving you know 23 years ago there's a phase where this a little bit of running is used but there's a lot of engineering that's involved in kind of you know taking care of corner cases and and putting limits etc because the learning system is not perfect and then I as technology progresses we end up relying more and more on learning that's the history of character recognition is a history of speech recognition now computer vision that ronnie was processing and I think the same is going to happen with with the term is driving that currently the the the methods that are closest to providing some level of autonomy some you know a decent level of autonomy where you don't expect a driver to kind of do anything is where you constrain the world so you only run within you know 100 square kilometers or square miles in Phoenix but the weather is nice and the roads are wide it wishes what Weimer is doing you completely over engineer the car with tons of light hours and sophisticated sensors that are too expensive for consumer cars but they're fine if you just run a fleet and you engineer the thing the hell out of the everything else you you map the entire world so you have complete 3d model of everything so the only thing that the perception system has to take care of is moving objects and and and construction and sort of you know things that that weren't in your map and you can engineer a good you know slam system or eye stuff right so so that's kind of the current approach that's closest to some level of autonomy but I think eventually the long term solution is going to rely more and more on learning and possibly using a combination of supervised learning and model-based reinforcement or something like that but ultimately learning will be at not just at the core but really the fundamental part of the system yeah it already is but it'll become more and more what do you think it takes to build a system with human level intelligence you talked about the AI system and then we her being way out of reach our current reach this might be outdated as well but this is still way out of reach what would it take to build her do you think so I can tell you the first two obstacles that we have to clear but I don't know how many obstacles they are after this so the image I usually use is that there is a bunch of mountains that we have to climb and we can see the first one but we don't know if there are 50 mountains behind it or not and this might be a good sort of metaphor for why AI researchers in the past I've been overly optimistic about the result of AI you know for example New Orleans Simon Wright wrote the general problem solver and they call it the general problems you have problems okay and of course if it's you realize is that all the problems you want to solve is financial and so you can't actually use it for anything useful but you know yes oh yeah all you see is the first peak so in general what are the first couple of peaks for her so the first peak which is precisely what I'm working on is self supervisor running high how do we get machines to learn models of the world by observation kind of like babies and like young animals so I we've been working with you know cognitive scientists so this Amanda depuis who is at fair and in Paris is half-time is also a researcher and French University and he he has his chart that shows that which how many months of life baby humans kind of learned different concepts and you can met you can measure this various ways so things like distinguishing animate objects from animate inanimate object you can you can tell the difference at age to three months whether an object is going to stay stable is gonna fall you know about four months you can tell you know things like this and then things like gravity the fact that objects are not supposed to float in the air but as opposed to fall you run this around the age of eight or nine months if you look at a lot of you know eight month old babies you give them a bunch of toys on the highchair first thing they do is it's why I'm on the ground that you look at them it's because you know they're learning about actively learning about gravity gravity yeah okay so they're not trying to know you but they you know they need to do the experiment right yeah so you know how do we get machines to learn like babies mostly by observation with a little bit of interaction and learning those those those models of the world because I think that's really a crucial piece of an intelligent autonomous system so if you think about the architecture of an intelligent autonomous system it needs to have a predictive model of the world so something that says here is a wall that time T here is a stable world at time T plus one if I take this action and it's not a single answer it can be education yeah yeah well but we don't know how to represent distributions in high dimension continuous basis so it's got to be something we care that data Hey but with some summer presentation with certainty if you have that then you can do what optimal control theory is called model predictive control which means that you can run your model with the hypothesis for a sequence of action and then see the result now what you need the other thing you need is some sort of objective that you want to optimize am i reaching the goal of grabbing the subject about minimizing energy am I whatever right so there is some sort of objectives that you have to minimize and so in your head if you had this model you can figure out the sequence of action that will optimize your objective that objective is something that ultimately is rooted in your basal ganglia at least in the human brain that's that's what is available Gambia computes your level of contentment or miss contentment oh no noise that's a word unhappiness okay yeah this contentment this contentment and so your entire behavior is driven towards kind of minimizing that objective which is maximizing your contentment computed by your your basal ganglia and what you have is an objective function which is basically a predictor of what your basal ganglia is going to tell you so you're not going to put your hand on fire because you know it's gonna you know it's gonna burn and you're gonna get hurt and you're predicting this because of your model of the world and your your predictor of this objective right so you if you have those you have those three components you have four components you have the the hard-wired contentment objective good computer if you want calculator and then you have the three components one is the objective predictor which basically predicts your level of contact and one is the model of the world and there's a third module I didn't mention which is a module that will figure out the best course of action to optimize an objective given your model okay yeah cool it's a policy policy network or something like that right now you need those three components to act autonomously intelligently and you can be stupid in three different ways you can be stupid because your model of the world is wrong you can be stupid because your objective is not aligned with what you actually want to achieve okay and in humans that would be a psychopath right and then the the third thing you the third way you can be stupid is that you have the right model you have the right objective but you're unable to figure out a course of action to optimize your objective given your model some people who are in charge of big countries actually have all three that are wrong all right which countries I don't know okay so if we think about this this agent if you think about the movie her you've criticized the art project that is Sophia the robot and what that project essentially does is uses our natural inclination to anthropomorphize things that look like human and given more do you think that could be used by AI systems like in the movie her so do you think that body is needed to create a feeling of intelligence well if Sophia was just an art piece I would have no problem with it but it's presented as something else let me add that comics real quick if creators of Sofia could change something about their marketing or behavior in general what would it be what what's just about everything I mean don't you think here's a tough question I mean so I agree with you so Sofia is not in the general public feels that Sofia can do way more than she actually can that's right and the people will create a Sofia are not honestly publicly communicating trying to teach the public right but here's a tough question don't you think this the same thing is scientists in industry and research are taking advantage of the sameness misunderstanding in the public when they create AI companies or published stuff some companies yes I mean there is no sense of there's no desire to delude there's no desire to kind of over claim what something is done right you know you should paper on AI that you know has this result on image net you know it's pretty clear I mean it's not even not even interesting anymore but you know I I don't think there is that I mean the reviewers are generally not very forgiving of of you know unsupported claims of this type and but there are certainly quite a few startups that have had a huge amount of hype around this that I find extremely damaging and I've been calling it out when I've seen it so yeah but to go back to your original question like the necessity of embodiment I think I don't think embodiment is necessary I think grounding is necessary so I don't think we're gonna get machines that I really understand language without some level of grounding in the world world and it's not clear to me that language is a kind of bandwidth medium to communicate how the real world works I think what this doctor ground our grounding means so running me he's that so there is this classic problem of common sense reasoning you know the the Winograd Winograd schema right and so I tell you the the trophy doesn't fit in the suitcase because this tool is too big what the trophy doesn't fit in the suitcase because it's too small and the it in the first case refers to the trophy in the second case to the suitcase and the reason you can figure this out is because you know what the trophy in the suitcase are you know one is supposed to fit in the other one and you know the notion of size and the big object doesn't fit in a small object and this is a TARDIS you know it things like that right so you have this got this knowledge of how the world works of geometry and things like that I don't believe you can learn everything about the world by just being told in language how the world works I think you need some low-level perception of the world you know be a visual touch you know whatever but some higher bandwidth perceptions of the world but by reading all the world's text you still may not have enough information that's right there's a lot of things that just will never appear in text and that you can't really infer so I think common sense will emerge from you know certainly a lot of language interaction but also with watching videos or perhaps even interacting in the in virtual environments and possibly you know robot interacting in the real world but I don't actually believe necessarily that this last one is absolutely necessary but I think there's a need for some grounding but the final product doesn't necessarily need to be embodied you know who say no it just needs to have an awareness a grounding right but it needs to know how the world works to have you know to not be frustrated frustrating to talk to and you talked about emotions being important that's that's a whole nother topic well so you know I talked about this the the basal ganglia ganglia as the you know this thing that could you know calculates your level of miss contentment contentment and then there is this other module that sort of tries to do a prediction of whether you're going to be content or not that's the source of some emotion so here for example is an anticipation of bad things that can happen to you right you have this inkling that there is some chance that something really bad is gonna happen to you and that creates here when you know for sure that something bad is gonna happen to you you cannot give up right it's not bad anymore it's uncertainty it creates fear so so the punchline is yes we're not gonna have a ton of intelligence without emotions whatever the heck emotions are so you mentioned very practical things of fear but there's a lot of other mess around but there are kind of the results of you know drives yeah there's deeper biological stuff going on and I've talked a few folks on this there's a fascinating stuff that ultimately connects to our joy to our brain if we create an AGI system sorry interminable human level intelligence system and you get to ask her one question what would that question be you know I think the the first one we'll create would probably not be that smart did you like a four-year-old okay so you would have to ask her a question - no she's not that smart yeah well what's a good question to ask you know to be responsive wind and if she answers oh it's because the leaves of the tree are moving in that creates wind she's on to something and if she says yeah that's a stupid question she's really obtuse no and then you tell her actually you know here is the the real thing and she says oh yeah that makes sense so questions that that reveal the ability to do common-sense reasoning about the physical world yeah and you know someone will call 20 ferns causal evidence well it was a huge honor congratulations returning award you know and thank you so much for talking today thank you you\n"