deeplearning.ai's Heroes of Deep Learning - Yoshua Bengio

**The Power of Practice: Advice from Andrew Ng on Deep Learning**

Andrew Ng is a renowned expert in deep learning and artificial intelligence. In this special recording, he shares his insights and advice on how to become proficient in deep learning, whether you're looking to pursue a career as a researcher or an engineer who wants to use deep learning to build products.

**Understanding the Phenomena of Interest**

Ng begins by emphasizing that understanding the phenomena of interest is crucial in deep learning. "What makes for example training in deeper networks harder or any current nets harder," he asks. We still have much to learn about this topic, and designing experiments whose goal is not to develop better algorithms but to understand better the existing ones can be incredibly valuable. By doing so, we can identify what circumstances make particular algorithms work better and why.

**The Importance of Science**

At its core, science is about understanding the "why" behind a phenomenon. Ng emphasizes that it's essential to ask questions like "what makes something work?" or "why does this algorithm perform better in certain situations?" This curiosity-driven approach is what drives scientific progress. By seeking answers and understanding the underlying principles, we can develop more effective algorithms and models.

**Motivations for Becoming a Deep Learning Researcher**

Ng notes that becoming a deep learning researcher requires a different level of understanding than if you want to be an engineer who uses deep learning to build products. However, in both cases, practice is essential. "You need to read a lot, practice, and program things yourself," he advises. Ng also emphasizes the importance of not relying solely on software frameworks or pre-written code. Instead, he recommends trying to implement algorithms from first principles or derive them from scratch.

**Reading Materials**

Ng recommends reading materials such as books and academic papers to deepen your understanding of deep learning. He praises a highly-regarded book by Yann LeCun, which has become extremely popular among readers. Ng also suggests checking out Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), as well as top conferences like NIPS and ICML.

**The Role of Mathematics**

Ng notes that having a strong background in mathematics is crucial for becoming proficient in deep learning. However, he emphasizes that it's not necessary to have five years of PhD-level education to become skilled. With the right training in computer science and math, you can learn enough to use deep learning and start building projects within just a few months.

**Continuous Learning**

Ng stresses the importance of continuous learning, regardless of whether you're pursuing a career as a researcher or an engineer. To develop intuitions about how algorithms work, you need to practice and experiment with different approaches. By doing so, you'll become more proficient in deep learning and be able to identify areas where existing algorithms fall short.

**Getting Started**

Finally, Ng offers advice for those who want to get started in deep learning. He emphasizes the importance of having a solid foundation in computer science and math, even if it means pursuing additional training or courses beyond what's typically covered in introductory computer science classes. By following these tips and staying committed to continuous learning, you can become proficient in deep learning and start building projects that make a meaningful impact.

**Conclusion**

Andrew Ng's advice on deep learning highlights the importance of practice, curiosity-driven scientific inquiry, and continuous learning. Whether you're pursuing a career as a researcher or an engineer, developing your skills in deep learning requires dedication, persistence, and a willingness to learn. By following Ng's tips and staying committed to lifelong learning, you can unlock the full potential of deep learning and make meaningful contributions to the field.

"WEBVTTKind: captionsLanguage: enhow y'all sure I'm really guy you could join us yesterday I'm very glad to you know today you're not just saved researcher or engineer in deep learning you've become one of the institution's and one of the icons of deep learning but really like to hear the story of how it started so how did you end up you know getting into deep learning and then pursuing this journey right well actually it started when I was a kid adolescent reading a lot of science fiction like I guess many of us and when I started my graduate studies in 1985 I started reading you know led papers and and that's where I got all excited and it became really a passion and and actually what was that like in what mid eighties right 1985 reading these papers do you remember yeah it was well you know coming from the courses I had taken in classical ai with expert systems and suddenly discovering that there was all this world of thinking about how humans might be learning and human intelligence and how we might draw connections between that and and artificial intelligence and computers that was really exciting for me when I discovered this literature and I started reading the connectionists of course so the papers from geoff hinton real art and so on and I worked on recurrent Nets I worked on speech recognition I worked on hmmso graphical models and then quickly moved to AT&T Bell Labs and MIT where I did postdocs and that's where I discovered some of the issues with the long term dependencies with training neural nets and then shortly after I got recruited udm back in Montreal where I had spent most of my child I know lessons years um you know and so as someone who's been there for the last several decades and seen you know seeing it also he's seen a lot of it um tell me about how you're thinking about deep learning about neural networks has evolved over this over this time we start with experiments with intuitions and and theory sort of comes later and we now understand a lot better for example why backdrop is working so well why depth is so important and and you know these kinds of notions we we didn't have any solid justification for in those days when when we started working on deep nets in the early 2000s we had the intuition that we made a lot of sense that a deeper network should be more powerful and we didn't know how to you know take that and prove it and of course our experiments initially didn't work and actually well there were the most important things that you think turned out to be right and what were the biggest surprises of what turns out to be wrong you know compared to what we knew thirty years ago oh sure so one of the biggest mistake I made was to think that like everyone else in the 90s thought you needed smooth nonlinearities in order to for backprop to work because I thought that you know if if we had something like rectifying non-linearity's where you have a flat part that it would be really hard to train because the derivative would be zero in so many places and when we started experimented with experimenting with relu with deepness in around 2010 you know III was obsessed with the idea that oh we should be careful about whether neurons won't saturate too much on the zero part but in the end it turned out that actually the relu was working a lot better than the sigmoids and attached and that was a big surprise we did this exploring this because of the biological connection actually not because we thought that it would be easier to optimize but it turned out work better whereas I thought it would be harder to drink so let me ask you what is the relationship between deep learning and the brain there's the obvious answer but I'm curious was your answer to that ah well the initial insight that really got me excited with neural nets was this idea from the connectionists that information is distributed across the activation of many neurons rather than being represented by and sort of the grandmother cells that were calling it were a symbolic representation that was the traditional view in classical yet right and I still believe this is a really important thing and I see people we just discovering the importance of that even even though recently so so that's that was really a foundation the death thing is something that came later in the thousands it wasn't something I was thinking about it and I nice for sure for example I remember you build a lot of relatively shallow but very disputed representations for the word embeddings very very early on then that's right yeah that's that's one of the things that I got really excited about in the late 90s was actually my brother Sonny and I worked on the idea that we could use neon Nets to tackle the curse of dimensionality which was believed to be one of the central issues with statistical learning and the fact that we could have these representations could be used to represent joint distributions over many random variables in a very efficient way and it turned out to work quite well and then I extended this to joint solutions over sequences of words and this is how the word embeddings were born because I thought oh you know this will allow analyzation across words that have similar semantic meaning and Poussin so over the last you know a couple decades your research group has invented more ideas than anyone can summarize in a few minutes sound Chris what are the inventions or ideas yeongho's proud of from your group right so I think I mentioned long-term dependencies the study of that I think people still don't understand it well enough then there's the story I mentioned about curse with emission allottee joint distributions with neural nets which which became more recently the the nadir that's Hugo's and Rochelle did and then as I said that gave rise to also the work on learning word embeddings for joint distributions for words then came I think probably the the best-known events of the work we did with deep learning with stacks of old encoders and stacks of RPMs one thing then it was the work on understanding better the difficulties of training deep nets with exactly Laura with the initialization ideas and also the vanishing gradient in in deep nets and that work actually was the one which gave rise to the experiments showing the importance of piecewise linear activation functions then I would say some of the most important work regards to work we did with unsupervised learning the denoising auto-encoders the gans which are very popular these days the narrative adversarial Awards the work we did with neural machine translation using attention which turned out to be really important for making attention making translation work and is currently used in industrial systems like Google Translate but this attention thing actually really changed my views nets you know knew immense we used to think as machines that can map a vector to a vector but but really with attention mechanisms you can now handle any kind of data structure and this is this is really opening up interesting avenues direction of actually connecting to biology one thing that I've been working on in the last couple of years is how could we come up with something like backprop but that brains could implement and we have a few papers in that direction that seems to be interesting for the signs people and then we're continuing in that direction of course one of the topics that I know you've been thinking all about is the relationship between deep learning and the brain can you tell us a bit more about that the biological thing is something I've been thinking about for fara well actually and and and and having a lot of I would say they dreamin about because it's it's like I think a bit like a puzzle so we have these pieces of evidence from what we know from the brain and from learning in the brain like spike timing-dependent plasticity and and on the other hand we have all these concepts from machine learning the idea of globally training the whole system with respect to an objective function and the idea of backprop and i know and what does backprop mean like what does credit assignment really mean when I started thinking about how brains could do something like back profits it you know prompted me to think about well maybe there's like some more general concepts behind back prop which make it so efficient which allow us to to be efficient with backdrop maybe there's a larger family of ways to the credit assignment and that connects to questions that people in reinforcement learning have been asking so it's interesting how sometimes asking a simple question leads you to thinking about so many different things and and forces you to think about so many elements that you'd like to bring together like a big puzzle so this has gone for a number of years I need to say that this whole endeavor like many of the ones that have followed has been highly inspired by Jeff Hinton's thoughts so in particular he gave this talk in 2007 I think the first deep learning workshop on what he thought was the way that the brain is working how temporal kind of temporal code could be used for potentially to make some of the job of not prob and and that you know led to a lot of the ideas that I've explored in recent years with this yeah so it's kind of an interesting story that has been running for a decade now basically one of the topics I've heard you speak about multiple times as well is unsupervised learning yeah can you share your perspective on that yes yes so unsupervised learning is really important right now our industrial systems are based on supervised learning which essentially requires humans to define what the important concepts are for the problem and to label those concepts and the data and you know we build all these amazing toys and services and systems using this but humans are able to do much more they are able to explore and discover new concepts by observation and interaction with the world a two-year-old is able to understand intuitive physics in other words she understands gravity she understands pressure he understands inertia she understands liquid solids and of course her parents never told her about any of these stuff right so how did you figure it out so that's the kind of question that unsupervised learning is trying to answer it's not just about we have labels or we don't have labels it's about actually building a construction a mental construction that explains how the world works by by observation and more recently I've been combining the ideas and as premised learning with the ideas and reinforcement learning because I believe that there's a very strong indication about the important underlying concepts that we're trying to disentangle we're trying to separate from each other that a human or machine can get by interacting with the book by by exploring the world and trying things and trying to control things so these these are I think tightly coupled to the original ideas of premised wedding so my take on lands provides learning 15 years ago when we started doing the auto encoder Cynthia rpms and so on was very focused on the idea of learning good representations and I still think this is a central question but the the thing we don't know is how and what is a good representation how do we figure out a an objective function for example so we've tried many things over the years and that's actually one of the cool things about its provides learning research that there are so many different ideas different ways that this problem can be attacked and then that's just maybe there's another one we'll discover you know next year that's completely different and and and maybe the brain is using something else completely different so it's it's not incremental research it's something that in itself is very exploratory we don't have like a good definition of what's the right objective function to even measure that a system is doing a good job on its points learning so of course it's challenging but at the same time there's you know it leaves open why and feel full of possibilities which is what you know researchers really love at least that's something that appeals to me so you know today there's so much going on in deep learning and I think we've passed the point where it's possible for any one human to read every single deep learning paper being published so I'm curious what a deep journey today excites you then though huh so I'm I'm very ambitious and I feel like the current state of science of deep learning is far from anything we're applying it to see it and I have the impression that our systems right now make mistakes that suggests the kind of mistakes that suggest that they have a very superficial understanding of the world so what excites me the most now is sort of direction of research where we're not trying to build systems I'm gonna do something useful we're just going back to principles about how can a computer observe the world interact with the world and discover how that world works even if that world is simple something that we can program is a kind of video game we don't know how to do that well and that's cool because I don't have to compete with Google and Facebook and Baidu and so on right because this is a kind of basic research that can be done by anyone in their Garret and could change the world so there are many of course many directions to attack this but I see a lot of the fruitful interactions between ideas and deep learning and reinforcement learning being really important there and you know I'm really excited that the progress in this direction could have a huge impact on practical applications actually because if you look at some of the big challenges that we have in applications like how we deal with new domains or categories on which we have too few examples and in cases where humans are very good at solving those problems so these transfer learning and joys Asian issues they would become much easier to tackle if we have systems that had a better understanding of how the world works a deeper understanding right what is actually going on what are the causes of what I'm seeing and and how could I you know influence what I'm saying by my actions so these these are the kinds of questions and really excited about these days and I think they connect also the deep learning research that has evolved for the last couple of decades with even all the questions in the eye because a lot of the sisters success in deep learning has been with perception so what's left right what's left is sort of high-level cognition which is about understanding at an abstract level how things work so we our program of understanding high-level abstractions I think has you know not reached those high levels of abstractions and so we have to get there we have to think about reasoning about sequential processing of information we have to think of how causality works and and how machines can discover all these things by themselves essentially guided by humans but as much as possible in an autonomous way and it sounds like from I love what you said that you're a fan of research approaches where you experiment on you know I'm going to use the term toy problem not in the disparaging right but on the small problem and you're optimistic that that transfers to bigger problems later yes yes yes and I mean it transfers in in a meta way right of course we're gonna have to do some work to escape a lab and address those problems but my main motivation for going for those toy problems is that we can understand better our failures and we can reduce use the problem to something we can intuitively sort of manipulate and understand more easily so sort of classical divide and conquer you know science approach and also I think something people don't think about enough is the cycle the research cycle can be much faster right so if I can do an experiment in in a few hours I can I can progress much faster if if I have to try a huge model that's trying to capture the whole common sense and you know everything in the general knowledge which will eventually we'll do is just that each experiment just takes too much time with current hardware so while of our hardware friends are building machines they're gonna be a thousand or million times faster I'm doing those toy experiments you know I shall also just speak about the signs of deep learning it's not just as a engineering discipline but doing more work to understand what's really going on do you share your thoughts on that you know absolutely I fear that a lot of the work that they're doing is sort of like blind people trying to find their way and you know you can get a lot of luck and and find interesting things that way but but really if we sort of stop a little bit and I'm trying to understand what we're doing in in a way that's transferable because because we go down to spools to theory but when I say theory I don't need necessarily math I don't I'm not like of course I like math and so on but but I don't think that we need that everything be formalized mathematically but be formalized logically in other sense that I can convince somebody that you know this should work with it this makes sense this is the most important aspect and then math you know allows us to make that stronger and tighter but but really it's more about understanding and it's about also doing our research not to beat the next space line or benchmark or you know beat the other guys in the other lab or the other company it's more about you know what kind of question should we ask that will allow us to understand better the phenomena of interest like you know what makes for example training in deeper networks harder or any current nets harder we have some ideas but a lot of things we don't understand yet so we can maybe design experiments whose goal is not to have like a better all of them but just to understand better the algorithms we currently have or you know why what circumstances make particular algorithm work better and then why I mean it's the Y that really matters that's what science is about it's why right today there are a lot of people that want to enter the fuse and I'm sure you've answered there's a lot in one-on-one settings but you know with all the people watching this on video what advice would you have for people that want to get into AR again there's a deep learning and so so first of all there are different motivations that different things you could do you know what you need to become a deep learning researcher may not be the same as if you want to be an engineer who's going to use deep learning to build products there's a different level of understanding that's needed in both cases but but in any case in both cases practice practice so to really master a subject like deep learning you of course you have to read a lot you have to practice so programming the things yourself very often I you know I interview students who have used software and these days there's so somebody knows so good software around that you can just you know plug and play and understand nothing of what you're doing or it's such a superficial level that then it becomes hard to figure out when it doesn't work and what's going wrong so actually trying to implement things yourself even if it's inefficient but just to make sure you really understand what is going on is really useful and and you know 20 No so that don't just use one of the program framework so you can do everything in a few lines of code but you don't really know what what just happen right exactly exactly and and I would say even more than that trying to derive the thing yourself from you know first principles if you can so that really helps but you know I mean the usual things you have to do like reading looking at other people's code writing your own code doing a lot of experiment making sure you understand everything you do so I mean especially for the science part of it trying to ask why am I doing this why are people doing this and maybe the answer is is somewhere in the book and you have to read more but it's even better if you can actually figure it out by yourself and and in fact of the things I read you and you know fellow erinkoval wrote a highly very highly regarded book thank you thank you yes it's selling a lot it's a bit crazy I feel like there's more people reading this book than people who can read it right now but but yeah also Proceedings of the IEEE CLR iclear conference is probably the best concentrated place of good papers of course they're really good papers at nips and ICML and other conferences but but if you really want to go for a lot of good papers just with the last few I see a lot of proceedings and that will give you a really good view of the field any other thoughts that when people ask you for advice you know how does someone write become good at deep learning well it depends on where you come from don't be afraid by the math just you know just develop the intuitions and then the math become really easier to understand once once you get the hang of what's going on at the intuitive level and one good news is that you don't need five years of PhD to become proficient at deep planning you can actually learn pretty quickly if you have a a good background in computer science and math you can you can learn enough to use it and build things and and start research experiments in just a few months you know something like six months for people with the right training may maybe they don't know anything about machine learning but if they're good in math and computer science it can be very fast and of course so that means you need to have the right training in math computer science sometimes what you would you learn in in just computer science courses is not enough you need some some you know continuous path especially so that this is the improbability algebra and optimization for example and calculus in calculus yeah thanks a lot you're sure for sharing all the comments and insights and advice even though I've learned even though I've known you for a long time there are many details of your early history that I didn't know until now so thank you well thank you Andrew for doing this this special recording and what you're doing and well I hope it's going to be used by a lot of peoplehow y'all sure I'm really guy you could join us yesterday I'm very glad to you know today you're not just saved researcher or engineer in deep learning you've become one of the institution's and one of the icons of deep learning but really like to hear the story of how it started so how did you end up you know getting into deep learning and then pursuing this journey right well actually it started when I was a kid adolescent reading a lot of science fiction like I guess many of us and when I started my graduate studies in 1985 I started reading you know led papers and and that's where I got all excited and it became really a passion and and actually what was that like in what mid eighties right 1985 reading these papers do you remember yeah it was well you know coming from the courses I had taken in classical ai with expert systems and suddenly discovering that there was all this world of thinking about how humans might be learning and human intelligence and how we might draw connections between that and and artificial intelligence and computers that was really exciting for me when I discovered this literature and I started reading the connectionists of course so the papers from geoff hinton real art and so on and I worked on recurrent Nets I worked on speech recognition I worked on hmmso graphical models and then quickly moved to AT&T Bell Labs and MIT where I did postdocs and that's where I discovered some of the issues with the long term dependencies with training neural nets and then shortly after I got recruited udm back in Montreal where I had spent most of my child I know lessons years um you know and so as someone who's been there for the last several decades and seen you know seeing it also he's seen a lot of it um tell me about how you're thinking about deep learning about neural networks has evolved over this over this time we start with experiments with intuitions and and theory sort of comes later and we now understand a lot better for example why backdrop is working so well why depth is so important and and you know these kinds of notions we we didn't have any solid justification for in those days when when we started working on deep nets in the early 2000s we had the intuition that we made a lot of sense that a deeper network should be more powerful and we didn't know how to you know take that and prove it and of course our experiments initially didn't work and actually well there were the most important things that you think turned out to be right and what were the biggest surprises of what turns out to be wrong you know compared to what we knew thirty years ago oh sure so one of the biggest mistake I made was to think that like everyone else in the 90s thought you needed smooth nonlinearities in order to for backprop to work because I thought that you know if if we had something like rectifying non-linearity's where you have a flat part that it would be really hard to train because the derivative would be zero in so many places and when we started experimented with experimenting with relu with deepness in around 2010 you know III was obsessed with the idea that oh we should be careful about whether neurons won't saturate too much on the zero part but in the end it turned out that actually the relu was working a lot better than the sigmoids and attached and that was a big surprise we did this exploring this because of the biological connection actually not because we thought that it would be easier to optimize but it turned out work better whereas I thought it would be harder to drink so let me ask you what is the relationship between deep learning and the brain there's the obvious answer but I'm curious was your answer to that ah well the initial insight that really got me excited with neural nets was this idea from the connectionists that information is distributed across the activation of many neurons rather than being represented by and sort of the grandmother cells that were calling it were a symbolic representation that was the traditional view in classical yet right and I still believe this is a really important thing and I see people we just discovering the importance of that even even though recently so so that's that was really a foundation the death thing is something that came later in the thousands it wasn't something I was thinking about it and I nice for sure for example I remember you build a lot of relatively shallow but very disputed representations for the word embeddings very very early on then that's right yeah that's that's one of the things that I got really excited about in the late 90s was actually my brother Sonny and I worked on the idea that we could use neon Nets to tackle the curse of dimensionality which was believed to be one of the central issues with statistical learning and the fact that we could have these representations could be used to represent joint distributions over many random variables in a very efficient way and it turned out to work quite well and then I extended this to joint solutions over sequences of words and this is how the word embeddings were born because I thought oh you know this will allow analyzation across words that have similar semantic meaning and Poussin so over the last you know a couple decades your research group has invented more ideas than anyone can summarize in a few minutes sound Chris what are the inventions or ideas yeongho's proud of from your group right so I think I mentioned long-term dependencies the study of that I think people still don't understand it well enough then there's the story I mentioned about curse with emission allottee joint distributions with neural nets which which became more recently the the nadir that's Hugo's and Rochelle did and then as I said that gave rise to also the work on learning word embeddings for joint distributions for words then came I think probably the the best-known events of the work we did with deep learning with stacks of old encoders and stacks of RPMs one thing then it was the work on understanding better the difficulties of training deep nets with exactly Laura with the initialization ideas and also the vanishing gradient in in deep nets and that work actually was the one which gave rise to the experiments showing the importance of piecewise linear activation functions then I would say some of the most important work regards to work we did with unsupervised learning the denoising auto-encoders the gans which are very popular these days the narrative adversarial Awards the work we did with neural machine translation using attention which turned out to be really important for making attention making translation work and is currently used in industrial systems like Google Translate but this attention thing actually really changed my views nets you know knew immense we used to think as machines that can map a vector to a vector but but really with attention mechanisms you can now handle any kind of data structure and this is this is really opening up interesting avenues direction of actually connecting to biology one thing that I've been working on in the last couple of years is how could we come up with something like backprop but that brains could implement and we have a few papers in that direction that seems to be interesting for the signs people and then we're continuing in that direction of course one of the topics that I know you've been thinking all about is the relationship between deep learning and the brain can you tell us a bit more about that the biological thing is something I've been thinking about for fara well actually and and and and having a lot of I would say they dreamin about because it's it's like I think a bit like a puzzle so we have these pieces of evidence from what we know from the brain and from learning in the brain like spike timing-dependent plasticity and and on the other hand we have all these concepts from machine learning the idea of globally training the whole system with respect to an objective function and the idea of backprop and i know and what does backprop mean like what does credit assignment really mean when I started thinking about how brains could do something like back profits it you know prompted me to think about well maybe there's like some more general concepts behind back prop which make it so efficient which allow us to to be efficient with backdrop maybe there's a larger family of ways to the credit assignment and that connects to questions that people in reinforcement learning have been asking so it's interesting how sometimes asking a simple question leads you to thinking about so many different things and and forces you to think about so many elements that you'd like to bring together like a big puzzle so this has gone for a number of years I need to say that this whole endeavor like many of the ones that have followed has been highly inspired by Jeff Hinton's thoughts so in particular he gave this talk in 2007 I think the first deep learning workshop on what he thought was the way that the brain is working how temporal kind of temporal code could be used for potentially to make some of the job of not prob and and that you know led to a lot of the ideas that I've explored in recent years with this yeah so it's kind of an interesting story that has been running for a decade now basically one of the topics I've heard you speak about multiple times as well is unsupervised learning yeah can you share your perspective on that yes yes so unsupervised learning is really important right now our industrial systems are based on supervised learning which essentially requires humans to define what the important concepts are for the problem and to label those concepts and the data and you know we build all these amazing toys and services and systems using this but humans are able to do much more they are able to explore and discover new concepts by observation and interaction with the world a two-year-old is able to understand intuitive physics in other words she understands gravity she understands pressure he understands inertia she understands liquid solids and of course her parents never told her about any of these stuff right so how did you figure it out so that's the kind of question that unsupervised learning is trying to answer it's not just about we have labels or we don't have labels it's about actually building a construction a mental construction that explains how the world works by by observation and more recently I've been combining the ideas and as premised learning with the ideas and reinforcement learning because I believe that there's a very strong indication about the important underlying concepts that we're trying to disentangle we're trying to separate from each other that a human or machine can get by interacting with the book by by exploring the world and trying things and trying to control things so these these are I think tightly coupled to the original ideas of premised wedding so my take on lands provides learning 15 years ago when we started doing the auto encoder Cynthia rpms and so on was very focused on the idea of learning good representations and I still think this is a central question but the the thing we don't know is how and what is a good representation how do we figure out a an objective function for example so we've tried many things over the years and that's actually one of the cool things about its provides learning research that there are so many different ideas different ways that this problem can be attacked and then that's just maybe there's another one we'll discover you know next year that's completely different and and and maybe the brain is using something else completely different so it's it's not incremental research it's something that in itself is very exploratory we don't have like a good definition of what's the right objective function to even measure that a system is doing a good job on its points learning so of course it's challenging but at the same time there's you know it leaves open why and feel full of possibilities which is what you know researchers really love at least that's something that appeals to me so you know today there's so much going on in deep learning and I think we've passed the point where it's possible for any one human to read every single deep learning paper being published so I'm curious what a deep journey today excites you then though huh so I'm I'm very ambitious and I feel like the current state of science of deep learning is far from anything we're applying it to see it and I have the impression that our systems right now make mistakes that suggests the kind of mistakes that suggest that they have a very superficial understanding of the world so what excites me the most now is sort of direction of research where we're not trying to build systems I'm gonna do something useful we're just going back to principles about how can a computer observe the world interact with the world and discover how that world works even if that world is simple something that we can program is a kind of video game we don't know how to do that well and that's cool because I don't have to compete with Google and Facebook and Baidu and so on right because this is a kind of basic research that can be done by anyone in their Garret and could change the world so there are many of course many directions to attack this but I see a lot of the fruitful interactions between ideas and deep learning and reinforcement learning being really important there and you know I'm really excited that the progress in this direction could have a huge impact on practical applications actually because if you look at some of the big challenges that we have in applications like how we deal with new domains or categories on which we have too few examples and in cases where humans are very good at solving those problems so these transfer learning and joys Asian issues they would become much easier to tackle if we have systems that had a better understanding of how the world works a deeper understanding right what is actually going on what are the causes of what I'm seeing and and how could I you know influence what I'm saying by my actions so these these are the kinds of questions and really excited about these days and I think they connect also the deep learning research that has evolved for the last couple of decades with even all the questions in the eye because a lot of the sisters success in deep learning has been with perception so what's left right what's left is sort of high-level cognition which is about understanding at an abstract level how things work so we our program of understanding high-level abstractions I think has you know not reached those high levels of abstractions and so we have to get there we have to think about reasoning about sequential processing of information we have to think of how causality works and and how machines can discover all these things by themselves essentially guided by humans but as much as possible in an autonomous way and it sounds like from I love what you said that you're a fan of research approaches where you experiment on you know I'm going to use the term toy problem not in the disparaging right but on the small problem and you're optimistic that that transfers to bigger problems later yes yes yes and I mean it transfers in in a meta way right of course we're gonna have to do some work to escape a lab and address those problems but my main motivation for going for those toy problems is that we can understand better our failures and we can reduce use the problem to something we can intuitively sort of manipulate and understand more easily so sort of classical divide and conquer you know science approach and also I think something people don't think about enough is the cycle the research cycle can be much faster right so if I can do an experiment in in a few hours I can I can progress much faster if if I have to try a huge model that's trying to capture the whole common sense and you know everything in the general knowledge which will eventually we'll do is just that each experiment just takes too much time with current hardware so while of our hardware friends are building machines they're gonna be a thousand or million times faster I'm doing those toy experiments you know I shall also just speak about the signs of deep learning it's not just as a engineering discipline but doing more work to understand what's really going on do you share your thoughts on that you know absolutely I fear that a lot of the work that they're doing is sort of like blind people trying to find their way and you know you can get a lot of luck and and find interesting things that way but but really if we sort of stop a little bit and I'm trying to understand what we're doing in in a way that's transferable because because we go down to spools to theory but when I say theory I don't need necessarily math I don't I'm not like of course I like math and so on but but I don't think that we need that everything be formalized mathematically but be formalized logically in other sense that I can convince somebody that you know this should work with it this makes sense this is the most important aspect and then math you know allows us to make that stronger and tighter but but really it's more about understanding and it's about also doing our research not to beat the next space line or benchmark or you know beat the other guys in the other lab or the other company it's more about you know what kind of question should we ask that will allow us to understand better the phenomena of interest like you know what makes for example training in deeper networks harder or any current nets harder we have some ideas but a lot of things we don't understand yet so we can maybe design experiments whose goal is not to have like a better all of them but just to understand better the algorithms we currently have or you know why what circumstances make particular algorithm work better and then why I mean it's the Y that really matters that's what science is about it's why right today there are a lot of people that want to enter the fuse and I'm sure you've answered there's a lot in one-on-one settings but you know with all the people watching this on video what advice would you have for people that want to get into AR again there's a deep learning and so so first of all there are different motivations that different things you could do you know what you need to become a deep learning researcher may not be the same as if you want to be an engineer who's going to use deep learning to build products there's a different level of understanding that's needed in both cases but but in any case in both cases practice practice so to really master a subject like deep learning you of course you have to read a lot you have to practice so programming the things yourself very often I you know I interview students who have used software and these days there's so somebody knows so good software around that you can just you know plug and play and understand nothing of what you're doing or it's such a superficial level that then it becomes hard to figure out when it doesn't work and what's going wrong so actually trying to implement things yourself even if it's inefficient but just to make sure you really understand what is going on is really useful and and you know 20 No so that don't just use one of the program framework so you can do everything in a few lines of code but you don't really know what what just happen right exactly exactly and and I would say even more than that trying to derive the thing yourself from you know first principles if you can so that really helps but you know I mean the usual things you have to do like reading looking at other people's code writing your own code doing a lot of experiment making sure you understand everything you do so I mean especially for the science part of it trying to ask why am I doing this why are people doing this and maybe the answer is is somewhere in the book and you have to read more but it's even better if you can actually figure it out by yourself and and in fact of the things I read you and you know fellow erinkoval wrote a highly very highly regarded book thank you thank you yes it's selling a lot it's a bit crazy I feel like there's more people reading this book than people who can read it right now but but yeah also Proceedings of the IEEE CLR iclear conference is probably the best concentrated place of good papers of course they're really good papers at nips and ICML and other conferences but but if you really want to go for a lot of good papers just with the last few I see a lot of proceedings and that will give you a really good view of the field any other thoughts that when people ask you for advice you know how does someone write become good at deep learning well it depends on where you come from don't be afraid by the math just you know just develop the intuitions and then the math become really easier to understand once once you get the hang of what's going on at the intuitive level and one good news is that you don't need five years of PhD to become proficient at deep planning you can actually learn pretty quickly if you have a a good background in computer science and math you can you can learn enough to use it and build things and and start research experiments in just a few months you know something like six months for people with the right training may maybe they don't know anything about machine learning but if they're good in math and computer science it can be very fast and of course so that means you need to have the right training in math computer science sometimes what you would you learn in in just computer science courses is not enough you need some some you know continuous path especially so that this is the improbability algebra and optimization for example and calculus in calculus yeah thanks a lot you're sure for sharing all the comments and insights and advice even though I've learned even though I've known you for a long time there are many details of your early history that I didn't know until now so thank you well thank you Andrew for doing this this special recording and what you're doing and well I hope it's going to be used by a lot of people\n"