Moore's Law is Not Dead (Jim Keller) _ AI Podcast Clips

**The Future of Innovation: Shrinkage and its Implications**

There are some obvious steps about how to shrink that so the metallurgy around wire stocks and stuff has very obvious abilities to shrink. This is a crucial step in understanding the potential for innovation, as shrinking can lead to significant advancements in various fields. The concept of shrinking is not new, but recent developments have brought it to the forefront of attention. It's exciting to think about the possibilities that come with shrinking, and we're on the cusp of something incredible.

The idea of shrinking is particularly relevant when considering Moore's Law, which has driven the development of faster, smaller computers over the years. However, as we move forward, it's becoming clear that this law may not be enough to sustain our rapid progress. Bell's Law, which states that every 10^x generates a new computation, suggests that we're on the verge of something much more profound. This shift in thinking is all about embracing the unknown and exploring the possibilities of what's to come.

As we look to the future, it's essential to acknowledge that the world is changing rapidly. The internet has transformed the way we communicate, and mobile devices have revolutionized the way we interact with information. Now, we're building 5G wireless networks with one-millisecond latency, and people are starting to think about the smart world where everything knows us and recognizes us. This raises questions about our role as architects of this future and whether we're ready for the challenges that come with it.

The attention-distracting nature of mobile phones is a significant concern. With billions of people on the planet, there's a lot of pressure to stay connected and be constantly available. It's not uncommon to see people checking their phones throughout the day, even when they're supposed to be doing other things. This phenomenon has become so normalized that it's almost expected.

However, as we move forward, it's crucial to consider the impact of these technologies on our society. We need to think about how our actions affect others and the world around us. It's not just about being aware of our own behavior; it's also about considering the broader implications of our creations. This is where philosophy comes in – exploring the deeper questions about existence, purpose, and meaning.

Philosophers have long grappled with the nature of reality, seeking to answer fundamental questions like "why are we here?" and "what is the purpose of life?" Meanwhile, physicists have been working on a more materialistic understanding of the world. The recent advancements in computation, particularly machine learning, have blurred the lines between these two disciplines. Computation has become sophisticated enough that it can arrive at results that are difficult to understand mathematically.

The emergence of machine learning and its ability to find patterns in large data sets without prior knowledge has opened up new avenues for research. It's like having a supercomputer that can solve complex problems without being explicitly programmed. This raises questions about the nature of intelligence, creativity, and even consciousness. Are we on the verge of creating machines that can think and learn in ways that surpass human capabilities?

As we navigate this uncharted territory, it's essential to consider our responsibilities as creators of these technologies. We're not just building computers; we're shaping the future of humanity. The impact of our work will be felt for generations to come, and it's crucial that we think about the consequences of our actions. By embracing the unknown and exploring new possibilities, we can unlock the true potential of computing and create a brighter future for all.

The possibility of computation leading to significant breakthroughs in mathematics is an exciting development. The idea that computers can generate complex mathematical formulas without being explicitly programmed raises fundamental questions about the nature of reality. It's like having a supercomputer that can solve problems that have puzzled human mathematicians for centuries.

As we move forward, it's essential to acknowledge that computation has become increasingly sophisticated. We're no longer dealing with simple binary algebra; we're working with complex mathematical formulas that are difficult to understand. This shift in thinking is all about embracing the unknown and exploring new possibilities.

The world of physics and mathematics is becoming increasingly intertwined as we explore the mysteries of the universe. The recent advancements in computation have brought us to a point where we can tackle problems that were previously impossible to solve. It's like having a supercomputer that can simulate complex phenomena without being explicitly programmed.

As we continue to push the boundaries of what's possible with computing, it's essential to consider our role as architects of this future. We're not just building computers; we're shaping the course of human history. The impact of our work will be felt for generations to come, and it's crucial that we think about the consequences of our actions.

The possibility of computation leading to significant breakthroughs in physics is an exciting development. The idea that computers can simulate complex phenomena without being explicitly programmed raises fundamental questions about the nature of reality. It's like having a supercomputer that can solve problems that have puzzled human physicists for centuries.

As we move forward, it's essential to acknowledge that computation has become increasingly sophisticated. We're no longer dealing with simple mathematical formulas; we're working with complex algorithms that are difficult to understand. This shift in thinking is all about embracing the unknown and exploring new possibilities.

The world of physics and mathematics is becoming increasingly intertwined as we explore the mysteries of the universe. The recent advancements in computation have brought us to a point where we can tackle problems that were previously impossible to solve. It's like having a supercomputer that can simulate complex phenomena without being explicitly programmed.

As we continue to push the boundaries of what's possible with computing, it's essential to consider our role as architects of this future. We're not just building computers; we're shaping the course of human history. The impact of our work will be felt for generations to come, and it's crucial that we think about the consequences of our actions.

The possibility of computation leading to significant breakthroughs in physics is an exciting development. The idea that computers can simulate complex phenomena without being explicitly programmed raises fundamental questions about the nature of reality. It's like having a supercomputer that can solve problems that have puzzled human physicists for centuries.

As we move forward, it's essential to acknowledge that computation has become increasingly sophisticated. We're no longer dealing with simple mathematical formulas; we're working with complex algorithms that are difficult to understand. This shift in thinking is all about embracing the unknown and exploring new possibilities.

The world of physics and mathematics is becoming increasingly intertwined as we explore the mysteries of the universe. The recent advancements in computation have brought us to a point where we can tackle problems that were previously impossible to solve. It's like having a supercomputer that can simulate complex phenomena without being explicitly programmed.

As we continue to push the boundaries of what's possible with computing, it's essential to consider our role as architects of this future. We're not just building computers; we're shaping the course of human history. The impact of our work will be felt for generations to come, and it's crucial that we think about the consequences of our actions.

The possibility of computation leading to significant breakthroughs in physics is an exciting development. The idea that computers can simulate complex phenomena without being explicitly programmed raises fundamental questions about the nature of reality. It's like having a supercomputer that can solve problems that have puzzled human physicists for centuries.

As we move forward, it's essential to acknowledge that computation has become increasingly sophisticated. We're no longer dealing with simple mathematical formulas; we're working with complex algorithms that are difficult to understand. This shift in thinking is all about embracing the unknown and exploring new possibilities.

The world of physics and mathematics is becoming increasingly intertwined as we explore the mysteries of the universe. The recent advancements in computation have brought us to a point where we can tackle problems that were previously impossible to solve. It's like having a supercomputer that can simulate complex phenomena without being explicitly programmed.

As we continue to push the boundaries of what's possible with computing, it's essential to consider our role as architects of this future. We're not just building computers; we're shaping the course of human history. The impact of our work will be felt for generations to come, and it's crucial that we think about the consequences of our actions.

The possibility of computation leading to significant breakthroughs in physics is an exciting development. The idea that computers can simulate complex phenomena without being explicitly programmed raises fundamental questions about the nature of reality. It's like having a supercomputer that can solve problems that have puzzled human physicists for centuries.

As we move forward, it's essential to acknowledge that computation has become increasingly sophisticated. We're no longer dealing with simple mathematical formulas; we're working with complex algorithms that are difficult to understand. This shift in thinking is all about embracing the unknown and exploring new possibilities.

The world of physics and mathematics is becoming increasingly intertwined as we explore the mysteries of the universe. The recent advancements in computation have brought us to a point where we can tackle problems that were previously impossible to solve. It's like having a supercomputer that can simulate complex phenomena without being explicitly programmed.

As we continue to push the boundaries of what's possible with computing, it's essential to consider our role as architects of this future. We're not just building computers; we're shaping the course of human history. The impact of our work will be felt for generations to come, and it's crucial that we think about the consequences of our actions.

The possibility of computation leading to significant breakthroughs in physics is an exciting development.

"WEBVTTKind: captionsLanguage: enfor over 50 years now Moore's law has served for me and millions of others as an inspiring beacon of what kind of amazing future brilliant engineers can build no I'm just making your kids laugh all of today it was great so first in your eyes what is Moore's law if you could define for people who don't know well the simple statement was from Gordon Moore was double the number of transistors every two years something like that and then my operational model is we increase the performance of computers by 2x every 2 or 3 years and it's wiggled around substantially over time and also in how we deliver performance has changed the foundational idea was to X 2 transistors every 2 years the current cadence is something like they call it a shrink factor like point six every two years which is not 0.5 but that that's referring strictly again to the original definition of transistor count a shrink factor just getting them smaller small as well well as you use for a constant chip area if you make the transistor smaller by 0.6 then you get 1 over 0.6 more transistors so can you linger on it a little longer what's what's a broader what do you think should be the broader definition of Moore's law when you mentioned before how you think of performance just broadly what's a good way to think about Moore's law well first of all so I I've been aware of Moore's law for 30 years in which sense well I've been designing computers for 40 you just watching it before your eyes kind of well and somewhere where I became aware of it I was also informed that Moore's law was gonna die in 10 to 15 years and so I thought that was true at first but then after 10 years it was gonna die in 10 to 15 years and then at one point it was gonna die in five years and then it went back up to ten years and at some point I decided not to worry about that particular product on occasion for the rest of my life which is which is fun and then I joined Intel and everybody said Moore's law is dead and I thought that's sad because it's the Moore's law company and it's not dead and it's always been gonna die and you know humans you like these apocryphal kind of statements like we'll run out of food or a run out of air or an out of room or run out of you know something right but it's still incredible this lived for as long as it has and yes there's many people who believe now that Moore's law is dead you know makin join the last 50 years of people who had the thing yeah there's a long tradition but why do you think if you can in touch try to understand it why do you think it's not dead well fern Hartley let's just think people think Moore's law is one thing transistors get smaller but actually under the sheets there's literally thousands of innovations and almost all those innovations have their own diminishing return curves so if you graph it it looks like a cascade of diminishing return curves I don't know what to call that but the result is an exponential curve at least it has been so and we keep inventing new things so if you're an expert in one of the things on a diminishing return curve right and you can see it's Plateau you will probably tell people while this is this is done meanwhile some other pile of people are doing something different so that's that's just normal so then there's the observation of how small could a switching device be so a modern transistor is something like a thousand by a thousand by thousand atoms right and you get quantum effects down around two to two to ten atoms so you can imagine a transistor as small as 10 by 10 by 10 so that's a million times smaller and then the quantum computational people are working away at how to use quantum effects so a thousand by thousand five thousand atoms it's a really clean way of putting it well fin like a modern transistor if you look at the fan it's like a hundred and twenty items wide but we can make that thinner and there's there's a gate wrapped around at an inner space II mean there's a whole bunch of geometry and you know a competent transistor designer could count both atoms in every single direction like there's techniques now to already put down atoms in a single atomic layer and you can place atoms if you want to it's just you know from a manufacturing process if placing an atom takes ten minutes and you need to put you know 10 to the 23rd atoms together to make a computer it would take a long time so the the methods are you know both shrinking things and then coming up with effective ways to control what's happening manufacture stabling cheaply yeah so so the innovation stock is pretty broad you know there's yours equipment there's optics there's chemistry there's physics there's material science there's metallurgy there's lots of ideas about when you put different materials together how they interact are they stable is very stable over temperature you know like are they repeatable you know there's look there's like literally thousands of technologies involved but just for the shrinking you don't think we're quite yet close to the fundamental limits in physics I did a talk on Moore's Law and I asked for a road map to a path of 100 and after two weeks they said we only got to fifty a hundred what's a 100 extra 100 a shrink we only got 250 and I said once you go to another two weeks well here's the thing about Moore's law right so I believe that the next 10 or 20 years of shrinking is going to happen right now as a computer designer there's you have two stances you think it's going to shrink in which case you're designing and thinking about architecture in a way that you'll use more transistors or conversely not be swamped by the complexity of all the transistors you get right you have to have a strategy you know so you're open to the possibility and waiting for the possibility of a whole new army of transistors ready to work I'm expecting affecting more transistors every two or three years by a number large enough that how you think about design how you think about architecture has to change like imagine you're you build built brick buildings out of bricks and every year the bricks are half the size or every two years well if you kept building bricks the same way you know so many bricks per person per day the amount of time to build a building when we go up exponentially right right but if you said I know that's coming so now I'm going to design equipment and moves bricks faster uses them better because maybe you're getting something out of the smaller bricks more strengths inner walls you know less material efficiency out of that so once you have a roadmap with what's going to happen transistors they're gonna get we're gonna get more of them then you design all this collateral around it to take advantage of it and also to cope with it like that's the thing people to understand it's like if I didn't believe in Moore's law and Moore's law transistors showed up my design teams were all drowned so what's the what's the hardest part of this in flood of new transistors I mean even if you just look historically throughout your career what's what's the thing what fundamentally changes when you add more transistors in in the task of designing an architecture there's there's two constants right one of those people don't get smarter I think by the way there's some signs shown that we do get smarter because nutrition or whatever yeah sorry to bring that what effect yes yeah nobody understands it nobody knows if it's still going on so that's all or whether it's real or not but yeah that's a I sort of Amen but not if I believe for the most part people aren't getting much smarter the evidence doesn't support it that's right and then teams can't grow that much right all right so human beings understand you know we're really good in teams of ten you know up two teams of a hundred they can know each other beyond that you have to have organizational boundaries so you're kind of you have those are pretty hard constraints right so then you have to divide and conquer like as the designs get bigger you have to divide it into pieces you know that the power of abstraction layers is really high we used to build computers out of transistors now we have a team that turns transistors and logic cells and our team that turns them into functional you know it's another one it turns in computers right so we have abstraction layers in there and you have to think about when do you shift gears on that we also use faster computers to build faster computers so some algorithms run twice as fast on new computers but a lot about rhythms are N squared so you know a computer with twice as many transistors and it might take four Tom's times as long to run so you have to refactor at the software like simply using faster computers to build bigger computers doesn't work so so you have to think about all these things so in terms of computing performance and the exciting possibility that more powerful computers bring is shrinking the thing we've been talking about one of the for you one of the biggest exciting possibilities of advancement in performance or is there are other directions that you're interested in like like in the direction of sort of enforcing given parallelism or like doing massive parallelism in terms of many many CPUs you know stacking CPUs on top of each other that kind of that kind of parallelism or you kind of well think about it a different way so old computers you know slow computers you said a equal B plus C times D pretty simple right and then we made faster computers with vector units and you can do proper equations and matrices right and then modern like AI computations or like convolutional neural networks where you convolve one large data set against another and so there's sort of this hierarchy of mathematics you know from simple equation to linear equations to matrix equations to it's a deeper kind of computation and the datasets are getting so big that people are thinking of data as a topology problem you know data is organized in some immense shape and then the computation which sort of wants to be get data from immense and do some computation on it so the with computers have allowed people to do is have algorithms go much much further so that that paper you you reference the Sutton paper they talked about you know like in a I started it was apply rule sets to something that's a very simple computational situation and then when they did first chess thing they solved deep searches so have a huge database of moves and results deep search but it's still just a search right now we we take large numbers of images and we use it to Train these weight sets that we convolve across it's a completely different kind of phenomena we call that AI now they're doing the next generation and if you look at it they're going up this mathematical graph right and then the computations the both computation and data sets support going up that graph yeah the kind of computation of my I mean I would argue that all of it is still a search right just like you said a topology problems data sets he's searching the data sets for valuable data and also the actual optimization of neural networks is a kind of search for the I don't know if you looked at the inner layers of finding a cat it's not a search it's it's a set of endless projection so you know projection and here's a shadow of this phone yeah right then you can have a shadow of that onto something in a shadow on that or something if you look in the layers you'll see this layer actually describes pointy ears and round eyeness and fuzziness and but the computation to tease out the attributes is not search right ain't like the inference part might be searched but the trainings not search okay well 10m in deep networks they look at layers and they don't even know it's represented and yet if you take the layers out it doesn't work okay so if I don't think it's search all right well but you have to talk to him a mathematician about what that actually is oh you would disagree but the the it's just semantics I think it's not but it's certainly not I would say it's absolutely not semantics but okay all right well if you want to go there so optimization to me is search and we're trying to optimize the ability of a neural network to detect cat ears and this difference between chess and the space the incredibly multi-dimensional hundred thousand dimensional space that you know networks are trying to optimize over is nothing like the chessboard database so it's a totally different kind of thing and okay in that sense you can say yeah yeah you know I could see how you you might say if you the funny thing is it's the difference between given search space and found search space exactly yeah maybe that's the different way that's right but okay but you're saying what's your sense in terms of the basic mathematical operations and the architectures can be hard where that enables those operations do you see the CPUs of today still being a really core part of executing those mathematical operations yes well the operations you know continue to be add subtract loads or compare and branch it's it's remarkable so it's it's interesting that the building blocks of you know computers or transistors and you know under that atoms so you got atoms transistors logic gates computers right you know functional units and computers the building blocks of mathematics at some level are things like adds and subtracts and multiplies but the space mathematics can describe is I think essentially infinite but the computers that run the algorithms are still doing the same things now a given algorithm may say I need sparse data or I need 32-bit data or I need you know like a convolution operation that naturally takes 8-bit data multiplies it and sums it up a certain way so the like the data types in tensorflow imply an optimization set but you go right down and look at the computers it's an inorganic lies like like that hasn't changed much now the quantum researchers think they're going to change that radically and then there's people who think about analog computing because you look in the brain and it seems to be more analog ish you know that maybe there's a way to do that more efficiently but we have a million acts on computation and I don't know the reference the relationship between computational let's say intensity and the ability to hit math mathematical abstractions I don't know anybody scribe dad but but just like you saw an AI you went from rulesets the simple search to complex search to say found search like those are you know orders of magnitude more computation to do and as we get the next two orders of magnitude like a friend Raja godori said like every order of magnitude change the computation fundamentally changes what the computation is doing here oh you know the expression the difference in quantity is the difference in kind you know the difference between ant and ant hill right or neuron and brain you know there's there's there's this indefinable place where the the quantity changed the quality right now we've seen that happen in mathematics multiple times and you know my my guess is it's gonna keep happening so your senses yeah if you focus head down and shrinking a transistor let's not just head down and we're aware about the software stacks that are running in the computational lows and we're kind of pondering what do you do with a petabyte of memory that wants to be accessed in a sparse way and have you know the kind of calculations ai programmers want so there's that there's a dialogue interaction but when you go in the computer chip you know you find adders and subtractors and multipliers and so if you zoom out then with as you mentioned which Sutton the idea that most of the development in the last many decades in the AI research came just leveraging computation and just the simple algorithms waiting for the computation to improve well suffer guys have a thing they called the the problem of early optimization right so if you write a big software stack and if you start optimizing like the first thing you write the odds of that being the performance limiter is low but when you get the whole thing working can you make it to X faster by optimizing the right things sure while you're optimizing that could you've written a new software stack which would have been a better choice maybe now you have creative tension so but the whole time as you're doing the writing the that's the software we're talking about the Hardware underneath gets faster which goes back to the Moore's law if Moore's Law is going to continue then your AI research should expect that to show up and then you make a slightly different set of choices then we've hit the wall nothing's gonna happen and from here it's just us rewriting algorithms like that seems like a failed strategy for the last 30 years of Moore's laws death so so can you just linger on it I think you've answered it but it just asked the same dumb question over and over so what why do you think Moore's Law is not going to die which is the most promising exciting possibility of why they won't done that's five ten years so is it that continues shrinking the transistor or is it another s-curve that steps in and it totally servo shrinking the transistor is literally thousands of innovations right so there's so this they're all answers and it's there's a whole bunch of s-curves just kind of running their course and and being reinvented and new things you know the the semiconductor fabricators and technologists have all announced what's called nanowires so they they took a fan which had a gate around it and turned that into little wire so you have better control that and they're smaller and then from there there are some obvious steps about how to shrink that so the metallurgy around wire stocks and stuff has very obvious abilities shrink and you know there's a whole combination of things there to do your sense is that we're gonna get a lot yeah this innovation from just that shrinking yeah like a factor of a hundred flawed yeah I would say that's incredible and it's totally it's only ten or fifteen years now you're smarter you might note but to me it's totally unpredictable of what that hundred ex would bring in terms of the nature of the computation that people be yeah you familiar with Bell's law so for a long time those mainframes minis workstation PC mobile Moore's law drove faster smaller computers right and then we were thinking about Moore's law Rogers godori said every 10 X generates a new computation so scalar vector matrix topological computation right and if you go look at the industry trends there was you know mainframes and mini-computers and PCs and then the internet took off and then we got mobile devices and now we're building 5g wireless with one millisecond latency and people are starting to think about the smart world where everything knows you recognizes you like like like the transformations are gonna be like unpredictable how does it make you feel that you're one of the key architects of this kind of futures you're not we're not talking about the architects of the high level people who build the Angry Bird apps and LAN Xang Angry Bird of who knows we're gonna be that's the whole point of the universe let's take a stand at that and the attention distracting nature of mobile phones I'll take a stand but anyway in terms of it that matters much the the side effects of smartphones or the attention distraction which part well who knows you know where this is all leading it's changing so fast wait back my parents used to all my sisters were hiding in the closet with a wired phone with a dial on it stop talking your friends all day right now my wife feels with my kids for talking to their friends all day on tax looks the same to me it's always it's echoes of the same day okay but you are the one of the key people architecting the hardware of this future how does that make you feel do you feel responsible do you feel excited so we're we're in a social context so there's billions of people on this planet there are literally millions of people working on technology I feel lucky to to be you know what doing what I do and getting paid for it and there's an interest in it but there's so many things going on in parallel it's like the actions are so unpredictable if I wasn't here somebody also doing the the vectors of all these different things are happening all the time you know there's a I'm sure some philosopher or meta philosophers you know wondering about how we transform our world so you can't deny the fact that these tools whether that these tools are changing our world that's right do you think it's changing for the better so somebody's I read this thing recently it said the people the two disciplines with the highest GRE scores in college our physics in philosophy right and they're both sort of trying to answer the question why is there anything right and the philosophers you know are on the kind of theological side and the physicists are obviously on the you know the material side and there's a hundred billion galaxies with a hundred billion stars it seems well repetitive at best so I you know there's on our way to 10 billion people I mean it's hard to say what it's all for if that's what you're asking yeah I guess I guess they do tend to are significantly increases in complexity and I'm curious about how computations like like our world our physical world inherently generates mathematics it's kind of obvious right so we have X Y Z coordinates you take a sphere you make it bigger you get a surface that falls you know grows by r-square like it generally generates mathematics and the mathematicians and the physicists have been having a lot of fun talking to each other for years and computation has been let's say relatively pedestrian like computation in terms of mathematics has been doing binary binary algebra while those guys have been gallivanting through the nether realms of possibility right now recently the computation lets you do math McMath ematic allah computations that are sophisticated enough that nobody understands how the answers came out right machine learning machine lying yeah it used to be you get data set you guess at a function the function is considered physics if it's predictive of new functions new data sets modern you can take a large data set with no intuition about what it is and use machine learning to find a pattern that has no function right and it can arrive at results that I don't know if they're completely mathematically describable so a computation is kind of done something interesting compared to a legal B plus C youfor over 50 years now Moore's law has served for me and millions of others as an inspiring beacon of what kind of amazing future brilliant engineers can build no I'm just making your kids laugh all of today it was great so first in your eyes what is Moore's law if you could define for people who don't know well the simple statement was from Gordon Moore was double the number of transistors every two years something like that and then my operational model is we increase the performance of computers by 2x every 2 or 3 years and it's wiggled around substantially over time and also in how we deliver performance has changed the foundational idea was to X 2 transistors every 2 years the current cadence is something like they call it a shrink factor like point six every two years which is not 0.5 but that that's referring strictly again to the original definition of transistor count a shrink factor just getting them smaller small as well well as you use for a constant chip area if you make the transistor smaller by 0.6 then you get 1 over 0.6 more transistors so can you linger on it a little longer what's what's a broader what do you think should be the broader definition of Moore's law when you mentioned before how you think of performance just broadly what's a good way to think about Moore's law well first of all so I I've been aware of Moore's law for 30 years in which sense well I've been designing computers for 40 you just watching it before your eyes kind of well and somewhere where I became aware of it I was also informed that Moore's law was gonna die in 10 to 15 years and so I thought that was true at first but then after 10 years it was gonna die in 10 to 15 years and then at one point it was gonna die in five years and then it went back up to ten years and at some point I decided not to worry about that particular product on occasion for the rest of my life which is which is fun and then I joined Intel and everybody said Moore's law is dead and I thought that's sad because it's the Moore's law company and it's not dead and it's always been gonna die and you know humans you like these apocryphal kind of statements like we'll run out of food or a run out of air or an out of room or run out of you know something right but it's still incredible this lived for as long as it has and yes there's many people who believe now that Moore's law is dead you know makin join the last 50 years of people who had the thing yeah there's a long tradition but why do you think if you can in touch try to understand it why do you think it's not dead well fern Hartley let's just think people think Moore's law is one thing transistors get smaller but actually under the sheets there's literally thousands of innovations and almost all those innovations have their own diminishing return curves so if you graph it it looks like a cascade of diminishing return curves I don't know what to call that but the result is an exponential curve at least it has been so and we keep inventing new things so if you're an expert in one of the things on a diminishing return curve right and you can see it's Plateau you will probably tell people while this is this is done meanwhile some other pile of people are doing something different so that's that's just normal so then there's the observation of how small could a switching device be so a modern transistor is something like a thousand by a thousand by thousand atoms right and you get quantum effects down around two to two to ten atoms so you can imagine a transistor as small as 10 by 10 by 10 so that's a million times smaller and then the quantum computational people are working away at how to use quantum effects so a thousand by thousand five thousand atoms it's a really clean way of putting it well fin like a modern transistor if you look at the fan it's like a hundred and twenty items wide but we can make that thinner and there's there's a gate wrapped around at an inner space II mean there's a whole bunch of geometry and you know a competent transistor designer could count both atoms in every single direction like there's techniques now to already put down atoms in a single atomic layer and you can place atoms if you want to it's just you know from a manufacturing process if placing an atom takes ten minutes and you need to put you know 10 to the 23rd atoms together to make a computer it would take a long time so the the methods are you know both shrinking things and then coming up with effective ways to control what's happening manufacture stabling cheaply yeah so so the innovation stock is pretty broad you know there's yours equipment there's optics there's chemistry there's physics there's material science there's metallurgy there's lots of ideas about when you put different materials together how they interact are they stable is very stable over temperature you know like are they repeatable you know there's look there's like literally thousands of technologies involved but just for the shrinking you don't think we're quite yet close to the fundamental limits in physics I did a talk on Moore's Law and I asked for a road map to a path of 100 and after two weeks they said we only got to fifty a hundred what's a 100 extra 100 a shrink we only got 250 and I said once you go to another two weeks well here's the thing about Moore's law right so I believe that the next 10 or 20 years of shrinking is going to happen right now as a computer designer there's you have two stances you think it's going to shrink in which case you're designing and thinking about architecture in a way that you'll use more transistors or conversely not be swamped by the complexity of all the transistors you get right you have to have a strategy you know so you're open to the possibility and waiting for the possibility of a whole new army of transistors ready to work I'm expecting affecting more transistors every two or three years by a number large enough that how you think about design how you think about architecture has to change like imagine you're you build built brick buildings out of bricks and every year the bricks are half the size or every two years well if you kept building bricks the same way you know so many bricks per person per day the amount of time to build a building when we go up exponentially right right but if you said I know that's coming so now I'm going to design equipment and moves bricks faster uses them better because maybe you're getting something out of the smaller bricks more strengths inner walls you know less material efficiency out of that so once you have a roadmap with what's going to happen transistors they're gonna get we're gonna get more of them then you design all this collateral around it to take advantage of it and also to cope with it like that's the thing people to understand it's like if I didn't believe in Moore's law and Moore's law transistors showed up my design teams were all drowned so what's the what's the hardest part of this in flood of new transistors I mean even if you just look historically throughout your career what's what's the thing what fundamentally changes when you add more transistors in in the task of designing an architecture there's there's two constants right one of those people don't get smarter I think by the way there's some signs shown that we do get smarter because nutrition or whatever yeah sorry to bring that what effect yes yeah nobody understands it nobody knows if it's still going on so that's all or whether it's real or not but yeah that's a I sort of Amen but not if I believe for the most part people aren't getting much smarter the evidence doesn't support it that's right and then teams can't grow that much right all right so human beings understand you know we're really good in teams of ten you know up two teams of a hundred they can know each other beyond that you have to have organizational boundaries so you're kind of you have those are pretty hard constraints right so then you have to divide and conquer like as the designs get bigger you have to divide it into pieces you know that the power of abstraction layers is really high we used to build computers out of transistors now we have a team that turns transistors and logic cells and our team that turns them into functional you know it's another one it turns in computers right so we have abstraction layers in there and you have to think about when do you shift gears on that we also use faster computers to build faster computers so some algorithms run twice as fast on new computers but a lot about rhythms are N squared so you know a computer with twice as many transistors and it might take four Tom's times as long to run so you have to refactor at the software like simply using faster computers to build bigger computers doesn't work so so you have to think about all these things so in terms of computing performance and the exciting possibility that more powerful computers bring is shrinking the thing we've been talking about one of the for you one of the biggest exciting possibilities of advancement in performance or is there are other directions that you're interested in like like in the direction of sort of enforcing given parallelism or like doing massive parallelism in terms of many many CPUs you know stacking CPUs on top of each other that kind of that kind of parallelism or you kind of well think about it a different way so old computers you know slow computers you said a equal B plus C times D pretty simple right and then we made faster computers with vector units and you can do proper equations and matrices right and then modern like AI computations or like convolutional neural networks where you convolve one large data set against another and so there's sort of this hierarchy of mathematics you know from simple equation to linear equations to matrix equations to it's a deeper kind of computation and the datasets are getting so big that people are thinking of data as a topology problem you know data is organized in some immense shape and then the computation which sort of wants to be get data from immense and do some computation on it so the with computers have allowed people to do is have algorithms go much much further so that that paper you you reference the Sutton paper they talked about you know like in a I started it was apply rule sets to something that's a very simple computational situation and then when they did first chess thing they solved deep searches so have a huge database of moves and results deep search but it's still just a search right now we we take large numbers of images and we use it to Train these weight sets that we convolve across it's a completely different kind of phenomena we call that AI now they're doing the next generation and if you look at it they're going up this mathematical graph right and then the computations the both computation and data sets support going up that graph yeah the kind of computation of my I mean I would argue that all of it is still a search right just like you said a topology problems data sets he's searching the data sets for valuable data and also the actual optimization of neural networks is a kind of search for the I don't know if you looked at the inner layers of finding a cat it's not a search it's it's a set of endless projection so you know projection and here's a shadow of this phone yeah right then you can have a shadow of that onto something in a shadow on that or something if you look in the layers you'll see this layer actually describes pointy ears and round eyeness and fuzziness and but the computation to tease out the attributes is not search right ain't like the inference part might be searched but the trainings not search okay well 10m in deep networks they look at layers and they don't even know it's represented and yet if you take the layers out it doesn't work okay so if I don't think it's search all right well but you have to talk to him a mathematician about what that actually is oh you would disagree but the the it's just semantics I think it's not but it's certainly not I would say it's absolutely not semantics but okay all right well if you want to go there so optimization to me is search and we're trying to optimize the ability of a neural network to detect cat ears and this difference between chess and the space the incredibly multi-dimensional hundred thousand dimensional space that you know networks are trying to optimize over is nothing like the chessboard database so it's a totally different kind of thing and okay in that sense you can say yeah yeah you know I could see how you you might say if you the funny thing is it's the difference between given search space and found search space exactly yeah maybe that's the different way that's right but okay but you're saying what's your sense in terms of the basic mathematical operations and the architectures can be hard where that enables those operations do you see the CPUs of today still being a really core part of executing those mathematical operations yes well the operations you know continue to be add subtract loads or compare and branch it's it's remarkable so it's it's interesting that the building blocks of you know computers or transistors and you know under that atoms so you got atoms transistors logic gates computers right you know functional units and computers the building blocks of mathematics at some level are things like adds and subtracts and multiplies but the space mathematics can describe is I think essentially infinite but the computers that run the algorithms are still doing the same things now a given algorithm may say I need sparse data or I need 32-bit data or I need you know like a convolution operation that naturally takes 8-bit data multiplies it and sums it up a certain way so the like the data types in tensorflow imply an optimization set but you go right down and look at the computers it's an inorganic lies like like that hasn't changed much now the quantum researchers think they're going to change that radically and then there's people who think about analog computing because you look in the brain and it seems to be more analog ish you know that maybe there's a way to do that more efficiently but we have a million acts on computation and I don't know the reference the relationship between computational let's say intensity and the ability to hit math mathematical abstractions I don't know anybody scribe dad but but just like you saw an AI you went from rulesets the simple search to complex search to say found search like those are you know orders of magnitude more computation to do and as we get the next two orders of magnitude like a friend Raja godori said like every order of magnitude change the computation fundamentally changes what the computation is doing here oh you know the expression the difference in quantity is the difference in kind you know the difference between ant and ant hill right or neuron and brain you know there's there's there's this indefinable place where the the quantity changed the quality right now we've seen that happen in mathematics multiple times and you know my my guess is it's gonna keep happening so your senses yeah if you focus head down and shrinking a transistor let's not just head down and we're aware about the software stacks that are running in the computational lows and we're kind of pondering what do you do with a petabyte of memory that wants to be accessed in a sparse way and have you know the kind of calculations ai programmers want so there's that there's a dialogue interaction but when you go in the computer chip you know you find adders and subtractors and multipliers and so if you zoom out then with as you mentioned which Sutton the idea that most of the development in the last many decades in the AI research came just leveraging computation and just the simple algorithms waiting for the computation to improve well suffer guys have a thing they called the the problem of early optimization right so if you write a big software stack and if you start optimizing like the first thing you write the odds of that being the performance limiter is low but when you get the whole thing working can you make it to X faster by optimizing the right things sure while you're optimizing that could you've written a new software stack which would have been a better choice maybe now you have creative tension so but the whole time as you're doing the writing the that's the software we're talking about the Hardware underneath gets faster which goes back to the Moore's law if Moore's Law is going to continue then your AI research should expect that to show up and then you make a slightly different set of choices then we've hit the wall nothing's gonna happen and from here it's just us rewriting algorithms like that seems like a failed strategy for the last 30 years of Moore's laws death so so can you just linger on it I think you've answered it but it just asked the same dumb question over and over so what why do you think Moore's Law is not going to die which is the most promising exciting possibility of why they won't done that's five ten years so is it that continues shrinking the transistor or is it another s-curve that steps in and it totally servo shrinking the transistor is literally thousands of innovations right so there's so this they're all answers and it's there's a whole bunch of s-curves just kind of running their course and and being reinvented and new things you know the the semiconductor fabricators and technologists have all announced what's called nanowires so they they took a fan which had a gate around it and turned that into little wire so you have better control that and they're smaller and then from there there are some obvious steps about how to shrink that so the metallurgy around wire stocks and stuff has very obvious abilities shrink and you know there's a whole combination of things there to do your sense is that we're gonna get a lot yeah this innovation from just that shrinking yeah like a factor of a hundred flawed yeah I would say that's incredible and it's totally it's only ten or fifteen years now you're smarter you might note but to me it's totally unpredictable of what that hundred ex would bring in terms of the nature of the computation that people be yeah you familiar with Bell's law so for a long time those mainframes minis workstation PC mobile Moore's law drove faster smaller computers right and then we were thinking about Moore's law Rogers godori said every 10 X generates a new computation so scalar vector matrix topological computation right and if you go look at the industry trends there was you know mainframes and mini-computers and PCs and then the internet took off and then we got mobile devices and now we're building 5g wireless with one millisecond latency and people are starting to think about the smart world where everything knows you recognizes you like like like the transformations are gonna be like unpredictable how does it make you feel that you're one of the key architects of this kind of futures you're not we're not talking about the architects of the high level people who build the Angry Bird apps and LAN Xang Angry Bird of who knows we're gonna be that's the whole point of the universe let's take a stand at that and the attention distracting nature of mobile phones I'll take a stand but anyway in terms of it that matters much the the side effects of smartphones or the attention distraction which part well who knows you know where this is all leading it's changing so fast wait back my parents used to all my sisters were hiding in the closet with a wired phone with a dial on it stop talking your friends all day right now my wife feels with my kids for talking to their friends all day on tax looks the same to me it's always it's echoes of the same day okay but you are the one of the key people architecting the hardware of this future how does that make you feel do you feel responsible do you feel excited so we're we're in a social context so there's billions of people on this planet there are literally millions of people working on technology I feel lucky to to be you know what doing what I do and getting paid for it and there's an interest in it but there's so many things going on in parallel it's like the actions are so unpredictable if I wasn't here somebody also doing the the vectors of all these different things are happening all the time you know there's a I'm sure some philosopher or meta philosophers you know wondering about how we transform our world so you can't deny the fact that these tools whether that these tools are changing our world that's right do you think it's changing for the better so somebody's I read this thing recently it said the people the two disciplines with the highest GRE scores in college our physics in philosophy right and they're both sort of trying to answer the question why is there anything right and the philosophers you know are on the kind of theological side and the physicists are obviously on the you know the material side and there's a hundred billion galaxies with a hundred billion stars it seems well repetitive at best so I you know there's on our way to 10 billion people I mean it's hard to say what it's all for if that's what you're asking yeah I guess I guess they do tend to are significantly increases in complexity and I'm curious about how computations like like our world our physical world inherently generates mathematics it's kind of obvious right so we have X Y Z coordinates you take a sphere you make it bigger you get a surface that falls you know grows by r-square like it generally generates mathematics and the mathematicians and the physicists have been having a lot of fun talking to each other for years and computation has been let's say relatively pedestrian like computation in terms of mathematics has been doing binary binary algebra while those guys have been gallivanting through the nether realms of possibility right now recently the computation lets you do math McMath ematic allah computations that are sophisticated enough that nobody understands how the answers came out right machine learning machine lying yeah it used to be you get data set you guess at a function the function is considered physics if it's predictive of new functions new data sets modern you can take a large data set with no intuition about what it is and use machine learning to find a pattern that has no function right and it can arrive at results that I don't know if they're completely mathematically describable so a computation is kind of done something interesting compared to a legal B plus C you\n"