Mojo - A Supercharged Python for AI with Chris Lattner - 634

**The Modular Engine and Mojo: A New Era in AI Development**

In recent times, there has been a significant development in the field of Artificial Intelligence (AI) that promises to revolutionize the way we approach machine learning. The Modular Engine and Mojo are two new technologies that have emerged as game-changers in this field. In this article, we will delve into the details of these technologies and explore their potential impact on AI development.

**The Modular Engine**

The Modular Engine is a product that was recently launched out of stealth mode. It is a modular framework that allows developers to build high-performance machine learning models without having to worry about the underlying complexity. The engine is designed to be highly flexible and can be used with popular frameworks such as TensorFlow and PyTorch. What sets the Modular Engine apart from other frameworks, however, is its focus on performance and deployment. It is optimized for real-world applications where speed and efficiency are critical.

The Modular Engine has a number of innovative features that make it stand out from other frameworks. One of the key benefits is its ability to provide kernel fusion, which allows developers to combine multiple kernels into a single, more efficient kernel. This can lead to significant performance gains in machine learning models. Additionally, the engine provides a high degree of customization, allowing developers to define their own custom ops and make modifications to existing ones.

**Mojo: A New Language for AI Development**

Mojo is a new language that has been developed specifically for AI development. It is a member of the Python family and is designed to be highly expressive and flexible. Mojo is optimized for performance and deployment, making it an ideal choice for real-world applications where speed and efficiency are critical.

One of the key benefits of Mojo is its ability to provide hackability. Unlike other languages, which can be inflexible and difficult to modify, Mojo allows developers to define their own custom ops and make modifications to existing ones. This makes it easier to fine-tune machine learning models for specific applications.

**The Relationship Between Mojo and the Modular Engine**

So how does Mojo fit into the picture with the Modular Engine? In short, they are two different technologies that complement each other. The Modular Engine is a product that provides a high-performance framework for building machine learning models, while Mojo is a language that allows developers to build custom ops and fine-tune models for specific applications.

The relationship between Mojo and the Modular Engine is one of "better together". By combining these two technologies, developers can create highly optimized machine learning models that are tailored to their specific needs. This can lead to significant performance gains and improved efficiency.

**Future Directions and Roadmap**

One of the most exciting things about the Modular Engine and Mojo is the potential for future development and expansion. The team behind these technologies has a clear roadmap for where they want to take them, and it's ambitious.

For example, the team plans to add support for more frameworks in the future, including Cuda and Intel MKL. They also plan to provide more tools and libraries for developers to work with, making it easier to build custom ops and fine-tune machine learning models.

The Modular Engine and Mojo are still early-stage technologies, but they have already generated a lot of interest and excitement in the AI development community. As they continue to evolve and expand, we can expect to see significant improvements in performance, efficiency, and customization.

**Conclusion**

In conclusion, the Modular Engine and Mojo represent a major breakthrough in AI development. By providing high-performance frameworks and languages for building machine learning models, these technologies are poised to revolutionize the way we approach machine learning. Whether you're a researcher looking to push the boundaries of what's possible or a developer looking to build highly optimized machine learning models, the Modular Engine and Mojo are definitely worth keeping an eye on.

"WEBVTTKind: captionsLanguage: enall right everyone welcome to another episode of the twimo AI podcast I am your host Sam cherrington and today I'm joined by Chris lattner Chris is the CEO and co-founder of modular AI before we get into today's conversation be sure to take a moment to head over to Spotify Apple podcast or your listening platform of choice and if you enjoy the show please leave us a five-star rating in review Chris welcome to the podcast hey Sam it's great to be here it is great to have you on the show The last time we got a chance to speak was I think back in 2020 around this time for the the big great ml language on debate but you've uh I think you've switched teams from a language perspective since we asked the question right I think there's a new Contender there's a new enter it on the field how about that there's a new contender in town yes and uh we will get deep into into that conversation but before we dive into Mojo uh the new Contender that we're speaking of and we'll be speaking of and all the work that you're doing on it I'd love to have you share a little bit about your background uh refresh our audience with you and some of the things that you've been up to yeah sounds great so I've been uh kicking around the software industry for a number of years now and have built and worked on a lot of different kind of low-level languages and compilers and other Technologies in the developer tool space have a lot of fun with that and have been learning a lot and so I've I'm most well known for open source things like the lvm compiler the Swift programming language things like this um but I got interested in in AI in 2016. and 2016 it feels like forever ago now but at the time I felt like all the best work had been done and it was just such a outrageous new approach to solving buying old problems and so I just got into it deeper and deeper and deeper and good news not everything AI has done yet so I didn't quite miss the boat but from there I went through many different parts of the journey worked on Google tpus and tensorflow and a bunch of other things like that um built more production systems worked on hardware and have touched many different parts of this elephant and so at modular we're here taking um you know I bring a lot of experience with a lot of different parts of the stack and we're trying to help lift AI to the next level and at least a part of that is in developing and promoting a new language for AI and that is Mojo can you talk a little bit about Mojo and uh its significance yeah absolutely I mean I think that if you if you zoom out to understand what Mojo is you have to understand where it came from and so um when we started modular our Quest is to make it much easier to build deploy and evolve AI research and so taking research lifting it to new levels and then getting that research into production now this is this is a quest that many people have been on for a really long time but it's really about making this whole technology stack more accessible and make it so more people can plan it so the experts at many different levels of the stack don't get stuck in one level and one of the things if you zoom into something like tensorflow or zoom into something like Pi torch you'll find is that many people work at the python level which is fantastic and they know how to build models and things like this but researchers who want to push the boundaries end up having to work at the C plus level and like that's one of the dark truths of python is that deep down underneath it when you get down to things that care about performance or care about Hardware you quickly end up in CNC plus plus land but AI is even worse than more challenge than most python systems code because now you bring in gpus and tpus and accelerators and all this kind of stuff and so now you end up in this actually a three world problem where you have python at the high level you have C plus plus in the the guts and then you have things like Cuda and other accelerator languages underneath and so Mojo is a solution to this equation right we're at modular we're building and solving and tackling a lot of these old problems in terms of how do you get models to be expressed in a natural way how do you map it onto accelerators and different kinds of heterogeneous fancy Hardware from the people you're coming out with and how do you make it hackable for researchers and to do that you have to get rid of this three world problem and the stack we built is really novel in the way it works underneath it covers is quite unique and so we needed a way to program that whole stack top to bottom and so we needed one language that could scale and so Mojo is is kind of that right it starts from this this requirement of let's pull together this three world problem into something that is consistent but then we need a syntax and so when we decided okay well we have a really interesting and cool to compiler nerd set of compiler Technologies out of the code to under the covers to enable all these accelerators and all this uh this fancy low level heterogeneous blah blah blah all the all the technology stuff right we need to we need to use our interface and so I was part of doing this we said well you know python is the obvious thing right python Powers so much of AI so much of data science in general and so what we decided to do is build Mojo into a super set of python so that first of all it feels like Python and it's accessible and it python programmers already know Mojo but but then we can also give python superpowers where now python can scale down and can be high performance and can run on accelerators and can do these things that it hasn't been able to do before awesome awesome yeah to what degree it is the work you're doing with Mojo uh build on top of or depend on some of the things that you've done in your past lives around llvm is that is llvm an enabler for this new uh this new tech yeah absolutely so um there's a number of different things that modular and Mojo build on top of and so you can say modular is a fairly young company we're about 18 months old at this point but it's built on many years of experience building a lot of Technologies in a lot of different places and so a lot of the research has been done in other contexts one of the pieces of that is this compiler framework called mlir mlir is is you can kind of think of it as an evolution of lvm that has enabled a new generation of compiler Technologies mlr is now widely utilized across the entire industry for AI accelerators and it's been very rapidly adopted it's something that I and the team built at Google and then we open sourced and it's now part of this lvm umbrella of Technologies llvm as you say is also a really important part of the component stack so lvm is an umbrella project that includes things like mlir and it includes like the clang compiler for CNC plus plus that many people know about there also includes fundamental building blocks like code generation for an xa6 processor and things like this and so we we build directly on top of a lot of that technology as well and so that's that's all kind of integrated into the stack and that's one of the um you know you make the hardware go Burr kind of things and so that's that's all super important and so when you think about you you kind of painted the picture of this uh uh three world problem every time you say that I think of three body problem it's a science fiction book and Trilogy um but it's three three you think of this three world problem and how as uh an AI developer who is trying to um you know actually get work into production you have to kind of think really deeply in the stack is the is the idea with Mojo that you want to make it easier to go deep in the stack or do you want to make it uh more transparent to the user so that they don't have to go down into stack and everything is just kind of working underneath without their kind of needing to switch boundaries yeah so so modular we have a couple of different goals right so one goal is meet people where they are solve today problems build a faster horse right and so and that in that in that department nobody wants to rewrite their models they want the code to just work right and so they want new capabilities but they want to fit within their existing ecosystem now when you deploy a model this is something that I think many AI practitioners don't talk about quite as much or maybe they the practitioners and the researchers don't have coffee enough because we're pretty well understood how to train a model deploying a model is another completely different set of problems and so you know you can you can take this in many different ways one example of that is that python is great for research it's maybe not the best for production deployment at scale and so many teams will end up rewriting their entire model in C plus plus just to get it to go if it's if it's a dynamic model for example language model now that and there's a bunch of interesting work and there's a really smart people that do that kind of stuff but why is it that we have to write our production model or our research models to get them into production that's really unfortunate right and so we'd like things to just scale and so one of the things that Mojo does is it's way faster right and and also if you use it the right way you can also make it so it deploys without you know into a single 8.0 executable and things like this and so it has new capabilities that the python natively doesn't provide which enables it to go much further and so useful that way now another piece of it is we're building really high tech you know what we call the engine that powers Ai and we have the fastest inference engine that's unified across tensorflow and pytorch now right and that engine is built entirely on top of Mojo and so it's not just about building a faster horse and like enabling the existing use cases it's about like unlocking this potential of this next Generation hardware and to us like that's equally important even though many people see mojo as being um you know it helps helps out Python and that's you can look at it as moving python forward but really where Mojo came from is working backwards from the speed of light of hardware and so you know we talk about Mojo can be up to 35 000 times faster than python uh that's because it's at the limit of what the hardware can achieve and Mojo some people will see it as it looks like a faster python or a python that has no Gill or a python that types enable performance or you know things like this but but it's really about what can the hardware do how do we unlock the full potential and how do we do that in a way that python programmers have direct access to but you said uh python that has no Gill that's like uh The Interpreter lock or something like that and it is one of many limitations that inhibits the performance of native python yeah yeah so um I mean I think that if you zoom into python right and I don't I don't know how deep you are in the internals of python a lot of folks use Python but they don't dig into it like like I do um and so don't dig into it like you do no yes I think you're you're in the majority um and so uh folks that use Python know that it's maybe slow it doesn't scale super Well it can't use all the processors on your machine without a lot of work around and things like this there's many aspects of the technology within the python implementation that make that so and so it has an interpreter right interpreters are slower than compilers generally it has what's called the Gill the Gill prevents effective use of multiple cores the implementation within python puts all of the objects on the Heap in a very specific way and there's a bunch of implementation details that go into how it works um Mojo is I mean interesting in different ways first of all it's compiled second of all it gets rid of the global interpreter log third it changes this representation fourth it adds types like you can keep layering in all the different all the differences here um but the consequences that it really is a it's a different animal it has different characteristics than what python the python implementation uh provides and so because it's a first principles programming language right it really has addressed a lot of the problems that python users have felt as symptoms but have not dug into you know Wise python this way you mentioned that it adds types you know one of the biggest things that's happened on the JavaScript side of things is the emergence of typescript um as being kind of this uh JavaScript compatible language but that is strongly typed is Mojo does mojo have that same kind of relationship to python yeah there's a there's a bunch of very um good analogies there so typescript super popular a lot of people use it um and it fits right into the the JavaScript ecosystem and so Mojo uh has a similar relationship to python where it's a superset it works with the existing ecosystem all the packages in Python just work in Mojo which is really important to us and so we don't want to break the python Community uh many many folks went through the python 2 to Python 3 transition it was really uh quite difficult in various ways and so we don't want we don't want to relive that right um and so uh and so you can look at mojo as a python superset and so by doing so you can pull pull forward all of the existing code and all that that ecosystem into a module world there's a big difference though and so um actually if you zoom into Python 3 as it is today python allows you to add types and those types if you add them to your code are there for some linter tools or Checker tools that can identify bugs and can identify you know obvious mistakes in your code sometimes but those types in Python aren't used and can't be used by the implementation for runtime exactly and so because of that you can detect certain errors but you don't get good performance out of that and so what Mojo does is it kind of takes that next step and so you can use the existing you know you can use lowercase i to say it's an INT you know and declared as an integer that way or you can use capital I and if you say it's capital I that's a Mojo strongly typed integer and it's checked and required and then it also is used for performance and you know we see you get 10x 20x faster performance if you just add a few type annotations and we have a couple of demos of that carrying forward that typescript analogy what I've appreciated about it is like well a couple of things one um you can you can add types without like fully buying into all of typescript and needing to know all that but still get like a little bit of benefit without going all the way into kind of this new paradigm uh and also when you are looking at code that you're not familiar with that is kind of fully adopting the new paradigm it's still familiar like you can kind of make your way through it knowing that there's things that you don't know uh if you're if Mojo enables kind of that same level of flexibility um I would think that's a good thing yeah well so you come back to this two world problem or the three world problem right where you have Python and python lives on top of C plus plus so being a superset means everything you do in Python Works in Mojo right so obviously types cannot be required because python doesn't require types right and so so that's also that's all true but in the traditional world of python if you run into performance problems or you need access to system software or low level things you have to go build a hybrid package whereas half C or C plus plus half python and so the value prop that Mojo provides is you can continue writing Dynamic dynamically typed code that's all good but instead of switching to a different language to do high performance lower level things just as you say you add a few type annotations right or you use some lower level syntax within your existing code and then you can you know put more effort in to get more performance instead of um you know having to switch to a completely different language where the debugger no longer works on both sides and you know all these things got it got it you mentioned that Mojo like gives python superpowers like um that made me think of I I may I'm probably not alone in this that you know the first place I've learned about like this Dunder the dunder uh functions in Python was from Jeremy Howard and the fast AI course like there's probably a lot of folks listening who came across it in the same way uh are you accessing these superpowers through like python native structures like that or are they annotations or like how do you well first of all what are beyond the ability to kind of kind of tap into lower level structures like what are some of the kind of superpowers or enhancements that Mojo adds and then how are they accessed yeah so I mean you mentioned Jeremy Jeremy's been a huge influence on me personally I mean you could say you can go back to saying like why does mojo exist and a lot of that's Jeremy's fault just just between us right and he's been he's been pushing for years specifically for hackability research ability like Jeremy's Jeremy's got the unique kind of brain where like ever like the whole problem fits in his head and so he can understand all the different parts of the problem right and so so yeah so Mojo has all the dunder methods and so if you want to add you know you want to make the plus operator work you can implement the underground ad method and things like that but then it goes a little bit further and so if you look in the space of system programming languages you enter you enter the realm of things like rust and C plus plus and like these kinds of languages right and the systems programming world for a long time has been pushing towards bringing safety into this world so C C plus plus you have a pointer pointer dangles bad things happen your app crashes you have security problems all these kinds of things rust and Swift and other languages like that have gone further into making uh make it possible to get good performance without sacrificing uh safety and so we've brought a lot of those ideas directly into Mojo and so in Rusk there's a notion of lifetimes and ownership and these kinds of things that enable safe pointer usage and things like that so Mojo brings that in now these are features that you know obviously you don't have to use unless you're writing low-level code and you care about getting a high performance in certain use cases but having that available gives you a very accessible whole stack solution that allows you to go all the way down and get rust style performance out of a CPU and um and similarly like we talk about this Hardware stuff well at the bottom even on a CPU you have many cores you have these crazy Vector units and Matrix extensions and like it's really interesting to see the evolution of Hardware because if you go back 10 years ago it used to be that there was a CPU thing and a GPU thing and these were points in the space that were very different and they were completely unrelated from a hardware perspective but today that whole line has gotten blurrier because gpus have gotten more programmable CPUs are getting more AI stuff in them the CPUs these days have B flow 16 and like all these other AI things that are being built right in and so we're getting a spectrum of programmability and so a lot of what Mojo is about is unlocking that for people and making it accessible and making it so that again you don't have to switch languages just to just to get access to this stuff you know that you're rightly focusing on CPUs and gpus but there's uh you know as you know a wide variety of other options and perspectives tpus and other um you know more um you know other kind of newer newer and more specific more exotic that's a great word yeah exactly approaches to this do you are you building mojos such that you know it is anticipating all of these options or is you know when you you're focusing on making mojo better use acceleration are you really talking about you know gpus or maybe gpus and tpus uh well so um so I I spent a couple of years working on Google tpus and Google tpus are uh I mean they're they're an impressive set of Technology machines because they scale up to exit flops of compute they're highly specialized for AI workloads they're also internally really weird and so to plus one exactly what you're saying right AI isn't just about like a GPU right I mean so much so much thinking around AI technology is okay I just need to get the gpus lit up and then go but uh particularly if you start deploying well if you rang on a smart camera or something the AI Chip is going to be completely specific to that camera right if you're doing uh you know Google scale training on on crazy distributed machines like that that's that that Hardware is quite different and so um this is where one of the things that's I think very exciting to me as a technologist about Mojo is that it's built on this mlir compiler so mlir is again the thing that we built started started back at Google now it's being used by basically the who's who of all the hardware industry and mor talks to all of these things and so um if you uh if you're familiar with llvm lvm as is is now a 20 year old technology it's widely adopted and talks to all the CPUs and some of the gpus but uh llvm has never been successful at targeting AI accelerators and video optimization engines and like all the other weird Hardware that exists in the world and that's the role that mlr provides and so Mojo one of the ways that it's implemented is it fully exposes that power and brings mlar compiler uh you know all the nerdery that goes into the compilers and it exposes up to library Developers and so it's actually quite important that you can talk to for example tpus or other things like that in their native language which in the case of a TPU is this like 120 by 128 tile and being able to expose that out into the language is really quite important so anyways that's that's a long way of saying yes it is more than CPUs and gpus though CPUs and gpus are the starting point obviously for lots of really good reasons but we've built this thing to have really long legs that can bring us into the future and do you see it extending um to things that are even more exotic like your graph cores and samanovas and like the you know things that take a very different approach to um the underlying compute yeah so so Mojo's really so let me bring you back to where modular is coming at this because Mojo is one of the components of the air stack as a way to look at it so modular is building what we called a unified AI engine and so this unified AI engine what the heck is that well it's an engine it's it's an engine it's not a framework and so people are familiar with pytorch and tensorflow and these machine learning Frameworks that provide provide apis and so you get and then module and the apis that we're all familiar with underneath the covers there's a whole bunch of deep technology for getting things done to a GPU getting things onto a CPU and so pytorch 2 just came out with this torch Dynamo stuff and like all these all these exotic low level technologies that make the hardware work on gpus Cuda is a major component of the technology stack that everybody Builds on top of right and so our engine fits at that level of the stack and the the cool thing about it particularly when you're deploying is that it talks to lots of hardware it also talks to both Frameworks and so when you're taking a model from research for example you have a nice pie torch model you get off hugging phase we have lots of people do this of course um you want to deploy this thing well you don't actually want all of Pi torch in a production Docker container what you want is a low dependency efficient way to serve the model and so that process of getting from pytorch and into a deployment thing is what the modular technology stack can help with now as you say coming back to answer your question graph core some Nova all these all these Hardwares can't talk about any relationships that's not but the um but from a technology perspective they're they're all slightly different in high level ways so some anova's chip is from my understanding a what's called a cgra right which is a super parallel really crazy thing that has almost nothing to do with CPUs graph scores are apparently lots lots of things that look like CPUs but they their memories are all really weird and different the way they communicate is very structured right and we all know CPUs and gpus right um and so uh what our technology stack enables is if you're the samanova or cerebris is another example of a really crazy system uh like those people need to implement a compiler for their chip right and so they're the experts on their ship they understand how this works and what modular can do is provide a thing for them to plug into so that they get all of tensorflow and pytorch and one of the major problems we have today with with Hardware accelerators particularly ones that are not the dominant player in the space is that their tools don't actually just work right so often um I'll pick on Apple for example right so apple has a deployment technology called core ml is talks to the neural accelerators and they have all this amazing Hardware on a Mac or an iPhone but cormel is not actually compatible with all the models and so getting something onto an Apple device means fighting with this translator and trying to get it to not crash and you know doing all these things the the production World struggles with and you know if I I talk with many people many leaders at software companies that are building AI into their products and a lot of software leaders uh you know they they see the symptoms they see okay it takes three months to get a model into production right they they see symptoms like I need a team of 40 people to be able to deploy things and they're very expensive very specialized people why is it this hurt right right and the answer to those questions are the the tools the Technologies are not anywhere near the tools and Technologies used for training and so there's so much suffering so much from so many problems in these things and and the root cause is the technology I've been working on for years which is um for any one of these chips people have had to build an entire technology stack from the bottom up and there's very little code reuse across across Hardware and Hardware vendors again I'll pick on Apple but I love Apple also it's not it's not out of anger it's that you know it's very difficult to track the speed of AI itorch moves super fast right this is stuff that you need a very dedicated team you need to be super responsive you need to be on top of this stuff and also uh the compiler problems and the technology problems to make the hardware work are really difficult and so um there have been a lot of really smart people working on this but if you're always focused on getting the next ship out the door and you can't take a step back and look at this whole technology stack then you can't make the leap that modular has is driving forward interesting interesting so you said something earlier kind of describing the the engine and its place and it made me think of uh you know fur for ages now right um we've kind of you know decry the kind of Stranglehold if you will put a negative spinner the Cuda has on like the low-level programming interface which basically kind of keeps you know ensures that Nvidia has you know long lasting position and makes it very difficult for you know say an Intel to come out with uh um you know a CPU with some numeric capabilities and displace it because there's all this you know hey there's all this code that's been written in these three worlds that you've mentioned and like it's not as easy as just swapping out the hardware right are you envisioning that this this modular engine is this kind of replacement for Cuda that is multi you know Hardware capable is that the the core idea yes I mean that's that's one of the value props we provide so um if I zoom out and look at the steps the end history has been going through so um we are we as an AI industry owe a huge debt of gratitude to Cuda like go you go back to the Alex net moment for example right A lot of people talk about it was a Confluence of imagenet and the data sets and things like this it was a Confluence of hardware and the fact that gpus enabled an amount of compute that could cause Alex net to happen but a lot of folks forget that Kudo was what enabled some researchers to go write convolution kernels and actually get a machine learning model running on a GPU which the hardware is definitely not designed for back in the day right today's yeah it's taken over and it's a little bit different but back in the day that initial breakthrough was really in in you know a large part thanks to Cuda and so one of the things that's happened is that as um AI has taken over right a lot of technology has been built on top of Cuda and it's it's a very good thing and it's very powerful and flexible and hackable and it's great but as you say it's kind of put us into a mode where one vendor has this dominant position and it's very difficult to um you know if you're a hardware vendor at even an AMD or some other widely known company that has really impressive Hardware to be able to play in this ecosystem now what what's happened and one of the things that led into the thinking that went to modular existing is that there have been a lot of compiler technologies that have been built for example there's this xla compiler that I worked on at Google there are new compilers every day being announced by different companies where they're saying I will build a compiler that will make ml go fast for example on gpus um and so several years of work lots of cool technology lots of examples of these systems exist like and the names keep changing but the technology is very powerful the problem with that is that they have lost one of the things that made Kuda really powerful which is the programmability and so what what has happened is the compiler nerds which I'm a member so I can I love the compiler nerds but those compiler nerds have went and turned AI code generation and things like this into a compiler problem but that has excluded all the non-compiler people right and so if you look at tpus for example tpus have uh can express everything you can do in this xla compiler and so I can do Matrix multiplications convolutions element wise ads Etc et cetera et cetera but it can't do sparse operations can't do data operations can't do pre-processing and so um AI you're an expert you know this AI is not just about matrix multiplication it's about data loading pre-processing this full parallel compute problem that is part of AI and so what has been lost over the several years of trying to solve uh the Cuda lock-in problem is that people have tried to make this compiler problem and now you've turned into a different lock-in but instead of locking into Hardware you're locking uh most smart people out of the ecosystem and these compilers haven't been super successful at being compatible with code and things like this right and so what modular is doing is we're saying okay again I love all these people I've been working on this stuff for a long time myself but what we're doing is saying start from a different perspective what is our assumption our assumption is people don't want to rewrite their code what that means is you have to have all the operators all the systems that go into something like tensorflow or pytorch need to work okay well that's a thousands of operators each and this really messy job but we handle that job for for the world right the other thing we say is okay pytorch is really popular in research tensorflow is still quite popular in production what what we see out in the industry again every shop is a little bit different but a lot of people have both tensorflow and Pi George and so they don't want to have this bifurcated stack built on top of these things they want to actually have one system that they can scale out and so we make our problem even more complicated by building a unified solution and so now it's not about 2 000 on the tensorflow side 2000 on the pi torch side it's about four thousand right and it's actually even worse than that when you bring in some of the other Technologies but now you talk about Hardware right it's not just about Intel CPUs and Nvidia gpus it's this other axis that then does a multiplication to this whole problem and says okay well now I have many different there's probably a hundred or a thousand different kinds of hardware and so where traditional teams have built a point solution saying okay I'm going to build a fancy compilery thing for one hardware for one framework and you know in one One Direction along this and they built one of these uh I mean you often very good tools but they're very purpose built in one case you know we're having sympathy for all the software people that have to deploy because software people they don't have one piece of Hardware they don't have one model they don't have one framework they don't have one product right their products evolve over the course of decades sometimes and software lives a long time and so they need to be able to talk to lots of different generations of this stuff so a modular what we've done is we've said okay well this is suddenly a very different problem for a technology perspective than building a point solution and this this problem this I need to solve this massively complicated space where you have Hardware on one side you have the sheer scope of AI on the other space is what drove Mojo to exist because we need a way to make this entire stack accessible hackable uh understandable to people that are not themselves compiler Engineers we need people that know really fancy numerics and sparse algorithms and you know convolutions and or people that know their Hardware we need to know like all these people that are involved in all of this massive technology stack that we've been building to be able to collaborate and work together and build cool stuff at a high velocity right and that's where we think that Mojo is really interesting because as far as I know nobody's done that like I mean it's like a completely unique Creation in the space and um and we hope that will really simplify the world one of the things you know I uh we kind of joke about it but you know our biggest enemy you know the mortal enemy that we struggle with a modular is is actually just complexity right and the in the AI space there are so many systems so many Technologies so many uh you know layers of stuff that has been built up and you know if you zoom out coming back to you know 2016 I thought I was you know too late to do anything important in AI like what you realize is that AI is still not done right this the stack that we're building on is adolescent like it's it's in its teenage years and so what we need is we need to get to that next level where everything actually works is way more predictable it's actually hackable when you try and experiment as a researcher the tools don't break out from underneath you and when you achieve that we think that the the impact of AI can go much further and that many more people can participate when you when you talk about the the complexity and the diversity of underlying components and then you talk about kind of how the lifespan of software kind of extends over generations of underlying infrastructure it makes me think of uh like dependencies and dependency management and packaging and all these things as like huge problems that need to be solved is does that play into what you're doing at all uh also not directly but that your your pattern matching your neural net there is doing a very good job of pattern matching and seeing seeing what we're talking about here um the the packaging problem is often because you have all these incompatible systems that are lashed together and so if you zoom into python packaging I mean there's there's a lot of things going on there I'm not an expert in Python packaging people I talk to that are um a big part of that is because of the C parts of these python packages right so you pick our old friend numpy for example right numpy has a ton of C code inside of it as well as the python API well packaging that means you're not actually packaging python you're packaging C code C's never had a package manager that's any good right and so and so you know it's it's funny you look at these old problems we've been struggling with well you get rid of the C code and suddenly packaging is way simpler right and so this is one of the things that Mojo provides is providing unified language and more generally every time you see one of these fissures like you're talking about the hardware divide you know here we're talking about python C plus plus you talk about Cuda versus sickle versus hip versus like all these other crazy things that exist in the world like each one of these things is at the bottom of our stack driving complexity up and so at the end of the day you know you'll have a researcher who very reasonably says hey I just want to run this model on AMD GPU no big deal right should flip a switch right but the problem is is that at the very bottom all this stuff is very different and all the cracks go up and you know it's if you take reliability and it's 90 reliable and then the next step is 90 reliable next step is 90 reliable you start multiplying together all the point nines and you get something that's ten percent reliable right and this is this is this is the AI stack that we all depend on and you've got you've got you know this easy problem which is well okay let me be careful here you've got this one class of problems that is very challenging but it's easy to deal with and that is when you're trying to use all this stuff together and it just doesn't work like it doesn't compile it doesn't run or whatever but then you have this other problem where it works but you don't know that it's actually not working because of like semantic differences or you know what have you um it's either not performing well or you know your results are are you know you're not converging your results are out of whack and like you're digging deep into underlying libraries trying to figure out like why are your answers like crazy yeah I give you one example right I mean just go through the life cycle of deploying a model right so to just you know make up a scenario but um but to just double click on what you're saying okay I want to deploy a model well now I need to get it to go through coramel or one of the many things for deploying to some piece of Hardware results don't work well now to just like 100 like just it's just like a plus one you 100 times now you need to know not just pytorch not just your model not just core ml but also the translator also all these things and you dig in and dig into dig and you find out it's handling the edge padding on a convolution slightly differently right it's like and so now wait a second so like all of these tools were supposed to be making it easy but because they don't they're not all reliable like it's a sleeky abstraction now you have to understand all of this complexity right and so this is this is what causes it to take three months to deploy a model right fundamentally this is something where you know I think that many folks that are building AI products and they're managing you know they're the VP of software some technology company right they just see the symptom of why does it take so long to get this model in production but they don't realize that the tool set this this fundamental technology that all this stuff is built on top of it's not up to the standards of a software tool set it's not you know no C programmer would tolerate AI Tools in their quality you know it's just crazy but but again this is just the maturity of the AI technology space and by solving that problem you know what we want to see is like way more people way more Technology Way way more inclusion in the kinds of companies that are able to work with AI and do things and we think that'll be a really big impact on the world so we've talked about Mojo we've talked about the this inference engine um or the engine that we've referred to in the context of Mojo you've talked about like 35 000 you know X performance and improvements over a standard python I do need the engine to get that level of um of uh performance Improvement you know it is switching using Mojo like lock you into using this engine like what's the business model there are you do you have licensing issue like it's both you know I have a bunch of questions kind of coming out here and they span kind of Technical and like business licensing kinds of questions how does all that work great question so you've identified the right the right players there's Mojo which is a programming language it's a programming language that's a member of the Python family it's really useful on for example just CPUs which is the only place that python plays and so many people see mojo as just being a better python now we have the engine the engine itself can stand alone and you can use the engine as a drop in replacement works with tensorflow pytorch it'll make your burp models go 3x and you're using it as a drop in replacement for what exactly for a traditional tensorflow implementation so actually before I before I answer your bigger question let me dive into that so what the modular engine does is you replace the tensorflow with our tensorflow or your pie torch with our pytorch or if you use in torch script or things like this and so you just put in and put a new thing in your Docker container and and what you get from that is massively better performance and so you know tensorflow is quite good at production but we're showing three to five x better performance on for example an Intel CPU or an AMD CPU or an arm-based graviton server in AWS and so you think about that and you see three to five x better performance well that's a massive cost savings exactly that is a massive cost savings well and it's also a massive latency Improvement and so many of our customers love that because then they can turn around and make their models bigger right and so now you can have a better product for your customers and so you get you know direct impact on your costs direct impact on your product and this is a huge deal for people and again this is where you know I'm a technology nerd sometimes right and I love some of the how it's built but the impact on products is is phenomenal and that the engine is a really big deal for for just like getting production AI to scale okay so just kind of continuing down on that line before we click back out then I would imagine one of the commitments that you need to be making to folks that are thinking about using this thing is how close you're going to stay to the you know the development of that stack right yep yep absolutely well so I mean one of the things also that um you know our customers love is that Google and meta don't actually like support tensorflow or Pi torch right these these people forget but these are not products right these are open source projects they are Hobbies maybe for the the mega Corps and so you're essentially offering like a supported opt performance optimize version of tensorflow and pytorch right but then to if I'm going to think about using this I need to know that I'm not going to get left behind like you're gonna you know I'm gonna wake up one day and I'm three versions behind the latest thing in tensorflow and it has something that I need in order to make my you know you know 500 trillion parameter llm work yep yep so I mean we're committed to doing that so I don't know if this is like a binary question but yes we do that um but the the thing that if you you know the Enterprises we talk to that care about their costs right um often they want somebody that they can call right and if if you think about it right it's it's analogous to who wants to run a mail server themselves right you can run send mail or something right but nobody in the right mind does that right why do we do this with AI infrastructure it's because there's no choice there's been nobody to reach out to nobody that actually can't do this and the thing that I think many folks forget is that meta and Google they've their technology platform has diverse a lot from what the rest of the industry uses right so they both have their own chips they build right for example right and they have their own specific use cases and so they're not actually focused on making the traditional uh CPUs gpus and public Cloud use case actually really good that's one of the reasons why we have such high value we can deliver um and so so yes we are we are a this is a product for us that means we actually support it that means we invest a huge amount of energy into it this is one of the reasons why we have such phenomenal results as well so yeah to your other question like one of the great things about being a drop in replacement is that from a customer perspective at least is that it means you can undrop in like you can use our technology and if you want to switch back you can always switch back at any time at some point we'll make it back to that broader question but I'm thinking about like you know we've talked about uh moja's being this better python um but you know what makes python usable and AI is not just kind of the core python it's all these other things numpy and and pandas and many other packages you mentioned you know we know they have C at the heart of them so at some point there's a significant number of packages that you also have to kind of rewrite that need to be Mojo native I would think in order to get the full uh the full performance yeah so um let's dive into compatibility so uh Mojo's still a young language we haven't talked about that but it's still not it's not done and I think it will take another year or so of development before it gets to be um like solving all the world's problems that we want to solve things like this right but even today you can import and use arbitrary packages like numpy pandas tensorflow pie torch whatever directly into Mojo and so a really important part of how our stack works is you don't have to rewrite all of your report or touch all of your python packages I mean many people have their own python code it's not just big packages like numpy right and so and so Mojo talks directly to all those packages you don't have to write wrappers it all just works right and this is this is a really big piece of that now if you choose to move your code into the Mojo Universe then you can get the benefits the Mojo provides and so if you're just talking to an existing package well it'll still run python speed it will be fully compatible and but it will also run with the same implementation this default python implementation and so moving your code to Mojo can then unlock these new capabilities but then you can choose to do that a package at a time or however you'd like to do that and so that understood in order to I guess I'm curious like how much of the like surface area of AI related packaging have you built or am I thinking about this the right way like in order to fully uh provide the performance benefits that you're talking about did you need to you know Port numpy over to you know kind of a Mojo native or to run on mli or whatever at whatever level that makes sense did you you know um pandas all these other like how much did you need to do and how much of that is done like percentage-wise relative to what you expect will need to be done to be absolutely uh well so the answer is zero so our our solution is like our solution enables to talk to the entire python ecosystem out of the box so matplot website sci-fi numpy like all that stuff just works right and and that again come back to being pragmatic and productive like we can't uh I'll make fun of you and I'll make fun of me from our last call on the you know like great language debate right the the the the problem with any new programming language is a new programming language has no community has no package ecosystem right and so that that again like uh myself on that previous call and all the other lovely people there right you want to get ml out of python for whatever reasons is is very exciting but it's not very pragmatic because the entire data science ecosystems all wrapped around python this one's also pretty great right I mean I think that that's something that um you know people in other communities like to make fun of python because of indentation or whatever it is but python is beautiful right subjectively I will say it's my opinion but and so what Mojo does is enables you to use literally everything in the python ecosystem and then if you want to invest more effort to get more performance then you can do that but you don't have to right and this this is this is the major value prop now in the case of modular and why we built Mojo our like business objective is go make ml really awesome right and we want we we care about the Matrix multiplications and the convolutions and the like the core operations that people spend all their time on in Ai and so we wrote We rewrote all of that stuff in Mojo and so this isn't like rewriting that plot live this is like rewriting Intel mko equivalent right or rewriting the Cuda implementation of these Cuda kernels equivalent right and so that's where we've put our energy into because that's what enables unlocking of the hardware enables unlocking of performance enables unlocking of usability and so you know we have really exotic fancy compilerary features that enable kernel Fusion automatic kernel fusion and things like this that you know no no normal ml researchers should ever have to know about they just see okay it runs 10x faster in this use case well that's pretty cool right and another thing that I think that folks are struggling with is that you know um uh take Transformers for example I mean you know Transformers I know Transformers we all love Transformers they're eating the world um but one of the problems with this is that because they can't became so so important to so many different use cases we got all these very hyper specialized software stacks for Transformers and so these existed the low levels so Nvidia for example has a set of kernels called faster Transformer these are at the high levels and so there's always distribution Frameworks for Transformers and things like this and so you get this very Transformer specialized stack which again forces you into this very narrow view of what a Transformer is and it works for the Benchmark but if you're a researcher you want to go push the boundaries and try slightly different Transformers or you know maybe there's a thing Beyond Transformers like I hear that rnns are coming back in and you know maybe ffts will have have their day right I mean there's like all these different theories and if we can't enable people to do that research like we may be missing out on that next big step and so um this the specialization that's inherent in um you know things becoming important really Cuts against generality and so that's that's one of the things that we've seen and we that we really want to like again like if you dramatically reduce complexity of these Stacks you can make it way more hackable and that we believe will enable people to invent new things yeah but I want to push on this one more time just to make sure see if I can figure out what um expose any uh kind of fissures in my understanding here is is what you're saying that um or is it the case that you know when in thinking about the relationship between like uh numpy or pandas and and python that those libraries that you know we all use as part of um you know that are kind of ubiquitous and from a machine learning perspective is it I can imagine a couple of things you know one that um like they're sitting on top of the underlying they delegate enough of what they're doing to the underlying python that you kind of replacing fixing that underlying python gives you you know some percent of the performance benefit um such that you don't need to deal with the upper piece who who is you here right so are you are you asking how it works internally are you asking how a user uses it or are you asking when somebody should do something which piece which piece of this elephant are you touching I you know I'm both try I'm primarily trying to make sure that I understand how it is that you're able to offer the performance improvements that you're boasting without needing to touch any of the libraries that people depend on and so I'm kind of asking about internals but also like how they're used so I'm imagining like several scenarios a you know for whatever reason yeah well a is like your 35 000 number that is you know that's kind of a made-up number that uh doesn't actually rely on any external dependencies and it's kind of a useless performance boasting metric that's one possibility another possibility is um numpy delegate you know these libraries delegate enough of their operations to the underlying python that you can get the you know significant performance gains even without touching those things and hey if somebody did touch those things maybe it would be 70 000 or whatever yeah I can break it down for you if you want got it okay all right so so let me let me break it into a couple of categories so one is you have unmodified python that is just imported so you take matplotlib just click on something that's not performance sensitive uh there's no reason to rewrite matplotlib it's it's fine right and so you just import it if you import it the way that runs is that runs with the existing C python interpreter and Mojo talks to the C python interpreter and so that code runs 100 compatibility everything just works great and like this is why the entire ecosystem works but it's no faster okay and so you're really what you're getting as you're getting at what are my trade-offs what are the levers I'm pulling here and so full compatibility but no performance benefit those things go together right another another another thing you can do and so if you so um if you go to modular.com you can see our video and you can see Jeremy giving a demo Jeremy Howard giving a demo and um and there we can see is you say okay I just take some python code I put it into Mojo and now it runs you know it depends on the code but you know roughly 10x faster out of the box maybe 15x 16x I mean there's more we can do to push it further we just haven't focused on that and that's running same code but in Mojo and the reason you get performance is it's a bet it's compiled instead of interpreted it has a new fancy compiler stack all the stuff under the covers but it's still running fully Dynamic typed code it's just running dynamically type code in a better way and so you can get you know 10x out of the box that's pretty good I mean that's that's quite nice then you start layering in and saying Hey I want to add types okay well now you're talking about like changing the the in-memory representation that's going to be way more efficient well that's 10x now you say it give me threads okay well that's 10x okay now I want to use vectors and do Hardware that's another 10x and so if you stack all these things up this is where you get into 35 000 times and to I will I will I will agree with you by the way that the 35 000 number is a Cherry Picked number this is this is a an extreme result on mandelbrot right which is a simple algorithm we can explain and people can play with in a notebook and stuff like this but we have lots of people just you know random people on the internet using Mojo that are getting hundreds and thousands of times speed UPS and so and so the 35 000 may be Cherry Picked but reasonably expecting getting over 100x is 100x is pretty big like that that I consider that to be a pretty big deal right and and and you can look at that as 100x over python or you can look at that as saying Python's now 100x more relevant for keeping me out of C right and that that's both of those sides of that is really cool now anyway so coming back to your categories and then it was it was I mean at that third category um or that's it maybe it's more a scenario than any category it is also um I think largely the case but maybe you can like validate this for me like you're probably using a lot more of these libraries when you're doing like Eda and kind of like the early stages of like building a model but then you finally have your model and you know the form of a graph and tensorflow or Pi torch and you know at that point like the things that you're relying on are kind of much lower level as opposed to like your pandas and your uh scipod all this kind of stuff and so you're like your exposure or your need to pull in all these libraries at the kind of the point where you're in Vogue like you know kind of core training Loop or inference is less yeah you want to like deploy the model yeah well exactly and so and so if you so if you zoom into the third part let's just call because the things we just talked about are actually completely generic software engineering things right we talked about using arbitrary python package off the shelf we talked about take python code arbitrary python code in an arbitrary domain and just make it go fast right it's fun right but it's has nothing to do with AI now let's talk about AI right and so AI third category super important turns out many of your readers or your Watchers what what want to think about AI right and so AI is this really fascinating technology stack that yeah you talk to it in Python but underneath the covers you have kernel fusing graph compilers and all this in accelerators and like all this other cool stuff right and so this is where the modular engine comes in right and Mojo Mojo is an implementation detail on the modular engine um and Mojo makes it all super extensible and hackable but this technology space actually really has nothing very little to do with syntax or with a programming language it's a completely different technology stack that's much more similar to like these xla compilers and the the uh you know the internals of Cuda and or the internals of Intel mkl or you know these kinds of things so so these are all different but come come back to your basic question several layers up in the stack which is what is the relationship between Mojo and the modular engine right because that's also uh really important so the modular engine is really focused around high performance production deployment go solve problems in Ai and so it's an AI thing mojo as a language is actually a new member of the Python family right and so for modular we see the engine as being a product and we see mojo as a technology and both of these things stand alone so you can use mojo as just a better python if that's what you want to do or you can use the modular engine as it drops into tensorflow and pytorch and then you just have a better tensorflow and a better way to deploy your models but there's a much bigger and I hopefully I believe much more important in the long term You Know Better Together story here right because putting a custom op into tensorflow or pytorch is very difficult you know we talk about um you know the three layer problem right the python C plus plus Cuda well if you want to put a custom Cuda up into pie torch you have to write C plus cudas like a C plus plus thing right but it's a C plus plus thing that doesn't have a debugger it's a C plus plus thing with a whole bunch of weird constraints where you might wedge your GPU right and and like that that complexity makes it so people don't do the kind of research that they might otherwise do and obviously if you have to hack C plus plus or even if you just have to rebuild tensorflow like who in the right mind knows how to do that you know I I know these people and I love these people right but but this is this is just monstrous right and so the better together here is that if you're an AI person You're Building deploying models you're training you're doing research well with Mojo inside the modular engine allows you to do is make this whole thing hackable so you can Define custom Ops so you can get kernel Fusion so you can get all this stuff for free and then when you want to go push boundaries you can go crack open the box and say okay I'm going to write a a custom sparse thingy for my domain or a custom summary function that does some fancy domain specific reduction before I send all the data across the wire and making that possible is is I think really cool interesting yeah I I feel like we've covered a lot and there's still a lot to cover and particularly in this dimension of hackability um but we don't have time to cover all that uh to to kind of wrap things up I'd love to have you maybe riff a little bit on like future directions roadmap like what are the the big things that you need to attack next to kind of build out this Vision absolutely so um modular just came out of stealth and so we have a nice video on our website at modular.com if you haven't seen it um there's a whole bunch of new drops that will be adding to the product over the coming months and so you can sign for a newsletter on that the thing I'll say is that Mojo is still quite early and so it's still it's not like ready for production use as a general drop-in python replacement but we have an amazing community of people already coming together and we're developing it in the open and so this is this is I think a pretty big deal for something that I hope will be important to a wide range of different use cases I mean python goes everywhere right and so um and so I think it's really important that we as a community build and do this together and modular's obviously driving this because it's really important to us like but we don't have all the smart people in the world and so I'd really love for people to join us on our Discord forum and other places where we can interact and and build this together awesome awesome you mentioned that Jeremy was a big inspiration I'm glad he wasn't able to inspire you to write it around Pearl again he's he he's an incredible person so he gives a killer demo in our uh a launch showing how to take matrix multiplication and he doesn't get it's 35 000 times but it's you know 20 000 times or something in a notebook which is pretty cool so it's yeah so he's he's an incredible person like don't under I just don't underestimate Jeremy absolutely absolutely well Chris thanks so much for getting us up to speed on what you're working on with uh Mojo and modular yeah it's really great to talk to you Sam same same thank youall right everyone welcome to another episode of the twimo AI podcast I am your host Sam cherrington and today I'm joined by Chris lattner Chris is the CEO and co-founder of modular AI before we get into today's conversation be sure to take a moment to head over to Spotify Apple podcast or your listening platform of choice and if you enjoy the show please leave us a five-star rating in review Chris welcome to the podcast hey Sam it's great to be here it is great to have you on the show The last time we got a chance to speak was I think back in 2020 around this time for the the big great ml language on debate but you've uh I think you've switched teams from a language perspective since we asked the question right I think there's a new Contender there's a new enter it on the field how about that there's a new contender in town yes and uh we will get deep into into that conversation but before we dive into Mojo uh the new Contender that we're speaking of and we'll be speaking of and all the work that you're doing on it I'd love to have you share a little bit about your background uh refresh our audience with you and some of the things that you've been up to yeah sounds great so I've been uh kicking around the software industry for a number of years now and have built and worked on a lot of different kind of low-level languages and compilers and other Technologies in the developer tool space have a lot of fun with that and have been learning a lot and so I've I'm most well known for open source things like the lvm compiler the Swift programming language things like this um but I got interested in in AI in 2016. and 2016 it feels like forever ago now but at the time I felt like all the best work had been done and it was just such a outrageous new approach to solving buying old problems and so I just got into it deeper and deeper and deeper and good news not everything AI has done yet so I didn't quite miss the boat but from there I went through many different parts of the journey worked on Google tpus and tensorflow and a bunch of other things like that um built more production systems worked on hardware and have touched many different parts of this elephant and so at modular we're here taking um you know I bring a lot of experience with a lot of different parts of the stack and we're trying to help lift AI to the next level and at least a part of that is in developing and promoting a new language for AI and that is Mojo can you talk a little bit about Mojo and uh its significance yeah absolutely I mean I think that if you if you zoom out to understand what Mojo is you have to understand where it came from and so um when we started modular our Quest is to make it much easier to build deploy and evolve AI research and so taking research lifting it to new levels and then getting that research into production now this is this is a quest that many people have been on for a really long time but it's really about making this whole technology stack more accessible and make it so more people can plan it so the experts at many different levels of the stack don't get stuck in one level and one of the things if you zoom into something like tensorflow or zoom into something like Pi torch you'll find is that many people work at the python level which is fantastic and they know how to build models and things like this but researchers who want to push the boundaries end up having to work at the C plus level and like that's one of the dark truths of python is that deep down underneath it when you get down to things that care about performance or care about Hardware you quickly end up in CNC plus plus land but AI is even worse than more challenge than most python systems code because now you bring in gpus and tpus and accelerators and all this kind of stuff and so now you end up in this actually a three world problem where you have python at the high level you have C plus plus in the the guts and then you have things like Cuda and other accelerator languages underneath and so Mojo is a solution to this equation right we're at modular we're building and solving and tackling a lot of these old problems in terms of how do you get models to be expressed in a natural way how do you map it onto accelerators and different kinds of heterogeneous fancy Hardware from the people you're coming out with and how do you make it hackable for researchers and to do that you have to get rid of this three world problem and the stack we built is really novel in the way it works underneath it covers is quite unique and so we needed a way to program that whole stack top to bottom and so we needed one language that could scale and so Mojo is is kind of that right it starts from this this requirement of let's pull together this three world problem into something that is consistent but then we need a syntax and so when we decided okay well we have a really interesting and cool to compiler nerd set of compiler Technologies out of the code to under the covers to enable all these accelerators and all this uh this fancy low level heterogeneous blah blah blah all the all the technology stuff right we need to we need to use our interface and so I was part of doing this we said well you know python is the obvious thing right python Powers so much of AI so much of data science in general and so what we decided to do is build Mojo into a super set of python so that first of all it feels like Python and it's accessible and it python programmers already know Mojo but but then we can also give python superpowers where now python can scale down and can be high performance and can run on accelerators and can do these things that it hasn't been able to do before awesome awesome yeah to what degree it is the work you're doing with Mojo uh build on top of or depend on some of the things that you've done in your past lives around llvm is that is llvm an enabler for this new uh this new tech yeah absolutely so um there's a number of different things that modular and Mojo build on top of and so you can say modular is a fairly young company we're about 18 months old at this point but it's built on many years of experience building a lot of Technologies in a lot of different places and so a lot of the research has been done in other contexts one of the pieces of that is this compiler framework called mlir mlir is is you can kind of think of it as an evolution of lvm that has enabled a new generation of compiler Technologies mlr is now widely utilized across the entire industry for AI accelerators and it's been very rapidly adopted it's something that I and the team built at Google and then we open sourced and it's now part of this lvm umbrella of Technologies llvm as you say is also a really important part of the component stack so lvm is an umbrella project that includes things like mlir and it includes like the clang compiler for CNC plus plus that many people know about there also includes fundamental building blocks like code generation for an xa6 processor and things like this and so we we build directly on top of a lot of that technology as well and so that's that's all kind of integrated into the stack and that's one of the um you know you make the hardware go Burr kind of things and so that's that's all super important and so when you think about you you kind of painted the picture of this uh uh three world problem every time you say that I think of three body problem it's a science fiction book and Trilogy um but it's three three you think of this three world problem and how as uh an AI developer who is trying to um you know actually get work into production you have to kind of think really deeply in the stack is the is the idea with Mojo that you want to make it easier to go deep in the stack or do you want to make it uh more transparent to the user so that they don't have to go down into stack and everything is just kind of working underneath without their kind of needing to switch boundaries yeah so so modular we have a couple of different goals right so one goal is meet people where they are solve today problems build a faster horse right and so and that in that in that department nobody wants to rewrite their models they want the code to just work right and so they want new capabilities but they want to fit within their existing ecosystem now when you deploy a model this is something that I think many AI practitioners don't talk about quite as much or maybe they the practitioners and the researchers don't have coffee enough because we're pretty well understood how to train a model deploying a model is another completely different set of problems and so you know you can you can take this in many different ways one example of that is that python is great for research it's maybe not the best for production deployment at scale and so many teams will end up rewriting their entire model in C plus plus just to get it to go if it's if it's a dynamic model for example language model now that and there's a bunch of interesting work and there's a really smart people that do that kind of stuff but why is it that we have to write our production model or our research models to get them into production that's really unfortunate right and so we'd like things to just scale and so one of the things that Mojo does is it's way faster right and and also if you use it the right way you can also make it so it deploys without you know into a single 8.0 executable and things like this and so it has new capabilities that the python natively doesn't provide which enables it to go much further and so useful that way now another piece of it is we're building really high tech you know what we call the engine that powers Ai and we have the fastest inference engine that's unified across tensorflow and pytorch now right and that engine is built entirely on top of Mojo and so it's not just about building a faster horse and like enabling the existing use cases it's about like unlocking this potential of this next Generation hardware and to us like that's equally important even though many people see mojo as being um you know it helps helps out Python and that's you can look at it as moving python forward but really where Mojo came from is working backwards from the speed of light of hardware and so you know we talk about Mojo can be up to 35 000 times faster than python uh that's because it's at the limit of what the hardware can achieve and Mojo some people will see it as it looks like a faster python or a python that has no Gill or a python that types enable performance or you know things like this but but it's really about what can the hardware do how do we unlock the full potential and how do we do that in a way that python programmers have direct access to but you said uh python that has no Gill that's like uh The Interpreter lock or something like that and it is one of many limitations that inhibits the performance of native python yeah yeah so um I mean I think that if you zoom into python right and I don't I don't know how deep you are in the internals of python a lot of folks use Python but they don't dig into it like like I do um and so don't dig into it like you do no yes I think you're you're in the majority um and so uh folks that use Python know that it's maybe slow it doesn't scale super Well it can't use all the processors on your machine without a lot of work around and things like this there's many aspects of the technology within the python implementation that make that so and so it has an interpreter right interpreters are slower than compilers generally it has what's called the Gill the Gill prevents effective use of multiple cores the implementation within python puts all of the objects on the Heap in a very specific way and there's a bunch of implementation details that go into how it works um Mojo is I mean interesting in different ways first of all it's compiled second of all it gets rid of the global interpreter log third it changes this representation fourth it adds types like you can keep layering in all the different all the differences here um but the consequences that it really is a it's a different animal it has different characteristics than what python the python implementation uh provides and so because it's a first principles programming language right it really has addressed a lot of the problems that python users have felt as symptoms but have not dug into you know Wise python this way you mentioned that it adds types you know one of the biggest things that's happened on the JavaScript side of things is the emergence of typescript um as being kind of this uh JavaScript compatible language but that is strongly typed is Mojo does mojo have that same kind of relationship to python yeah there's a there's a bunch of very um good analogies there so typescript super popular a lot of people use it um and it fits right into the the JavaScript ecosystem and so Mojo uh has a similar relationship to python where it's a superset it works with the existing ecosystem all the packages in Python just work in Mojo which is really important to us and so we don't want to break the python Community uh many many folks went through the python 2 to Python 3 transition it was really uh quite difficult in various ways and so we don't want we don't want to relive that right um and so uh and so you can look at mojo as a python superset and so by doing so you can pull pull forward all of the existing code and all that that ecosystem into a module world there's a big difference though and so um actually if you zoom into Python 3 as it is today python allows you to add types and those types if you add them to your code are there for some linter tools or Checker tools that can identify bugs and can identify you know obvious mistakes in your code sometimes but those types in Python aren't used and can't be used by the implementation for runtime exactly and so because of that you can detect certain errors but you don't get good performance out of that and so what Mojo does is it kind of takes that next step and so you can use the existing you know you can use lowercase i to say it's an INT you know and declared as an integer that way or you can use capital I and if you say it's capital I that's a Mojo strongly typed integer and it's checked and required and then it also is used for performance and you know we see you get 10x 20x faster performance if you just add a few type annotations and we have a couple of demos of that carrying forward that typescript analogy what I've appreciated about it is like well a couple of things one um you can you can add types without like fully buying into all of typescript and needing to know all that but still get like a little bit of benefit without going all the way into kind of this new paradigm uh and also when you are looking at code that you're not familiar with that is kind of fully adopting the new paradigm it's still familiar like you can kind of make your way through it knowing that there's things that you don't know uh if you're if Mojo enables kind of that same level of flexibility um I would think that's a good thing yeah well so you come back to this two world problem or the three world problem right where you have Python and python lives on top of C plus plus so being a superset means everything you do in Python Works in Mojo right so obviously types cannot be required because python doesn't require types right and so so that's also that's all true but in the traditional world of python if you run into performance problems or you need access to system software or low level things you have to go build a hybrid package whereas half C or C plus plus half python and so the value prop that Mojo provides is you can continue writing Dynamic dynamically typed code that's all good but instead of switching to a different language to do high performance lower level things just as you say you add a few type annotations right or you use some lower level syntax within your existing code and then you can you know put more effort in to get more performance instead of um you know having to switch to a completely different language where the debugger no longer works on both sides and you know all these things got it got it you mentioned that Mojo like gives python superpowers like um that made me think of I I may I'm probably not alone in this that you know the first place I've learned about like this Dunder the dunder uh functions in Python was from Jeremy Howard and the fast AI course like there's probably a lot of folks listening who came across it in the same way uh are you accessing these superpowers through like python native structures like that or are they annotations or like how do you well first of all what are beyond the ability to kind of kind of tap into lower level structures like what are some of the kind of superpowers or enhancements that Mojo adds and then how are they accessed yeah so I mean you mentioned Jeremy Jeremy's been a huge influence on me personally I mean you could say you can go back to saying like why does mojo exist and a lot of that's Jeremy's fault just just between us right and he's been he's been pushing for years specifically for hackability research ability like Jeremy's Jeremy's got the unique kind of brain where like ever like the whole problem fits in his head and so he can understand all the different parts of the problem right and so so yeah so Mojo has all the dunder methods and so if you want to add you know you want to make the plus operator work you can implement the underground ad method and things like that but then it goes a little bit further and so if you look in the space of system programming languages you enter you enter the realm of things like rust and C plus plus and like these kinds of languages right and the systems programming world for a long time has been pushing towards bringing safety into this world so C C plus plus you have a pointer pointer dangles bad things happen your app crashes you have security problems all these kinds of things rust and Swift and other languages like that have gone further into making uh make it possible to get good performance without sacrificing uh safety and so we've brought a lot of those ideas directly into Mojo and so in Rusk there's a notion of lifetimes and ownership and these kinds of things that enable safe pointer usage and things like that so Mojo brings that in now these are features that you know obviously you don't have to use unless you're writing low-level code and you care about getting a high performance in certain use cases but having that available gives you a very accessible whole stack solution that allows you to go all the way down and get rust style performance out of a CPU and um and similarly like we talk about this Hardware stuff well at the bottom even on a CPU you have many cores you have these crazy Vector units and Matrix extensions and like it's really interesting to see the evolution of Hardware because if you go back 10 years ago it used to be that there was a CPU thing and a GPU thing and these were points in the space that were very different and they were completely unrelated from a hardware perspective but today that whole line has gotten blurrier because gpus have gotten more programmable CPUs are getting more AI stuff in them the CPUs these days have B flow 16 and like all these other AI things that are being built right in and so we're getting a spectrum of programmability and so a lot of what Mojo is about is unlocking that for people and making it accessible and making it so that again you don't have to switch languages just to just to get access to this stuff you know that you're rightly focusing on CPUs and gpus but there's uh you know as you know a wide variety of other options and perspectives tpus and other um you know more um you know other kind of newer newer and more specific more exotic that's a great word yeah exactly approaches to this do you are you building mojos such that you know it is anticipating all of these options or is you know when you you're focusing on making mojo better use acceleration are you really talking about you know gpus or maybe gpus and tpus uh well so um so I I spent a couple of years working on Google tpus and Google tpus are uh I mean they're they're an impressive set of Technology machines because they scale up to exit flops of compute they're highly specialized for AI workloads they're also internally really weird and so to plus one exactly what you're saying right AI isn't just about like a GPU right I mean so much so much thinking around AI technology is okay I just need to get the gpus lit up and then go but uh particularly if you start deploying well if you rang on a smart camera or something the AI Chip is going to be completely specific to that camera right if you're doing uh you know Google scale training on on crazy distributed machines like that that's that that Hardware is quite different and so um this is where one of the things that's I think very exciting to me as a technologist about Mojo is that it's built on this mlir compiler so mlir is again the thing that we built started started back at Google now it's being used by basically the who's who of all the hardware industry and mor talks to all of these things and so um if you uh if you're familiar with llvm lvm as is is now a 20 year old technology it's widely adopted and talks to all the CPUs and some of the gpus but uh llvm has never been successful at targeting AI accelerators and video optimization engines and like all the other weird Hardware that exists in the world and that's the role that mlr provides and so Mojo one of the ways that it's implemented is it fully exposes that power and brings mlar compiler uh you know all the nerdery that goes into the compilers and it exposes up to library Developers and so it's actually quite important that you can talk to for example tpus or other things like that in their native language which in the case of a TPU is this like 120 by 128 tile and being able to expose that out into the language is really quite important so anyways that's that's a long way of saying yes it is more than CPUs and gpus though CPUs and gpus are the starting point obviously for lots of really good reasons but we've built this thing to have really long legs that can bring us into the future and do you see it extending um to things that are even more exotic like your graph cores and samanovas and like the you know things that take a very different approach to um the underlying compute yeah so so Mojo's really so let me bring you back to where modular is coming at this because Mojo is one of the components of the air stack as a way to look at it so modular is building what we called a unified AI engine and so this unified AI engine what the heck is that well it's an engine it's it's an engine it's not a framework and so people are familiar with pytorch and tensorflow and these machine learning Frameworks that provide provide apis and so you get and then module and the apis that we're all familiar with underneath the covers there's a whole bunch of deep technology for getting things done to a GPU getting things onto a CPU and so pytorch 2 just came out with this torch Dynamo stuff and like all these all these exotic low level technologies that make the hardware work on gpus Cuda is a major component of the technology stack that everybody Builds on top of right and so our engine fits at that level of the stack and the the cool thing about it particularly when you're deploying is that it talks to lots of hardware it also talks to both Frameworks and so when you're taking a model from research for example you have a nice pie torch model you get off hugging phase we have lots of people do this of course um you want to deploy this thing well you don't actually want all of Pi torch in a production Docker container what you want is a low dependency efficient way to serve the model and so that process of getting from pytorch and into a deployment thing is what the modular technology stack can help with now as you say coming back to answer your question graph core some Nova all these all these Hardwares can't talk about any relationships that's not but the um but from a technology perspective they're they're all slightly different in high level ways so some anova's chip is from my understanding a what's called a cgra right which is a super parallel really crazy thing that has almost nothing to do with CPUs graph scores are apparently lots lots of things that look like CPUs but they their memories are all really weird and different the way they communicate is very structured right and we all know CPUs and gpus right um and so uh what our technology stack enables is if you're the samanova or cerebris is another example of a really crazy system uh like those people need to implement a compiler for their chip right and so they're the experts on their ship they understand how this works and what modular can do is provide a thing for them to plug into so that they get all of tensorflow and pytorch and one of the major problems we have today with with Hardware accelerators particularly ones that are not the dominant player in the space is that their tools don't actually just work right so often um I'll pick on Apple for example right so apple has a deployment technology called core ml is talks to the neural accelerators and they have all this amazing Hardware on a Mac or an iPhone but cormel is not actually compatible with all the models and so getting something onto an Apple device means fighting with this translator and trying to get it to not crash and you know doing all these things the the production World struggles with and you know if I I talk with many people many leaders at software companies that are building AI into their products and a lot of software leaders uh you know they they see the symptoms they see okay it takes three months to get a model into production right they they see symptoms like I need a team of 40 people to be able to deploy things and they're very expensive very specialized people why is it this hurt right right and the answer to those questions are the the tools the Technologies are not anywhere near the tools and Technologies used for training and so there's so much suffering so much from so many problems in these things and and the root cause is the technology I've been working on for years which is um for any one of these chips people have had to build an entire technology stack from the bottom up and there's very little code reuse across across Hardware and Hardware vendors again I'll pick on Apple but I love Apple also it's not it's not out of anger it's that you know it's very difficult to track the speed of AI itorch moves super fast right this is stuff that you need a very dedicated team you need to be super responsive you need to be on top of this stuff and also uh the compiler problems and the technology problems to make the hardware work are really difficult and so um there have been a lot of really smart people working on this but if you're always focused on getting the next ship out the door and you can't take a step back and look at this whole technology stack then you can't make the leap that modular has is driving forward interesting interesting so you said something earlier kind of describing the the engine and its place and it made me think of uh you know fur for ages now right um we've kind of you know decry the kind of Stranglehold if you will put a negative spinner the Cuda has on like the low-level programming interface which basically kind of keeps you know ensures that Nvidia has you know long lasting position and makes it very difficult for you know say an Intel to come out with uh um you know a CPU with some numeric capabilities and displace it because there's all this you know hey there's all this code that's been written in these three worlds that you've mentioned and like it's not as easy as just swapping out the hardware right are you envisioning that this this modular engine is this kind of replacement for Cuda that is multi you know Hardware capable is that the the core idea yes I mean that's that's one of the value props we provide so um if I zoom out and look at the steps the end history has been going through so um we are we as an AI industry owe a huge debt of gratitude to Cuda like go you go back to the Alex net moment for example right A lot of people talk about it was a Confluence of imagenet and the data sets and things like this it was a Confluence of hardware and the fact that gpus enabled an amount of compute that could cause Alex net to happen but a lot of folks forget that Kudo was what enabled some researchers to go write convolution kernels and actually get a machine learning model running on a GPU which the hardware is definitely not designed for back in the day right today's yeah it's taken over and it's a little bit different but back in the day that initial breakthrough was really in in you know a large part thanks to Cuda and so one of the things that's happened is that as um AI has taken over right a lot of technology has been built on top of Cuda and it's it's a very good thing and it's very powerful and flexible and hackable and it's great but as you say it's kind of put us into a mode where one vendor has this dominant position and it's very difficult to um you know if you're a hardware vendor at even an AMD or some other widely known company that has really impressive Hardware to be able to play in this ecosystem now what what's happened and one of the things that led into the thinking that went to modular existing is that there have been a lot of compiler technologies that have been built for example there's this xla compiler that I worked on at Google there are new compilers every day being announced by different companies where they're saying I will build a compiler that will make ml go fast for example on gpus um and so several years of work lots of cool technology lots of examples of these systems exist like and the names keep changing but the technology is very powerful the problem with that is that they have lost one of the things that made Kuda really powerful which is the programmability and so what what has happened is the compiler nerds which I'm a member so I can I love the compiler nerds but those compiler nerds have went and turned AI code generation and things like this into a compiler problem but that has excluded all the non-compiler people right and so if you look at tpus for example tpus have uh can express everything you can do in this xla compiler and so I can do Matrix multiplications convolutions element wise ads Etc et cetera et cetera but it can't do sparse operations can't do data operations can't do pre-processing and so um AI you're an expert you know this AI is not just about matrix multiplication it's about data loading pre-processing this full parallel compute problem that is part of AI and so what has been lost over the several years of trying to solve uh the Cuda lock-in problem is that people have tried to make this compiler problem and now you've turned into a different lock-in but instead of locking into Hardware you're locking uh most smart people out of the ecosystem and these compilers haven't been super successful at being compatible with code and things like this right and so what modular is doing is we're saying okay again I love all these people I've been working on this stuff for a long time myself but what we're doing is saying start from a different perspective what is our assumption our assumption is people don't want to rewrite their code what that means is you have to have all the operators all the systems that go into something like tensorflow or pytorch need to work okay well that's a thousands of operators each and this really messy job but we handle that job for for the world right the other thing we say is okay pytorch is really popular in research tensorflow is still quite popular in production what what we see out in the industry again every shop is a little bit different but a lot of people have both tensorflow and Pi George and so they don't want to have this bifurcated stack built on top of these things they want to actually have one system that they can scale out and so we make our problem even more complicated by building a unified solution and so now it's not about 2 000 on the tensorflow side 2000 on the pi torch side it's about four thousand right and it's actually even worse than that when you bring in some of the other Technologies but now you talk about Hardware right it's not just about Intel CPUs and Nvidia gpus it's this other axis that then does a multiplication to this whole problem and says okay well now I have many different there's probably a hundred or a thousand different kinds of hardware and so where traditional teams have built a point solution saying okay I'm going to build a fancy compilery thing for one hardware for one framework and you know in one One Direction along this and they built one of these uh I mean you often very good tools but they're very purpose built in one case you know we're having sympathy for all the software people that have to deploy because software people they don't have one piece of Hardware they don't have one model they don't have one framework they don't have one product right their products evolve over the course of decades sometimes and software lives a long time and so they need to be able to talk to lots of different generations of this stuff so a modular what we've done is we've said okay well this is suddenly a very different problem for a technology perspective than building a point solution and this this problem this I need to solve this massively complicated space where you have Hardware on one side you have the sheer scope of AI on the other space is what drove Mojo to exist because we need a way to make this entire stack accessible hackable uh understandable to people that are not themselves compiler Engineers we need people that know really fancy numerics and sparse algorithms and you know convolutions and or people that know their Hardware we need to know like all these people that are involved in all of this massive technology stack that we've been building to be able to collaborate and work together and build cool stuff at a high velocity right and that's where we think that Mojo is really interesting because as far as I know nobody's done that like I mean it's like a completely unique Creation in the space and um and we hope that will really simplify the world one of the things you know I uh we kind of joke about it but you know our biggest enemy you know the mortal enemy that we struggle with a modular is is actually just complexity right and the in the AI space there are so many systems so many Technologies so many uh you know layers of stuff that has been built up and you know if you zoom out coming back to you know 2016 I thought I was you know too late to do anything important in AI like what you realize is that AI is still not done right this the stack that we're building on is adolescent like it's it's in its teenage years and so what we need is we need to get to that next level where everything actually works is way more predictable it's actually hackable when you try and experiment as a researcher the tools don't break out from underneath you and when you achieve that we think that the the impact of AI can go much further and that many more people can participate when you when you talk about the the complexity and the diversity of underlying components and then you talk about kind of how the lifespan of software kind of extends over generations of underlying infrastructure it makes me think of uh like dependencies and dependency management and packaging and all these things as like huge problems that need to be solved is does that play into what you're doing at all uh also not directly but that your your pattern matching your neural net there is doing a very good job of pattern matching and seeing seeing what we're talking about here um the the packaging problem is often because you have all these incompatible systems that are lashed together and so if you zoom into python packaging I mean there's there's a lot of things going on there I'm not an expert in Python packaging people I talk to that are um a big part of that is because of the C parts of these python packages right so you pick our old friend numpy for example right numpy has a ton of C code inside of it as well as the python API well packaging that means you're not actually packaging python you're packaging C code C's never had a package manager that's any good right and so and so you know it's it's funny you look at these old problems we've been struggling with well you get rid of the C code and suddenly packaging is way simpler right and so this is one of the things that Mojo provides is providing unified language and more generally every time you see one of these fissures like you're talking about the hardware divide you know here we're talking about python C plus plus you talk about Cuda versus sickle versus hip versus like all these other crazy things that exist in the world like each one of these things is at the bottom of our stack driving complexity up and so at the end of the day you know you'll have a researcher who very reasonably says hey I just want to run this model on AMD GPU no big deal right should flip a switch right but the problem is is that at the very bottom all this stuff is very different and all the cracks go up and you know it's if you take reliability and it's 90 reliable and then the next step is 90 reliable next step is 90 reliable you start multiplying together all the point nines and you get something that's ten percent reliable right and this is this is this is the AI stack that we all depend on and you've got you've got you know this easy problem which is well okay let me be careful here you've got this one class of problems that is very challenging but it's easy to deal with and that is when you're trying to use all this stuff together and it just doesn't work like it doesn't compile it doesn't run or whatever but then you have this other problem where it works but you don't know that it's actually not working because of like semantic differences or you know what have you um it's either not performing well or you know your results are are you know you're not converging your results are out of whack and like you're digging deep into underlying libraries trying to figure out like why are your answers like crazy yeah I give you one example right I mean just go through the life cycle of deploying a model right so to just you know make up a scenario but um but to just double click on what you're saying okay I want to deploy a model well now I need to get it to go through coramel or one of the many things for deploying to some piece of Hardware results don't work well now to just like 100 like just it's just like a plus one you 100 times now you need to know not just pytorch not just your model not just core ml but also the translator also all these things and you dig in and dig into dig and you find out it's handling the edge padding on a convolution slightly differently right it's like and so now wait a second so like all of these tools were supposed to be making it easy but because they don't they're not all reliable like it's a sleeky abstraction now you have to understand all of this complexity right and so this is this is what causes it to take three months to deploy a model right fundamentally this is something where you know I think that many folks that are building AI products and they're managing you know they're the VP of software some technology company right they just see the symptom of why does it take so long to get this model in production but they don't realize that the tool set this this fundamental technology that all this stuff is built on top of it's not up to the standards of a software tool set it's not you know no C programmer would tolerate AI Tools in their quality you know it's just crazy but but again this is just the maturity of the AI technology space and by solving that problem you know what we want to see is like way more people way more Technology Way way more inclusion in the kinds of companies that are able to work with AI and do things and we think that'll be a really big impact on the world so we've talked about Mojo we've talked about the this inference engine um or the engine that we've referred to in the context of Mojo you've talked about like 35 000 you know X performance and improvements over a standard python I do need the engine to get that level of um of uh performance Improvement you know it is switching using Mojo like lock you into using this engine like what's the business model there are you do you have licensing issue like it's both you know I have a bunch of questions kind of coming out here and they span kind of Technical and like business licensing kinds of questions how does all that work great question so you've identified the right the right players there's Mojo which is a programming language it's a programming language that's a member of the Python family it's really useful on for example just CPUs which is the only place that python plays and so many people see mojo as just being a better python now we have the engine the engine itself can stand alone and you can use the engine as a drop in replacement works with tensorflow pytorch it'll make your burp models go 3x and you're using it as a drop in replacement for what exactly for a traditional tensorflow implementation so actually before I before I answer your bigger question let me dive into that so what the modular engine does is you replace the tensorflow with our tensorflow or your pie torch with our pytorch or if you use in torch script or things like this and so you just put in and put a new thing in your Docker container and and what you get from that is massively better performance and so you know tensorflow is quite good at production but we're showing three to five x better performance on for example an Intel CPU or an AMD CPU or an arm-based graviton server in AWS and so you think about that and you see three to five x better performance well that's a massive cost savings exactly that is a massive cost savings well and it's also a massive latency Improvement and so many of our customers love that because then they can turn around and make their models bigger right and so now you can have a better product for your customers and so you get you know direct impact on your costs direct impact on your product and this is a huge deal for people and again this is where you know I'm a technology nerd sometimes right and I love some of the how it's built but the impact on products is is phenomenal and that the engine is a really big deal for for just like getting production AI to scale okay so just kind of continuing down on that line before we click back out then I would imagine one of the commitments that you need to be making to folks that are thinking about using this thing is how close you're going to stay to the you know the development of that stack right yep yep absolutely well so I mean one of the things also that um you know our customers love is that Google and meta don't actually like support tensorflow or Pi torch right these these people forget but these are not products right these are open source projects they are Hobbies maybe for the the mega Corps and so you're essentially offering like a supported opt performance optimize version of tensorflow and pytorch right but then to if I'm going to think about using this I need to know that I'm not going to get left behind like you're gonna you know I'm gonna wake up one day and I'm three versions behind the latest thing in tensorflow and it has something that I need in order to make my you know you know 500 trillion parameter llm work yep yep so I mean we're committed to doing that so I don't know if this is like a binary question but yes we do that um but the the thing that if you you know the Enterprises we talk to that care about their costs right um often they want somebody that they can call right and if if you think about it right it's it's analogous to who wants to run a mail server themselves right you can run send mail or something right but nobody in the right mind does that right why do we do this with AI infrastructure it's because there's no choice there's been nobody to reach out to nobody that actually can't do this and the thing that I think many folks forget is that meta and Google they've their technology platform has diverse a lot from what the rest of the industry uses right so they both have their own chips they build right for example right and they have their own specific use cases and so they're not actually focused on making the traditional uh CPUs gpus and public Cloud use case actually really good that's one of the reasons why we have such high value we can deliver um and so so yes we are we are a this is a product for us that means we actually support it that means we invest a huge amount of energy into it this is one of the reasons why we have such phenomenal results as well so yeah to your other question like one of the great things about being a drop in replacement is that from a customer perspective at least is that it means you can undrop in like you can use our technology and if you want to switch back you can always switch back at any time at some point we'll make it back to that broader question but I'm thinking about like you know we've talked about uh moja's being this better python um but you know what makes python usable and AI is not just kind of the core python it's all these other things numpy and and pandas and many other packages you mentioned you know we know they have C at the heart of them so at some point there's a significant number of packages that you also have to kind of rewrite that need to be Mojo native I would think in order to get the full uh the full performance yeah so um let's dive into compatibility so uh Mojo's still a young language we haven't talked about that but it's still not it's not done and I think it will take another year or so of development before it gets to be um like solving all the world's problems that we want to solve things like this right but even today you can import and use arbitrary packages like numpy pandas tensorflow pie torch whatever directly into Mojo and so a really important part of how our stack works is you don't have to rewrite all of your report or touch all of your python packages I mean many people have their own python code it's not just big packages like numpy right and so and so Mojo talks directly to all those packages you don't have to write wrappers it all just works right and this is this is a really big piece of that now if you choose to move your code into the Mojo Universe then you can get the benefits the Mojo provides and so if you're just talking to an existing package well it'll still run python speed it will be fully compatible and but it will also run with the same implementation this default python implementation and so moving your code to Mojo can then unlock these new capabilities but then you can choose to do that a package at a time or however you'd like to do that and so that understood in order to I guess I'm curious like how much of the like surface area of AI related packaging have you built or am I thinking about this the right way like in order to fully uh provide the performance benefits that you're talking about did you need to you know Port numpy over to you know kind of a Mojo native or to run on mli or whatever at whatever level that makes sense did you you know um pandas all these other like how much did you need to do and how much of that is done like percentage-wise relative to what you expect will need to be done to be absolutely uh well so the answer is zero so our our solution is like our solution enables to talk to the entire python ecosystem out of the box so matplot website sci-fi numpy like all that stuff just works right and and that again come back to being pragmatic and productive like we can't uh I'll make fun of you and I'll make fun of me from our last call on the you know like great language debate right the the the the problem with any new programming language is a new programming language has no community has no package ecosystem right and so that that again like uh myself on that previous call and all the other lovely people there right you want to get ml out of python for whatever reasons is is very exciting but it's not very pragmatic because the entire data science ecosystems all wrapped around python this one's also pretty great right I mean I think that that's something that um you know people in other communities like to make fun of python because of indentation or whatever it is but python is beautiful right subjectively I will say it's my opinion but and so what Mojo does is enables you to use literally everything in the python ecosystem and then if you want to invest more effort to get more performance then you can do that but you don't have to right and this this is this is the major value prop now in the case of modular and why we built Mojo our like business objective is go make ml really awesome right and we want we we care about the Matrix multiplications and the convolutions and the like the core operations that people spend all their time on in Ai and so we wrote We rewrote all of that stuff in Mojo and so this isn't like rewriting that plot live this is like rewriting Intel mko equivalent right or rewriting the Cuda implementation of these Cuda kernels equivalent right and so that's where we've put our energy into because that's what enables unlocking of the hardware enables unlocking of performance enables unlocking of usability and so you know we have really exotic fancy compilerary features that enable kernel Fusion automatic kernel fusion and things like this that you know no no normal ml researchers should ever have to know about they just see okay it runs 10x faster in this use case well that's pretty cool right and another thing that I think that folks are struggling with is that you know um uh take Transformers for example I mean you know Transformers I know Transformers we all love Transformers they're eating the world um but one of the problems with this is that because they can't became so so important to so many different use cases we got all these very hyper specialized software stacks for Transformers and so these existed the low levels so Nvidia for example has a set of kernels called faster Transformer these are at the high levels and so there's always distribution Frameworks for Transformers and things like this and so you get this very Transformer specialized stack which again forces you into this very narrow view of what a Transformer is and it works for the Benchmark but if you're a researcher you want to go push the boundaries and try slightly different Transformers or you know maybe there's a thing Beyond Transformers like I hear that rnns are coming back in and you know maybe ffts will have have their day right I mean there's like all these different theories and if we can't enable people to do that research like we may be missing out on that next big step and so um this the specialization that's inherent in um you know things becoming important really Cuts against generality and so that's that's one of the things that we've seen and we that we really want to like again like if you dramatically reduce complexity of these Stacks you can make it way more hackable and that we believe will enable people to invent new things yeah but I want to push on this one more time just to make sure see if I can figure out what um expose any uh kind of fissures in my understanding here is is what you're saying that um or is it the case that you know when in thinking about the relationship between like uh numpy or pandas and and python that those libraries that you know we all use as part of um you know that are kind of ubiquitous and from a machine learning perspective is it I can imagine a couple of things you know one that um like they're sitting on top of the underlying they delegate enough of what they're doing to the underlying python that you kind of replacing fixing that underlying python gives you you know some percent of the performance benefit um such that you don't need to deal with the upper piece who who is you here right so are you are you asking how it works internally are you asking how a user uses it or are you asking when somebody should do something which piece which piece of this elephant are you touching I you know I'm both try I'm primarily trying to make sure that I understand how it is that you're able to offer the performance improvements that you're boasting without needing to touch any of the libraries that people depend on and so I'm kind of asking about internals but also like how they're used so I'm imagining like several scenarios a you know for whatever reason yeah well a is like your 35 000 number that is you know that's kind of a made-up number that uh doesn't actually rely on any external dependencies and it's kind of a useless performance boasting metric that's one possibility another possibility is um numpy delegate you know these libraries delegate enough of their operations to the underlying python that you can get the you know significant performance gains even without touching those things and hey if somebody did touch those things maybe it would be 70 000 or whatever yeah I can break it down for you if you want got it okay all right so so let me let me break it into a couple of categories so one is you have unmodified python that is just imported so you take matplotlib just click on something that's not performance sensitive uh there's no reason to rewrite matplotlib it's it's fine right and so you just import it if you import it the way that runs is that runs with the existing C python interpreter and Mojo talks to the C python interpreter and so that code runs 100 compatibility everything just works great and like this is why the entire ecosystem works but it's no faster okay and so you're really what you're getting as you're getting at what are my trade-offs what are the levers I'm pulling here and so full compatibility but no performance benefit those things go together right another another another thing you can do and so if you so um if you go to modular.com you can see our video and you can see Jeremy giving a demo Jeremy Howard giving a demo and um and there we can see is you say okay I just take some python code I put it into Mojo and now it runs you know it depends on the code but you know roughly 10x faster out of the box maybe 15x 16x I mean there's more we can do to push it further we just haven't focused on that and that's running same code but in Mojo and the reason you get performance is it's a bet it's compiled instead of interpreted it has a new fancy compiler stack all the stuff under the covers but it's still running fully Dynamic typed code it's just running dynamically type code in a better way and so you can get you know 10x out of the box that's pretty good I mean that's that's quite nice then you start layering in and saying Hey I want to add types okay well now you're talking about like changing the the in-memory representation that's going to be way more efficient well that's 10x now you say it give me threads okay well that's 10x okay now I want to use vectors and do Hardware that's another 10x and so if you stack all these things up this is where you get into 35 000 times and to I will I will I will agree with you by the way that the 35 000 number is a Cherry Picked number this is this is a an extreme result on mandelbrot right which is a simple algorithm we can explain and people can play with in a notebook and stuff like this but we have lots of people just you know random people on the internet using Mojo that are getting hundreds and thousands of times speed UPS and so and so the 35 000 may be Cherry Picked but reasonably expecting getting over 100x is 100x is pretty big like that that I consider that to be a pretty big deal right and and and you can look at that as 100x over python or you can look at that as saying Python's now 100x more relevant for keeping me out of C right and that that's both of those sides of that is really cool now anyway so coming back to your categories and then it was it was I mean at that third category um or that's it maybe it's more a scenario than any category it is also um I think largely the case but maybe you can like validate this for me like you're probably using a lot more of these libraries when you're doing like Eda and kind of like the early stages of like building a model but then you finally have your model and you know the form of a graph and tensorflow or Pi torch and you know at that point like the things that you're relying on are kind of much lower level as opposed to like your pandas and your uh scipod all this kind of stuff and so you're like your exposure or your need to pull in all these libraries at the kind of the point where you're in Vogue like you know kind of core training Loop or inference is less yeah you want to like deploy the model yeah well exactly and so and so if you so if you zoom into the third part let's just call because the things we just talked about are actually completely generic software engineering things right we talked about using arbitrary python package off the shelf we talked about take python code arbitrary python code in an arbitrary domain and just make it go fast right it's fun right but it's has nothing to do with AI now let's talk about AI right and so AI third category super important turns out many of your readers or your Watchers what what want to think about AI right and so AI is this really fascinating technology stack that yeah you talk to it in Python but underneath the covers you have kernel fusing graph compilers and all this in accelerators and like all this other cool stuff right and so this is where the modular engine comes in right and Mojo Mojo is an implementation detail on the modular engine um and Mojo makes it all super extensible and hackable but this technology space actually really has nothing very little to do with syntax or with a programming language it's a completely different technology stack that's much more similar to like these xla compilers and the the uh you know the internals of Cuda and or the internals of Intel mkl or you know these kinds of things so so these are all different but come come back to your basic question several layers up in the stack which is what is the relationship between Mojo and the modular engine right because that's also uh really important so the modular engine is really focused around high performance production deployment go solve problems in Ai and so it's an AI thing mojo as a language is actually a new member of the Python family right and so for modular we see the engine as being a product and we see mojo as a technology and both of these things stand alone so you can use mojo as just a better python if that's what you want to do or you can use the modular engine as it drops into tensorflow and pytorch and then you just have a better tensorflow and a better way to deploy your models but there's a much bigger and I hopefully I believe much more important in the long term You Know Better Together story here right because putting a custom op into tensorflow or pytorch is very difficult you know we talk about um you know the three layer problem right the python C plus plus Cuda well if you want to put a custom Cuda up into pie torch you have to write C plus cudas like a C plus plus thing right but it's a C plus plus thing that doesn't have a debugger it's a C plus plus thing with a whole bunch of weird constraints where you might wedge your GPU right and and like that that complexity makes it so people don't do the kind of research that they might otherwise do and obviously if you have to hack C plus plus or even if you just have to rebuild tensorflow like who in the right mind knows how to do that you know I I know these people and I love these people right but but this is this is just monstrous right and so the better together here is that if you're an AI person You're Building deploying models you're training you're doing research well with Mojo inside the modular engine allows you to do is make this whole thing hackable so you can Define custom Ops so you can get kernel Fusion so you can get all this stuff for free and then when you want to go push boundaries you can go crack open the box and say okay I'm going to write a a custom sparse thingy for my domain or a custom summary function that does some fancy domain specific reduction before I send all the data across the wire and making that possible is is I think really cool interesting yeah I I feel like we've covered a lot and there's still a lot to cover and particularly in this dimension of hackability um but we don't have time to cover all that uh to to kind of wrap things up I'd love to have you maybe riff a little bit on like future directions roadmap like what are the the big things that you need to attack next to kind of build out this Vision absolutely so um modular just came out of stealth and so we have a nice video on our website at modular.com if you haven't seen it um there's a whole bunch of new drops that will be adding to the product over the coming months and so you can sign for a newsletter on that the thing I'll say is that Mojo is still quite early and so it's still it's not like ready for production use as a general drop-in python replacement but we have an amazing community of people already coming together and we're developing it in the open and so this is this is I think a pretty big deal for something that I hope will be important to a wide range of different use cases I mean python goes everywhere right and so um and so I think it's really important that we as a community build and do this together and modular's obviously driving this because it's really important to us like but we don't have all the smart people in the world and so I'd really love for people to join us on our Discord forum and other places where we can interact and and build this together awesome awesome you mentioned that Jeremy was a big inspiration I'm glad he wasn't able to inspire you to write it around Pearl again he's he he's an incredible person so he gives a killer demo in our uh a launch showing how to take matrix multiplication and he doesn't get it's 35 000 times but it's you know 20 000 times or something in a notebook which is pretty cool so it's yeah so he's he's an incredible person like don't under I just don't underestimate Jeremy absolutely absolutely well Chris thanks so much for getting us up to speed on what you're working on with uh Mojo and modular yeah it's really great to talk to you Sam same same thank you\n"

Mojo - A Supercharged Python for AI with Chris Lattner - 634

Random Videos