The State of the Llama Ecosystem - Joe Spisak, Meta

**Using Pip to Install and Work with Llama Models**

If you're using a CLI (Command Line Interface) for development, it's essential to incorporate tools like pip into your workflow. Pip is a package installer for Python that can be used to install and manage different components of the llama toolchain. For example, you can use pip to list out available models, understand their templates, or determine whether they support specific modalities.

**Benefits of Using Pip**

Pip provides several benefits for developers who want to work with llama models in their workflow. Firstly, it gives you the ability to list out available models and understand their characteristics, such as the prompt template, system prompt, and context length. This information is crucial for choosing the right model for your project. Secondly, pip can help you determine whether a model supports different modalities, which is essential for certain applications.

**Incorporating Pip into Your Workflow**

To incorporate pip into your workflow, you can use it to install different components of the llama toolchain. For example, you can use pip to install a distribution, such as a lam guard shield, and then incorporate it into your application. The process is relatively simple and can be done by running a command in your terminal.

**Creating a Distribution**

A distribution is a collection of RESTful APIs that provide access to different components of the llama toolchain. You can create a distribution using pip and then run it to access its features. For example, you can use a distribution to quantize a model or deploy it across multiple servers for inference.

**The Importance of Distribution Orchestration**

Orchestrating different components of the llama toolchain is crucial for building complex applications. However, this process can be challenging, especially when working with large models like 405b. To overcome these challenges, we developed the llama stack API, which provides a clean way to orchestrate these components.

**The Llama Stack API**

The llama stack API is a collection of RESTful APIs that provide access to different components of the llama toolchain. It can be used to run benchmarks, evaluate models in situ, and perform reward scoring. The API is designed to be flexible and can be used with various platforms, including AWS, Bedrock, Microsoft Azure, Scale, Snowflake, and Grok.

**Support for Multiple Platforms**

The llama stack API supports multiple platforms, making it easy to deploy models across different environments. Before we quantized the 405b model, it was difficult to deploy models due to the need for distributed inference. However, with the introduction of pip and the llama stack API, we can now run models on a single instance.

**Community Adoption**

The community has been incredibly receptive to the llama stack API, with many partners already leveraging its features. This adoption is a testament to the power of open-source tools like pip and the importance of collaboration in the development process.

**Resources for Developers**

For developers who want to learn more about using pip and the llama toolchain, we recommend visiting our website at llama.com. The site provides detailed documentation on how to build and use models, as well as information on how to download and install the toolchain from various sources, including Meta, Hugging Face, and Kaggle.

**Conclusion**

In conclusion, pip is a powerful tool that can be used to manage different components of the llama toolchain. Its flexibility and ease of use make it an essential part of any developer's toolkit. By incorporating pip into your workflow, you can streamline your development process and focus on building complex applications with confidence.

"WEBVTTKind: captionsLanguage: enall right so nice to see everyone it's uh crazy this is what 2024 I think the first P devcom was 2018 so it's really cool to see the history lesson from uh from Peter and the folks this morning um so hopefully you can see the slides I can barely see them um hopefully the sun goes down and you can see all um so I'm Joe SPAC you can't actually read any of these words um I used to work on PCH for a long time I helped uh with others uh take it to the foundation um and and build it I think to to what it is today which is something that's pretty amazing um I also uh today currently um I lead the open source efforts for llama um at meta and uh I do a lot of advising and Angel work as well um and have worked in research so U enough about me uh let's talk about llama so uh who's using llama today who's developing llama who's downloaded a model from hugging face and like prompted it okay lots of hands who uses pie torch to find tune it torch tune hell yeah okay all near and dear to my heart um we've seen some incredible adoption of llama um you can see some of the numbers um we have 350 plus million downloads every time I bug Omar it keeps getting higher and higher um we've seen like the I mean we we work with all of the the cloud providers work with a ton of Partners dozens of Partners hopefully everyone saw the four or 5B launch from I guess now not even two months ago um there's been like literally millions of downloads the usage has been like doubling and I think since uh what January to July we put out a blog post on this like 10x the usage so we're seeing some like really awesome usage of of llama and obviously we've had this like crazy uh set of releases you can kind of see like um I mean who remembers like llama one the research only right came out of fair it was actually the team who's now Mr all and we were working on the improving and I was actually in Fair at the time uh that was pretty incredible um and you know I think the model actually got leaked on to torrent which was super fun um and uh that that kind of set off a chain of events uh including the Llama 2 release which is a commercial uh commercially licensed um model which um got incredible adoption as well and then we started to release and get on this kind of train of releases we released code llama um out of fair uh Gabrielle s of and the team released that um we also uh released this thing called Purple llama which is near and dear to my heart anyone know what purple llama is hopefully red and blue right purple red teaming blue teaming so we started to incorporate uh safety and and system level safety in llama which uh turned it more into an agentic system and not just this model that we all know and love um and we also released things around cyber security um and then of course the Llama 3 moment came in April that was truthfully actually more of a pre-release uh because the context length was was pretty small um with the you know with the big bang really coming in in July with the 405b oh there we go it's better um with the 405b back in in July um um and we released number of components around it including another llama guard um the more cyber security um components including a new eval um and a prompt injection filter as well so um we started to really to to kind of create this full system which is really cool um we one of the really interesting things about the 405b that we found um internally because we use it obviously in met Ai and our own applications um is it's become a really great teacher for us to build smaller models with and so when we did post training for um the 8 and the 70b we actually took you know outputs of the 405b um and trained on those and that actually allowed us to improve the model um models pretty significantly so while these were like the same sizes as the the models from April um they were actually significantly better and we actually did um we actually were able to leverage our our fancy 405b for that um so the key features out of these U models were obviously a longer context length at 128k this has been pretty popular um we added multiple Lang is which is easier said than done if you've ever done post training or got involved with building an llm like the amount of safety and and work and validation work in in sft you need to do to make sure that these like these languages are really solid um is is quite difficult um tool use is uh who who knows what tool use is or has called functions out of LMS so um what was really interesting is seeing like the grock team if you know the grock the inference um Cloud team they actually like had a post even within like a few hours showing how we're a state-of-the-art in tool use which I there you go okay I see them right there um which I I really love that there you go um I mean that was really cool and I just to see that um so we really put a lot of effort into into zero shot tool use and actually fine-tuning in even um like tools like wol from and brave search and other things uh so we were excited to see that the community also saw that um so that was really fun and then um one of the things we did for the 405b is um we actually changed the license so if you not to get into the weeds or I'm not a lawyer but um I work with a lot of lawyers um and one of the things we we did was actually change the license so you could actually train on the outputs of our models so if you go to like say the terms of service for a lot of models that are out there it will say like you know you are in violation if you use the outputs to improve another model we basically looked at that and we said okay no let's not do that anymore like let's actually U make that a feature not a bug um and now basically you can use the 45b or any of our models frankly um from 3.1 onward um you know to basically improve any model you can improve a you know I hate to say it but improve a myal model or your own model or whatever it is um you can take the data and what this has actually done is created this new world of synthetic data generation and distillation um and we worked with a number of Partners to actually even deploy net new Services um like AWS and Nvidia and others um and it's been really really popular and um so you know P torch obviously was a pretty big uh part of of creating llama uh we trained in on 16, h100 gpus um over months um we used over 15,000 15 trillion tokens actually um to to train it so lots of data um lots of compute lots of failures in our data center so we actually if you look in if has anyone read the 92 page paper that we published by the way it's a really really good read so like if you're sleepy or if you want to be sleepy I guess no I'm just kidding um but you should read it seriously it's the most detailed paper the team is a labor of love putting it together we actually went through all the fault like kind of the faults that we found in training and I'm actually looking at even trying to open source that data set to show like how how real and how hard it is to actually train a model at this scale it's it's not like you just like throw some pytorch code at 16,000 like interconnected h100s and like magic happens in a couple months like these models need to be babysat and they and gpus do fail and they fail quite often and and more than you would think um and so getting that right and getting a model to converge at at this level of quality at that scale is actually a really really difficult problem um and so that required a lot of like collaboration across the company ac across the Llama team as well as the piech team um and with all the way down to the infra um and managing that so the results were pretty cool um so again this is like you know from Late July um you can see we compared against the state-of-the-art at the time we did not shy away we compared against gbd4 uh 40 at the time and then Claud son 3.5 uh which is uh I mean still today are are some of the best models obviously 01 is super interesting um and you know at the time like this this model was was really amazing um and it's being deployed and actually really we like pleasantly surprised at how reasonable it is actually to serve um by folks like fireworks and others that like even a few dollars um per I think $3 Blended um per million tokens um we also did human eval so um for those who develop models um like it's really cool to show how you're good on mlu it's a kind of a fun reasoning Benchmark but it's getting pretty saturated same with like Jesus M AK and some of these other like you know grade school math um type of benchmarks but the human eval is actually show like how how people actually interact with your models and how how good they find them basically and so um we we found that um you know for for certain models and again it's a little bit of preference so if you go on lmis or some of these other kind of um interactive sites like that that um you'll kind of get an idea of of how well people like to play with certain models so um we spent a lot of time uh and spent a lot of money to to have humans also interact with our models and they were quite happy with them so as I mentioned like llama is actually evolving very much to be a system and um so if you go to like our GitHub so uh So Meta Lama um on the the GitHub you'll actually see a number of repos that we're actually um actually building and and actually open sourcing tools Tools around agentic development and so I've actually um I know it's very hard to read but I've actually like I've actually enumerated the ones I think you should care about anyway so um number one like llama models is our new site basically for all things llama so this is basically the basic inference code um it has the acceptable use policy the license um model card like kind of everything that's that's kind of there for the models um and you'll see me there a lot uh we have something called new called LL stack which I'm going to get into here in a second um and this is basically uh a standard API for how to build agentic systems and and a lot more and I'll talk more about that in some detail um and you can see like from the little squigglies um on the activity on the repos these are pretty new repos so we've just kind of open open source these um you know even as as early as July and we've seen like pretty pretty uh good engagement um and then llch apps is uh is basically where we have all of our reference apps so um I'll show you an example of that later but basically these are fully open sourced um apps that support things like tool use and Rag and and other things using the API there's of course purple llama which um has everything from cyber security um evals to um infant time guard rails for uh insecure code um recognition or prompt prompt injection filtering those kind of things um and then llama recipes I think Hamid and and some of the folks who maintain that that uh repo are here here somewhere around here today and this is basically your One-Stop shop for anything from like how do you find ton llama models um you know to how do you work with like llama index or Lang chain or kind of any of the common um popular projects in the community uh VM you know TGI basically uh we have recipes for just about everything so as I mentioned like we we're moving quickly towards llama being an agent um and actually even more so um we're moving towards llama actually being a drro so I can't use that analogy um like in my normal day-to-day because no one knows really what a dis is but I can hear I think people know like it's a difference between like having a lenux kernel and saying like hey here's a kernel um versus hey here's a Dro right here's and upgrade the kernel and the the new model and you get this amazing you know developer experience and you can build things with it and that's really where llama is heading which is pretty exciting and so um you can kind of see like an app here that's running um in the background it's pretty simple it's like uploading a CSV and like you're able to inspect it or code or or um or visualize things um and you can see like the the reference architecture is pretty pretty simple um you have an llm um you have this executor you can call Shields um for safety um You can call tools but basically it becomes like the the LM becomes kind of this Kel and things around it um become these components that get orchestrated um and I I will dive a little bit deeper into that so here's kind of like the the the stack so you know in true um open source fashion um we did an RFC so um we we got a lot of Engagement lots of comments um so uh it was you know we wanted this to be definitely you know more more of an open Community feel um and that actually people contributed quite a few great ideas that we incorporated into into llama stack and there's poll requests you know we've seen poll requests from the old llama team from uh fireworks from from a lot of lot of folks um and it's starting to really become this great Community but you can see like um you know at the bottom you kind of have the the the hardw itself um you have the model and then you start to to see libraries and things layer on top obviously P torch if you want to fine tune or you want to deploy there's you know torch tune and torch chat um there's a a CLI an API that I'll talk about here in a second all the way up through to kind of your agentic application which kind of Builds on all of these components so one of the cool things that you know that we really happy about or I'm happy about personally is we built a CLI I thought you know i' I've heard so many Developers kind of say like give me a basic like pip install sort of like for playing with like llama models and it sounds like simple like from a software perspective but like it's actually quite difficult to do for for lot for models but we built it and you can basically pip install um you know this thing called llama tool chain or you can use cond um but I I personally like pip um and basically what it does is it gives you like um the ability to kind of you know list out the available models um you can kind of understand the models themselves you can it'll tell you like things like the The Prompt uh template for them or the system prompt or you know um basically any description does it support different modalities does it support you know what context length does it support those kind of things so it's um it's really quite handy if you like who uses clis for development like I'm I'm kind of expecting every freaking hand to be up right now but like come on see really so people are using goys for development no okay come on okay so yeah so if you're using a CLI you know it's really nice to incorporate into your workflow um and you can obviously like um as part of this you can incorporate different parts of the distribution so if you want to incorporate a lam guard shield for example um you know you can basically um install that basically we call it a distribution um or a a you know a different API as part of that distribution and you can kind of incorporate that into your application and it's pretty simple to add it as you can see here um and then basically uh a distribution is really just a collection of of sa restful apis and in this case it's you can create that distribution and then run it um and basically the the this llama stack API does all the kind of the orchestration for you so it's what we found is like it's it's quite difficult or quite challenging to kind of bre kind of stitch these components together whether I want to stitch together inference or fine tuning or even rag or some of these other components um this is just a really really clean way to do that um and so you can kind of go here to I I've snapshotted the the repo you can see some of the uh the folders We have that's already already there so if you want to quantize for example there's a um someone built a component to quantize the model this party workflow um but everything from inference to um you know deploying and and orchestrating safety guard rails to we call it memory but you know could be more than just rag that's why we kind of use the generalized term there um evaluation so if you want to basically in situ in your workflow through CLI start to evaluate for different things and run benchmarks kind of in in your workflow you can do that as well um you can do reward scoring I mean this is it's pretty like um it's going to be pretty full featured in terms of what what you're able to do so um a couple of things I will leave you with one is we have a really great site um called it's now llama.com so it was lm.com we just migrated over to uh to llama.com and you can go there and we have uh detailed documentation uh for like I said how to build um how to how to use U llama with ecosystem projects um how to download it from you know from meta from hugging face from kaggle um you know how to run it on your Mac uh for example how to use it with again things like code llama and others or L Lang chain llama index and others um and so on and but if you want to kind of use it out of the box you can see some of the platforms that we support this was actually the table we used for the 405b launch so every single one of these check boxes is a night a weekend um a lot of Blood Sweat and Tears to support the 405b and other models on all these platforms so whether it's AWS and and Bedrock or Microsoft on Azure or scale or uh snowflake or grock um we worked really hard with all of our partners to make sure that developers can use these things um one of the things we found is like models of this scale are quite difficult to deploy um before we quantize the 405b model you had to use distributed inference and again this is like a pretty pretty Savvy crowd so so who who's actually deployed a model across multiple servers for inference yeah I kind of expected some hands so but like not everyone can you guys are the cool kids so um you know so we quantitized the model can run on a single instance but um but every one of these Partners was ready to go including probably another couple of dozen um and uh it's been incredible the the community has been um really leveraging llama through all these different different partners so that is it thanks so much for your timeall right so nice to see everyone it's uh crazy this is what 2024 I think the first P devcom was 2018 so it's really cool to see the history lesson from uh from Peter and the folks this morning um so hopefully you can see the slides I can barely see them um hopefully the sun goes down and you can see all um so I'm Joe SPAC you can't actually read any of these words um I used to work on PCH for a long time I helped uh with others uh take it to the foundation um and and build it I think to to what it is today which is something that's pretty amazing um I also uh today currently um I lead the open source efforts for llama um at meta and uh I do a lot of advising and Angel work as well um and have worked in research so U enough about me uh let's talk about llama so uh who's using llama today who's developing llama who's downloaded a model from hugging face and like prompted it okay lots of hands who uses pie torch to find tune it torch tune hell yeah okay all near and dear to my heart um we've seen some incredible adoption of llama um you can see some of the numbers um we have 350 plus million downloads every time I bug Omar it keeps getting higher and higher um we've seen like the I mean we we work with all of the the cloud providers work with a ton of Partners dozens of Partners hopefully everyone saw the four or 5B launch from I guess now not even two months ago um there's been like literally millions of downloads the usage has been like doubling and I think since uh what January to July we put out a blog post on this like 10x the usage so we're seeing some like really awesome usage of of llama and obviously we've had this like crazy uh set of releases you can kind of see like um I mean who remembers like llama one the research only right came out of fair it was actually the team who's now Mr all and we were working on the improving and I was actually in Fair at the time uh that was pretty incredible um and you know I think the model actually got leaked on to torrent which was super fun um and uh that that kind of set off a chain of events uh including the Llama 2 release which is a commercial uh commercially licensed um model which um got incredible adoption as well and then we started to release and get on this kind of train of releases we released code llama um out of fair uh Gabrielle s of and the team released that um we also uh released this thing called Purple llama which is near and dear to my heart anyone know what purple llama is hopefully red and blue right purple red teaming blue teaming so we started to incorporate uh safety and and system level safety in llama which uh turned it more into an agentic system and not just this model that we all know and love um and we also released things around cyber security um and then of course the Llama 3 moment came in April that was truthfully actually more of a pre-release uh because the context length was was pretty small um with the you know with the big bang really coming in in July with the 405b oh there we go it's better um with the 405b back in in July um um and we released number of components around it including another llama guard um the more cyber security um components including a new eval um and a prompt injection filter as well so um we started to really to to kind of create this full system which is really cool um we one of the really interesting things about the 405b that we found um internally because we use it obviously in met Ai and our own applications um is it's become a really great teacher for us to build smaller models with and so when we did post training for um the 8 and the 70b we actually took you know outputs of the 405b um and trained on those and that actually allowed us to improve the model um models pretty significantly so while these were like the same sizes as the the models from April um they were actually significantly better and we actually did um we actually were able to leverage our our fancy 405b for that um so the key features out of these U models were obviously a longer context length at 128k this has been pretty popular um we added multiple Lang is which is easier said than done if you've ever done post training or got involved with building an llm like the amount of safety and and work and validation work in in sft you need to do to make sure that these like these languages are really solid um is is quite difficult um tool use is uh who who knows what tool use is or has called functions out of LMS so um what was really interesting is seeing like the grock team if you know the grock the inference um Cloud team they actually like had a post even within like a few hours showing how we're a state-of-the-art in tool use which I there you go okay I see them right there um which I I really love that there you go um I mean that was really cool and I just to see that um so we really put a lot of effort into into zero shot tool use and actually fine-tuning in even um like tools like wol from and brave search and other things uh so we were excited to see that the community also saw that um so that was really fun and then um one of the things we did for the 405b is um we actually changed the license so if you not to get into the weeds or I'm not a lawyer but um I work with a lot of lawyers um and one of the things we we did was actually change the license so you could actually train on the outputs of our models so if you go to like say the terms of service for a lot of models that are out there it will say like you know you are in violation if you use the outputs to improve another model we basically looked at that and we said okay no let's not do that anymore like let's actually U make that a feature not a bug um and now basically you can use the 45b or any of our models frankly um from 3.1 onward um you know to basically improve any model you can improve a you know I hate to say it but improve a myal model or your own model or whatever it is um you can take the data and what this has actually done is created this new world of synthetic data generation and distillation um and we worked with a number of Partners to actually even deploy net new Services um like AWS and Nvidia and others um and it's been really really popular and um so you know P torch obviously was a pretty big uh part of of creating llama uh we trained in on 16, h100 gpus um over months um we used over 15,000 15 trillion tokens actually um to to train it so lots of data um lots of compute lots of failures in our data center so we actually if you look in if has anyone read the 92 page paper that we published by the way it's a really really good read so like if you're sleepy or if you want to be sleepy I guess no I'm just kidding um but you should read it seriously it's the most detailed paper the team is a labor of love putting it together we actually went through all the fault like kind of the faults that we found in training and I'm actually looking at even trying to open source that data set to show like how how real and how hard it is to actually train a model at this scale it's it's not like you just like throw some pytorch code at 16,000 like interconnected h100s and like magic happens in a couple months like these models need to be babysat and they and gpus do fail and they fail quite often and and more than you would think um and so getting that right and getting a model to converge at at this level of quality at that scale is actually a really really difficult problem um and so that required a lot of like collaboration across the company ac across the Llama team as well as the piech team um and with all the way down to the infra um and managing that so the results were pretty cool um so again this is like you know from Late July um you can see we compared against the state-of-the-art at the time we did not shy away we compared against gbd4 uh 40 at the time and then Claud son 3.5 uh which is uh I mean still today are are some of the best models obviously 01 is super interesting um and you know at the time like this this model was was really amazing um and it's being deployed and actually really we like pleasantly surprised at how reasonable it is actually to serve um by folks like fireworks and others that like even a few dollars um per I think $3 Blended um per million tokens um we also did human eval so um for those who develop models um like it's really cool to show how you're good on mlu it's a kind of a fun reasoning Benchmark but it's getting pretty saturated same with like Jesus M AK and some of these other like you know grade school math um type of benchmarks but the human eval is actually show like how how people actually interact with your models and how how good they find them basically and so um we we found that um you know for for certain models and again it's a little bit of preference so if you go on lmis or some of these other kind of um interactive sites like that that um you'll kind of get an idea of of how well people like to play with certain models so um we spent a lot of time uh and spent a lot of money to to have humans also interact with our models and they were quite happy with them so as I mentioned like llama is actually evolving very much to be a system and um so if you go to like our GitHub so uh So Meta Lama um on the the GitHub you'll actually see a number of repos that we're actually um actually building and and actually open sourcing tools Tools around agentic development and so I've actually um I know it's very hard to read but I've actually like I've actually enumerated the ones I think you should care about anyway so um number one like llama models is our new site basically for all things llama so this is basically the basic inference code um it has the acceptable use policy the license um model card like kind of everything that's that's kind of there for the models um and you'll see me there a lot uh we have something called new called LL stack which I'm going to get into here in a second um and this is basically uh a standard API for how to build agentic systems and and a lot more and I'll talk more about that in some detail um and you can see like from the little squigglies um on the activity on the repos these are pretty new repos so we've just kind of open open source these um you know even as as early as July and we've seen like pretty pretty uh good engagement um and then llch apps is uh is basically where we have all of our reference apps so um I'll show you an example of that later but basically these are fully open sourced um apps that support things like tool use and Rag and and other things using the API there's of course purple llama which um has everything from cyber security um evals to um infant time guard rails for uh insecure code um recognition or prompt prompt injection filtering those kind of things um and then llama recipes I think Hamid and and some of the folks who maintain that that uh repo are here here somewhere around here today and this is basically your One-Stop shop for anything from like how do you find ton llama models um you know to how do you work with like llama index or Lang chain or kind of any of the common um popular projects in the community uh VM you know TGI basically uh we have recipes for just about everything so as I mentioned like we we're moving quickly towards llama being an agent um and actually even more so um we're moving towards llama actually being a drro so I can't use that analogy um like in my normal day-to-day because no one knows really what a dis is but I can hear I think people know like it's a difference between like having a lenux kernel and saying like hey here's a kernel um versus hey here's a Dro right here's and upgrade the kernel and the the new model and you get this amazing you know developer experience and you can build things with it and that's really where llama is heading which is pretty exciting and so um you can kind of see like an app here that's running um in the background it's pretty simple it's like uploading a CSV and like you're able to inspect it or code or or um or visualize things um and you can see like the the reference architecture is pretty pretty simple um you have an llm um you have this executor you can call Shields um for safety um You can call tools but basically it becomes like the the LM becomes kind of this Kel and things around it um become these components that get orchestrated um and I I will dive a little bit deeper into that so here's kind of like the the the stack so you know in true um open source fashion um we did an RFC so um we we got a lot of Engagement lots of comments um so uh it was you know we wanted this to be definitely you know more more of an open Community feel um and that actually people contributed quite a few great ideas that we incorporated into into llama stack and there's poll requests you know we've seen poll requests from the old llama team from uh fireworks from from a lot of lot of folks um and it's starting to really become this great Community but you can see like um you know at the bottom you kind of have the the the hardw itself um you have the model and then you start to to see libraries and things layer on top obviously P torch if you want to fine tune or you want to deploy there's you know torch tune and torch chat um there's a a CLI an API that I'll talk about here in a second all the way up through to kind of your agentic application which kind of Builds on all of these components so one of the cool things that you know that we really happy about or I'm happy about personally is we built a CLI I thought you know i' I've heard so many Developers kind of say like give me a basic like pip install sort of like for playing with like llama models and it sounds like simple like from a software perspective but like it's actually quite difficult to do for for lot for models but we built it and you can basically pip install um you know this thing called llama tool chain or you can use cond um but I I personally like pip um and basically what it does is it gives you like um the ability to kind of you know list out the available models um you can kind of understand the models themselves you can it'll tell you like things like the The Prompt uh template for them or the system prompt or you know um basically any description does it support different modalities does it support you know what context length does it support those kind of things so it's um it's really quite handy if you like who uses clis for development like I'm I'm kind of expecting every freaking hand to be up right now but like come on see really so people are using goys for development no okay come on okay so yeah so if you're using a CLI you know it's really nice to incorporate into your workflow um and you can obviously like um as part of this you can incorporate different parts of the distribution so if you want to incorporate a lam guard shield for example um you know you can basically um install that basically we call it a distribution um or a a you know a different API as part of that distribution and you can kind of incorporate that into your application and it's pretty simple to add it as you can see here um and then basically uh a distribution is really just a collection of of sa restful apis and in this case it's you can create that distribution and then run it um and basically the the this llama stack API does all the kind of the orchestration for you so it's what we found is like it's it's quite difficult or quite challenging to kind of bre kind of stitch these components together whether I want to stitch together inference or fine tuning or even rag or some of these other components um this is just a really really clean way to do that um and so you can kind of go here to I I've snapshotted the the repo you can see some of the uh the folders We have that's already already there so if you want to quantize for example there's a um someone built a component to quantize the model this party workflow um but everything from inference to um you know deploying and and orchestrating safety guard rails to we call it memory but you know could be more than just rag that's why we kind of use the generalized term there um evaluation so if you want to basically in situ in your workflow through CLI start to evaluate for different things and run benchmarks kind of in in your workflow you can do that as well um you can do reward scoring I mean this is it's pretty like um it's going to be pretty full featured in terms of what what you're able to do so um a couple of things I will leave you with one is we have a really great site um called it's now llama.com so it was lm.com we just migrated over to uh to llama.com and you can go there and we have uh detailed documentation uh for like I said how to build um how to how to use U llama with ecosystem projects um how to download it from you know from meta from hugging face from kaggle um you know how to run it on your Mac uh for example how to use it with again things like code llama and others or L Lang chain llama index and others um and so on and but if you want to kind of use it out of the box you can see some of the platforms that we support this was actually the table we used for the 405b launch so every single one of these check boxes is a night a weekend um a lot of Blood Sweat and Tears to support the 405b and other models on all these platforms so whether it's AWS and and Bedrock or Microsoft on Azure or scale or uh snowflake or grock um we worked really hard with all of our partners to make sure that developers can use these things um one of the things we found is like models of this scale are quite difficult to deploy um before we quantize the 405b model you had to use distributed inference and again this is like a pretty pretty Savvy crowd so so who who's actually deployed a model across multiple servers for inference yeah I kind of expected some hands so but like not everyone can you guys are the cool kids so um you know so we quantitized the model can run on a single instance but um but every one of these Partners was ready to go including probably another couple of dozen um and uh it's been incredible the the community has been um really leveraging llama through all these different different partners so that is it thanks so much for your time\n"

The State of the Llama Ecosystem - Joe Spisak, Meta

Random Videos