The Challenges of Building an Opinionated Open Source LLM Framework - Wing Lian, Axolotl AI
**The Challenges and Opportunities of Optimizing AI Models**
As AI researchers and developers, we often find ourselves facing numerous challenges when it comes to optimizing our models. One of the most significant challenges is the vast number of knobs to spin, as we like to say. With the rise of deep learning, the complexity of these models has increased exponentially, making it increasingly difficult to find the optimal configuration.
**Validation and Composability**
To address this challenge, we have implemented a robust validation process to ensure that our models are not only efficient but also effective. We use various techniques such as layer freezing, activation checkpointing, and others to optimize our models. However, we also know that there are certain combinations of these techniques that simply don't work together. For example, activating checkpointing with layer freezing is not recommended without non-reentrant activation. To account for this, we provide warnings when certain configurations are not composable.
Another aspect of optimization is the network effect. As more people submit issues and feedback, we can identify patterns and trends that help us determine which techniques work best in specific scenarios. This collaborative approach allows us to refine our models and improve their performance over time.
**Dependencies and Upstream Issues**
As AI researchers, we are also acutely aware of the importance of dependencies when it comes to optimizing our models. Many of our models rely on upstream libraries such as Accelerate, Trainer, PF-Bits, and others. However, these dependencies can be tricky to manage, especially when it comes to compatibility issues.
One approach we have taken is to pin everything at a certain version. While this may seem straightforward, we have found that it often leads to unexpected breakages. To mitigate this, we use nightly CI to test our models against the latest upstream release, replacing all pinned versions with the latest mainline branch. This allows us to catch issues early and ensure that our models are compatible with the latest dependencies.
**Low-Code and High-Quality Data**
Finally, we have also seen the benefits of low-code approaches in optimizing AI models. By focusing on high-quality data sets rather than implementing complex algorithms, researchers can free up time to focus on more important tasks. We have seen this play out in various projects, such as those that won the Nurb efficiency challenge, where teams were able to optimize their models using curated data sets.
Projects like Llama Storm and SMILE have demonstrated the power of low-code approaches in optimizing AI models. By focusing on data-driven approaches rather than complex algorithms, researchers can create high-quality state-of-the-art models without getting bogged down in implementation details. As we continue to explore new techniques and libraries, we are excited to see how these low-code approaches will shape the future of AI optimization.
**Conclusion**
As AI researchers and developers, we face numerous challenges when it comes to optimizing our models. From validation and composability to dependencies and upstream issues, there are many factors to consider. However, by working together and embracing new techniques and libraries, we can create high-quality models that are both efficient and effective. Whether you're a seasoned researcher or just starting out, there's always something new to learn and explore in the world of AI optimization.
"WEBVTTKind: captionsLanguage: enso my name's wingland Axel really started off as like my own learning experience of like LMS and sort of this gen geni cycle of like all right how am I going to learn how to find two naems like I want and I'm not necessarily like a person that understands like NE networks and all of that so taking all of the sort of existing pieces um that exist sort of in the in the ecosystem um and saying all right uh how how can I iterate quickly how can I build system that integrates all of the various open source data sets open source models and build something itate quickly that I can learn and eventually like this turned into something that led to more of a no code approach for a lot of people that sort of resonated because not you know a lot of people are very interested in llms and fine tuning them and they don't necessarily like come with all of the sort of um ml expertise so so first off like Axel is a hugging face rapper uh rapper over the hugging face trainer um the the main decision for this is like you know there's L mistal there's a long list of white models that come out like every other week and we can't be it's hard for like I'm pretty much the like 90% uh right uh maintainer of this so I can't be like trying to write modeling code to support all of this every time something new comes out um so really it gives us a lot of out of the box like oh this sort of works and we just have to figure out what are sort of the edge cases what are the popular things that people are like want techniques that they want to experiment with and how do we just make that work for users and that is really like the things where like oh if we can do that we don't have to like try and solve everything yes if someone says I'm trying this maybe it doesn't work we'll look into it but um at the end of the day like we we are indexing on the majority of users on the majority of use cases right um and looking for something to have basically trying to figure out what has the most impact uh all right so what are our signals um they're pretty obvious it's like disc like everybody most of people in open source AI pretty much live in either Discord or I guess now X um a lot of our users um they sort of are either individual researchers they are people who are like there's a lot of people doing sort of like these more chatbot sort of um uh sort of fine tunes um you know we have um we also have much larger users like you know news research and we so we get a lot of signals from that um if you pop it into the Discord and you ask questions or if you like have requests like we'll we'll look into it for for the most part if like it's a hot T like if you have something and you want it done just pop in the Discord and ask and it'll probably get done in a reasonable amount of time um we find that most of the um the community like they come in they learn um and then they sort of stick around so like as new users funnel in like as time has progressed I've had to like I've been able to spend a little less time in Discord which has been nice um because eventually like the community does come in step up and help so there's a lot of things to like you know going back there's a large support Matrix of things and if we try and support and make everything and every little combination work we it just moves too slow um so for us you know going back to we focus on the functionalities that addresses the largest cohort of users so you know um because we're a hugging face rapper for the most part it's like hugging face technically supports like py torch Sage maker tender flow Flags majority of users just care about PW work so we're throwing all of those the rest of the support out the door we're like you can try it maybe it works maybe it doesn't but we're not trying to like support those same thing um a lot of the you know stor the state-of-the-art techniques are built on Modern versions of like pytorch or um or Triton or things that require you know for the most part upto-date versions of pytorch so we're throwing out older minor versions of pytorch even um and then in this gen cycle it's like we're all like really working with sort of um you know causal causal language models so we're throwing out like you know sequence sort of like T5 support those sorts of uh models bill in the- Middle um those sorts of things so um there are a lot of supportive techniques um PF you have Laur lur plus door there's a lot of things just in PFT Alone um there's layer freezing you can experiment with optimizers fsdp versus like deep speed there's a lot of there's a lot of knobs to spin and users want to spin those knobs they're like what can I try how do I you know get the um minimize amount of vram those sorts of things um and th when the obviously those sorts of things like explodes the possible like space that we have to support um but at the same time when you're a user there there obviously things that probably aren't going to work but as a user like as an end user you don't necessarily know that um those don't work so what we do is we add a lot of validation um because sometimes you know you're 20 minutes in you're an hour in it fails or you know because it just takes that long to load the model sometimes like if you're using 405b um but you want to know that this just straight up isn't going to work so we have a lot of like um validation um there's a lot of things we know are not composable we know a lot of things are break we just say straight up no this isn't going to work try something else um there are a lot of things um oh for example like things that uh probably doesn't work is activation checkpointing with layer freezing if you don't use uh nonreentrant activation checkpointing that's not going to work um so those are also things that we can sort of like if you're not specifying that we can sort of heal the um configurations for you and say we're just set that for you right um and and also there's this network effect right so like people come in they submit issues this doesn't seem to work we can dig into it figure out is this a problem that we can solve is this an upstream problem that we can solve is this a big Upstream problem that somebody else is going to have to solve and if that's the case we'll we simply just you know we'll add warnings We'll add we'll say no this isn't going to work and maybe come back to it another day um so we have a lot of you know that Network effect of like what works and what doesn't work um so being being relying on hugging face there's a lot of like um Upstream dependencies it's like accelerate trainer PF bits and bites and a bunch of other things and as anybody who's ever like dealt with dependencies um one day it might work next day it may not right um but AI moves fast and all of these new architectures and techniques that come out and you know all of these libraries they do their best um to implement all of these building blocks that we use on a day-to-day basis um and sometimes things you know sometimes interfaces change um and and it's it's nobody's fault it's just that you know we're all moving fast we're all trying to like um break things so um as maintainers like the best thing is really to have direct coms with your um Upstream dependencies um um that way you can collab collaborate um and work sort of hand inand to uh identify and Tackle these issues um this next one might be a bit controversial um but we pin everything uh I know some people like to use greater than or equal to blah blah for for their versioning we find that often times like when we do this things will break unexpectedly um and the way we work around this is we just have nightly CI and we just replace all of the pin versions and point it to Upstream Main and then run all of our CI against Upstream main so what this does is like we get an early signal if something's broken but we can all there's been a couple cases where we can actually find bugs upstream and get that resolved as well um let's see so like what is why why low cod in uh or no code um if you if you don't have to think about like oh the M code the implementation this gives you all this new found time gives you all this new found time to like work on high quality data sets so you see like so like with llam storm I think these were the guys that won the nurb efficiency challenge last year they built that they were able to like just work on curating data sets like nagp align that um you know those are just that's all these are all data set driven high quality State models you know hermies with from new um and then even like vertical specif or domain specific models like aloe and Smiley llama um yeah um you know uh thanks for listening um here's a project you know we're always looking for more core contributors so um and if there's any questions feel free to find me laterso my name's wingland Axel really started off as like my own learning experience of like LMS and sort of this gen geni cycle of like all right how am I going to learn how to find two naems like I want and I'm not necessarily like a person that understands like NE networks and all of that so taking all of the sort of existing pieces um that exist sort of in the in the ecosystem um and saying all right uh how how can I iterate quickly how can I build system that integrates all of the various open source data sets open source models and build something itate quickly that I can learn and eventually like this turned into something that led to more of a no code approach for a lot of people that sort of resonated because not you know a lot of people are very interested in llms and fine tuning them and they don't necessarily like come with all of the sort of um ml expertise so so first off like Axel is a hugging face rapper uh rapper over the hugging face trainer um the the main decision for this is like you know there's L mistal there's a long list of white models that come out like every other week and we can't be it's hard for like I'm pretty much the like 90% uh right uh maintainer of this so I can't be like trying to write modeling code to support all of this every time something new comes out um so really it gives us a lot of out of the box like oh this sort of works and we just have to figure out what are sort of the edge cases what are the popular things that people are like want techniques that they want to experiment with and how do we just make that work for users and that is really like the things where like oh if we can do that we don't have to like try and solve everything yes if someone says I'm trying this maybe it doesn't work we'll look into it but um at the end of the day like we we are indexing on the majority of users on the majority of use cases right um and looking for something to have basically trying to figure out what has the most impact uh all right so what are our signals um they're pretty obvious it's like disc like everybody most of people in open source AI pretty much live in either Discord or I guess now X um a lot of our users um they sort of are either individual researchers they are people who are like there's a lot of people doing sort of like these more chatbot sort of um uh sort of fine tunes um you know we have um we also have much larger users like you know news research and we so we get a lot of signals from that um if you pop it into the Discord and you ask questions or if you like have requests like we'll we'll look into it for for the most part if like it's a hot T like if you have something and you want it done just pop in the Discord and ask and it'll probably get done in a reasonable amount of time um we find that most of the um the community like they come in they learn um and then they sort of stick around so like as new users funnel in like as time has progressed I've had to like I've been able to spend a little less time in Discord which has been nice um because eventually like the community does come in step up and help so there's a lot of things to like you know going back there's a large support Matrix of things and if we try and support and make everything and every little combination work we it just moves too slow um so for us you know going back to we focus on the functionalities that addresses the largest cohort of users so you know um because we're a hugging face rapper for the most part it's like hugging face technically supports like py torch Sage maker tender flow Flags majority of users just care about PW work so we're throwing all of those the rest of the support out the door we're like you can try it maybe it works maybe it doesn't but we're not trying to like support those same thing um a lot of the you know stor the state-of-the-art techniques are built on Modern versions of like pytorch or um or Triton or things that require you know for the most part upto-date versions of pytorch so we're throwing out older minor versions of pytorch even um and then in this gen cycle it's like we're all like really working with sort of um you know causal causal language models so we're throwing out like you know sequence sort of like T5 support those sorts of uh models bill in the- Middle um those sorts of things so um there are a lot of supportive techniques um PF you have Laur lur plus door there's a lot of things just in PFT Alone um there's layer freezing you can experiment with optimizers fsdp versus like deep speed there's a lot of there's a lot of knobs to spin and users want to spin those knobs they're like what can I try how do I you know get the um minimize amount of vram those sorts of things um and th when the obviously those sorts of things like explodes the possible like space that we have to support um but at the same time when you're a user there there obviously things that probably aren't going to work but as a user like as an end user you don't necessarily know that um those don't work so what we do is we add a lot of validation um because sometimes you know you're 20 minutes in you're an hour in it fails or you know because it just takes that long to load the model sometimes like if you're using 405b um but you want to know that this just straight up isn't going to work so we have a lot of like um validation um there's a lot of things we know are not composable we know a lot of things are break we just say straight up no this isn't going to work try something else um there are a lot of things um oh for example like things that uh probably doesn't work is activation checkpointing with layer freezing if you don't use uh nonreentrant activation checkpointing that's not going to work um so those are also things that we can sort of like if you're not specifying that we can sort of heal the um configurations for you and say we're just set that for you right um and and also there's this network effect right so like people come in they submit issues this doesn't seem to work we can dig into it figure out is this a problem that we can solve is this an upstream problem that we can solve is this a big Upstream problem that somebody else is going to have to solve and if that's the case we'll we simply just you know we'll add warnings We'll add we'll say no this isn't going to work and maybe come back to it another day um so we have a lot of you know that Network effect of like what works and what doesn't work um so being being relying on hugging face there's a lot of like um Upstream dependencies it's like accelerate trainer PF bits and bites and a bunch of other things and as anybody who's ever like dealt with dependencies um one day it might work next day it may not right um but AI moves fast and all of these new architectures and techniques that come out and you know all of these libraries they do their best um to implement all of these building blocks that we use on a day-to-day basis um and sometimes things you know sometimes interfaces change um and and it's it's nobody's fault it's just that you know we're all moving fast we're all trying to like um break things so um as maintainers like the best thing is really to have direct coms with your um Upstream dependencies um um that way you can collab collaborate um and work sort of hand inand to uh identify and Tackle these issues um this next one might be a bit controversial um but we pin everything uh I know some people like to use greater than or equal to blah blah for for their versioning we find that often times like when we do this things will break unexpectedly um and the way we work around this is we just have nightly CI and we just replace all of the pin versions and point it to Upstream Main and then run all of our CI against Upstream main so what this does is like we get an early signal if something's broken but we can all there's been a couple cases where we can actually find bugs upstream and get that resolved as well um let's see so like what is why why low cod in uh or no code um if you if you don't have to think about like oh the M code the implementation this gives you all this new found time gives you all this new found time to like work on high quality data sets so you see like so like with llam storm I think these were the guys that won the nurb efficiency challenge last year they built that they were able to like just work on curating data sets like nagp align that um you know those are just that's all these are all data set driven high quality State models you know hermies with from new um and then even like vertical specif or domain specific models like aloe and Smiley llama um yeah um you know uh thanks for listening um here's a project you know we're always looking for more core contributors so um and if there's any questions feel free to find me later\n"