Open Source Generative AI at Hugging Face with Jeff Boudier - 624

The Partnership Between AWS and Hugging Face: A Game-Changer for Machine Learning Computing

The collaboration between Amazon Web Services (AWS) and Hugging Face is a game-changer for machine learning computing. This partnership brings together two industry leaders to make machine learning more accessible and affordable for practitioners, researchers, and businesses. As Jeff Bezos, the founder of AWS, puts it, "We're building the GitHub of machine learning" - a central place where all contributors can access models and make machine learning computing more democratized.

AWS's role in this partnership is to provide compute services on top of Hugging Face's open-source platform. This allows users to take advantage of Hugging Face's vast library of pre-trained models and accelerate their machine learning workflows. By leveraging AWS's scalable infrastructure, users can run their models faster and at a lower cost, making it easier to apply machine learning at scale.

Hugging Face's role in this partnership is to provide the core machine learning platform that enables these compute services. The company has built a vast library of pre-trained models that can be easily integrated into workflows. By partnering with AWS, Hugging Face can expand its reach and make its platform more accessible to users worldwide.

One of the key areas where this partnership is making a significant impact is in the area of natural language processing (NLP). With the growing demand for NLP applications, such as text classification, sentiment analysis, and conversational AI, there is a need for scalable and affordable computing solutions. Hugging Face's pre-trained models are already being used by many businesses and researchers, but they often require significant computational resources to run effectively.

To address this challenge, AWS has released its Inferencing Endpoints service, which enables users to run pre-trained models on its scalable infrastructure. This allows users to take their models from development to production without having to worry about the underlying computing requirements. The service is already seeing significant adoption, with over 1,000 customers signing up for it within three months of its release.

Another area where this partnership is making a difference is in the acceleration of machine learning workloads on Hugging Face's platform. By leveraging AWS's Neural Processing Units (NPUs), users can accelerate their machine learning workflows by up to 5 times compared to running models on general-purpose GPUs. This makes it possible for businesses and researchers to apply machine learning at scale, without having to invest in expensive hardware.

In terms of the future of this partnership, there are many exciting developments on the horizon. One area that is likely to see significant growth is the adoption of cloud-based machine learning services by businesses. As more companies look to apply machine learning at scale, they will need scalable and affordable computing solutions that can meet their demands. Hugging Face's platform, combined with AWS's compute services, is well-positioned to meet these needs.

Another area where this partnership is likely to make a significant impact is in the development of new machine learning models. By leveraging AWS's vast resources and expertise, researchers and practitioners can develop more complex and powerful models that can tackle some of the world's most pressing challenges. For example, Hugging Face has already released its Optimum Neuron package, which allows users to compile their models for deployment on AWS's NPU hardware.

Overall, the partnership between AWS and Hugging Face is a game-changer for machine learning computing. By providing scalable and affordable compute services on top of a vast library of pre-trained models, this partnership is making it easier for businesses and researchers to apply machine learning at scale. As the demand for machine learning continues to grow, this partnership is likely to play an increasingly important role in shaping the future of AI.

"WEBVTTKind: captionsLanguage: enall right everyone welcome to another episode of the twiml AI podcast I am your host Sam cherrington and today I'm joined by Jeff boutier Jeff is head of product at hugging face before we get into today's conversation be sure to take a moment to head over to Apple podcast or your listening platform of choice and if you enjoy the show please leave us a five-star rating in review Jeff welcome to the podcast thank you Sam thanks for having me I am really looking forward to our conversation we're going to be talking about open source and generative Ai and hugging face of course uh recent partnership with AWS a bunch of things on the agenda uh but before we dive into that I'd love to hear a little bit about your background oh for sure um well I'm a I'm a late bloomer to uh to AI a joint hiking face uh two and a half years ago I've known uh I've known uh Julian and Clem for uh for some time um in my first uh my first foray into AI was about like how can you automate uh the the editing in videos and so there's lots of early applications of AIS in there like trying to transcribe the speech trying to identify key moments through audio through computer vision and these things have come a long way man since that then but the last the last couple years have been really amazing and I feel like last week could have been a year in in regular time yeah that's one of the things that I wanted to maybe spend a little bit of time on uh was all the the this new news from last week um but uh you you mentioned that you've been at hugging face for two and a half years uh we were chatting before there's like the startup multiplier that multiplies that by like five or seven but then another AI multiplier on top of that uh that it's been a crazy couple of years I guess yeah it feels like it's compounding um and uh you know we've been saying that for a long time like there's been an exponential increase in like model size in like compute needs nerves papers yep yep everything everywhere all at once um and uh despite the past few weeks I've been uh super uh super interesting not just uh in the rate of uh new uh models new releases but also like how the the whole landscape of AI has been evolving yeah I've been mentioning to folks recently that um you know when I started the podcast six and change years ago a big part of the reason why I started it was because there was just some something in the air at the time this was like three years past alexnet uh folks were really starting to do interesting things with deep neural Nets I'd end up every week with hundreds of tabs of things that I wanted to learn more about and explore and uh you know it's been an exciting six years but like it it feels like that same energy right now that I was experiencing back then uh almost making me feel like I need to start another podcast or something you started it at the same time that uh hiking face started like uh at the time like Julian and clemo and Tomar they saw that uh the things were starting to work that were not possible before uh but he wasn't yet fully working like they had this crazy idea at the beginning of hiking phase that you could actually create an AI that would be fun to have a conversation with and you wouldn't interact it like you would text to your friends like kind of a crazy idea uh but I guess now six six years later uh it uh it came true uh but um I'm super excited to talk to you because you know when I uh three years ago when I started getting deeper into uh uh ml like I needed to catch up and uh yours was my source of inspiration and and uh and learning so yeah super super happy to uh to talk to you well that that's awesome to hear um you mentioned this past week uh we are of course referring to the release of gpt4 among a ton of other things that are that are going on um but I think gpt4 maybe provides an interesting backdrop for our broader conversation um and that it contrasts the open source theme that we're going to be spending some time chatting about any any Reflections on uh the the gpt4 launch and you know how you're seeing that impact the the broader Market yeah I mean uh it was kind of a fireworks right it was like this pie day like everybody in the AI Community decided to go on release mode and so of course you have gpt4 but you also uh had uh Google palm apis and you have anthropic Claude and you had Stanford coming out with the open source I don't know to say that I always get it wrong but uh yeah the instruction Tunes how do you say it I'm confused anyways everything everywhere happened all at once uh in the world of AI assembly AI uh announced a model that uh supposedly is 43 better than whisper or something like that uh just a ton of really amazing news uh this past week yeah and also on the open source side right uh together compute uh came out with open chat kit so that's uh uh Neo gptx 20 billion that's been fine-tuned on like 40 plus million uh of uh of instructions to get a an instruction fine-tuned models that like fully open source like Apache 2.0 it's like on the Hub you can use it mid Journey 5 also yeah and apparently it does a good job with hands now fingers you see I saw someone tweeted a a tweet but before you jump in I saw someone tweeted a tweet uh where they asked uh mid Journey five to create an image of a hundred you know raised hands and it did and all of the hands had five fingers but it was like 500 hands so the models still can't count but it can do hands now that's progress through this uh milestrum of announcements like what uh what became clear to me is um there's a shift for our field where sort of six months ago uh AI machine learning um was very much a scientific field with researchers building upon each other publishing papers reproducing each other results improving everything ends with the release of gpt4 and Google and anthropic announcements like we're in a different reality where the the the the the the new models are released kind of like uh apple style you know you have like the iPhone 10 and it's got this new feature and it's like a cool demo and uh you're gonna bring on stage some some people gonna tell you the story about it and and it's available today at this price um and that that's that's like new to me right and it's been a it's been a shift and for us like our mission is to democratize good machine learning and the way to do that is through open source uh see so the availability of the the models the training data sets the model weights the trading the the code all of that is super important for the field to progress together and make sure that uh everybody can build upon machine learning and so yeah that was like the the Bittersweet sort of uh part of uh of of those announcements and in particular Suite in that you're seeing this spread in acceleration but bitter in that uh for the most part there's a lack of openness in the major the major large models yeah it was quite stunning it was quite stunning like if you look uh at the the paper quote unquote um of the gpt4 release like the most notable thing was the absence right it was like yes you know it's for competitive commercial reasons uh we won't tell you like how big the model is like why was trained on high was trained like nothing um so that's that's a new turn yeah it is um it was interesting that they're not uh trying to position it as a safety concern as much as you know for competitive reasons like you said uh uh this is our core asset and we're you know gonna hide it behind uh you know we're gonna we're not going to be as open as maybe our name might suggest yeah yeah I think Ilia put it like very plainly uh in follow-up interviews saying that uh uh yeah open source is not the way forward for um for for them for commercial reasons um and so yeah for us that that triples our commitments to to make a open source uh uh models open source foundational models state-of-the-art models uh available to the community and enable the open source Community to to to to to contribute I think it's super important for for everyone involved so how do you think about the the open source landscape uh of models in particular um is there um you know yeah maybe kind of broadly uh at first and then we can dig into some examples of of some notable things on your radar yeah I mean it's uh the ecosystem is as vibrant as ever that the open source Community uses uh the hugging face Hub as a central place to calm and share and contribute and discuss uh all the all the latest models um and uh I have a slide where I say hey we have that many models available on the Hub and I have to to redo it uh like uh every other week like right now there's a hundred fifty thousand models that are free uh and openly accessible um on the hugging face Hub you can try any of them right there on the page um in the community keeps contributing and building upon each other work uh to uh to offer more Alternatives in like any task you can imagine any language you can imagine and yeah as I said like the the recent sort of commercialization sorry commercialization of uh of AI sort of um uh building a lot of momentum behind the open source Community to uh to accelerate uh their work right so of course at hagging face uh we are building a new foundational models uh to provide open source uh alternative we're doing this with AWS uh building them on our cluster there that's part of the uh a partnership that we recently announced uh but also we are enabling the community illuser uh just uh structure themselves uh to uh to do this work as well lion is building up an assistant I mentioned together computer and there's many other projects right now uh that's uh will create a new open source models kind of maybe connecting back to to Al alums in particular to what degree do you believe open source llms are kind of long-term viable given the immense cost associated with training Cutting Edge models uh do you do you feel like that cost is going to be or become insurmountable for open source communities or do you think that those communities will find a way to stay competitive with uh with closed commercial models yeah um I think there's a lot of research right now into um making models more performance on a model size basis right the the scaling laws that were sort of the main takeaway from the gpt3 paper and release I've sort of been uh challenged in a way or improved upon through new developments from the chinchilla paper to um to the latest uh the latest models like the the the I I'm not going to be going to be able to say it right but the alpaca yes the alpaca model from from Stanford right so you saw that so they trained that thing on 500 of compute right so they started from a seven billion parameter not a hundred plus billion parameter a seven billion parameter um and then we're able to fine tune it with instructions uh to to produce alpaca uh with uh with just 500 of compute so I don't think uh uh I don't think we're going to be in a place where the practical way of doing machine learning uh is going to require millions of dollars to train models um and uh an exorbitant amounts of compute for like every type of application of course if you want to do a Bing chat that's going to be expensive but you should also uh take a look at all the efforts uh from uh so the gdm from Gregory I forgot his last name that did a C plus plus implementation of a deployment of llama we've done the same on Bloom first uh it was whisper.cpp then llama.cpp that's right whispered and llama and then we added blooms to the to the pile so that allows you to run those models on the edge uh so that's super exciting yeah we're seeing reports now of um with the with llama in particular uh someone rented on a Raspberry Pi doing 10 tokens a second it then was like converted over to a pixel six I think at five tokens a second um pretty pretty amazing amazing work happening out there when uh when you Nest on our team uh are saying that they got uh he got 16 tokens a second uh on Bloom uh on on the local machine I think was a Mac I was like I was blown away wow wow um so I guess one other thought that that occurred to me you know in thinking about the broader Hub and open source models and um what's happening with uh llms now you know on the one hand llms are you know I've kind of demonstrated themselves to be this like Swiss army knife of uh of uh of machine learning in the sense that you know they can you know do classification and a bunch of other tasks uh on the other hand I think you know one of the things I'm seeing is the excitement about llms is kind of causing folks to want to treat them as the the first tool or the tool you know the de facto tool as opposed to um you know often for cases where there are our use case specific models that probably already exist on the hugging face Hub uh that do a better job are you seeing that kind of thing happening and and how are you um you know when you're talking to to folks like how are you addressing that I'm glad I'm glad you bring it up because I think uh the the hype around generative Ai and llms is creating a lot of confusion um in the in the market right like I see uh I see customers come in and say hey I tried on the playground with like gpt4 uh it's amazing it's able to parse HTML and I'm like that was that was solved like 10 years ago with like super cost efficient algorithms right and um we have so many uh tasks specific domain specific language specific models uh on the Hub that have been contributed by our community um so that when you have a specific task you can have like a very efficient way to do it that maybe you can run on CPU maybe you can run on a single machine you can apply to all of your data like all your customer tickets coming in like a whole Twitter feed whatever at scale um and that's that's the the way to approach machine learning and do data science in a more pragmatic way so in a way like yeah you have this uh Swiss Army knife but uh if you want to hang a painting in your wall like are you gonna use a Swiss army knife like ah you're probably gonna use a drill I don't know um right so there's there's something about picking the right tool for the job um and uh we're super happy to provide that service to the community to have like this place where all the tools for all the jobs are and if uh just like apple was saying that there's an app for that on the Hub there is a model for that um um so that uh yeah we don't uh use very very expensive uh models to do a simple repeatable tasks I spoke with uh with your colleague Thomas Wolfe um actually a year ago uh it is amazing how quickly that year has flown by but about kind of big science this was just as I think the research phase around uh Bloom was kind of coming to an end and the productionalizing phase was starting um and uh so that model's been uh released um but there's recently been I think you mentioned this the bloom C the instruction tuned version of that can you talk a little bit about uh about that model and what do you think its contribution is or will be yeah well I think uh with the big science like the the biggest deliverable of big science was to show that you can build a large-scale collaboration where you can build all you can bring all the leading experts from every corner every company every organization to work together uh thinking through all the ethics from the ground up and produce uh produce something that's like a meaningful Improvement and that was Bloom right so it's a 176 billion parameter uh that remains today probably the the best multilingual uh open source base uh llm um and but I think the the main contribution was to show like as a field uh we can collaborate together scientifically uh to really Advance uh all boats like rise the tide for all boats and to me like that's even more interesting uh than the actual you know model checkpoint itself yeah and that was a big theme that came up in our our conversation and it was particularly interesting because it um you know kind of a row I guess you know it's a natural consequence but it kind of arose out of a in this time when there was a lot of question around you know can a non-google you know non-aws non-microsoft research team compete in NLP um and contribute in NLP you know given that they tend to not have the Investments that are um you know required to to train these massive models and uh yeah I think I think uh we I think big science was a great experiment uh for that I do think small teams small open source teams and given uh given some amount of compute right you do need a millions or tens of millions of compute are able to provide provide meaningful improvements to the state of the art but today it's not only about uh advancing the state of the art it's also about just making it accessible to people and that's why like our efforts today are really centered around the up and reproduction of closed Source models right we have big code that's sort of uh took the uh took the torch from a big science to produce a code generation model uh we're doing this with a servicenow fully open source the way big science was fully open source we got some checkpoints already there's more to come we talked about our uh Flamingo reproduction effort we're much like gpt4 where training a new model on both text and images so there are many efforts ongoing and I don't think it's Out Of Reach for small focused open source organizations to make meaningful contribution that's why we're backing the the the great Folks at El Uther and we're collaborating with Lion stability and and all the other guys is there a um uh code focused model that you're working on or backing yeah yeah that's the that's the big code uh effort and there are already some checkpoints out um you can find that on a hiking face uh it's the big code organization everything is out there it's a kind of analogous to big science it's a separate organization that's going after uh code code generation model yeah it's really a collaboration between servicenow and Hiking face to build this thing so um yeah it's it's more focused in that way but you talked about uh bloomzy and I think it's cool to mention because a lot of people don't know about it so the same way that you have a T5 as a base model and then flan T5 as an instruction tuned model that can respond to your instructions like describe in one sentence the following paragraph or like translate this thing this type of uh prompts and requests so the same way we instruction tuned uh a bloom into this Bloom Z checkpoint uh that's today the the largest open source instruction based model uh so it's yeah it's 176 billion parameter um and uh I think uh not enough people know about it how do you um evaluate and characterize the performance of models like that well the the thing is with um with these uh General capable models is that you need to develop a new kind of benchmarks um thankfully that's that's a domain that's still very much a scientific domain where um everybody's uh sharing results and different benchmarks from Helm to to other things um but uh yeah for for Bloom Z what's uh what's very important to us and is inhabiting from from Bloom like all the multilingual uh components of them so yeah you have to really uh look at a wide variety of benchmarks and then uh as a user like ask yourself like what's important for for my use case yeah one of the things that came up when we were chatting was this idea of uh you know we tend to think of open you know versus closed as kind of this a switch or a binary thing but there's a in fact a spectrum of different ways to make models available um tell us a little bit more about how you think about that yeah for us um uh open source machine learning is very important because that's core to our mission to to democratize good machine learning there's various components of that there is um of course the open source code which is the implementation of the model but there's also the accessibility of the training data sets and the accessibility of the model weights and the transparency of the the research so all of this goes into what can be closed or open and there's there are various sort of approaches to releasing new models along that Spectrum actually uh when uh probably the the best researcher in the field is Irene Suleiman who used to work at open Ai and worked on the gpt2 release and it's now as a at hiking face and she published I think it was a two or three months ago uh the a really cool paper uh called the gradient uh the gradient of uh of release uh for for models um and she really breaks it down um very well like the the whole spectrum between a fully open release and a close release um it was interesting to see a meta uh finding the the cursor within that gradient along the the last few releases right from from Galactica to uh to Lama so here it's a it's it's a different approach where uh you get some things you approve on a case-by-case basis uh you release in open source to code but not the weight um so yeah there's like many different approaches and then you just leak everything to torrents allegedly in any in any case like on a hiking phase where we're all the way on the fully open Spectrum right we want everything to be public like we think that AI is too important not to be a common good um and uh we want we want the whole field to to progress together so how does your collaboration with AWS fit into that well we've been uh working with AWS for quite some time it's the the number one Cloud that's hiking face users apply our models in uh but recently we decided to really extend uh and deepen our collaboration um and there are I guess two main aspects to that like the the first one is as I say like there is renewed urgency around making sure that the community has access to fully open source models that they can use and so toward that we needed to build our own capabilities to do those uh those training and that's what we did with AWS we benchmarked a whole bunch of different solutions and we found um we found a great solution for us to build that capability so we have a super computer cluster that's running right now on on training new open source models that's one important part and we want to make sure those models are available and easy to use um to uh to AWS customers and then the second part of it is how do we drive the adoption of machine learning uh within uh companies like I think we do a pretty good job at making things accessible to uh to practitioners but how you do you take that to production how do you make sure that you can use machine learning in a way that doesn't that your your production costs don't go out of control so that's a big focus of this collaboration and it's a very um uh it's a very deep engineering collaboration like we work day to day with the engineering teams uh from the hardware to the platform layers so from the hardware uh with the uh teams that build these Hardware accelerators that are designed from the ground up for machine learning so it's trainium and inferentia and we work uh day to day closely with the engineering teams at sagemaker which is how um how data scientists and machine learning in learning Engineers can use these models to deploy them and fine-tune them Etc to build a very very easy experiences using open source to take a model from from hagging face and then build with it directly in sagemaker I'm controlling controlling costs along the way one of the the things that I struggled a little bit with reviewing that uh the blog post about the the expanded relationship was in compared to a couple of years ago or 18 months ago or so when you announced the initial relationship there was a ton of detail published like these are the things that we're going to do these These are the modules that we've created to integrate hugging face and sagemaker um whereas uh more recently it was higher level and I'm curious maybe a way to make the way you're planning to work together or the way you're working together more tangible for me kind of I I guess I have my analyst hat on now in in a year you know or the next 18 months um if you look back like what are the things that you all have achieved that will let you know that you know the past you know 18 months was successful so that couple a couple of things to that so the first thing as I said like we built a super complete super computer cluster uh to train new foundational models and we're not going to announce the models until we release them and we're not going to release them until they're not just ready but also that they work really well so I think that's going to be one of the one of the good ways to look at uh What uh what the impact of our partnership um will have been a year from now so when we see when we see new big models coming out we know that those were taking advantage of that supercomputer cluster uh that was built as part of this so that's one thing yeah the the open source contribution and then the other thing is uh the uh the uh developer experience that we are building uh between hugging phase and Amazon sagemaker and AWS um and some of that we had been working on uh for quite some time so already if you go to the hiking face Hub like go look at any of the 150 000 models that are out there we provide an easy way to deploy them or take them to sagemaker for fine tuning we like examples and um and Etc and so we already sort of have that platform to build upon what you're going to see is that we're going to be expanding like all the use cases that you can do this way and we're going to uh um to integrate more closely with the hardware accelerators so that you can take advantage of the cost savings um for your model so all the sort of practical uh how to get started information that we put together when we first announced our collaboration with sagemaker it's sort of already there right we have a full-on documentation about using a hanging face on sagemaker uh on on hikingface.com we have deep learning containers that are available in open source with the latest version of pytorch tensorflow uh and the hiking phase that are available so a lot of these things like already exist and for me the way that we're going to be measuring uh the success a year from now is by seeing like how many companies have been enabled by all these things that we've been building hmm that's awesome um you mentioned the the hardware accelerators um these are trainium and inferentia can you talk about the enablement process like what does it mean to um to for those uh accelerators to better support hugging face or hugging face models or Transformers or what what specifically needs to happen there yeah so uh in order to uh to take advantage of the acceleration uh on trainium and inferentia um you have to bring your model through the neuron uh compiler and so part of the work is to build the open source bridge between our models that can be in pi torch terms of flow Etc to bring them down in a way that you're going to get all the acceleration uh and so we're we just uh we haven't talked about it yet but we just released an open source package um Optimum neuron that's going to be a instrumental in enabling those um those experiences like in our testing if you take a if you take trainium um like head to head with like a comparable price GPU instance like an 8 NG and you try to do like your typical out of the box um training um you get up to like 5x uh better throughput so in terms of cost savings like really really big um and then on inferentia um we uh we got uh early uh sneak preview uh to the Next Generation in ferentia 2 is not yet generally available but it's out there in preview and again like we did like out of the box like take bird base and like run that thing for various uh sequence length and uh the the acceleration is like crazy it's like 8X uh faster for some sequence lengths um so uh so it's really really uh really compelling so for like companies that want to really apply machine learning at scale right using the right tool for the job like don't use an llm to do to classify emails like you can really take advantage of um that acceleration to reduce your costs by a lot uh so I think for us that's that's important for our goal of the marketization making things not just accessible but also affordable um um this open neuron I I may have missed the name Optimum neuron yeah Optimum neuron uh it sounds a little bit like an onyx type of competitor or alternative I guess more more um specifically is it doing a similar kind of thing well I guess uh at a high level right it's it's bridging uh the the language of the model to the language of the hardware um so there is some notion of right exactly um and Onyx allows that kind of intermediate representation of your model graph so you can lower it uh to various types of Hardware neuron allows you to compile your model so you can like run super fast yeah okay so it's specific to those two Hardware targets as opposed to Onyx which is trying to be more of a general layer yes got it got it got it um is one last thing that I wanted to kind of get your take on was the hugging face as a business and kind of how you see that evolving um again maybe you know this is analyst hat on uh and I don't know if I should say this but you know some you know sometimes I look at hugging face and I I see kind of Echoes of of Docker like this company that's like loved by customers and doing all this great work kind of revolutionizing user experience um but really like found it very difficult to to monetize and build a sustainable business and you know while they've I think you know gone through some changes and turned things around like it was a rough road um you know for a long time uh how do you see hugging face uh evolving and kind of overcoming the you know the challenges to scale as a business yeah well I think uh the the the exciting thing for me um is that as a function of having built sort of the the GitHub of machine learning right the the central place where all practitioners researchers are gonna contribute access um models we've built the the gateway to machine learning compute which as we know is just like seeing exponential growth right now I mean it's astronomical right and we are sort of at the bottom of that funnel and our our business model is to sell uh compute and services on top of our platform and so it's super exciting uh to see uh to see the adoption uh of these uh these products of CD adoption of our open source and models um on on sagemaker but also to see the adoption of our open source and models with uh our compute Services right we released a few months ago a production service called a hugging face inference endpoints where you take a model and then click click click AWS U.S east one uh take that type of instance and you get an API up um like within three months we had over a thousand customers of that thing so I think it's a very uh different opportunity that's in front of us than that was in front of Docker at the time although I don't have a crystal ball fair enough fair enough well uh there are a lot of us rooting for you and it was great to have an opportunity to to chat um and uh yeah I wish you wish you all the best but thank you so much time it was super fun same here thanks Jeff goodbyeall right everyone welcome to another episode of the twiml AI podcast I am your host Sam cherrington and today I'm joined by Jeff boutier Jeff is head of product at hugging face before we get into today's conversation be sure to take a moment to head over to Apple podcast or your listening platform of choice and if you enjoy the show please leave us a five-star rating in review Jeff welcome to the podcast thank you Sam thanks for having me I am really looking forward to our conversation we're going to be talking about open source and generative Ai and hugging face of course uh recent partnership with AWS a bunch of things on the agenda uh but before we dive into that I'd love to hear a little bit about your background oh for sure um well I'm a I'm a late bloomer to uh to AI a joint hiking face uh two and a half years ago I've known uh I've known uh Julian and Clem for uh for some time um in my first uh my first foray into AI was about like how can you automate uh the the editing in videos and so there's lots of early applications of AIS in there like trying to transcribe the speech trying to identify key moments through audio through computer vision and these things have come a long way man since that then but the last the last couple years have been really amazing and I feel like last week could have been a year in in regular time yeah that's one of the things that I wanted to maybe spend a little bit of time on uh was all the the this new news from last week um but uh you you mentioned that you've been at hugging face for two and a half years uh we were chatting before there's like the startup multiplier that multiplies that by like five or seven but then another AI multiplier on top of that uh that it's been a crazy couple of years I guess yeah it feels like it's compounding um and uh you know we've been saying that for a long time like there's been an exponential increase in like model size in like compute needs nerves papers yep yep everything everywhere all at once um and uh despite the past few weeks I've been uh super uh super interesting not just uh in the rate of uh new uh models new releases but also like how the the whole landscape of AI has been evolving yeah I've been mentioning to folks recently that um you know when I started the podcast six and change years ago a big part of the reason why I started it was because there was just some something in the air at the time this was like three years past alexnet uh folks were really starting to do interesting things with deep neural Nets I'd end up every week with hundreds of tabs of things that I wanted to learn more about and explore and uh you know it's been an exciting six years but like it it feels like that same energy right now that I was experiencing back then uh almost making me feel like I need to start another podcast or something you started it at the same time that uh hiking face started like uh at the time like Julian and clemo and Tomar they saw that uh the things were starting to work that were not possible before uh but he wasn't yet fully working like they had this crazy idea at the beginning of hiking phase that you could actually create an AI that would be fun to have a conversation with and you wouldn't interact it like you would text to your friends like kind of a crazy idea uh but I guess now six six years later uh it uh it came true uh but um I'm super excited to talk to you because you know when I uh three years ago when I started getting deeper into uh uh ml like I needed to catch up and uh yours was my source of inspiration and and uh and learning so yeah super super happy to uh to talk to you well that that's awesome to hear um you mentioned this past week uh we are of course referring to the release of gpt4 among a ton of other things that are that are going on um but I think gpt4 maybe provides an interesting backdrop for our broader conversation um and that it contrasts the open source theme that we're going to be spending some time chatting about any any Reflections on uh the the gpt4 launch and you know how you're seeing that impact the the broader Market yeah I mean uh it was kind of a fireworks right it was like this pie day like everybody in the AI Community decided to go on release mode and so of course you have gpt4 but you also uh had uh Google palm apis and you have anthropic Claude and you had Stanford coming out with the open source I don't know to say that I always get it wrong but uh yeah the instruction Tunes how do you say it I'm confused anyways everything everywhere happened all at once uh in the world of AI assembly AI uh announced a model that uh supposedly is 43 better than whisper or something like that uh just a ton of really amazing news uh this past week yeah and also on the open source side right uh together compute uh came out with open chat kit so that's uh uh Neo gptx 20 billion that's been fine-tuned on like 40 plus million uh of uh of instructions to get a an instruction fine-tuned models that like fully open source like Apache 2.0 it's like on the Hub you can use it mid Journey 5 also yeah and apparently it does a good job with hands now fingers you see I saw someone tweeted a a tweet but before you jump in I saw someone tweeted a tweet uh where they asked uh mid Journey five to create an image of a hundred you know raised hands and it did and all of the hands had five fingers but it was like 500 hands so the models still can't count but it can do hands now that's progress through this uh milestrum of announcements like what uh what became clear to me is um there's a shift for our field where sort of six months ago uh AI machine learning um was very much a scientific field with researchers building upon each other publishing papers reproducing each other results improving everything ends with the release of gpt4 and Google and anthropic announcements like we're in a different reality where the the the the the the new models are released kind of like uh apple style you know you have like the iPhone 10 and it's got this new feature and it's like a cool demo and uh you're gonna bring on stage some some people gonna tell you the story about it and and it's available today at this price um and that that's that's like new to me right and it's been a it's been a shift and for us like our mission is to democratize good machine learning and the way to do that is through open source uh see so the availability of the the models the training data sets the model weights the trading the the code all of that is super important for the field to progress together and make sure that uh everybody can build upon machine learning and so yeah that was like the the Bittersweet sort of uh part of uh of of those announcements and in particular Suite in that you're seeing this spread in acceleration but bitter in that uh for the most part there's a lack of openness in the major the major large models yeah it was quite stunning it was quite stunning like if you look uh at the the paper quote unquote um of the gpt4 release like the most notable thing was the absence right it was like yes you know it's for competitive commercial reasons uh we won't tell you like how big the model is like why was trained on high was trained like nothing um so that's that's a new turn yeah it is um it was interesting that they're not uh trying to position it as a safety concern as much as you know for competitive reasons like you said uh uh this is our core asset and we're you know gonna hide it behind uh you know we're gonna we're not going to be as open as maybe our name might suggest yeah yeah I think Ilia put it like very plainly uh in follow-up interviews saying that uh uh yeah open source is not the way forward for um for for them for commercial reasons um and so yeah for us that that triples our commitments to to make a open source uh uh models open source foundational models state-of-the-art models uh available to the community and enable the open source Community to to to to to contribute I think it's super important for for everyone involved so how do you think about the the open source landscape uh of models in particular um is there um you know yeah maybe kind of broadly uh at first and then we can dig into some examples of of some notable things on your radar yeah I mean it's uh the ecosystem is as vibrant as ever that the open source Community uses uh the hugging face Hub as a central place to calm and share and contribute and discuss uh all the all the latest models um and uh I have a slide where I say hey we have that many models available on the Hub and I have to to redo it uh like uh every other week like right now there's a hundred fifty thousand models that are free uh and openly accessible um on the hugging face Hub you can try any of them right there on the page um in the community keeps contributing and building upon each other work uh to uh to offer more Alternatives in like any task you can imagine any language you can imagine and yeah as I said like the the recent sort of commercialization sorry commercialization of uh of AI sort of um uh building a lot of momentum behind the open source Community to uh to accelerate uh their work right so of course at hagging face uh we are building a new foundational models uh to provide open source uh alternative we're doing this with AWS uh building them on our cluster there that's part of the uh a partnership that we recently announced uh but also we are enabling the community illuser uh just uh structure themselves uh to uh to do this work as well lion is building up an assistant I mentioned together computer and there's many other projects right now uh that's uh will create a new open source models kind of maybe connecting back to to Al alums in particular to what degree do you believe open source llms are kind of long-term viable given the immense cost associated with training Cutting Edge models uh do you do you feel like that cost is going to be or become insurmountable for open source communities or do you think that those communities will find a way to stay competitive with uh with closed commercial models yeah um I think there's a lot of research right now into um making models more performance on a model size basis right the the scaling laws that were sort of the main takeaway from the gpt3 paper and release I've sort of been uh challenged in a way or improved upon through new developments from the chinchilla paper to um to the latest uh the latest models like the the the I I'm not going to be going to be able to say it right but the alpaca yes the alpaca model from from Stanford right so you saw that so they trained that thing on 500 of compute right so they started from a seven billion parameter not a hundred plus billion parameter a seven billion parameter um and then we're able to fine tune it with instructions uh to to produce alpaca uh with uh with just 500 of compute so I don't think uh uh I don't think we're going to be in a place where the practical way of doing machine learning uh is going to require millions of dollars to train models um and uh an exorbitant amounts of compute for like every type of application of course if you want to do a Bing chat that's going to be expensive but you should also uh take a look at all the efforts uh from uh so the gdm from Gregory I forgot his last name that did a C plus plus implementation of a deployment of llama we've done the same on Bloom first uh it was whisper.cpp then llama.cpp that's right whispered and llama and then we added blooms to the to the pile so that allows you to run those models on the edge uh so that's super exciting yeah we're seeing reports now of um with the with llama in particular uh someone rented on a Raspberry Pi doing 10 tokens a second it then was like converted over to a pixel six I think at five tokens a second um pretty pretty amazing amazing work happening out there when uh when you Nest on our team uh are saying that they got uh he got 16 tokens a second uh on Bloom uh on on the local machine I think was a Mac I was like I was blown away wow wow um so I guess one other thought that that occurred to me you know in thinking about the broader Hub and open source models and um what's happening with uh llms now you know on the one hand llms are you know I've kind of demonstrated themselves to be this like Swiss army knife of uh of uh of machine learning in the sense that you know they can you know do classification and a bunch of other tasks uh on the other hand I think you know one of the things I'm seeing is the excitement about llms is kind of causing folks to want to treat them as the the first tool or the tool you know the de facto tool as opposed to um you know often for cases where there are our use case specific models that probably already exist on the hugging face Hub uh that do a better job are you seeing that kind of thing happening and and how are you um you know when you're talking to to folks like how are you addressing that I'm glad I'm glad you bring it up because I think uh the the hype around generative Ai and llms is creating a lot of confusion um in the in the market right like I see uh I see customers come in and say hey I tried on the playground with like gpt4 uh it's amazing it's able to parse HTML and I'm like that was that was solved like 10 years ago with like super cost efficient algorithms right and um we have so many uh tasks specific domain specific language specific models uh on the Hub that have been contributed by our community um so that when you have a specific task you can have like a very efficient way to do it that maybe you can run on CPU maybe you can run on a single machine you can apply to all of your data like all your customer tickets coming in like a whole Twitter feed whatever at scale um and that's that's the the way to approach machine learning and do data science in a more pragmatic way so in a way like yeah you have this uh Swiss Army knife but uh if you want to hang a painting in your wall like are you gonna use a Swiss army knife like ah you're probably gonna use a drill I don't know um right so there's there's something about picking the right tool for the job um and uh we're super happy to provide that service to the community to have like this place where all the tools for all the jobs are and if uh just like apple was saying that there's an app for that on the Hub there is a model for that um um so that uh yeah we don't uh use very very expensive uh models to do a simple repeatable tasks I spoke with uh with your colleague Thomas Wolfe um actually a year ago uh it is amazing how quickly that year has flown by but about kind of big science this was just as I think the research phase around uh Bloom was kind of coming to an end and the productionalizing phase was starting um and uh so that model's been uh released um but there's recently been I think you mentioned this the bloom C the instruction tuned version of that can you talk a little bit about uh about that model and what do you think its contribution is or will be yeah well I think uh with the big science like the the biggest deliverable of big science was to show that you can build a large-scale collaboration where you can build all you can bring all the leading experts from every corner every company every organization to work together uh thinking through all the ethics from the ground up and produce uh produce something that's like a meaningful Improvement and that was Bloom right so it's a 176 billion parameter uh that remains today probably the the best multilingual uh open source base uh llm um and but I think the the main contribution was to show like as a field uh we can collaborate together scientifically uh to really Advance uh all boats like rise the tide for all boats and to me like that's even more interesting uh than the actual you know model checkpoint itself yeah and that was a big theme that came up in our our conversation and it was particularly interesting because it um you know kind of a row I guess you know it's a natural consequence but it kind of arose out of a in this time when there was a lot of question around you know can a non-google you know non-aws non-microsoft research team compete in NLP um and contribute in NLP you know given that they tend to not have the Investments that are um you know required to to train these massive models and uh yeah I think I think uh we I think big science was a great experiment uh for that I do think small teams small open source teams and given uh given some amount of compute right you do need a millions or tens of millions of compute are able to provide provide meaningful improvements to the state of the art but today it's not only about uh advancing the state of the art it's also about just making it accessible to people and that's why like our efforts today are really centered around the up and reproduction of closed Source models right we have big code that's sort of uh took the uh took the torch from a big science to produce a code generation model uh we're doing this with a servicenow fully open source the way big science was fully open source we got some checkpoints already there's more to come we talked about our uh Flamingo reproduction effort we're much like gpt4 where training a new model on both text and images so there are many efforts ongoing and I don't think it's Out Of Reach for small focused open source organizations to make meaningful contribution that's why we're backing the the the great Folks at El Uther and we're collaborating with Lion stability and and all the other guys is there a um uh code focused model that you're working on or backing yeah yeah that's the that's the big code uh effort and there are already some checkpoints out um you can find that on a hiking face uh it's the big code organization everything is out there it's a kind of analogous to big science it's a separate organization that's going after uh code code generation model yeah it's really a collaboration between servicenow and Hiking face to build this thing so um yeah it's it's more focused in that way but you talked about uh bloomzy and I think it's cool to mention because a lot of people don't know about it so the same way that you have a T5 as a base model and then flan T5 as an instruction tuned model that can respond to your instructions like describe in one sentence the following paragraph or like translate this thing this type of uh prompts and requests so the same way we instruction tuned uh a bloom into this Bloom Z checkpoint uh that's today the the largest open source instruction based model uh so it's yeah it's 176 billion parameter um and uh I think uh not enough people know about it how do you um evaluate and characterize the performance of models like that well the the thing is with um with these uh General capable models is that you need to develop a new kind of benchmarks um thankfully that's that's a domain that's still very much a scientific domain where um everybody's uh sharing results and different benchmarks from Helm to to other things um but uh yeah for for Bloom Z what's uh what's very important to us and is inhabiting from from Bloom like all the multilingual uh components of them so yeah you have to really uh look at a wide variety of benchmarks and then uh as a user like ask yourself like what's important for for my use case yeah one of the things that came up when we were chatting was this idea of uh you know we tend to think of open you know versus closed as kind of this a switch or a binary thing but there's a in fact a spectrum of different ways to make models available um tell us a little bit more about how you think about that yeah for us um uh open source machine learning is very important because that's core to our mission to to democratize good machine learning there's various components of that there is um of course the open source code which is the implementation of the model but there's also the accessibility of the training data sets and the accessibility of the model weights and the transparency of the the research so all of this goes into what can be closed or open and there's there are various sort of approaches to releasing new models along that Spectrum actually uh when uh probably the the best researcher in the field is Irene Suleiman who used to work at open Ai and worked on the gpt2 release and it's now as a at hiking face and she published I think it was a two or three months ago uh the a really cool paper uh called the gradient uh the gradient of uh of release uh for for models um and she really breaks it down um very well like the the whole spectrum between a fully open release and a close release um it was interesting to see a meta uh finding the the cursor within that gradient along the the last few releases right from from Galactica to uh to Lama so here it's a it's it's a different approach where uh you get some things you approve on a case-by-case basis uh you release in open source to code but not the weight um so yeah there's like many different approaches and then you just leak everything to torrents allegedly in any in any case like on a hiking phase where we're all the way on the fully open Spectrum right we want everything to be public like we think that AI is too important not to be a common good um and uh we want we want the whole field to to progress together so how does your collaboration with AWS fit into that well we've been uh working with AWS for quite some time it's the the number one Cloud that's hiking face users apply our models in uh but recently we decided to really extend uh and deepen our collaboration um and there are I guess two main aspects to that like the the first one is as I say like there is renewed urgency around making sure that the community has access to fully open source models that they can use and so toward that we needed to build our own capabilities to do those uh those training and that's what we did with AWS we benchmarked a whole bunch of different solutions and we found um we found a great solution for us to build that capability so we have a super computer cluster that's running right now on on training new open source models that's one important part and we want to make sure those models are available and easy to use um to uh to AWS customers and then the second part of it is how do we drive the adoption of machine learning uh within uh companies like I think we do a pretty good job at making things accessible to uh to practitioners but how you do you take that to production how do you make sure that you can use machine learning in a way that doesn't that your your production costs don't go out of control so that's a big focus of this collaboration and it's a very um uh it's a very deep engineering collaboration like we work day to day with the engineering teams uh from the hardware to the platform layers so from the hardware uh with the uh teams that build these Hardware accelerators that are designed from the ground up for machine learning so it's trainium and inferentia and we work uh day to day closely with the engineering teams at sagemaker which is how um how data scientists and machine learning in learning Engineers can use these models to deploy them and fine-tune them Etc to build a very very easy experiences using open source to take a model from from hagging face and then build with it directly in sagemaker I'm controlling controlling costs along the way one of the the things that I struggled a little bit with reviewing that uh the blog post about the the expanded relationship was in compared to a couple of years ago or 18 months ago or so when you announced the initial relationship there was a ton of detail published like these are the things that we're going to do these These are the modules that we've created to integrate hugging face and sagemaker um whereas uh more recently it was higher level and I'm curious maybe a way to make the way you're planning to work together or the way you're working together more tangible for me kind of I I guess I have my analyst hat on now in in a year you know or the next 18 months um if you look back like what are the things that you all have achieved that will let you know that you know the past you know 18 months was successful so that couple a couple of things to that so the first thing as I said like we built a super complete super computer cluster uh to train new foundational models and we're not going to announce the models until we release them and we're not going to release them until they're not just ready but also that they work really well so I think that's going to be one of the one of the good ways to look at uh What uh what the impact of our partnership um will have been a year from now so when we see when we see new big models coming out we know that those were taking advantage of that supercomputer cluster uh that was built as part of this so that's one thing yeah the the open source contribution and then the other thing is uh the uh the uh developer experience that we are building uh between hugging phase and Amazon sagemaker and AWS um and some of that we had been working on uh for quite some time so already if you go to the hiking face Hub like go look at any of the 150 000 models that are out there we provide an easy way to deploy them or take them to sagemaker for fine tuning we like examples and um and Etc and so we already sort of have that platform to build upon what you're going to see is that we're going to be expanding like all the use cases that you can do this way and we're going to uh um to integrate more closely with the hardware accelerators so that you can take advantage of the cost savings um for your model so all the sort of practical uh how to get started information that we put together when we first announced our collaboration with sagemaker it's sort of already there right we have a full-on documentation about using a hanging face on sagemaker uh on on hikingface.com we have deep learning containers that are available in open source with the latest version of pytorch tensorflow uh and the hiking phase that are available so a lot of these things like already exist and for me the way that we're going to be measuring uh the success a year from now is by seeing like how many companies have been enabled by all these things that we've been building hmm that's awesome um you mentioned the the hardware accelerators um these are trainium and inferentia can you talk about the enablement process like what does it mean to um to for those uh accelerators to better support hugging face or hugging face models or Transformers or what what specifically needs to happen there yeah so uh in order to uh to take advantage of the acceleration uh on trainium and inferentia um you have to bring your model through the neuron uh compiler and so part of the work is to build the open source bridge between our models that can be in pi torch terms of flow Etc to bring them down in a way that you're going to get all the acceleration uh and so we're we just uh we haven't talked about it yet but we just released an open source package um Optimum neuron that's going to be a instrumental in enabling those um those experiences like in our testing if you take a if you take trainium um like head to head with like a comparable price GPU instance like an 8 NG and you try to do like your typical out of the box um training um you get up to like 5x uh better throughput so in terms of cost savings like really really big um and then on inferentia um we uh we got uh early uh sneak preview uh to the Next Generation in ferentia 2 is not yet generally available but it's out there in preview and again like we did like out of the box like take bird base and like run that thing for various uh sequence length and uh the the acceleration is like crazy it's like 8X uh faster for some sequence lengths um so uh so it's really really uh really compelling so for like companies that want to really apply machine learning at scale right using the right tool for the job like don't use an llm to do to classify emails like you can really take advantage of um that acceleration to reduce your costs by a lot uh so I think for us that's that's important for our goal of the marketization making things not just accessible but also affordable um um this open neuron I I may have missed the name Optimum neuron yeah Optimum neuron uh it sounds a little bit like an onyx type of competitor or alternative I guess more more um specifically is it doing a similar kind of thing well I guess uh at a high level right it's it's bridging uh the the language of the model to the language of the hardware um so there is some notion of right exactly um and Onyx allows that kind of intermediate representation of your model graph so you can lower it uh to various types of Hardware neuron allows you to compile your model so you can like run super fast yeah okay so it's specific to those two Hardware targets as opposed to Onyx which is trying to be more of a general layer yes got it got it got it um is one last thing that I wanted to kind of get your take on was the hugging face as a business and kind of how you see that evolving um again maybe you know this is analyst hat on uh and I don't know if I should say this but you know some you know sometimes I look at hugging face and I I see kind of Echoes of of Docker like this company that's like loved by customers and doing all this great work kind of revolutionizing user experience um but really like found it very difficult to to monetize and build a sustainable business and you know while they've I think you know gone through some changes and turned things around like it was a rough road um you know for a long time uh how do you see hugging face uh evolving and kind of overcoming the you know the challenges to scale as a business yeah well I think uh the the the exciting thing for me um is that as a function of having built sort of the the GitHub of machine learning right the the central place where all practitioners researchers are gonna contribute access um models we've built the the gateway to machine learning compute which as we know is just like seeing exponential growth right now I mean it's astronomical right and we are sort of at the bottom of that funnel and our our business model is to sell uh compute and services on top of our platform and so it's super exciting uh to see uh to see the adoption uh of these uh these products of CD adoption of our open source and models um on on sagemaker but also to see the adoption of our open source and models with uh our compute Services right we released a few months ago a production service called a hugging face inference endpoints where you take a model and then click click click AWS U.S east one uh take that type of instance and you get an API up um like within three months we had over a thousand customers of that thing so I think it's a very uh different opportunity that's in front of us than that was in front of Docker at the time although I don't have a crystal ball fair enough fair enough well uh there are a lot of us rooting for you and it was great to have an opportunity to to chat um and uh yeah I wish you wish you all the best but thank you so much time it was super fun same here thanks Jeff goodbye\n"