Pre-training LMS: Creating Your Own Language Model from Scratch
I'm delighted to introduce pre-training LMS built in partnership with Upstage and taught by Upstage CL Sunum, as well as Chief Scientific Officer Lucy Punk. Researchers and developers are constantly announcing new advancements in this field, including models like Llama, GroSolar, Orer, and many others. This course will show you how such models are created through a process called pre-training and walk you through the steps to take if you ever want to train an LM from scratch yourself.
You also gain an intuition of about the cost of pre-training and how it doesn't always cost millions of dollars, especially if you start with one of the smaller existing open source models as opposed to training a massive Cutting Edge model, which still remains very expensive. Specifically, pre-training is the initial phase of supervised learning during which an LM learns to repeatedly predict the next word fragment called a token using vast amounts of text data. The result of pre-training is a model called a base model, which is generally pretty good at predicting the next word or generating more text when given an input prompt.
However, it's not always great at following instructions or behaving in a safe way, that's why we call this pre-training. It comes before the fine-tuning and alignment training that then results in the instruct or instruction tune variance of models like GBT 3.5 and Llama Or Gemini. Taking specialized hardware and a huge amount of compute to pre-train LMS makes it expensive if your use case can be done with an existing model, it will pretty much always be easier and cheaper to use the existing model.
But there are some scenarios where you might find yourself having to pre-train for example creating a model with new domain knowledge or developing models that better at speaking a specific language that's maybe not well represented by the more general models. Further, new strategies and processes are being developed by research labs and companies like Upstage to make pre-training more efficient, which is opening up this process to more developers. Upstage Sun and Lucy have trained their own family of models called Solar using the techniques demonstrated in this course.
They also work with many customers to pre-train new models for new specific needs and specific use cases as one example of training alic innovation. Upstage has found that a key technique called dep upscaling, which you learn about in this course, can reduce training costs by maybe about 70%. Andrew is really excited to be here and will start the course by presenting some scenarios where pre-training your own model is a good option and discussing why F-tuning alone may not be enough to get the performance you need from a model.
Hopefully, this will give you a sense of why so many new pre-change models are being developed when great models like G4 Llama Cloud and Gemini have been already trained. Next, you'll walk through all the steps required to train your own model starting with data sourcing, cleaning preparation, and then moving on to model configuration and weight initialization. We'll finish by showing you how to set up a training run along the way and share tips on how to extend the examples from the course to larger scale real-world training.
If you want to better understand how the key step of pre-training Els work or if you want to take an existing model and continue the pre-trading process or if you even want to compete on LM leader boards by pre-trading your own model from scratch, I hope you sign up for this course.
"WEBVTTKind: captionsLanguage: enI'm delighted to introduce pre-training LMS built in partnership with upstage and taught by upstage CL sunum as well as Chief scientific officer Lucy Punk new Elms are being announced by researchers and developers all the time including llama Gro solar orer and many many others this course will show you how such models are created through a process called pre trining and walk you through the steps to take if you ever want to train an LM from scratch yourself you also gain an intuition of about the cost of pre-training and how it doesn't always cost millions of dollars especially if you starts with one of the smaller existing open source models as opposed to training a massive Cutting Edge model which Still Remains very expensive of course specifically pre-training is the initial phase of supervised learning during which an LM learns to repeatedly predict the next word fragment called a token using vast amounts of text Data the result of pre-training is a model called a base model which is generally pretty good at predicting the next word or generating more text when given an input prompt but it's not always great at following instructions or behaving in a safe way that's why we call this pre-training it comes before the fine-tuning and Alignment training that then results in the instruct or instruction tune variance of models like gbt 3.5 and Llama Or Gemini it takes specialized hardware and a huge amount of compute to pre pre Trin LMS and this makes it expensive if your use case can be done with an existing model it'll pretty much always be easier and cheaper to use the existing model but there are some scenarios where you might find yourself having to pre-train for example creating a model with new domain knowledge or developing models that better at speaking a specific language that's maybe not well represented by the more General models further new strategies and processes are being developed by research labs and companies like upstage to make pre-training more efficient and this is opening up this process to more developers at upstage Sun and Lucy have trained their own family of models called solar using the techniques demonstrated in this course they also work with many customers to pre-train new models for new specific needs and specific use cases as one example of training alic innovation upstage has found that a key technique called dep upscaling which you learn about in this course can reduce training costs by maybe about 70% thanks Andrew really excited to be here we will start the course by presenting some scenarios where pre-training your own model is good option and discuss why F tuning alone may not be enough to get the performance you need from a model hopefully this will give you a sense of why so many new pre-change models are being developed when great models like G4 llama cloud and Gemini have been already trained next you'll walk through all the steps required to train your own model starting with data sourcing cleaning preparation and then moving on to model configuration and weight initialization we'll finish by showing you how to set up a training run along the way we'll share tips on how to extend the examples from the course to larger scale real world training so if you want to better understand how the key step of pre-training Els work or if you want to take an existing model and continue the pre-trading process or if you even want to compete on LM leader boards by pre-trading your own model from scratch I hope you sign up for this courseI'm delighted to introduce pre-training LMS built in partnership with upstage and taught by upstage CL sunum as well as Chief scientific officer Lucy Punk new Elms are being announced by researchers and developers all the time including llama Gro solar orer and many many others this course will show you how such models are created through a process called pre trining and walk you through the steps to take if you ever want to train an LM from scratch yourself you also gain an intuition of about the cost of pre-training and how it doesn't always cost millions of dollars especially if you starts with one of the smaller existing open source models as opposed to training a massive Cutting Edge model which Still Remains very expensive of course specifically pre-training is the initial phase of supervised learning during which an LM learns to repeatedly predict the next word fragment called a token using vast amounts of text Data the result of pre-training is a model called a base model which is generally pretty good at predicting the next word or generating more text when given an input prompt but it's not always great at following instructions or behaving in a safe way that's why we call this pre-training it comes before the fine-tuning and Alignment training that then results in the instruct or instruction tune variance of models like gbt 3.5 and Llama Or Gemini it takes specialized hardware and a huge amount of compute to pre pre Trin LMS and this makes it expensive if your use case can be done with an existing model it'll pretty much always be easier and cheaper to use the existing model but there are some scenarios where you might find yourself having to pre-train for example creating a model with new domain knowledge or developing models that better at speaking a specific language that's maybe not well represented by the more General models further new strategies and processes are being developed by research labs and companies like upstage to make pre-training more efficient and this is opening up this process to more developers at upstage Sun and Lucy have trained their own family of models called solar using the techniques demonstrated in this course they also work with many customers to pre-train new models for new specific needs and specific use cases as one example of training alic innovation upstage has found that a key technique called dep upscaling which you learn about in this course can reduce training costs by maybe about 70% thanks Andrew really excited to be here we will start the course by presenting some scenarios where pre-training your own model is good option and discuss why F tuning alone may not be enough to get the performance you need from a model hopefully this will give you a sense of why so many new pre-change models are being developed when great models like G4 llama cloud and Gemini have been already trained next you'll walk through all the steps required to train your own model starting with data sourcing cleaning preparation and then moving on to model configuration and weight initialization we'll finish by showing you how to set up a training run along the way we'll share tips on how to extend the examples from the course to larger scale real world training so if you want to better understand how the key step of pre-training Els work or if you want to take an existing model and continue the pre-trading process or if you even want to compete on LM leader boards by pre-trading your own model from scratch I hope you sign up for this course\n"