**Exploring the Potential of Large Language Models: Challenges, Solutions, and Future Trends**
**Introduction**
Large language models (LLMs) have revolutionized the way we approach artificial intelligence, offering unparalleled capabilities in natural language processing. From automating tasks to enhancing creativity, these models have become integral to various applications across industries. However, their potential is not without challenges, which necessitates a deeper understanding of their limitations and how they can be overcome.
**Challenges in Building Applications with LLMs**
One of the primary hurdles in leveraging LLMs is ensuring reliability. Many projects start as proofs of concept but struggle to transition into production-ready solutions. This gap often stems from the models' tendency to produce inconsistent or irrelevant outputs, known as hallucination. Developers must bridge this gap by refining use cases and ensuring consistent performance across various scenarios.
**Hallucination in LLMs: Causes and Mitigation**
Hallucination occurs when an LLM generates incorrect or nonsensical information due to its training data or architecture. To mitigate this, techniques like embedding-based retrieval systems have emerged as effective solutions. By caching knowledge and retrieving relevant contexts, these systems reduce reliance on the model's internal knowledge, thereby improving accuracy.
**Embedding Techniques and Retrieval Systems**
Embeddings represent text in a numerical format, enabling semantic search capabilities. This approach allows users to retrieve the most contextually relevant information from a dataset. Combining embeddings with retrieval systems not only enhances reliability but also makes applications more efficient by reducing unnecessary computations.
**Multi-Language Models: Training and Applications**
Multilingual models are trained on diverse datasets, encompassing multiple languages. This training process improves their ability to understand and generate text across different linguistic contexts. Such models are invaluable for global businesses seeking to deploy AI solutions that cater to a wide audience, bridging language barriers seamlessly.
**The Future of Multimodal AI**
Looking ahead, the integration of multimodal capabilities into LLMs promises exciting advancements. By incorporating inputs like images and audio, these models can perform tasks ranging from image generation to multilingual understanding. This expansion opens new avenues for innovation, particularly in areas like social interaction and embodied learning.
**AI in Personal Productivity**
Beyond enterprise applications, LLMs are transforming personal productivity. Tools like AI-powered writing assistants and brainstorming aids help non-native speakers improve their formulation skills and assist individuals in refining their ideas. These tools exemplify how AI can enhance everyday tasks, making them more efficient and accessible.
**Conclusion**
As we stand on the brink of new developments in AI, embracing these technologies while addressing their challenges is crucial. By leveraging techniques like embedding-based retrieval systems and focusing on multilingual and multimodal capabilities, we can unlock the full potential of LLMs. The future holds endless possibilities, where AI not only enhances business operations but also enriches our personal lives, fostering a more connected and intelligent world.
"WEBVTTKind: captionsLanguage: enforeign this is an interview with Jay Alamar J has a technical blog as well as a YouTube channel where he explains a lot of technical AI related stuff like his attention and Transformers blog that you definitely have heard before and he has amazing learning resources now he is focused on building llm university with Co here and we will talk about that in this episode but not only we will also dive into the different challenges in building large language model based apps and how to build them I think this episode is perfect for anyone interested into large English models but mainly for people that want to create a cool app using them I hope you enjoy this episode so I'm Jay I worked as a software engineer I've been fascinated with machine learning for so long but I really started working in it just about eight years ago when I started a Blog to document how I'm learning about about machine learning it it seemed to be this power that that that gives software capabilities to do that's quite mind-blowing if you know what the limitations of software are and you see these machine learning demos coming and doing things that really stretch the imagination in terms of what is possible to do with software so that's when I started my blog which a bunch of people have seen use for uh because I I started with introductions to machine learning and just in general how to do how to think about bad propagation and neural networks and then I moved into uh uh more advanced topics covering uh attention and how language processing is done with language generation and then it really exploded when I started sort of talking about Transformer models and birth models and GPT models from the blog I got to work with Udacity creating some of their nanodegrees for educating people on how to use these language models and how to train them these were two things that really launched a little bit of how I work with machine learning in in AI the blog I've seen has been has had upwards of 600 of 6 million page views so far a lot of it uh around Transformers and how they work um and yeah I think maybe tens of thousands of people have went through the the various Udacity programs on machine learning deep learning uh NLP computer vision um and most recently I get to work very closely with these models with cohere as director and Engineering fellow where I continue to learn about the capabilities of these models and explain them to people in terms of how to deploy them in real world applications so how can you build something with them that solves a real problem that you have right now and that includes education it includes talks that includes creating schematics but also like crystallizing a lot of Lessons Learned in in the industry of how to deploy these models and how to build things around them cohere so a lot of people have heard about about Transformers and clear was was built by three co-founders um some of whom were co-authors of the original Transformers paper and they've been building these you know Transformers in the cloud these hosted managed large language models um for about two or maybe two and a half years I've been with the company for for two years and I've seen this this deployment and brought out um and yeah since then the company has trained and deployed you know two families of massive language models that do text generation but also text understanding and we can go deeper into these two uh capabilities of large language models absolutely and you mentioned that you started eight years ago your blog and of course as with most people I discovered due to just the amazing blog that you wrote on Transformer and attention and it really helped me and and lots of people but since you started so long ago what made you start and made you get into the field because eight years ago AI hype wasn't really near what it is right now so how did you discover it and what made you create this this blog about it amazing yes so I had working as a software engineer for a long time sometimes I would come across some demos that to me feel magical uh and a lot of them come from machine learning and so if you it's on YouTube If you Google words lens uh this was a demo that came out in 2010 of an iPhone 4 that you you can point it to let's say a sentence written in Spanish and it will transform that into let's say the English translation it will superimpose it and right now it's you know it's 13 years later now but that still feels like magic especially if you like know about software and how complex dealing with language and images to be able to do that without a server on software that's running on a machine that felt like an alien artifact to me and so seeing something like that I've always had this to-do list in my head of get into machine learning find the bed the first opportunity to get into machine learning and understand uh because it's clearly going to be transformative into to software the moment that really gave me the the jump into it is around 2015 when tensorflow is open sourced it felt like okay this is the time now they're open source code so because you know a lot of these things you have to be very close to a research lab or you work sort of deep inside a company and I at the time I'm like an outsider and now I wasn't in Silicon Valley it wasn't in sort of you know a big Tech Hub I was just a a software person with a laptop with access to the internet and just trying to learn on my own without sort of a group or a company or a research group around me was it let's say academic um and so a lot of it was like self-learning so when tensorflow came out I was like okay I read a paper in 2004 called mapreduce and that launched the Big Data industry so everything sort of around big data is a massive industry and it's felt like okay the the launch of tensorflow is like and everything that started happening in deep learning uh is we're in the beginning of this new wave of of deep learning uh and so yeah I started to take some tutorials but then how do you feel satisfied if you spend three four months learning about something yes you have more information you've developed a little bit of a skill but like I always need an artifact that sort of solidifies that from one month one to one three I have this thing and that's what the blog was certainly it's like okay let me I struggle very hard to understand a concept once I understand it I'm like okay there's a better maybe an easier way for me to have understood it if it was explained this way and that's how I try to sort of let me chew on this once I understand it try to sort of uh hide the complexity the things that made me intimidated when I was like I would learn something and then I'm faced with a wall of four wheelers for example or like a lot of code before getting the intuition that made me feel intimidated and sort of I'm very that those feelings are sort of what guides me to have these gentle approaches to these topics to to the readers where we hide the complexity let's get to the intuition let's get a visual sense of intuition but it happens over a lot of iteration and I'm happy to get into like the how the writing and visualization process uh it develops over time with me yeah I'd love to because mainly I do the same thing with YouTube where I actually started when learning artificial intelligence just to force myself to to study more and learn more but also just like you I wanted some kind of result or output from what I was learning and just to confirm that I actually correctly learned things is so because if you can explain it it it shouldn't mean that you understand it and so I I perfectly understand you and I'm on the same page same page but another way is also to like create something or code something or develop an application or whatever so what made you go the path of trying to to teach what you are learning instead of using or learning something to create something else so because it's always like you as you develop a skill you're not always building it building that skill to build one product that you have you're learning and you're observing and you're seeing what is is popular in the in the market um and then as you develop your skills maybe you in month six you're still not able to you know train a massive model deploy it to solve a problem um and so like I I decoupled launching a product from the learning and acquiring the skill um and that's why writing is is a really great middle ground or artifact of of learning because it's also a gift to people and it comes from also from like gratefulness like feeling grateful to the great explainers that explain things to me in the past so when I would you know really struggle in the beginning to understand you know what neural networks are and I come across a blog post by Andrea carpathy or by Andrew Trask or Chris Ola that explains something visually in a beautiful way or in like 11 lines of code where you can feel that you have it like I feel a lot of gratefulness that this made me closer to a goal that I have uh by by simplified it's what we're trying to do with what we just a lot of my Learning Journey is a sort of echoed by it I'm I'm happy to see sort of that work of education learning growing as a community uh you know teaching each other but also collaborating on on how we sort of learn and um is this is something that I think benefits everybody and I advise everybody to write what they learn a lot of people are stopped by no I'm just a newbie I'm just learning this but a lot of times just you listing the resources that you found useful is valuable on its own let alone sort of how much it will brand you um that you sort of wrote something and the writing process to me helps me learn so much deeper let's say The Illustrated transform of the blog post so there's what maybe 20 visuals on there the visuals you see on the blog post each one is let's say version number six or seven like I iterate over it so much more and learn in the process so I would read the paper I would say okay do I understand it correctly I'll this is my understanding of it and then I would saying I'll read another paragraph and say no the way I said it this sort of content takes with it let me draw it again with this new understanding and then once I know that I'm going to publish it I'm like a part of my brain says wait other people are going to read this how sure am I that this is how it actually is maybe I should actually go and read the code to verify my understanding gear like this depth of Investigation I would not have done it if I just read the paper and said okay I understand it and then there's another thing of like it's a good life hack that I sort of hand out to people so if you're explaining somebody's paper you know the paper has their emails and you're helping them sort of spread their ideas so once you work on it have it in a good shape write something send it to them have some feedback from them that is also sort of another uh really great source of you know feedback and connections that I've had over over uh over the blog and it really helps uh remove some of the blind spots that you sort of cannot see this is especially more important and valuable for people who are again not in Silicon Valley the majority of people are you know somewhere in the world without access to you know the most uh people who are working much closer with with this technology so a lot of us all we have is the internet so how can we sort of uh learn with online communities um that's another thing that we sort of llmu helps to like you know democratize that that knowledge but also here has a Community Driven research arm called cohere for AI that aims to broaden machine learning uh knowledge and research and sort of accepting people from I think that people from over 100 countries right now um so to me that that is a little bit on it because coming into that I was like you know I'm I'm a professional working in a specific profession I'm really excited about this I want to learn this but I'm you know I'm I'm nowhere near any of these uh you know very big companies that do it how do I do it and so that's what I hope um you know the opportunities that people get from sharing creating things sharing what they learn and learning together as a community and it's what we try to do also in let's say the cohere discourse you know right let us know let's let's learn together as a community yeah it's definitely super valuable to share anything you want to share and if you are wrong well in the worst case I I believe nobody will see it just because it's it's it's like not high quality but if people see it you will end up being corrected and we you will just learn even more as well as long as you are not like intentionally spreading misinformation it's possible that you are not completely sure that you fully understand what you are trying to explain but it it can still be just like dealt in your in your in your head and you are still right and you you can share it's like it's like a fear you you have to get over at some point 100 that stops a lot of people a lot of people are like you know I'm not the world's best expert on this and you don't have to be and like you can write that right in you're like and to me that gave me a lot of license and a lot of comfort in writing the things where I say I'm learning this let's learn it together these are my notes this is how I understand it and once I do that I learn deeper and you know people correct it I just update it I just send put an update or you know change the visual or and that's a that's a great way of learning together so you're doing your audience sort of a favor by learning together so and it helps you career so I advise a lot of people to say okay this this will help open doors for your career and you know for possible jobs that you can have in the future by uh by showing that you're passionate about this one topic or later on if you do it long enough that you're an expert absolutely yeah visibility is is super important even when you are learning like by being on YouTube for three years I I've seen a lot of people asking me like what to learn where to start and Etc and if especially when when learning online a lot of people get stuck into like doing one course and another another and they just keep on trying to learn because they just just like myself and most PhD students we always we almost always have the Imposter syndrome and we just like we think we are not the expert and it's people should not trust us or believe in us but it's just like we need to get over that and just and just try at least yeah yeah but like even even if you go to the world's you know best experts on anything the experts usually are are the experts on one very narrow thing um and then they're just learning everything else they're not just yeah so it's uh these are just out limitations as humans and right now with all the experience that you have teaching it um like with your amazing blog as I said on Transformer attention and everything related to large English models and now with the llm University where you also try to as we spoke with Lewis in the the previous episode you also do your best to explain how they work and what you can do with them and I know that this usually requires lots of visual like it's visuals are very helpful to try to teach how complicated things work but I wonder if uh after all this time working with this and trying to explain it you could find a way to explain Transformers and attention relatively Clearly Now with just the audio format would you be able to to explain how it works to for example someone just getting into the field okay yeah we have really good content on on Melody Mio for that and one thing that makes LMU special is that I'm collaborating on it with Incredible people so Luis is one of the best ML explainers and Educators in the world and right now if somebody wants to learn Transformers I don't I really don't refer them to The Illustrated Transformer the title I refer them to uh luis's because uh article on and llmu on Transformers because in the illustrator Transformer that there was a context where I was expecting people to have read the previous article about attention and rnn's and if you're coming in right now maybe you shouldn't You Should Skip learning about rnns and lstms you can just come right into neural networks and then sort of uh transfers and attention and so uh part of what makes llmu special for me is collaborating with Luis on it but also with mior armor who's like one of the best visual explainers of things Muir has a visual intro to a book called the visual intro to deep learning uh that as visual explanations of a lot of the concepts in in in deep learning and uh you know what are the best people who can really take a concept and put up put a visual picture on it so that that collaboration has been sort of a a dream come true for me on so the question on how I've been explaining Transformers to different audiences over the last uh five years and there are different ideas yes depending on who who the audience uh is uh there are so one way is to say right now A lot of people are used to generative models to generative Transformers uh and so that's a that's a good way to to see okay how does a text generation model uh one of these GPT models how does that uh answer a question if you ask and the way it does it is by generating one word at a time uh and so that's how it it runs on on on on on entrance how does it generate more one word at a time we give it the inputs uh the let's say we say uh you know what date is it today yeah it breaks that down into I'll say words the real the real word for this is tokens but let's say it breaks it down into the into the words and it feeds it into the model and the model on the other side of the model comes the the next word that the model sort of uh expects and then that is added back to the input and then the model generates and next and then the next uh this is how it works this is how these text generation models work now if you give them inputs that doesn't answer happens under the the hood that makes them uh do that that's another thing but in the beginning I like to give people a sense of okay when you're dealing with it at inference time this is what it's doing you can then go into the actual components so how does it do that well the input words are translated into numeric representations computers are computers they compute and language models are technically I heard this from uh somebody called Sanchez uh transform or language models are our language calculators so everything has to become numbers and then those numbers through calculations and multiplications become other other language and so that's what happened inside this box which is which is which is the model which was trained we'll get to how training happens uh at the end but now just assume we have this magically trained model you give words it predicts the next word and it gives you something coherent based on uh the statistics of the text that it was trained mechanically how it works is that the input text goes through the various layers of the model and the model has let's say components or blocks this can be six layers or in the original Transformer but like you know some of the large models now are 90 or you know 100 layers and each layer processes the text a little bit outputs numerical representations that are a little bit more processed and that goes to the next layer to the next layer and then by the end you get enough processing that the model is confident that okay the next word is is this so this is another layer of let's say breaking it down um and from here yeah we can take it into different ways you can say how it was trained and then we can also break down these blocks and these layers and talk about their various components so I will have you choose your destiny and sort of steer me which which way would you like us to go next I think I'd rather go for the how the blocks are are made what the blocks are made of and how it works amazing so I give an example of there are two major capabilities uh that are that correspond to the two major components of what's called a Transformer block uh have you seen the film The Shawshank Redemption I haven't it's a very popular film uh but like it's just these two words that commonly used to get the Shashank and Redemption uh so if you tell models Shawshank it will just based on a lot of the data that it was trained on then there aren't a lot of words that appear in the the training data set that usually come after searching so the highest probability word would be Redemption so it is based on what the model has seen in the past yeah and so that is the job of one of the two components uh that's what's called a feed forward neural network uh that's one of the two major components of this Transformer block that just works on on these let's say statistics of so if you only have that component of the Transformer block then you the model can make this prediction if you give it an input text of saying Shawshank it will output to Redemption that's that kind of work but then language is a little a little bit complex and that is not the only mechanism that can make um software generate you need another mechanism which is called attention and attention uh we can think about it as saying okay what if we tell the model this sentence and have try to have it complete the chicken did not cross the road because it no does it refer to the road or to the chicken it's very difficult to say okay to rely on the words that usually traditionally statistically appear after the word it because that will be a meaningless sentence in a lot of cases you need to make the understanding of are we talking about the streets or are we talking about about the chain and that's the goal or the purpose of the second player the attention mechanism how does it do that so it's built in a specific way that we don't need to go into surgery how it's but that's that's its goal and it learns this from a lot of the uh data that it was trained on which we can go into sort of next but these are the two major components of a model is multiple Trend a transform into our model including a GPT model the T in GPT is is transformed uh is multiple Transformer blocks each transferable block is self-attention this attention layer and then feed forward neural networking each one of them has this goal and then once you stack them for a model that is large enough uh and train it on large enough data set you can start to add these models that generate code that can summarize that can write copywriting and you know you can build these new industries of AI writing assistance on top of them yeah that perfectly makes sense I it's a really good explanation and I've struggled for a while even like in I think it's yeah it's two years ago that no three years ago that gpt3 came out it was I don't know why but I think it's it's always the case for new technologies but it was really hard to understand well enough to explain it properly and will allow you you definitely mastered it and I love how to how you separated the different topics and not dive into the details too much I often get stuck into the details because I I like like how attention calculates the the well the attention for each word and Etc and I I really like the details that you didn't even mention and I think it's relatively important to not mention that mention them and as you as you as you've done and yeah I I still need to to learn how to best explain things but it's yeah it's really nice to to see you explain something that I even know now but it's still like teaches me new stuff it's really cool it's uh it's it's it helps to do a lot of iteration and just do it over and over again and explain it to people and then notice that I said this and then their eyes sort of started defocusing a little bit and sort of going back to say okay maybe this was a little bit too too much detail let me delay it you can still mention the details but I love to layer it of like say you get one part of the concept and then you go a little bit deeper into another part and then but you get the full concept first at a high level and then a little bit of a more resolution on another another part that's a little bit of a philosophy that I've seen work over the years and I think just for a regular presentation it's also a good a good format to follow that just even to mention it that like that's the broad overview I will dive into the details later but for now just focus on that like I think just mentioning this made it like more interesting like you like you are a bit lust but you know that it's it's gonna come and so it's like you don't feel lost it's yeah I think it's it's a better way of of explaining for sure even if for example anyone listening that are not teachers or do not have drugs but still need for example they are working and they need to present something or any kind of presentation or just sharing knowledge is it's just really relevant to to learn or improve how you share it now everyone talks about jgpt and so I I would love if you could go over the different Steps From the self-supervised part to the fine tuning to the reinforcement learning with human feedback feedback like how would you explain all those quite complicated steps in simple words yeah so I do intend to at some point write something about human preference training either without a lot with without it there are different sort of methods um so training works one of the things that makes these models work now is that we can have a lot of data that is unlabeled uh and trained the model on so we can just get text get free text from the internet from Wikipedia for example or books or from any data set and we can use that to create training examples in this unsupervised which is now called semi-supervised way of saying okay let's take the uh one page from Wikipedia maybe a page about the film The Matrix for example or or just any any article and say okay that page has uh 10 000 words let's create a few training examples let's take the first three words and present them to the model and have the model try to predict the fourth word that's a training example and then we're gonna then again have another example where we give it the first forwards and have it try to predict the fifth word and so we can you can see that we can just slide this window and create millions or billions of training examples and that's what happens in the beginning this is why they're called language models this is a task in NLP called language modeling now that turned out to be one of the most magical things one of the biggest returns of Investments that maybe the technology ecosystem was like if human technology has ever given us back that with this you can go so far and in ways that sort of are really surprising to the people who are working closely with with this technology that if you do this with large enough models on large enough data the model then we'll be able to retain information World information so you can ask it about people and it will tell you you know who acted in The Matrix and what date and what time and that information starts being being bad it starts to generate very coherent text that sounds correct and it's grammatically correcting how does it do that without us being writing all the grammar rules in it if you train it large enough on multilingual data set it starts being able to do that in all languages in multiple languages so the language modeling is is one of the magical things that sort of are really bringing this this massive sort of explosion in in capability of software and Ai and it's the source of where all of this starts and it's the first step in training these these large language models and it's the one that takes the most compute and the most data so this can take you know months and months to train to take a model in machine learning you take a model and then you can start with a model with I don't know with any number of parameters but they're random in the beginning and the predictions that the model makes are jug because they're they're random but it learns from each training step when we give it the first we'll give it the three words and have it predict the forward its prediction is going to be wrong we'll say no you said this this is the correct answer let's update you so the next time you see this you have a little bit of a bit of a better chance of doing it again this step is what happens billions of or you know millions or billions of times this is the training this is the learning in machine learning um of making a prediction updating the model based on how wrong that prediction was in doing it over and over again so that is the first and most expensive step in creating a base uh Baseline model once that that came out and people started using it you can make it do useful things but you have to do a lot of prompt engineering to have the model uh because you can ask the model a question and say uh how do apples taste and the model based on just what it's seen in in the data that it can ask another question and say how do oranges taste and how do strawberries taste these are all reasonable continuations because you give it a question give you more questions but maybe changing the fruit type but the people from their interactions actually wanted was if I ask you a question give me an answer if I give you a command and tell you to write an article about apples I want you to write an article not to give me more commands about this and so this is what's called preference training and to do that you get these training examples of a question and this answer and or a command and where to say okay write me an article about X and then you have the article about X and then you train the model on this data set and that's how you get those those that behavior um of the model that that follows what people started expecting from the model of you follow my and so that's what what commands are that's what coher's command model is um attuned to do and that's what it struck GPT sort of started doing and how it's improved on on gpt3 in the past so so that uh sort of next step and then you can do you can get a little bit more uh behaviors by having another sort of training step which sometimes can include reinforcement learning by not just doing language modeling on this new data set that you've created and provided but also giving it good examples and bad examples and to say okay make it closer to good examples and and further from from Bad examples as rated by another say reward model but that complexity I think a lot of people don't need to get into like as long as you understand the language modeling objective and then this the idea of preference that gets you most of the understanding that you need then just just focus on how can it be relevant and useful for your own product that you're trying to build uh what kinds of prompts what kinds of pipelines or chains that are useful uh for that and that's you know for the vast majority of people much better than sort of understanding the the betterment equations you know the detailed reinforcement learning steps regarding the different products that you build with those models I I know that you talk a lot about that in the llmu and one thing that is I believe super important and promising other than for example fine tuning and the the common models is to use embeddings and build applications on them like Memorial travel related applications or any other kind of semantic storage classification Etc I wonder first well I have two questions with that and the the first one is what are embeddings and what can you do with them but also um what to you is is more promising behind trying to make the perfect chat GPT model with lots of fine tuning and and like the best comments possible and and human feedback and everything to make it perfect or like use a a model just for embeddings and then to to work with very specific applications like those or they are just very different use cases and both are relevant yeah so there are going to be people who use both there are going to be people who are just gonna be you know using different prompts and sending them to a large model and getting the results back and you see a lot of these on LinkedIn you know these are the top 10 prompts to use there's a class of people that will find that useful but there's another class which I sort of advocate for is how to think of these tools as components that you can build more and more advanced and and let's say uh systems where you're not just consuming this one one service or one model but you're actually building them as a builder yourself yeah um so when I advocate for that and for for you to do that the idea of embeddings is one of the most powerful ones and one of the most really Central ideas that just like how API is a as a word is not only a technical term now it is a business term CEOs have to know what an API is you know the last 10 15 years embeddings I believe is going to be one of those things because it's one of these Central components of how you can uh deal with with large language models and build more and more systems embeddings in short are these numerical representations of of text um they can be of words so things like word Evac were methods to give each word a series of numbers that represent it and capture its meaning but then from words we can also go into text embeddings which is you have a list of numbers that represent an entire text or sentence or email or or book so to speak and so that concept is is very important if you elect to be a builder with llms and you start to sort of generate get a sense of what what embeddings are one of the best things that I advise people to build is something involving semantic search where you get a data set so maybe let's say the Matrix uh film Wikipedia page break it down into sentences embed each sentence and then you can create let's say a simple search engine on on this data set and the search engine works like this you give it a query let's say you know when was The Matrix filmed for example when was it released that text is also embedded so you send that to the to an embedding model kind of like cohes embed endpoint you get the numbers back and then you can do a simple nearest neighbor as such that's also a very simple like two lines of code you can you can get this nearest neighbor and then that will give you the top three or top five sentences that are most close to that to that query the beautiful thing here is that you can regardless of the words that you use the llms capture the meaning so even if you don't use the same words the the model captures the intent um that's why when these models were rolled out like especially the birth model in 2019 like six months later Google rolled it out into Google search and called it one of the biggest leaps forward in the history of search just that addition of that one one model and most likely so cinematic services like has these two capabilities that you can build it so what we just compared is called denser 3 volt which is you have you embed your archive you embed your query and you get the nearest neighbors so that's one major concept that I advise people to build with and the other one is called rear rank um and rewriting is just using an uh an llm to change the order of uh search the results that happened in a step before so you throw your search at your existing search engine uh you get the top 10 results and you throw those at the rerun re-ranker that sort of changes the order and that dramatically improves like if you have an existing such system this dramatically improves the quality of those of those search results and so these two components each has let's say their own endpoint and and uh super high quality models on on the cohere side are maybe the two best ways to start dealing with large language models because then that is the future of generation as well because retrieval augmented generation is absolutely one of the most exciting areas and um you know one of the areas that can help you rely on information that you can serve to the model when you need it you can update that data whenever you need it you can give different users access to different data sets you're not reliant on data stored in the model you want to update them okay let's train the model for another nine months uh and then you know the model can sort of also that increases the model's hallucination so yeah there's a lot of excitement in this area that brings together semantic search and and Generation Um and we think it's it's it's it's highly wanted to to pay attention to it yeah retrieval is definitely as you mentioned a great way to not avoid but limit the hallucination problem because you can almost it doesn't work all the time but you can try to force it to only answer with like the the response and give reference to to what it it responds so like when it's searched in its memory and just finds nearest neighbor neighbor you can ask it to only answer with what it find and also give the source of what it found so that's really powerful compared to chat GPT that will just give you text and hopefully it's true and you don't even know where it comes from so it's definitely safer and also as you said easier to build you don't require to retrain the whole model and everything and can build multiple applications super easily but I'm not that familiar with their re-rank system could you give a bit more details on how it works and how it actually reorders the answers and improves the results sure yeah so so real anchors are these models that let's say you are let's say Google and you're crawling out your Google search three ranker you have your existing system before Transformers you give it a query it gives you 100 results yeah the easiest way to deploy a to power your search with llms with large models is to say okay these 100 results let me take the query and take each one of these results and present them to the model and have the model evaluate how relevant the this result is so the re-ranker is basically a classifier that classifies it has two parts of text it's what's called The Crossing code so you can you give it examples of a query and its answer and it should give the result of one let's say because that's a that's a true but then a query and a document that is not relevant to it and you the training label there is zero so this is not relevant to this so that's how you trade it and once you train it you just plug it into an existing search system the the previous step can can use embeddings or cannot that's that's that's fine but then it gives a relevance score for each of the 100 results and then you just sort by that relevance that becomes this one signal for you for your search that you can either just use and sort by the most relevant or then you can use other signals if you're rolling out actual search systems you want other signals you want let's say I want the more recent ones so assign a signal for recent documents or if you're building search for Google Maps you're like okay give me things that are uh closer to this to this one point so that's another sort of search uh signal or you can just say preference or things so that's how Reliance work and they you know you can Source by Source by relevance directly or you can just use that as one additional signal to a more complex okay much more clear now the easiest way basically to use large language models when you already have a search system or when you have a data set mostly for example anyone in a company or a manager or someone that has an issue or a problem or just in its regular work how do they know that their problem can be helped with llms is there any tricks or or tips to know that like oh now I should use an llm or something embedding or like a product of code here like how can you know that this problem will be helped through an llm yeah yeah that's a great point and the the common wisdom is that you know use the best tool for the job llms are great for some use cases they're not good for everything there are a lot of use cases where somebody wants to use an llm and I would advise them no you should use a regular expression for this or you should use Spacey for this use case or you should use just python string and text matching um the llms are just one additional tool that makes uh adds a level of capability to your to your system that should augment existing things uh that that sort of work with them so that understanding is a little bit important some people will be driven by the hype and you know would want to inject AI somehow they told their investors we will roll out AI in our next product to the list how to do that let's find any way to do it no it really should come from let's say user Pane and what problem you're trying to solve and you can classify two major parts parts there so one is maybe you're improving a specific uh text processing problem right now and you can get better results if you if you try another and so you have to choose what metric that you have that will improve your product or solve your pain and then compare llms with existing strong baselines which you know there are a lot of things that that can be done with with things that are not but then once you see that the llm is providing that that value for you that's when you sort of the progress for llm providers like cohere make it easy in terms of that you don't need to worry about deploying a model models are going out of memory because this model needs to fit on tens of gpus or something just worry about okay you want to do a rank okay send me the query send me the 10 texts I will send you back the ordered list and I will make that better the next time you sent me a request because I'm updating the model or training it's on you and better data every every month or with every version so this is a new type or let's say a providers of of this technology but yeah definitely focus on the problem that models can so improving existing text processing is one that's search system classification but there's this new capability of text generation so AI writing systems were not possible before three years so these are these new categories of so you might be wanting to innovate I will create I don't know the next AI interactive AI games or the next media format or I want to create a world like GTA with all of its radio stations but all of it will be generated by the computers um and uh I'm going to be creating something new and they that's the second category of things experiment it allows for a lot of new applications to be born just because before you well not a long time ago but when AI first started you needed to train your own model and as as you mentioned host it on the cloud or anywhere and like there's a lot of a lot to manage but now thanks to open Ai cohere and other companies you can basically have someone else do that for you but it's still there's there are still some challenges in building those large English model based apps for example if like I have a specific data set in my in my company for instance if it's a a very private company and the data set cannot go outside of the intranet of the company what can you do with that sense cohere and open AI for example it's all outside the the internet so what can you do if you want to build some kind of search based chatbot yeah that's a great question and that's a very common concern and we come across like it's one of the biggest areas that companies in the industry but also specifically Enterprise like large companies companies working in regulated spaces and uh coher actually caters uh to that so there is a solution of bringing the models to your own sort of virtual private Cloud so there's this rollout with AWS sagemaker where the model can be deployed in your own cloud the data does not necessarily go to coherence infrastructure they remain on your own sort of data center but then it's run through the sagemaker endpoint and all of it sort of you know roommates and that's one of the use cases where we see a lot of a lot of demanded sort of cohesive focus on Enterprise uh makes it able to focus on on use cases like this um where it's like say not specifically you know consumer Focus but it's like you know what are the big business problems of building the Next Generation applications and this I like that you highlighted it because this is you know commonly asked for and um you know we'd love to see more people so there's uh you know building with those uh it's great to know that it's possible and and for the the people that maybe have different problems not large companies for example if I don't know someone is learning and wants to build an app what are the main challenges when it comes to to building such like cohere or open AI based apps where you basically use the very powerful models that already exist but want to fine-tune it or like to adapt it to your specific applications either through a data set or or just specific commands but what what are the typical challenges or things that that the people that want to create cool things with those need to to tackle and and just go over yeah so the challenges are don't all have to be technical challenges like everything in the past like remains true in terms of you still need to find product Market fit for your product you need validation from your users you need to really solve a problem um and not do something that is nice nice to have with the generative models specifically like identifying reliable use cases is one thing that a lot of people need some hand-holding on like they come across an amazing demo on Twitter or something but then they don't realize that a lot of the demos are Cherry Picked it's like yeah they had to generate you know 20 generations to get that one let's say uh if you're building it as a product it cannot work just three out of 10 times it needs to work nine out of ten times yeah and so how do you get it to that level of of uh of production that that gap between a proof of concept product of this prompt can work I will take a screenshot of it and I will put it on on Twitter to this is a reliable system behavior that I know I could put in front of my users and it will work always bridging that Gap is is is is one of the challenging things that a lot of people have to have to contend with um and there are solutions and there are playbooks and we sort of write a lot about them and educate about and they include things like yeah using search using embeddings um fine tuning is another one you can the big models allow you to prototype and they do amazing behaviors and once you have the model able to do uh the behaviors that you want using one example in the prompt or you know five examples in the prompt you can then make it cheaper for you and faster by collecting that data set and fine-tuning a smaller model that is able to do that same task as well by the larger model um that saves on you know context size because you're not sending the same you know five few shot examples of every every prompt and so that yeah that is helpful and then another part of that is like we also talked about that with semantic search and getting the relevant bits and injecting them into the prompt you know people think that some people may think that context link will solve all the problems if you have a very large context on Google so and so you will send your I don't know the documentation of your software to the language model with every question that you ask about that argument and you can clearly see that that is wasteful if you were to answer a thousand questions the model has to process the same documentation thousands and thousands of times Well embeddings really is this way of caching that that knowledge and retrieving the um the important bits um so yeah these are a couple of things that experimenting and thinking about reliable behavior is is one of the learning curves that a lot of people have to go through what are skills and material needed to do that if well first can only one person do that if I want to create an app do I need a whole team and do I need a server do I need to to go through a course beforehand like what is the the required skill set and material to get into that is it just impossible for one person or can the person listening right now that has an ID can just start and and learn in the process what's what's accessible and how accessible is it so in in software like in general you need a user interface right so if you're targeting users Howard will they interact with it um or are you creating let's say an API endpoint that other people can just connect to it so there's a bunch of software hurdles that are not necessarily language modeling or prompt engineering or uh so that was a piping of that information and how you your users sort of uh connect with it so if you know python or and and the JavaScript one person can go very very far if they invest in sort of these two things if you only know Python and machine learning or data science you can create a proof of concept you can use something like streamlit um and and sort of create an application that the user interface that you can maybe demo to investors to you know help you uh build the next sort of level of it and more and more you see yeah companies like versel coming along and making that uh front-end to AI Pathways sort of a little bit easier so the language models will continue to make it easier for generalists to do many things um very well we're still in the beginning of that but it's clear that you individual people who are productive will become massively more productive aided by these Technologies and what they can do so yes smaller groups of people will be able to do a lot more and but for now yeah there are these skill sets of how are you going to build the the UI how are you then going to put it in front of users you can roll it out on the App Store you can do it through some some Marketplace can you do that individually do you know that customer segment do you know the really know the pain that you can solve with them um but yeah I mean a lot of people run one or two person um companies where it's like okay charge credit cards use these various Frameworks and then put some some good looking UI on top of it and uh but then the question is do you have enough of a competitive advantage that somebody else doesn't copy you once you once your service is popular and that's another challenge of are you building what's called enough of a mode or competitive Advantage around your product that others can just steal your idea steal your UI yeah indeed you talked about how generalists can now do more thanks to Ai and will only increase and that's really cool because I believe I am some somewhat of a generalist I really like to know about everything and I'm even though I'm doing a PhD I'm I'm not completely sure about like being super specialized to one thing and forget the others like I really like to learn everything and that's a very recurrent Topic in in the podcast we it's funny how basically years ago a lot of people said that AI will increase the discrepancy between rich and poor and will just make things even more unfair than what they were and now I believe we I'm not sure if I have all the data and information but I believe we see almost the opposite where in my case at least and in lots of people I know AI actually allows people to do things that they couldn't do before which is quite cool it's pretty much the opposite and then it just democratizes lots of stuff like for example building applications one of my my friend is is currently doing some kind of challenge that she she is trying to learn like to use chat GPT and she does something posting daily for 30 days about Chachi BT and what she does and she's like in human resources and she doesn't know any programming but she's still coded an application that is a to-do list and everything thanks to jgpt without any like python JavaScript any notion of coding and I don't know that's incredible it's it's it's so cool 100 and like I you know predict that we'll start to see not one but many five people companies reach you know billion dollar in valuation pretty soon just aided and augmented by by AI uh definitely a lot of opportunities created but also there are a lot of of challenges and uh that you know we need to be sort of cautious about there's opportunities for misuse and as well as like the need for people to keep learning keep developing their skill sets uh you know use these new technologies in their own workflows to augment them and make their you know what they do better and better you can't just rely on what you learned in college the world just keeps changing very quickly and so the more you you're quick at learning and adapting and uh incorporating these tools in your in what you do the more of the opportunity you will catch and then the your resist the challenges and speaking of these challenges one last challenge that I often struggle with using 10gbt or other models is hallucination and is there any other way than for example using memory retrieval or if that's the only way to solve it but is there any other way to help improve hallucination or just in general make those applications safer and like as open AI says more aligned with what you actually want yeah there's two sides of these questions like obviously you can do that during training as as open AI does but what if you are using an open ai's product or a cohere spreader and you want to make it safer on your end is there anything you can do to help mitigate the modal hallucination even if you are you you do not control the training process so we already mentioned one of the big ones which is like you know actually injecting the correct information so you're not relying on the models parametric yeah that's that's one there are methods that you as an engineer can can build systems around um a lot of them were outlined in Google's Minerva model paper that just you know solves a lot of very complex mathematics using so where we heard about things like Chain of Thought where the model you know you ask it a complex question it shouldn't answer it right away it should output the steps of how it can sort of arrive at that then there are things called there's another method called majority voting where the model is supposed to Output not just one result but maybe 10 results and then those 10 results you you know choose which ones have occurred more than one time and use those as votes that's specifically if you have let's say one specific output that you can have at the end and so that's that's another sort of uh way there's a paper called it around it um and the methods called majority voting close to this is this idea this also a recent idea of three of thought where it's like Chain of Thought but like multiple chains um of our thought and then you can sort of reduce the so that's that's let's say one way of evaluating if the model generates the answer four or five or ten times if it says the same thing over and over again most there's probably a good chance that it's it knows this but if there's variance in what it generates from across these five or ten times that is probably an indication that the model is just being creative and then there are things like temperature and let's say setting the right temperature setting that sort of arguments to a certain degree yeah it's a it's an easy way to to at least mitigate like very random answers but still it also increases the customer since you have to generate multiple times but yeah it's a it's a very easy way to do that one last question that I have very specifically towards large language model is the first one is how are for example check GPT that work with many languages how they are built and trained on many languages because I know there's definitely a difference between GPT that works with almost every language and a model for example Facebook released a model that was trained specifically on French and so it's it's definitely not the same thing so what are the differences with jgbt why how does it work with any language that you that you can type in so multilingual models are just you know done by incorporating other languages in the training data set and in optimizing the model or let's say initializing the the tokenizer which is like a step that comes before the training the model to you know choose how to break down um the words but it's it's really just a factor of the same training process it is language modeling predict the next word except in our data set we have a lot of other languages and that we also use to evaluate the mods how coherent is the model on this uh this language and that language so you use that in your incorporation because if you're serving these models so if you're serving a model like um like cohes command the one that we put out is not the only model that we've trained no you have to train tens or hundreds of models with a lot of different experiments yeah to really find the best performing model and do a lot of complex evaluations so if you if multilingual is one of your focus areas which for us it is it is a you know very much a focus area um there's a lot of this incorporating it in the training data but also in the evaluations um we have a lot of focus on multilingual on the embedding site um you have this embedding model that um supports over 100 languages um with which is like completely sort of geared and focused on on search in in the multilingual settings so you have to pay extra attention when you're building the model to incorporate languages because it was very easy in the beginning to just focus on English and you know not consider that the vast majority of people uh you know also speak other languages and they need them in a day-to-day businesses the usage so it's mainly just trained with even more data and I believe from research maybe you can confirm but just like for humans training on different languages actually improves the results in English as well yes yes there are things that are strange like that kind of like you know training on code also enhances generating text to with like you know Common Sense or like reasoning capabilities so yeah the more you you throw in there of like high quality data that always seems to improve the results yeah it's just like humans when I don't remember exactly what it does to your brain but learning and a music instrument actually helps you understand other stuff better and just like makes you not more intelligent but like it definitely helps you it's not it's not irrelevant to learn art or to learn a music instrument and just with also the different languages that you basically are a different person in when you speak another language that's also super interesting I wonder if that's the case for large language models where they act differently in different languages but yeah I mean the distributions would be different in the different languages and I love how it also this extends to multi-module models once you throw audio in there once you throw images in there how that also could then uh plywood generation yeah really exciting how do you see AI evolving over the next I don't know it's the classic question the next five years but how do you see in the next few years Ai and large English models evolves where do you see we are going like mostly trying to come back to specific applications with retrieval systems or build a neither better General model with less hallucination or anything else what where are we going yeah I've been generation and like embedding approaches I I feel are here to stay um things like yeah semantic search or retrieval or some of these are the things that you can't really even do with generative models there will be a lot of development in the models themselves so in the quality of data that can be presented to them and in the quantity and the types of data so now we're at internet scale text Data where do you go next we talked about multimodality that's another area where the models will improved by being able to look at images or even generate images by looking at audio or getting other sort of modalities um Beyond this there's this idea of embodiment so where the models can interact with environments and learn from those interactions that will also be another sort of source of information and feedback to promote Behavior and then there's this idea of the social interaction how models can you know socially interact with large groups of people not just one person who gives it a prompt and gets a result back um there's social interactions these are three of uh five world uh Scopes that this one paper I've discussed on my YouTube channel sort of displays this this future of where are we going to get more data now that we've you know trained on internet data so that's on the modeling front and how to make the models better definitely new architectures improvements Hardware that will all sort of continue to develop even though right now there's a little bit of a convergence there haven't been any major steps on the modeling side for for a while but there's still a lot of to be done in engineering so rolling these models out Building Systems around the capabilities we currently have there's so much that that can be done there that will you know keep so many people busy for the next two three years but also inventing other ways of using other media formats that are now possible that you can generate images or generic texts or full stories or full podcasts um so yeah the world will be a little different and a lot of people are going to be very creative in terms of what they what they deploy a lot of it's going to come from the engineering side but not only from engineering modeling on your end is there one thing that right now ai cannot do and you you would love it to be able to do is there one thing that comes to your mind yes one thing that I'm it's a little random but I don't think I'm really obsessed about just the nature of learning about intelligence uh in in software and having software do solve problems in intelligent ways makes me very intrigued about other natural intelligences Beyond humans so animal intelligence um the Dolphins uh the octopus the ant colony the Apes there are efforts like there's this project called the seti project c-e-t-i uh for you know trying to throw all the NLP technology that we have on try to decode the language and vocalizations of of whales um to see you know can we start to understand maybe communicate with these other you know forms of of intelligent life around us that we sort of don't have yet ways of communicating to them I'm absolutely passionate about you know this language being able to allow us to to connect better to our more intelligent um uh forms of life around this yeah it's so cool I've always been drawn into how we understand things and just also how a cat sees the world and just all the animals and and living beings it's it's really cool that like it's like neuroscience and all this is a completely well not completely but it's a different field and now lots of people come from Pure software and they become interested in that in in neuroscience and these topics just thanks to language models and how it makes you think of how things understand its it's really cool and I'm excited to see where Ai and like my field well our field can can help the human race to understand other other things it's it's it's really cool I have one last question for you just because I'm it's a topic that I'm particularly interested in it's on your end since you are a blogger and now an even a YouTuber first are you using any AI help when creating educational content well not necessarily llm but maybe AI editing or just generation or asking questions brainstorming are you using any AI power tools to make your writing process better or just creating creative process better play not on let's say a daily basis but yes sometimes for like outlines or idea generation uh that are useful or like some artwork or thumbnails sometimes like mid-journey has been has been useful for for some of these um but uh like everybody I'm just learning how to adapt them into into my workflows toward the investment I didn't see myself use it until very recently and now I've I've seen a particular a particular use case for me just when I'm it's mainly because I'm French but and not a native English speaker but it it's really helpful to help improve your formulation and syntax that's one thing just because it helps me improve but another thing is when you I'm still currently learning lots of new stuff and I still try to explain them while learning and why I see a word that I don't understand or a topic that is that seems a bit blurry even if I have the paper and I I think I understand them asking GPT or any other model is quite useful it actually like reformulates and it can be very useful to quickly get high level understanding of specific topics that that's something I've been using recently and it took me a while to get into that which is weird because we are actually explaining how they work but we don't use it nearly as much but it's now now I see a better use case like the more the more time that we have with them the better we we use them obviously and that's yeah I see it's really promising it's it's really cool but yeah you definitely have to double check the outputs and ensure it's not hallucinating or anything else it is it still requires human inputs but it's really useful and so just as a it's not really a question but the last thing I wanted to mention is there anything you would like to share with the audience are you do you have any project projects other than the large language model University that of course um anyone can go right now through cohere for free learn a lot about Transformers and everything we discussed in this interview and it's a really good resources resource I I definitely recommend it but is there anything else on your end that you are excited to share about or to release soon or work on yeah so aside from the llm University we have the cohere Discord where we answer questions so if you have questions as you go through the llm university join us let us know what you want sort of uh to to learn about we're happy to sort of help you with your learning education and then when you build something we're also welcome that you share it and say you know what problems you faced how you solve them uh so it's a community to learn together and you know we welcome everybody on the uh career Discord that's awesome is there anything coming up uh on your YouTube or or the blog so I've been doing a bunch of shorts I've been digging deeper into these tools that build on top of llms like Lang chain like llama index so I've been doing a few of these of these shorts so that's a little bit of my focus area now but in terms of topics if I can carve some time to talk about uh human feedback and other Chef that's high on my list yeah I'd love to do I'd love to see that so perfect well thank you very much for all the time you you gave us and did the amazing insights it was a really cool discussion to have with you I I've known you for only two years unfortunately I didn't know your blog before that but it's just amazing resources and likewise for the llmu I'm yeah I'm really thankful for for you to you and and your team for building that but also to you personally for the YouTube and the blog it's just really cool that people like you exist so thank you and and thank you for joining the podcast thank you so much that's so kind of you uh you know I'm just a student like any other and we're just learning together thank you so much for having me and uh looking forward to uh yeah interacting it and speaking together in the futureforeign this is an interview with Jay Alamar J has a technical blog as well as a YouTube channel where he explains a lot of technical AI related stuff like his attention and Transformers blog that you definitely have heard before and he has amazing learning resources now he is focused on building llm university with Co here and we will talk about that in this episode but not only we will also dive into the different challenges in building large language model based apps and how to build them I think this episode is perfect for anyone interested into large English models but mainly for people that want to create a cool app using them I hope you enjoy this episode so I'm Jay I worked as a software engineer I've been fascinated with machine learning for so long but I really started working in it just about eight years ago when I started a Blog to document how I'm learning about about machine learning it it seemed to be this power that that that gives software capabilities to do that's quite mind-blowing if you know what the limitations of software are and you see these machine learning demos coming and doing things that really stretch the imagination in terms of what is possible to do with software so that's when I started my blog which a bunch of people have seen use for uh because I I started with introductions to machine learning and just in general how to do how to think about bad propagation and neural networks and then I moved into uh uh more advanced topics covering uh attention and how language processing is done with language generation and then it really exploded when I started sort of talking about Transformer models and birth models and GPT models from the blog I got to work with Udacity creating some of their nanodegrees for educating people on how to use these language models and how to train them these were two things that really launched a little bit of how I work with machine learning in in AI the blog I've seen has been has had upwards of 600 of 6 million page views so far a lot of it uh around Transformers and how they work um and yeah I think maybe tens of thousands of people have went through the the various Udacity programs on machine learning deep learning uh NLP computer vision um and most recently I get to work very closely with these models with cohere as director and Engineering fellow where I continue to learn about the capabilities of these models and explain them to people in terms of how to deploy them in real world applications so how can you build something with them that solves a real problem that you have right now and that includes education it includes talks that includes creating schematics but also like crystallizing a lot of Lessons Learned in in the industry of how to deploy these models and how to build things around them cohere so a lot of people have heard about about Transformers and clear was was built by three co-founders um some of whom were co-authors of the original Transformers paper and they've been building these you know Transformers in the cloud these hosted managed large language models um for about two or maybe two and a half years I've been with the company for for two years and I've seen this this deployment and brought out um and yeah since then the company has trained and deployed you know two families of massive language models that do text generation but also text understanding and we can go deeper into these two uh capabilities of large language models absolutely and you mentioned that you started eight years ago your blog and of course as with most people I discovered due to just the amazing blog that you wrote on Transformer and attention and it really helped me and and lots of people but since you started so long ago what made you start and made you get into the field because eight years ago AI hype wasn't really near what it is right now so how did you discover it and what made you create this this blog about it amazing yes so I had working as a software engineer for a long time sometimes I would come across some demos that to me feel magical uh and a lot of them come from machine learning and so if you it's on YouTube If you Google words lens uh this was a demo that came out in 2010 of an iPhone 4 that you you can point it to let's say a sentence written in Spanish and it will transform that into let's say the English translation it will superimpose it and right now it's you know it's 13 years later now but that still feels like magic especially if you like know about software and how complex dealing with language and images to be able to do that without a server on software that's running on a machine that felt like an alien artifact to me and so seeing something like that I've always had this to-do list in my head of get into machine learning find the bed the first opportunity to get into machine learning and understand uh because it's clearly going to be transformative into to software the moment that really gave me the the jump into it is around 2015 when tensorflow is open sourced it felt like okay this is the time now they're open source code so because you know a lot of these things you have to be very close to a research lab or you work sort of deep inside a company and I at the time I'm like an outsider and now I wasn't in Silicon Valley it wasn't in sort of you know a big Tech Hub I was just a a software person with a laptop with access to the internet and just trying to learn on my own without sort of a group or a company or a research group around me was it let's say academic um and so a lot of it was like self-learning so when tensorflow came out I was like okay I read a paper in 2004 called mapreduce and that launched the Big Data industry so everything sort of around big data is a massive industry and it's felt like okay the the launch of tensorflow is like and everything that started happening in deep learning uh is we're in the beginning of this new wave of of deep learning uh and so yeah I started to take some tutorials but then how do you feel satisfied if you spend three four months learning about something yes you have more information you've developed a little bit of a skill but like I always need an artifact that sort of solidifies that from one month one to one three I have this thing and that's what the blog was certainly it's like okay let me I struggle very hard to understand a concept once I understand it I'm like okay there's a better maybe an easier way for me to have understood it if it was explained this way and that's how I try to sort of let me chew on this once I understand it try to sort of uh hide the complexity the things that made me intimidated when I was like I would learn something and then I'm faced with a wall of four wheelers for example or like a lot of code before getting the intuition that made me feel intimidated and sort of I'm very that those feelings are sort of what guides me to have these gentle approaches to these topics to to the readers where we hide the complexity let's get to the intuition let's get a visual sense of intuition but it happens over a lot of iteration and I'm happy to get into like the how the writing and visualization process uh it develops over time with me yeah I'd love to because mainly I do the same thing with YouTube where I actually started when learning artificial intelligence just to force myself to to study more and learn more but also just like you I wanted some kind of result or output from what I was learning and just to confirm that I actually correctly learned things is so because if you can explain it it it shouldn't mean that you understand it and so I I perfectly understand you and I'm on the same page same page but another way is also to like create something or code something or develop an application or whatever so what made you go the path of trying to to teach what you are learning instead of using or learning something to create something else so because it's always like you as you develop a skill you're not always building it building that skill to build one product that you have you're learning and you're observing and you're seeing what is is popular in the in the market um and then as you develop your skills maybe you in month six you're still not able to you know train a massive model deploy it to solve a problem um and so like I I decoupled launching a product from the learning and acquiring the skill um and that's why writing is is a really great middle ground or artifact of of learning because it's also a gift to people and it comes from also from like gratefulness like feeling grateful to the great explainers that explain things to me in the past so when I would you know really struggle in the beginning to understand you know what neural networks are and I come across a blog post by Andrea carpathy or by Andrew Trask or Chris Ola that explains something visually in a beautiful way or in like 11 lines of code where you can feel that you have it like I feel a lot of gratefulness that this made me closer to a goal that I have uh by by simplified it's what we're trying to do with what we just a lot of my Learning Journey is a sort of echoed by it I'm I'm happy to see sort of that work of education learning growing as a community uh you know teaching each other but also collaborating on on how we sort of learn and um is this is something that I think benefits everybody and I advise everybody to write what they learn a lot of people are stopped by no I'm just a newbie I'm just learning this but a lot of times just you listing the resources that you found useful is valuable on its own let alone sort of how much it will brand you um that you sort of wrote something and the writing process to me helps me learn so much deeper let's say The Illustrated transform of the blog post so there's what maybe 20 visuals on there the visuals you see on the blog post each one is let's say version number six or seven like I iterate over it so much more and learn in the process so I would read the paper I would say okay do I understand it correctly I'll this is my understanding of it and then I would saying I'll read another paragraph and say no the way I said it this sort of content takes with it let me draw it again with this new understanding and then once I know that I'm going to publish it I'm like a part of my brain says wait other people are going to read this how sure am I that this is how it actually is maybe I should actually go and read the code to verify my understanding gear like this depth of Investigation I would not have done it if I just read the paper and said okay I understand it and then there's another thing of like it's a good life hack that I sort of hand out to people so if you're explaining somebody's paper you know the paper has their emails and you're helping them sort of spread their ideas so once you work on it have it in a good shape write something send it to them have some feedback from them that is also sort of another uh really great source of you know feedback and connections that I've had over over uh over the blog and it really helps uh remove some of the blind spots that you sort of cannot see this is especially more important and valuable for people who are again not in Silicon Valley the majority of people are you know somewhere in the world without access to you know the most uh people who are working much closer with with this technology so a lot of us all we have is the internet so how can we sort of uh learn with online communities um that's another thing that we sort of llmu helps to like you know democratize that that knowledge but also here has a Community Driven research arm called cohere for AI that aims to broaden machine learning uh knowledge and research and sort of accepting people from I think that people from over 100 countries right now um so to me that that is a little bit on it because coming into that I was like you know I'm I'm a professional working in a specific profession I'm really excited about this I want to learn this but I'm you know I'm I'm nowhere near any of these uh you know very big companies that do it how do I do it and so that's what I hope um you know the opportunities that people get from sharing creating things sharing what they learn and learning together as a community and it's what we try to do also in let's say the cohere discourse you know right let us know let's let's learn together as a community yeah it's definitely super valuable to share anything you want to share and if you are wrong well in the worst case I I believe nobody will see it just because it's it's it's like not high quality but if people see it you will end up being corrected and we you will just learn even more as well as long as you are not like intentionally spreading misinformation it's possible that you are not completely sure that you fully understand what you are trying to explain but it it can still be just like dealt in your in your in your head and you are still right and you you can share it's like it's like a fear you you have to get over at some point 100 that stops a lot of people a lot of people are like you know I'm not the world's best expert on this and you don't have to be and like you can write that right in you're like and to me that gave me a lot of license and a lot of comfort in writing the things where I say I'm learning this let's learn it together these are my notes this is how I understand it and once I do that I learn deeper and you know people correct it I just update it I just send put an update or you know change the visual or and that's a that's a great way of learning together so you're doing your audience sort of a favor by learning together so and it helps you career so I advise a lot of people to say okay this this will help open doors for your career and you know for possible jobs that you can have in the future by uh by showing that you're passionate about this one topic or later on if you do it long enough that you're an expert absolutely yeah visibility is is super important even when you are learning like by being on YouTube for three years I I've seen a lot of people asking me like what to learn where to start and Etc and if especially when when learning online a lot of people get stuck into like doing one course and another another and they just keep on trying to learn because they just just like myself and most PhD students we always we almost always have the Imposter syndrome and we just like we think we are not the expert and it's people should not trust us or believe in us but it's just like we need to get over that and just and just try at least yeah yeah but like even even if you go to the world's you know best experts on anything the experts usually are are the experts on one very narrow thing um and then they're just learning everything else they're not just yeah so it's uh these are just out limitations as humans and right now with all the experience that you have teaching it um like with your amazing blog as I said on Transformer attention and everything related to large English models and now with the llm University where you also try to as we spoke with Lewis in the the previous episode you also do your best to explain how they work and what you can do with them and I know that this usually requires lots of visual like it's visuals are very helpful to try to teach how complicated things work but I wonder if uh after all this time working with this and trying to explain it you could find a way to explain Transformers and attention relatively Clearly Now with just the audio format would you be able to to explain how it works to for example someone just getting into the field okay yeah we have really good content on on Melody Mio for that and one thing that makes LMU special is that I'm collaborating on it with Incredible people so Luis is one of the best ML explainers and Educators in the world and right now if somebody wants to learn Transformers I don't I really don't refer them to The Illustrated Transformer the title I refer them to uh luis's because uh article on and llmu on Transformers because in the illustrator Transformer that there was a context where I was expecting people to have read the previous article about attention and rnn's and if you're coming in right now maybe you shouldn't You Should Skip learning about rnns and lstms you can just come right into neural networks and then sort of uh transfers and attention and so uh part of what makes llmu special for me is collaborating with Luis on it but also with mior armor who's like one of the best visual explainers of things Muir has a visual intro to a book called the visual intro to deep learning uh that as visual explanations of a lot of the concepts in in in deep learning and uh you know what are the best people who can really take a concept and put up put a visual picture on it so that that collaboration has been sort of a a dream come true for me on so the question on how I've been explaining Transformers to different audiences over the last uh five years and there are different ideas yes depending on who who the audience uh is uh there are so one way is to say right now A lot of people are used to generative models to generative Transformers uh and so that's a that's a good way to to see okay how does a text generation model uh one of these GPT models how does that uh answer a question if you ask and the way it does it is by generating one word at a time uh and so that's how it it runs on on on on on entrance how does it generate more one word at a time we give it the inputs uh the let's say we say uh you know what date is it today yeah it breaks that down into I'll say words the real the real word for this is tokens but let's say it breaks it down into the into the words and it feeds it into the model and the model on the other side of the model comes the the next word that the model sort of uh expects and then that is added back to the input and then the model generates and next and then the next uh this is how it works this is how these text generation models work now if you give them inputs that doesn't answer happens under the the hood that makes them uh do that that's another thing but in the beginning I like to give people a sense of okay when you're dealing with it at inference time this is what it's doing you can then go into the actual components so how does it do that well the input words are translated into numeric representations computers are computers they compute and language models are technically I heard this from uh somebody called Sanchez uh transform or language models are our language calculators so everything has to become numbers and then those numbers through calculations and multiplications become other other language and so that's what happened inside this box which is which is which is the model which was trained we'll get to how training happens uh at the end but now just assume we have this magically trained model you give words it predicts the next word and it gives you something coherent based on uh the statistics of the text that it was trained mechanically how it works is that the input text goes through the various layers of the model and the model has let's say components or blocks this can be six layers or in the original Transformer but like you know some of the large models now are 90 or you know 100 layers and each layer processes the text a little bit outputs numerical representations that are a little bit more processed and that goes to the next layer to the next layer and then by the end you get enough processing that the model is confident that okay the next word is is this so this is another layer of let's say breaking it down um and from here yeah we can take it into different ways you can say how it was trained and then we can also break down these blocks and these layers and talk about their various components so I will have you choose your destiny and sort of steer me which which way would you like us to go next I think I'd rather go for the how the blocks are are made what the blocks are made of and how it works amazing so I give an example of there are two major capabilities uh that are that correspond to the two major components of what's called a Transformer block uh have you seen the film The Shawshank Redemption I haven't it's a very popular film uh but like it's just these two words that commonly used to get the Shashank and Redemption uh so if you tell models Shawshank it will just based on a lot of the data that it was trained on then there aren't a lot of words that appear in the the training data set that usually come after searching so the highest probability word would be Redemption so it is based on what the model has seen in the past yeah and so that is the job of one of the two components uh that's what's called a feed forward neural network uh that's one of the two major components of this Transformer block that just works on on these let's say statistics of so if you only have that component of the Transformer block then you the model can make this prediction if you give it an input text of saying Shawshank it will output to Redemption that's that kind of work but then language is a little a little bit complex and that is not the only mechanism that can make um software generate you need another mechanism which is called attention and attention uh we can think about it as saying okay what if we tell the model this sentence and have try to have it complete the chicken did not cross the road because it no does it refer to the road or to the chicken it's very difficult to say okay to rely on the words that usually traditionally statistically appear after the word it because that will be a meaningless sentence in a lot of cases you need to make the understanding of are we talking about the streets or are we talking about about the chain and that's the goal or the purpose of the second player the attention mechanism how does it do that so it's built in a specific way that we don't need to go into surgery how it's but that's that's its goal and it learns this from a lot of the uh data that it was trained on which we can go into sort of next but these are the two major components of a model is multiple Trend a transform into our model including a GPT model the T in GPT is is transformed uh is multiple Transformer blocks each transferable block is self-attention this attention layer and then feed forward neural networking each one of them has this goal and then once you stack them for a model that is large enough uh and train it on large enough data set you can start to add these models that generate code that can summarize that can write copywriting and you know you can build these new industries of AI writing assistance on top of them yeah that perfectly makes sense I it's a really good explanation and I've struggled for a while even like in I think it's yeah it's two years ago that no three years ago that gpt3 came out it was I don't know why but I think it's it's always the case for new technologies but it was really hard to understand well enough to explain it properly and will allow you you definitely mastered it and I love how to how you separated the different topics and not dive into the details too much I often get stuck into the details because I I like like how attention calculates the the well the attention for each word and Etc and I I really like the details that you didn't even mention and I think it's relatively important to not mention that mention them and as you as you as you've done and yeah I I still need to to learn how to best explain things but it's yeah it's really nice to to see you explain something that I even know now but it's still like teaches me new stuff it's really cool it's uh it's it's it helps to do a lot of iteration and just do it over and over again and explain it to people and then notice that I said this and then their eyes sort of started defocusing a little bit and sort of going back to say okay maybe this was a little bit too too much detail let me delay it you can still mention the details but I love to layer it of like say you get one part of the concept and then you go a little bit deeper into another part and then but you get the full concept first at a high level and then a little bit of a more resolution on another another part that's a little bit of a philosophy that I've seen work over the years and I think just for a regular presentation it's also a good a good format to follow that just even to mention it that like that's the broad overview I will dive into the details later but for now just focus on that like I think just mentioning this made it like more interesting like you like you are a bit lust but you know that it's it's gonna come and so it's like you don't feel lost it's yeah I think it's it's a better way of of explaining for sure even if for example anyone listening that are not teachers or do not have drugs but still need for example they are working and they need to present something or any kind of presentation or just sharing knowledge is it's just really relevant to to learn or improve how you share it now everyone talks about jgpt and so I I would love if you could go over the different Steps From the self-supervised part to the fine tuning to the reinforcement learning with human feedback feedback like how would you explain all those quite complicated steps in simple words yeah so I do intend to at some point write something about human preference training either without a lot with without it there are different sort of methods um so training works one of the things that makes these models work now is that we can have a lot of data that is unlabeled uh and trained the model on so we can just get text get free text from the internet from Wikipedia for example or books or from any data set and we can use that to create training examples in this unsupervised which is now called semi-supervised way of saying okay let's take the uh one page from Wikipedia maybe a page about the film The Matrix for example or or just any any article and say okay that page has uh 10 000 words let's create a few training examples let's take the first three words and present them to the model and have the model try to predict the fourth word that's a training example and then we're gonna then again have another example where we give it the first forwards and have it try to predict the fifth word and so we can you can see that we can just slide this window and create millions or billions of training examples and that's what happens in the beginning this is why they're called language models this is a task in NLP called language modeling now that turned out to be one of the most magical things one of the biggest returns of Investments that maybe the technology ecosystem was like if human technology has ever given us back that with this you can go so far and in ways that sort of are really surprising to the people who are working closely with with this technology that if you do this with large enough models on large enough data the model then we'll be able to retain information World information so you can ask it about people and it will tell you you know who acted in The Matrix and what date and what time and that information starts being being bad it starts to generate very coherent text that sounds correct and it's grammatically correcting how does it do that without us being writing all the grammar rules in it if you train it large enough on multilingual data set it starts being able to do that in all languages in multiple languages so the language modeling is is one of the magical things that sort of are really bringing this this massive sort of explosion in in capability of software and Ai and it's the source of where all of this starts and it's the first step in training these these large language models and it's the one that takes the most compute and the most data so this can take you know months and months to train to take a model in machine learning you take a model and then you can start with a model with I don't know with any number of parameters but they're random in the beginning and the predictions that the model makes are jug because they're they're random but it learns from each training step when we give it the first we'll give it the three words and have it predict the forward its prediction is going to be wrong we'll say no you said this this is the correct answer let's update you so the next time you see this you have a little bit of a bit of a better chance of doing it again this step is what happens billions of or you know millions or billions of times this is the training this is the learning in machine learning um of making a prediction updating the model based on how wrong that prediction was in doing it over and over again so that is the first and most expensive step in creating a base uh Baseline model once that that came out and people started using it you can make it do useful things but you have to do a lot of prompt engineering to have the model uh because you can ask the model a question and say uh how do apples taste and the model based on just what it's seen in in the data that it can ask another question and say how do oranges taste and how do strawberries taste these are all reasonable continuations because you give it a question give you more questions but maybe changing the fruit type but the people from their interactions actually wanted was if I ask you a question give me an answer if I give you a command and tell you to write an article about apples I want you to write an article not to give me more commands about this and so this is what's called preference training and to do that you get these training examples of a question and this answer and or a command and where to say okay write me an article about X and then you have the article about X and then you train the model on this data set and that's how you get those those that behavior um of the model that that follows what people started expecting from the model of you follow my and so that's what what commands are that's what coher's command model is um attuned to do and that's what it struck GPT sort of started doing and how it's improved on on gpt3 in the past so so that uh sort of next step and then you can do you can get a little bit more uh behaviors by having another sort of training step which sometimes can include reinforcement learning by not just doing language modeling on this new data set that you've created and provided but also giving it good examples and bad examples and to say okay make it closer to good examples and and further from from Bad examples as rated by another say reward model but that complexity I think a lot of people don't need to get into like as long as you understand the language modeling objective and then this the idea of preference that gets you most of the understanding that you need then just just focus on how can it be relevant and useful for your own product that you're trying to build uh what kinds of prompts what kinds of pipelines or chains that are useful uh for that and that's you know for the vast majority of people much better than sort of understanding the the betterment equations you know the detailed reinforcement learning steps regarding the different products that you build with those models I I know that you talk a lot about that in the llmu and one thing that is I believe super important and promising other than for example fine tuning and the the common models is to use embeddings and build applications on them like Memorial travel related applications or any other kind of semantic storage classification Etc I wonder first well I have two questions with that and the the first one is what are embeddings and what can you do with them but also um what to you is is more promising behind trying to make the perfect chat GPT model with lots of fine tuning and and like the best comments possible and and human feedback and everything to make it perfect or like use a a model just for embeddings and then to to work with very specific applications like those or they are just very different use cases and both are relevant yeah so there are going to be people who use both there are going to be people who are just gonna be you know using different prompts and sending them to a large model and getting the results back and you see a lot of these on LinkedIn you know these are the top 10 prompts to use there's a class of people that will find that useful but there's another class which I sort of advocate for is how to think of these tools as components that you can build more and more advanced and and let's say uh systems where you're not just consuming this one one service or one model but you're actually building them as a builder yourself yeah um so when I advocate for that and for for you to do that the idea of embeddings is one of the most powerful ones and one of the most really Central ideas that just like how API is a as a word is not only a technical term now it is a business term CEOs have to know what an API is you know the last 10 15 years embeddings I believe is going to be one of those things because it's one of these Central components of how you can uh deal with with large language models and build more and more systems embeddings in short are these numerical representations of of text um they can be of words so things like word Evac were methods to give each word a series of numbers that represent it and capture its meaning but then from words we can also go into text embeddings which is you have a list of numbers that represent an entire text or sentence or email or or book so to speak and so that concept is is very important if you elect to be a builder with llms and you start to sort of generate get a sense of what what embeddings are one of the best things that I advise people to build is something involving semantic search where you get a data set so maybe let's say the Matrix uh film Wikipedia page break it down into sentences embed each sentence and then you can create let's say a simple search engine on on this data set and the search engine works like this you give it a query let's say you know when was The Matrix filmed for example when was it released that text is also embedded so you send that to the to an embedding model kind of like cohes embed endpoint you get the numbers back and then you can do a simple nearest neighbor as such that's also a very simple like two lines of code you can you can get this nearest neighbor and then that will give you the top three or top five sentences that are most close to that to that query the beautiful thing here is that you can regardless of the words that you use the llms capture the meaning so even if you don't use the same words the the model captures the intent um that's why when these models were rolled out like especially the birth model in 2019 like six months later Google rolled it out into Google search and called it one of the biggest leaps forward in the history of search just that addition of that one one model and most likely so cinematic services like has these two capabilities that you can build it so what we just compared is called denser 3 volt which is you have you embed your archive you embed your query and you get the nearest neighbors so that's one major concept that I advise people to build with and the other one is called rear rank um and rewriting is just using an uh an llm to change the order of uh search the results that happened in a step before so you throw your search at your existing search engine uh you get the top 10 results and you throw those at the rerun re-ranker that sort of changes the order and that dramatically improves like if you have an existing such system this dramatically improves the quality of those of those search results and so these two components each has let's say their own endpoint and and uh super high quality models on on the cohere side are maybe the two best ways to start dealing with large language models because then that is the future of generation as well because retrieval augmented generation is absolutely one of the most exciting areas and um you know one of the areas that can help you rely on information that you can serve to the model when you need it you can update that data whenever you need it you can give different users access to different data sets you're not reliant on data stored in the model you want to update them okay let's train the model for another nine months uh and then you know the model can sort of also that increases the model's hallucination so yeah there's a lot of excitement in this area that brings together semantic search and and Generation Um and we think it's it's it's it's highly wanted to to pay attention to it yeah retrieval is definitely as you mentioned a great way to not avoid but limit the hallucination problem because you can almost it doesn't work all the time but you can try to force it to only answer with like the the response and give reference to to what it it responds so like when it's searched in its memory and just finds nearest neighbor neighbor you can ask it to only answer with what it find and also give the source of what it found so that's really powerful compared to chat GPT that will just give you text and hopefully it's true and you don't even know where it comes from so it's definitely safer and also as you said easier to build you don't require to retrain the whole model and everything and can build multiple applications super easily but I'm not that familiar with their re-rank system could you give a bit more details on how it works and how it actually reorders the answers and improves the results sure yeah so so real anchors are these models that let's say you are let's say Google and you're crawling out your Google search three ranker you have your existing system before Transformers you give it a query it gives you 100 results yeah the easiest way to deploy a to power your search with llms with large models is to say okay these 100 results let me take the query and take each one of these results and present them to the model and have the model evaluate how relevant the this result is so the re-ranker is basically a classifier that classifies it has two parts of text it's what's called The Crossing code so you can you give it examples of a query and its answer and it should give the result of one let's say because that's a that's a true but then a query and a document that is not relevant to it and you the training label there is zero so this is not relevant to this so that's how you trade it and once you train it you just plug it into an existing search system the the previous step can can use embeddings or cannot that's that's that's fine but then it gives a relevance score for each of the 100 results and then you just sort by that relevance that becomes this one signal for you for your search that you can either just use and sort by the most relevant or then you can use other signals if you're rolling out actual search systems you want other signals you want let's say I want the more recent ones so assign a signal for recent documents or if you're building search for Google Maps you're like okay give me things that are uh closer to this to this one point so that's another sort of search uh signal or you can just say preference or things so that's how Reliance work and they you know you can Source by Source by relevance directly or you can just use that as one additional signal to a more complex okay much more clear now the easiest way basically to use large language models when you already have a search system or when you have a data set mostly for example anyone in a company or a manager or someone that has an issue or a problem or just in its regular work how do they know that their problem can be helped with llms is there any tricks or or tips to know that like oh now I should use an llm or something embedding or like a product of code here like how can you know that this problem will be helped through an llm yeah yeah that's a great point and the the common wisdom is that you know use the best tool for the job llms are great for some use cases they're not good for everything there are a lot of use cases where somebody wants to use an llm and I would advise them no you should use a regular expression for this or you should use Spacey for this use case or you should use just python string and text matching um the llms are just one additional tool that makes uh adds a level of capability to your to your system that should augment existing things uh that that sort of work with them so that understanding is a little bit important some people will be driven by the hype and you know would want to inject AI somehow they told their investors we will roll out AI in our next product to the list how to do that let's find any way to do it no it really should come from let's say user Pane and what problem you're trying to solve and you can classify two major parts parts there so one is maybe you're improving a specific uh text processing problem right now and you can get better results if you if you try another and so you have to choose what metric that you have that will improve your product or solve your pain and then compare llms with existing strong baselines which you know there are a lot of things that that can be done with with things that are not but then once you see that the llm is providing that that value for you that's when you sort of the progress for llm providers like cohere make it easy in terms of that you don't need to worry about deploying a model models are going out of memory because this model needs to fit on tens of gpus or something just worry about okay you want to do a rank okay send me the query send me the 10 texts I will send you back the ordered list and I will make that better the next time you sent me a request because I'm updating the model or training it's on you and better data every every month or with every version so this is a new type or let's say a providers of of this technology but yeah definitely focus on the problem that models can so improving existing text processing is one that's search system classification but there's this new capability of text generation so AI writing systems were not possible before three years so these are these new categories of so you might be wanting to innovate I will create I don't know the next AI interactive AI games or the next media format or I want to create a world like GTA with all of its radio stations but all of it will be generated by the computers um and uh I'm going to be creating something new and they that's the second category of things experiment it allows for a lot of new applications to be born just because before you well not a long time ago but when AI first started you needed to train your own model and as as you mentioned host it on the cloud or anywhere and like there's a lot of a lot to manage but now thanks to open Ai cohere and other companies you can basically have someone else do that for you but it's still there's there are still some challenges in building those large English model based apps for example if like I have a specific data set in my in my company for instance if it's a a very private company and the data set cannot go outside of the intranet of the company what can you do with that sense cohere and open AI for example it's all outside the the internet so what can you do if you want to build some kind of search based chatbot yeah that's a great question and that's a very common concern and we come across like it's one of the biggest areas that companies in the industry but also specifically Enterprise like large companies companies working in regulated spaces and uh coher actually caters uh to that so there is a solution of bringing the models to your own sort of virtual private Cloud so there's this rollout with AWS sagemaker where the model can be deployed in your own cloud the data does not necessarily go to coherence infrastructure they remain on your own sort of data center but then it's run through the sagemaker endpoint and all of it sort of you know roommates and that's one of the use cases where we see a lot of a lot of demanded sort of cohesive focus on Enterprise uh makes it able to focus on on use cases like this um where it's like say not specifically you know consumer Focus but it's like you know what are the big business problems of building the Next Generation applications and this I like that you highlighted it because this is you know commonly asked for and um you know we'd love to see more people so there's uh you know building with those uh it's great to know that it's possible and and for the the people that maybe have different problems not large companies for example if I don't know someone is learning and wants to build an app what are the main challenges when it comes to to building such like cohere or open AI based apps where you basically use the very powerful models that already exist but want to fine-tune it or like to adapt it to your specific applications either through a data set or or just specific commands but what what are the typical challenges or things that that the people that want to create cool things with those need to to tackle and and just go over yeah so the challenges are don't all have to be technical challenges like everything in the past like remains true in terms of you still need to find product Market fit for your product you need validation from your users you need to really solve a problem um and not do something that is nice nice to have with the generative models specifically like identifying reliable use cases is one thing that a lot of people need some hand-holding on like they come across an amazing demo on Twitter or something but then they don't realize that a lot of the demos are Cherry Picked it's like yeah they had to generate you know 20 generations to get that one let's say uh if you're building it as a product it cannot work just three out of 10 times it needs to work nine out of ten times yeah and so how do you get it to that level of of uh of production that that gap between a proof of concept product of this prompt can work I will take a screenshot of it and I will put it on on Twitter to this is a reliable system behavior that I know I could put in front of my users and it will work always bridging that Gap is is is is one of the challenging things that a lot of people have to have to contend with um and there are solutions and there are playbooks and we sort of write a lot about them and educate about and they include things like yeah using search using embeddings um fine tuning is another one you can the big models allow you to prototype and they do amazing behaviors and once you have the model able to do uh the behaviors that you want using one example in the prompt or you know five examples in the prompt you can then make it cheaper for you and faster by collecting that data set and fine-tuning a smaller model that is able to do that same task as well by the larger model um that saves on you know context size because you're not sending the same you know five few shot examples of every every prompt and so that yeah that is helpful and then another part of that is like we also talked about that with semantic search and getting the relevant bits and injecting them into the prompt you know people think that some people may think that context link will solve all the problems if you have a very large context on Google so and so you will send your I don't know the documentation of your software to the language model with every question that you ask about that argument and you can clearly see that that is wasteful if you were to answer a thousand questions the model has to process the same documentation thousands and thousands of times Well embeddings really is this way of caching that that knowledge and retrieving the um the important bits um so yeah these are a couple of things that experimenting and thinking about reliable behavior is is one of the learning curves that a lot of people have to go through what are skills and material needed to do that if well first can only one person do that if I want to create an app do I need a whole team and do I need a server do I need to to go through a course beforehand like what is the the required skill set and material to get into that is it just impossible for one person or can the person listening right now that has an ID can just start and and learn in the process what's what's accessible and how accessible is it so in in software like in general you need a user interface right so if you're targeting users Howard will they interact with it um or are you creating let's say an API endpoint that other people can just connect to it so there's a bunch of software hurdles that are not necessarily language modeling or prompt engineering or uh so that was a piping of that information and how you your users sort of uh connect with it so if you know python or and and the JavaScript one person can go very very far if they invest in sort of these two things if you only know Python and machine learning or data science you can create a proof of concept you can use something like streamlit um and and sort of create an application that the user interface that you can maybe demo to investors to you know help you uh build the next sort of level of it and more and more you see yeah companies like versel coming along and making that uh front-end to AI Pathways sort of a little bit easier so the language models will continue to make it easier for generalists to do many things um very well we're still in the beginning of that but it's clear that you individual people who are productive will become massively more productive aided by these Technologies and what they can do so yes smaller groups of people will be able to do a lot more and but for now yeah there are these skill sets of how are you going to build the the UI how are you then going to put it in front of users you can roll it out on the App Store you can do it through some some Marketplace can you do that individually do you know that customer segment do you know the really know the pain that you can solve with them um but yeah I mean a lot of people run one or two person um companies where it's like okay charge credit cards use these various Frameworks and then put some some good looking UI on top of it and uh but then the question is do you have enough of a competitive advantage that somebody else doesn't copy you once you once your service is popular and that's another challenge of are you building what's called enough of a mode or competitive Advantage around your product that others can just steal your idea steal your UI yeah indeed you talked about how generalists can now do more thanks to Ai and will only increase and that's really cool because I believe I am some somewhat of a generalist I really like to know about everything and I'm even though I'm doing a PhD I'm I'm not completely sure about like being super specialized to one thing and forget the others like I really like to learn everything and that's a very recurrent Topic in in the podcast we it's funny how basically years ago a lot of people said that AI will increase the discrepancy between rich and poor and will just make things even more unfair than what they were and now I believe we I'm not sure if I have all the data and information but I believe we see almost the opposite where in my case at least and in lots of people I know AI actually allows people to do things that they couldn't do before which is quite cool it's pretty much the opposite and then it just democratizes lots of stuff like for example building applications one of my my friend is is currently doing some kind of challenge that she she is trying to learn like to use chat GPT and she does something posting daily for 30 days about Chachi BT and what she does and she's like in human resources and she doesn't know any programming but she's still coded an application that is a to-do list and everything thanks to jgpt without any like python JavaScript any notion of coding and I don't know that's incredible it's it's it's so cool 100 and like I you know predict that we'll start to see not one but many five people companies reach you know billion dollar in valuation pretty soon just aided and augmented by by AI uh definitely a lot of opportunities created but also there are a lot of of challenges and uh that you know we need to be sort of cautious about there's opportunities for misuse and as well as like the need for people to keep learning keep developing their skill sets uh you know use these new technologies in their own workflows to augment them and make their you know what they do better and better you can't just rely on what you learned in college the world just keeps changing very quickly and so the more you you're quick at learning and adapting and uh incorporating these tools in your in what you do the more of the opportunity you will catch and then the your resist the challenges and speaking of these challenges one last challenge that I often struggle with using 10gbt or other models is hallucination and is there any other way than for example using memory retrieval or if that's the only way to solve it but is there any other way to help improve hallucination or just in general make those applications safer and like as open AI says more aligned with what you actually want yeah there's two sides of these questions like obviously you can do that during training as as open AI does but what if you are using an open ai's product or a cohere spreader and you want to make it safer on your end is there anything you can do to help mitigate the modal hallucination even if you are you you do not control the training process so we already mentioned one of the big ones which is like you know actually injecting the correct information so you're not relying on the models parametric yeah that's that's one there are methods that you as an engineer can can build systems around um a lot of them were outlined in Google's Minerva model paper that just you know solves a lot of very complex mathematics using so where we heard about things like Chain of Thought where the model you know you ask it a complex question it shouldn't answer it right away it should output the steps of how it can sort of arrive at that then there are things called there's another method called majority voting where the model is supposed to Output not just one result but maybe 10 results and then those 10 results you you know choose which ones have occurred more than one time and use those as votes that's specifically if you have let's say one specific output that you can have at the end and so that's that's another sort of uh way there's a paper called it around it um and the methods called majority voting close to this is this idea this also a recent idea of three of thought where it's like Chain of Thought but like multiple chains um of our thought and then you can sort of reduce the so that's that's let's say one way of evaluating if the model generates the answer four or five or ten times if it says the same thing over and over again most there's probably a good chance that it's it knows this but if there's variance in what it generates from across these five or ten times that is probably an indication that the model is just being creative and then there are things like temperature and let's say setting the right temperature setting that sort of arguments to a certain degree yeah it's a it's an easy way to to at least mitigate like very random answers but still it also increases the customer since you have to generate multiple times but yeah it's a it's a very easy way to do that one last question that I have very specifically towards large language model is the first one is how are for example check GPT that work with many languages how they are built and trained on many languages because I know there's definitely a difference between GPT that works with almost every language and a model for example Facebook released a model that was trained specifically on French and so it's it's definitely not the same thing so what are the differences with jgbt why how does it work with any language that you that you can type in so multilingual models are just you know done by incorporating other languages in the training data set and in optimizing the model or let's say initializing the the tokenizer which is like a step that comes before the training the model to you know choose how to break down um the words but it's it's really just a factor of the same training process it is language modeling predict the next word except in our data set we have a lot of other languages and that we also use to evaluate the mods how coherent is the model on this uh this language and that language so you use that in your incorporation because if you're serving these models so if you're serving a model like um like cohes command the one that we put out is not the only model that we've trained no you have to train tens or hundreds of models with a lot of different experiments yeah to really find the best performing model and do a lot of complex evaluations so if you if multilingual is one of your focus areas which for us it is it is a you know very much a focus area um there's a lot of this incorporating it in the training data but also in the evaluations um we have a lot of focus on multilingual on the embedding site um you have this embedding model that um supports over 100 languages um with which is like completely sort of geared and focused on on search in in the multilingual settings so you have to pay extra attention when you're building the model to incorporate languages because it was very easy in the beginning to just focus on English and you know not consider that the vast majority of people uh you know also speak other languages and they need them in a day-to-day businesses the usage so it's mainly just trained with even more data and I believe from research maybe you can confirm but just like for humans training on different languages actually improves the results in English as well yes yes there are things that are strange like that kind of like you know training on code also enhances generating text to with like you know Common Sense or like reasoning capabilities so yeah the more you you throw in there of like high quality data that always seems to improve the results yeah it's just like humans when I don't remember exactly what it does to your brain but learning and a music instrument actually helps you understand other stuff better and just like makes you not more intelligent but like it definitely helps you it's not it's not irrelevant to learn art or to learn a music instrument and just with also the different languages that you basically are a different person in when you speak another language that's also super interesting I wonder if that's the case for large language models where they act differently in different languages but yeah I mean the distributions would be different in the different languages and I love how it also this extends to multi-module models once you throw audio in there once you throw images in there how that also could then uh plywood generation yeah really exciting how do you see AI evolving over the next I don't know it's the classic question the next five years but how do you see in the next few years Ai and large English models evolves where do you see we are going like mostly trying to come back to specific applications with retrieval systems or build a neither better General model with less hallucination or anything else what where are we going yeah I've been generation and like embedding approaches I I feel are here to stay um things like yeah semantic search or retrieval or some of these are the things that you can't really even do with generative models there will be a lot of development in the models themselves so in the quality of data that can be presented to them and in the quantity and the types of data so now we're at internet scale text Data where do you go next we talked about multimodality that's another area where the models will improved by being able to look at images or even generate images by looking at audio or getting other sort of modalities um Beyond this there's this idea of embodiment so where the models can interact with environments and learn from those interactions that will also be another sort of source of information and feedback to promote Behavior and then there's this idea of the social interaction how models can you know socially interact with large groups of people not just one person who gives it a prompt and gets a result back um there's social interactions these are three of uh five world uh Scopes that this one paper I've discussed on my YouTube channel sort of displays this this future of where are we going to get more data now that we've you know trained on internet data so that's on the modeling front and how to make the models better definitely new architectures improvements Hardware that will all sort of continue to develop even though right now there's a little bit of a convergence there haven't been any major steps on the modeling side for for a while but there's still a lot of to be done in engineering so rolling these models out Building Systems around the capabilities we currently have there's so much that that can be done there that will you know keep so many people busy for the next two three years but also inventing other ways of using other media formats that are now possible that you can generate images or generic texts or full stories or full podcasts um so yeah the world will be a little different and a lot of people are going to be very creative in terms of what they what they deploy a lot of it's going to come from the engineering side but not only from engineering modeling on your end is there one thing that right now ai cannot do and you you would love it to be able to do is there one thing that comes to your mind yes one thing that I'm it's a little random but I don't think I'm really obsessed about just the nature of learning about intelligence uh in in software and having software do solve problems in intelligent ways makes me very intrigued about other natural intelligences Beyond humans so animal intelligence um the Dolphins uh the octopus the ant colony the Apes there are efforts like there's this project called the seti project c-e-t-i uh for you know trying to throw all the NLP technology that we have on try to decode the language and vocalizations of of whales um to see you know can we start to understand maybe communicate with these other you know forms of of intelligent life around us that we sort of don't have yet ways of communicating to them I'm absolutely passionate about you know this language being able to allow us to to connect better to our more intelligent um uh forms of life around this yeah it's so cool I've always been drawn into how we understand things and just also how a cat sees the world and just all the animals and and living beings it's it's really cool that like it's like neuroscience and all this is a completely well not completely but it's a different field and now lots of people come from Pure software and they become interested in that in in neuroscience and these topics just thanks to language models and how it makes you think of how things understand its it's really cool and I'm excited to see where Ai and like my field well our field can can help the human race to understand other other things it's it's it's really cool I have one last question for you just because I'm it's a topic that I'm particularly interested in it's on your end since you are a blogger and now an even a YouTuber first are you using any AI help when creating educational content well not necessarily llm but maybe AI editing or just generation or asking questions brainstorming are you using any AI power tools to make your writing process better or just creating creative process better play not on let's say a daily basis but yes sometimes for like outlines or idea generation uh that are useful or like some artwork or thumbnails sometimes like mid-journey has been has been useful for for some of these um but uh like everybody I'm just learning how to adapt them into into my workflows toward the investment I didn't see myself use it until very recently and now I've I've seen a particular a particular use case for me just when I'm it's mainly because I'm French but and not a native English speaker but it it's really helpful to help improve your formulation and syntax that's one thing just because it helps me improve but another thing is when you I'm still currently learning lots of new stuff and I still try to explain them while learning and why I see a word that I don't understand or a topic that is that seems a bit blurry even if I have the paper and I I think I understand them asking GPT or any other model is quite useful it actually like reformulates and it can be very useful to quickly get high level understanding of specific topics that that's something I've been using recently and it took me a while to get into that which is weird because we are actually explaining how they work but we don't use it nearly as much but it's now now I see a better use case like the more the more time that we have with them the better we we use them obviously and that's yeah I see it's really promising it's it's really cool but yeah you definitely have to double check the outputs and ensure it's not hallucinating or anything else it is it still requires human inputs but it's really useful and so just as a it's not really a question but the last thing I wanted to mention is there anything you would like to share with the audience are you do you have any project projects other than the large language model University that of course um anyone can go right now through cohere for free learn a lot about Transformers and everything we discussed in this interview and it's a really good resources resource I I definitely recommend it but is there anything else on your end that you are excited to share about or to release soon or work on yeah so aside from the llm University we have the cohere Discord where we answer questions so if you have questions as you go through the llm university join us let us know what you want sort of uh to to learn about we're happy to sort of help you with your learning education and then when you build something we're also welcome that you share it and say you know what problems you faced how you solve them uh so it's a community to learn together and you know we welcome everybody on the uh career Discord that's awesome is there anything coming up uh on your YouTube or or the blog so I've been doing a bunch of shorts I've been digging deeper into these tools that build on top of llms like Lang chain like llama index so I've been doing a few of these of these shorts so that's a little bit of my focus area now but in terms of topics if I can carve some time to talk about uh human feedback and other Chef that's high on my list yeah I'd love to do I'd love to see that so perfect well thank you very much for all the time you you gave us and did the amazing insights it was a really cool discussion to have with you I I've known you for only two years unfortunately I didn't know your blog before that but it's just amazing resources and likewise for the llmu I'm yeah I'm really thankful for for you to you and and your team for building that but also to you personally for the YouTube and the blog it's just really cool that people like you exist so thank you and and thank you for joining the podcast thank you so much that's so kind of you uh you know I'm just a student like any other and we're just learning together thank you so much for having me and uh looking forward to uh yeah interacting it and speaking together in the futureforeign this is an interview with Jay Alamar J has a technical blog as well as a YouTube channel where he explains a lot of technical AI related stuff like his attention and Transformers blog that you definitely have heard before and he has amazing learning resources now he is focused on building llm university with Co here and we will talk about that in this episode but not only we will also dive into the different challenges in building large language model based apps and how to build them I think this episode is perfect for anyone interested into large English models but mainly for people that want to create a cool app using them I hope you enjoy this episode so I'm Jay I worked as a software engineer I've been fascinated with machine learning for so long but I really started working in it just about eight years ago when I started a Blog to document how I'm learning about about machine learning it it seemed to be this power that that that gives software capabilities to do that's quite mind-blowing if you know what the limitations of software are and you see these machine learning demos coming and doing things that really stretch the imagination in terms of what is possible to do with software so that's when I started my blog which a bunch of people have seen use for uh because I I started with introductions to machine learning and just in general how to do how to think about bad propagation and neural networks and then I moved into uh uh more advanced topics covering uh attention and how language processing is done with language generation and then it really exploded when I started sort of talking about Transformer models and birth models and GPT models from the blog I got to work with Udacity creating some of their nanodegrees for educating people on how to use these language models and how to train them these were two things that really launched a little bit of how I work with machine learning in in AI the blog I've seen has been has had upwards of 600 of 6 million page views so far a lot of it uh around Transformers and how they work um and yeah I think maybe tens of thousands of people have went through the the various Udacity programs on machine learning deep learning uh NLP computer vision um and most recently I get to work very closely with these models with cohere as director and Engineering fellow where I continue to learn about the capabilities of these models and explain them to people in terms of how to deploy them in real world applications so how can you build something with them that solves a real problem that you have right now and that includes education it includes talks that includes creating schematics but also like crystallizing a lot of Lessons Learned in in the industry of how to deploy these models and how to build things around them cohere so a lot of people have heard about about Transformers and clear was was built by three co-founders um some of whom were co-authors of the original Transformers paper and they've been building these you know Transformers in the cloud these hosted managed large language models um for about two or maybe two and a half years I've been with the company for for two years and I've seen this this deployment and brought out um and yeah since then the company has trained and deployed you know two families of massive language models that do text generation but also text understanding and we can go deeper into these two uh capabilities of large language models absolutely and you mentioned that you started eight years ago your blog and of course as with most people I discovered due to just the amazing blog that you wrote on Transformer and attention and it really helped me and and lots of people but since you started so long ago what made you start and made you get into the field because eight years ago AI hype wasn't really near what it is right now so how did you discover it and what made you create this this blog about it amazing yes so I had working as a software engineer for a long time sometimes I would come across some demos that to me feel magical uh and a lot of them come from machine learning and so if you it's on YouTube If you Google words lens uh this was a demo that came out in 2010 of an iPhone 4 that you you can point it to let's say a sentence written in Spanish and it will transform that into let's say the English translation it will superimpose it and right now it's you know it's 13 years later now but that still feels like magic especially if you like know about software and how complex dealing with language and images to be able to do that without a server on software that's running on a machine that felt like an alien artifact to me and so seeing something like that I've always had this to-do list in my head of get into machine learning find the bed the first opportunity to get into machine learning and understand uh because it's clearly going to be transformative into to software the moment that really gave me the the jump into it is around 2015 when tensorflow is open sourced it felt like okay this is the time now they're open source code so because you know a lot of these things you have to be very close to a research lab or you work sort of deep inside a company and I at the time I'm like an outsider and now I wasn't in Silicon Valley it wasn't in sort of you know a big Tech Hub I was just a a software person with a laptop with access to the internet and just trying to learn on my own without sort of a group or a company or a research group around me was it let's say academic um and so a lot of it was like self-learning so when tensorflow came out I was like okay I read a paper in 2004 called mapreduce and that launched the Big Data industry so everything sort of around big data is a massive industry and it's felt like okay the the launch of tensorflow is like and everything that started happening in deep learning uh is we're in the beginning of this new wave of of deep learning uh and so yeah I started to take some tutorials but then how do you feel satisfied if you spend three four months learning about something yes you have more information you've developed a little bit of a skill but like I always need an artifact that sort of solidifies that from one month one to one three I have this thing and that's what the blog was certainly it's like okay let me I struggle very hard to understand a concept once I understand it I'm like okay there's a better maybe an easier way for me to have understood it if it was explained this way and that's how I try to sort of let me chew on this once I understand it try to sort of uh hide the complexity the things that made me intimidated when I was like I would learn something and then I'm faced with a wall of four wheelers for example or like a lot of code before getting the intuition that made me feel intimidated and sort of I'm very that those feelings are sort of what guides me to have these gentle approaches to these topics to to the readers where we hide the complexity let's get to the intuition let's get a visual sense of intuition but it happens over a lot of iteration and I'm happy to get into like the how the writing and visualization process uh it develops over time with me yeah I'd love to because mainly I do the same thing with YouTube where I actually started when learning artificial intelligence just to force myself to to study more and learn more but also just like you I wanted some kind of result or output from what I was learning and just to confirm that I actually correctly learned things is so because if you can explain it it it shouldn't mean that you understand it and so I I perfectly understand you and I'm on the same page same page but another way is also to like create something or code something or develop an application or whatever so what made you go the path of trying to to teach what you are learning instead of using or learning something to create something else so because it's always like you as you develop a skill you're not always building it building that skill to build one product that you have you're learning and you're observing and you're seeing what is is popular in the in the market um and then as you develop your skills maybe you in month six you're still not able to you know train a massive model deploy it to solve a problem um and so like I I decoupled launching a product from the learning and acquiring the skill um and that's why writing is is a really great middle ground or artifact of of learning because it's also a gift to people and it comes from also from like gratefulness like feeling grateful to the great explainers that explain things to me in the past so when I would you know really struggle in the beginning to understand you know what neural networks are and I come across a blog post by Andrea carpathy or by Andrew Trask or Chris Ola that explains something visually in a beautiful way or in like 11 lines of code where you can feel that you have it like I feel a lot of gratefulness that this made me closer to a goal that I have uh by by simplified it's what we're trying to do with what we just a lot of my Learning Journey is a sort of echoed by it I'm I'm happy to see sort of that work of education learning growing as a community uh you know teaching each other but also collaborating on on how we sort of learn and um is this is something that I think benefits everybody and I advise everybody to write what they learn a lot of people are stopped by no I'm just a newbie I'm just learning this but a lot of times just you listing the resources that you found useful is valuable on its own let alone sort of how much it will brand you um that you sort of wrote something and the writing process to me helps me learn so much deeper let's say The Illustrated transform of the blog post so there's what maybe 20 visuals on there the visuals you see on the blog post each one is let's say version number six or seven like I iterate over it so much more and learn in the process so I would read the paper I would say okay do I understand it correctly I'll this is my understanding of it and then I would saying I'll read another paragraph and say no the way I said it this sort of content takes with it let me draw it again with this new understanding and then once I know that I'm going to publish it I'm like a part of my brain says wait other people are going to read this how sure am I that this is how it actually is maybe I should actually go and read the code to verify my understanding gear like this depth of Investigation I would not have done it if I just read the paper and said okay I understand it and then there's another thing of like it's a good life hack that I sort of hand out to people so if you're explaining somebody's paper you know the paper has their emails and you're helping them sort of spread their ideas so once you work on it have it in a good shape write something send it to them have some feedback from them that is also sort of another uh really great source of you know feedback and connections that I've had over over uh over the blog and it really helps uh remove some of the blind spots that you sort of cannot see this is especially more important and valuable for people who are again not in Silicon Valley the majority of people are you know somewhere in the world without access to you know the most uh people who are working much closer with with this technology so a lot of us all we have is the internet so how can we sort of uh learn with online communities um that's another thing that we sort of llmu helps to like you know democratize that that knowledge but also here has a Community Driven research arm called cohere for AI that aims to broaden machine learning uh knowledge and research and sort of accepting people from I think that people from over 100 countries right now um so to me that that is a little bit on it because coming into that I was like you know I'm I'm a professional working in a specific profession I'm really excited about this I want to learn this but I'm you know I'm I'm nowhere near any of these uh you know very big companies that do it how do I do it and so that's what I hope um you know the opportunities that people get from sharing creating things sharing what they learn and learning together as a community and it's what we try to do also in let's say the cohere discourse you know right let us know let's let's learn together as a community yeah it's definitely super valuable to share anything you want to share and if you are wrong well in the worst case I I believe nobody will see it just because it's it's it's like not high quality but if people see it you will end up being corrected and we you will just learn even more as well as long as you are not like intentionally spreading misinformation it's possible that you are not completely sure that you fully understand what you are trying to explain but it it can still be just like dealt in your in your in your head and you are still right and you you can share it's like it's like a fear you you have to get over at some point 100 that stops a lot of people a lot of people are like you know I'm not the world's best expert on this and you don't have to be and like you can write that right in you're like and to me that gave me a lot of license and a lot of comfort in writing the things where I say I'm learning this let's learn it together these are my notes this is how I understand it and once I do that I learn deeper and you know people correct it I just update it I just send put an update or you know change the visual or and that's a that's a great way of learning together so you're doing your audience sort of a favor by learning together so and it helps you career so I advise a lot of people to say okay this this will help open doors for your career and you know for possible jobs that you can have in the future by uh by showing that you're passionate about this one topic or later on if you do it long enough that you're an expert absolutely yeah visibility is is super important even when you are learning like by being on YouTube for three years I I've seen a lot of people asking me like what to learn where to start and Etc and if especially when when learning online a lot of people get stuck into like doing one course and another another and they just keep on trying to learn because they just just like myself and most PhD students we always we almost always have the Imposter syndrome and we just like we think we are not the expert and it's people should not trust us or believe in us but it's just like we need to get over that and just and just try at least yeah yeah but like even even if you go to the world's you know best experts on anything the experts usually are are the experts on one very narrow thing um and then they're just learning everything else they're not just yeah so it's uh these are just out limitations as humans and right now with all the experience that you have teaching it um like with your amazing blog as I said on Transformer attention and everything related to large English models and now with the llm University where you also try to as we spoke with Lewis in the the previous episode you also do your best to explain how they work and what you can do with them and I know that this usually requires lots of visual like it's visuals are very helpful to try to teach how complicated things work but I wonder if uh after all this time working with this and trying to explain it you could find a way to explain Transformers and attention relatively Clearly Now with just the audio format would you be able to to explain how it works to for example someone just getting into the field okay yeah we have really good content on on Melody Mio for that and one thing that makes LMU special is that I'm collaborating on it with Incredible people so Luis is one of the best ML explainers and Educators in the world and right now if somebody wants to learn Transformers I don't I really don't refer them to The Illustrated Transformer the title I refer them to uh luis's because uh article on and llmu on Transformers because in the illustrator Transformer that there was a context where I was expecting people to have read the previous article about attention and rnn's and if you're coming in right now maybe you shouldn't You Should Skip learning about rnns and lstms you can just come right into neural networks and then sort of uh transfers and attention and so uh part of what makes llmu special for me is collaborating with Luis on it but also with mior armor who's like one of the best visual explainers of things Muir has a visual intro to a book called the visual intro to deep learning uh that as visual explanations of a lot of the concepts in in in deep learning and uh you know what are the best people who can really take a concept and put up put a visual picture on it so that that collaboration has been sort of a a dream come true for me on so the question on how I've been explaining Transformers to different audiences over the last uh five years and there are different ideas yes depending on who who the audience uh is uh there are so one way is to say right now A lot of people are used to generative models to generative Transformers uh and so that's a that's a good way to to see okay how does a text generation model uh one of these GPT models how does that uh answer a question if you ask and the way it does it is by generating one word at a time uh and so that's how it it runs on on on on on entrance how does it generate more one word at a time we give it the inputs uh the let's say we say uh you know what date is it today yeah it breaks that down into I'll say words the real the real word for this is tokens but let's say it breaks it down into the into the words and it feeds it into the model and the model on the other side of the model comes the the next word that the model sort of uh expects and then that is added back to the input and then the model generates and next and then the next uh this is how it works this is how these text generation models work now if you give them inputs that doesn't answer happens under the the hood that makes them uh do that that's another thing but in the beginning I like to give people a sense of okay when you're dealing with it at inference time this is what it's doing you can then go into the actual components so how does it do that well the input words are translated into numeric representations computers are computers they compute and language models are technically I heard this from uh somebody called Sanchez uh transform or language models are our language calculators so everything has to become numbers and then those numbers through calculations and multiplications become other other language and so that's what happened inside this box which is which is which is the model which was trained we'll get to how training happens uh at the end but now just assume we have this magically trained model you give words it predicts the next word and it gives you something coherent based on uh the statistics of the text that it was trained mechanically how it works is that the input text goes through the various layers of the model and the model has let's say components or blocks this can be six layers or in the original Transformer but like you know some of the large models now are 90 or you know 100 layers and each layer processes the text a little bit outputs numerical representations that are a little bit more processed and that goes to the next layer to the next layer and then by the end you get enough processing that the model is confident that okay the next word is is this so this is another layer of let's say breaking it down um and from here yeah we can take it into different ways you can say how it was trained and then we can also break down these blocks and these layers and talk about their various components so I will have you choose your destiny and sort of steer me which which way would you like us to go next I think I'd rather go for the how the blocks are are made what the blocks are made of and how it works amazing so I give an example of there are two major capabilities uh that are that correspond to the two major components of what's called a Transformer block uh have you seen the film The Shawshank Redemption I haven't it's a very popular film uh but like it's just these two words that commonly used to get the Shashank and Redemption uh so if you tell models Shawshank it will just based on a lot of the data that it was trained on then there aren't a lot of words that appear in the the training data set that usually come after searching so the highest probability word would be Redemption so it is based on what the model has seen in the past yeah and so that is the job of one of the two components uh that's what's called a feed forward neural network uh that's one of the two major components of this Transformer block that just works on on these let's say statistics of so if you only have that component of the Transformer block then you the model can make this prediction if you give it an input text of saying Shawshank it will output to Redemption that's that kind of work but then language is a little a little bit complex and that is not the only mechanism that can make um software generate you need another mechanism which is called attention and attention uh we can think about it as saying okay what if we tell the model this sentence and have try to have it complete the chicken did not cross the road because it no does it refer to the road or to the chicken it's very difficult to say okay to rely on the words that usually traditionally statistically appear after the word it because that will be a meaningless sentence in a lot of cases you need to make the understanding of are we talking about the streets or are we talking about about the chain and that's the goal or the purpose of the second player the attention mechanism how does it do that so it's built in a specific way that we don't need to go into surgery how it's but that's that's its goal and it learns this from a lot of the uh data that it was trained on which we can go into sort of next but these are the two major components of a model is multiple Trend a transform into our model including a GPT model the T in GPT is is transformed uh is multiple Transformer blocks each transferable block is self-attention this attention layer and then feed forward neural networking each one of them has this goal and then once you stack them for a model that is large enough uh and train it on large enough data set you can start to add these models that generate code that can summarize that can write copywriting and you know you can build these new industries of AI writing assistance on top of them yeah that perfectly makes sense I it's a really good explanation and I've struggled for a while even like in I think it's yeah it's two years ago that no three years ago that gpt3 came out it was I don't know why but I think it's it's always the case for new technologies but it was really hard to understand well enough to explain it properly and will allow you you definitely mastered it and I love how to how you separated the different topics and not dive into the details too much I often get stuck into the details because I I like like how attention calculates the the well the attention for each word and Etc and I I really like the details that you didn't even mention and I think it's relatively important to not mention that mention them and as you as you as you've done and yeah I I still need to to learn how to best explain things but it's yeah it's really nice to to see you explain something that I even know now but it's still like teaches me new stuff it's really cool it's uh it's it's it helps to do a lot of iteration and just do it over and over again and explain it to people and then notice that I said this and then their eyes sort of started defocusing a little bit and sort of going back to say okay maybe this was a little bit too too much detail let me delay it you can still mention the details but I love to layer it of like say you get one part of the concept and then you go a little bit deeper into another part and then but you get the full concept first at a high level and then a little bit of a more resolution on another another part that's a little bit of a philosophy that I've seen work over the years and I think just for a regular presentation it's also a good a good format to follow that just even to mention it that like that's the broad overview I will dive into the details later but for now just focus on that like I think just mentioning this made it like more interesting like you like you are a bit lust but you know that it's it's gonna come and so it's like you don't feel lost it's yeah I think it's it's a better way of of explaining for sure even if for example anyone listening that are not teachers or do not have drugs but still need for example they are working and they need to present something or any kind of presentation or just sharing knowledge is it's just really relevant to to learn or improve how you share it now everyone talks about jgpt and so I I would love if you could go over the different Steps From the self-supervised part to the fine tuning to the reinforcement learning with human feedback feedback like how would you explain all those quite complicated steps in simple words yeah so I do intend to at some point write something about human preference training either without a lot with without it there are different sort of methods um so training works one of the things that makes these models work now is that we can have a lot of data that is unlabeled uh and trained the model on so we can just get text get free text from the internet from Wikipedia for example or books or from any data set and we can use that to create training examples in this unsupervised which is now called semi-supervised way of saying okay let's take the uh one page from Wikipedia maybe a page about the film The Matrix for example or or just any any article and say okay that page has uh 10 000 words let's create a few training examples let's take the first three words and present them to the model and have the model try to predict the fourth word that's a training example and then we're gonna then again have another example where we give it the first forwards and have it try to predict the fifth word and so we can you can see that we can just slide this window and create millions or billions of training examples and that's what happens in the beginning this is why they're called language models this is a task in NLP called language modeling now that turned out to be one of the most magical things one of the biggest returns of Investments that maybe the technology ecosystem was like if human technology has ever given us back that with this you can go so far and in ways that sort of are really surprising to the people who are working closely with with this technology that if you do this with large enough models on large enough data the model then we'll be able to retain information World information so you can ask it about people and it will tell you you know who acted in The Matrix and what date and what time and that information starts being being bad it starts to generate very coherent text that sounds correct and it's grammatically correcting how does it do that without us being writing all the grammar rules in it if you train it large enough on multilingual data set it starts being able to do that in all languages in multiple languages so the language modeling is is one of the magical things that sort of are really bringing this this massive sort of explosion in in capability of software and Ai and it's the source of where all of this starts and it's the first step in training these these large language models and it's the one that takes the most compute and the most data so this can take you know months and months to train to take a model in machine learning you take a model and then you can start with a model with I don't know with any number of parameters but they're random in the beginning and the predictions that the model makes are jug because they're they're random but it learns from each training step when we give it the first we'll give it the three words and have it predict the forward its prediction is going to be wrong we'll say no you said this this is the correct answer let's update you so the next time you see this you have a little bit of a bit of a better chance of doing it again this step is what happens billions of or you know millions or billions of times this is the training this is the learning in machine learning um of making a prediction updating the model based on how wrong that prediction was in doing it over and over again so that is the first and most expensive step in creating a base uh Baseline model once that that came out and people started using it you can make it do useful things but you have to do a lot of prompt engineering to have the model uh because you can ask the model a question and say uh how do apples taste and the model based on just what it's seen in in the data that it can ask another question and say how do oranges taste and how do strawberries taste these are all reasonable continuations because you give it a question give you more questions but maybe changing the fruit type but the people from their interactions actually wanted was if I ask you a question give me an answer if I give you a command and tell you to write an article about apples I want you to write an article not to give me more commands about this and so this is what's called preference training and to do that you get these training examples of a question and this answer and or a command and where to say okay write me an article about X and then you have the article about X and then you train the model on this data set and that's how you get those those that behavior um of the model that that follows what people started expecting from the model of you follow my and so that's what what commands are that's what coher's command model is um attuned to do and that's what it struck GPT sort of started doing and how it's improved on on gpt3 in the past so so that uh sort of next step and then you can do you can get a little bit more uh behaviors by having another sort of training step which sometimes can include reinforcement learning by not just doing language modeling on this new data set that you've created and provided but also giving it good examples and bad examples and to say okay make it closer to good examples and and further from from Bad examples as rated by another say reward model but that complexity I think a lot of people don't need to get into like as long as you understand the language modeling objective and then this the idea of preference that gets you most of the understanding that you need then just just focus on how can it be relevant and useful for your own product that you're trying to build uh what kinds of prompts what kinds of pipelines or chains that are useful uh for that and that's you know for the vast majority of people much better than sort of understanding the the betterment equations you know the detailed reinforcement learning steps regarding the different products that you build with those models I I know that you talk a lot about that in the llmu and one thing that is I believe super important and promising other than for example fine tuning and the the common models is to use embeddings and build applications on them like Memorial travel related applications or any other kind of semantic storage classification Etc I wonder first well I have two questions with that and the the first one is what are embeddings and what can you do with them but also um what to you is is more promising behind trying to make the perfect chat GPT model with lots of fine tuning and and like the best comments possible and and human feedback and everything to make it perfect or like use a a model just for embeddings and then to to work with very specific applications like those or they are just very different use cases and both are relevant yeah so there are going to be people who use both there are going to be people who are just gonna be you know using different prompts and sending them to a large model and getting the results back and you see a lot of these on LinkedIn you know these are the top 10 prompts to use there's a class of people that will find that useful but there's another class which I sort of advocate for is how to think of these tools as components that you can build more and more advanced and and let's say uh systems where you're not just consuming this one one service or one model but you're actually building them as a builder yourself yeah um so when I advocate for that and for for you to do that the idea of embeddings is one of the most powerful ones and one of the most really Central ideas that just like how API is a as a word is not only a technical term now it is a business term CEOs have to know what an API is you know the last 10 15 years embeddings I believe is going to be one of those things because it's one of these Central components of how you can uh deal with with large language models and build more and more systems embeddings in short are these numerical representations of of text um they can be of words so things like word Evac were methods to give each word a series of numbers that represent it and capture its meaning but then from words we can also go into text embeddings which is you have a list of numbers that represent an entire text or sentence or email or or book so to speak and so that concept is is very important if you elect to be a builder with llms and you start to sort of generate get a sense of what what embeddings are one of the best things that I advise people to build is something involving semantic search where you get a data set so maybe let's say the Matrix uh film Wikipedia page break it down into sentences embed each sentence and then you can create let's say a simple search engine on on this data set and the search engine works like this you give it a query let's say you know when was The Matrix filmed for example when was it released that text is also embedded so you send that to the to an embedding model kind of like cohes embed endpoint you get the numbers back and then you can do a simple nearest neighbor as such that's also a very simple like two lines of code you can you can get this nearest neighbor and then that will give you the top three or top five sentences that are most close to that to that query the beautiful thing here is that you can regardless of the words that you use the llms capture the meaning so even if you don't use the same words the the model captures the intent um that's why when these models were rolled out like especially the birth model in 2019 like six months later Google rolled it out into Google search and called it one of the biggest leaps forward in the history of search just that addition of that one one model and most likely so cinematic services like has these two capabilities that you can build it so what we just compared is called denser 3 volt which is you have you embed your archive you embed your query and you get the nearest neighbors so that's one major concept that I advise people to build with and the other one is called rear rank um and rewriting is just using an uh an llm to change the order of uh search the results that happened in a step before so you throw your search at your existing search engine uh you get the top 10 results and you throw those at the rerun re-ranker that sort of changes the order and that dramatically improves like if you have an existing such system this dramatically improves the quality of those of those search results and so these two components each has let's say their own endpoint and and uh super high quality models on on the cohere side are maybe the two best ways to start dealing with large language models because then that is the future of generation as well because retrieval augmented generation is absolutely one of the most exciting areas and um you know one of the areas that can help you rely on information that you can serve to the model when you need it you can update that data whenever you need it you can give different users access to different data sets you're not reliant on data stored in the model you want to update them okay let's train the model for another nine months uh and then you know the model can sort of also that increases the model's hallucination so yeah there's a lot of excitement in this area that brings together semantic search and and Generation Um and we think it's it's it's it's highly wanted to to pay attention to it yeah retrieval is definitely as you mentioned a great way to not avoid but limit the hallucination problem because you can almost it doesn't work all the time but you can try to force it to only answer with like the the response and give reference to to what it it responds so like when it's searched in its memory and just finds nearest neighbor neighbor you can ask it to only answer with what it find and also give the source of what it found so that's really powerful compared to chat GPT that will just give you text and hopefully it's true and you don't even know where it comes from so it's definitely safer and also as you said easier to build you don't require to retrain the whole model and everything and can build multiple applications super easily but I'm not that familiar with their re-rank system could you give a bit more details on how it works and how it actually reorders the answers and improves the results sure yeah so so real anchors are these models that let's say you are let's say Google and you're crawling out your Google search three ranker you have your existing system before Transformers you give it a query it gives you 100 results yeah the easiest way to deploy a to power your search with llms with large models is to say okay these 100 results let me take the query and take each one of these results and present them to the model and have the model evaluate how relevant the this result is so the re-ranker is basically a classifier that classifies it has two parts of text it's what's called The Crossing code so you can you give it examples of a query and its answer and it should give the result of one let's say because that's a that's a true but then a query and a document that is not relevant to it and you the training label there is zero so this is not relevant to this so that's how you trade it and once you train it you just plug it into an existing search system the the previous step can can use embeddings or cannot that's that's that's fine but then it gives a relevance score for each of the 100 results and then you just sort by that relevance that becomes this one signal for you for your search that you can either just use and sort by the most relevant or then you can use other signals if you're rolling out actual search systems you want other signals you want let's say I want the more recent ones so assign a signal for recent documents or if you're building search for Google Maps you're like okay give me things that are uh closer to this to this one point so that's another sort of search uh signal or you can just say preference or things so that's how Reliance work and they you know you can Source by Source by relevance directly or you can just use that as one additional signal to a more complex okay much more clear now the easiest way basically to use large language models when you already have a search system or when you have a data set mostly for example anyone in a company or a manager or someone that has an issue or a problem or just in its regular work how do they know that their problem can be helped with llms is there any tricks or or tips to know that like oh now I should use an llm or something embedding or like a product of code here like how can you know that this problem will be helped through an llm yeah yeah that's a great point and the the common wisdom is that you know use the best tool for the job llms are great for some use cases they're not good for everything there are a lot of use cases where somebody wants to use an llm and I would advise them no you should use a regular expression for this or you should use Spacey for this use case or you should use just python string and text matching um the llms are just one additional tool that makes uh adds a level of capability to your to your system that should augment existing things uh that that sort of work with them so that understanding is a little bit important some people will be driven by the hype and you know would want to inject AI somehow they told their investors we will roll out AI in our next product to the list how to do that let's find any way to do it no it really should come from let's say user Pane and what problem you're trying to solve and you can classify two major parts parts there so one is maybe you're improving a specific uh text processing problem right now and you can get better results if you if you try another and so you have to choose what metric that you have that will improve your product or solve your pain and then compare llms with existing strong baselines which you know there are a lot of things that that can be done with with things that are not but then once you see that the llm is providing that that value for you that's when you sort of the progress for llm providers like cohere make it easy in terms of that you don't need to worry about deploying a model models are going out of memory because this model needs to fit on tens of gpus or something just worry about okay you want to do a rank okay send me the query send me the 10 texts I will send you back the ordered list and I will make that better the next time you sent me a request because I'm updating the model or training it's on you and better data every every month or with every version so this is a new type or let's say a providers of of this technology but yeah definitely focus on the problem that models can so improving existing text processing is one that's search system classification but there's this new capability of text generation so AI writing systems were not possible before three years so these are these new categories of so you might be wanting to innovate I will create I don't know the next AI interactive AI games or the next media format or I want to create a world like GTA with all of its radio stations but all of it will be generated by the computers um and uh I'm going to be creating something new and they that's the second category of things experiment it allows for a lot of new applications to be born just because before you well not a long time ago but when AI first started you needed to train your own model and as as you mentioned host it on the cloud or anywhere and like there's a lot of a lot to manage but now thanks to open Ai cohere and other companies you can basically have someone else do that for you but it's still there's there are still some challenges in building those large English model based apps for example if like I have a specific data set in my in my company for instance if it's a a very private company and the data set cannot go outside of the intranet of the company what can you do with that sense cohere and open AI for example it's all outside the the internet so what can you do if you want to build some kind of search based chatbot yeah that's a great question and that's a very common concern and we come across like it's one of the biggest areas that companies in the industry but also specifically Enterprise like large companies companies working in regulated spaces and uh coher actually caters uh to that so there is a solution of bringing the models to your own sort of virtual private Cloud so there's this rollout with AWS sagemaker where the model can be deployed in your own cloud the data does not necessarily go to coherence infrastructure they remain on your own sort of data center but then it's run through the sagemaker endpoint and all of it sort of you know roommates and that's one of the use cases where we see a lot of a lot of demanded sort of cohesive focus on Enterprise uh makes it able to focus on on use cases like this um where it's like say not specifically you know consumer Focus but it's like you know what are the big business problems of building the Next Generation applications and this I like that you highlighted it because this is you know commonly asked for and um you know we'd love to see more people so there's uh you know building with those uh it's great to know that it's possible and and for the the people that maybe have different problems not large companies for example if I don't know someone is learning and wants to build an app what are the main challenges when it comes to to building such like cohere or open AI based apps where you basically use the very powerful models that already exist but want to fine-tune it or like to adapt it to your specific applications either through a data set or or just specific commands but what what are the typical challenges or things that that the people that want to create cool things with those need to to tackle and and just go over yeah so the challenges are don't all have to be technical challenges like everything in the past like remains true in terms of you still need to find product Market fit for your product you need validation from your users you need to really solve a problem um and not do something that is nice nice to have with the generative models specifically like identifying reliable use cases is one thing that a lot of people need some hand-holding on like they come across an amazing demo on Twitter or something but then they don't realize that a lot of the demos are Cherry Picked it's like yeah they had to generate you know 20 generations to get that one let's say uh if you're building it as a product it cannot work just three out of 10 times it needs to work nine out of ten times yeah and so how do you get it to that level of of uh of production that that gap between a proof of concept product of this prompt can work I will take a screenshot of it and I will put it on on Twitter to this is a reliable system behavior that I know I could put in front of my users and it will work always bridging that Gap is is is is one of the challenging things that a lot of people have to have to contend with um and there are solutions and there are playbooks and we sort of write a lot about them and educate about and they include things like yeah using search using embeddings um fine tuning is another one you can the big models allow you to prototype and they do amazing behaviors and once you have the model able to do uh the behaviors that you want using one example in the prompt or you know five examples in the prompt you can then make it cheaper for you and faster by collecting that data set and fine-tuning a smaller model that is able to do that same task as well by the larger model um that saves on you know context size because you're not sending the same you know five few shot examples of every every prompt and so that yeah that is helpful and then another part of that is like we also talked about that with semantic search and getting the relevant bits and injecting them into the prompt you know people think that some people may think that context link will solve all the problems if you have a very large context on Google so and so you will send your I don't know the documentation of your software to the language model with every question that you ask about that argument and you can clearly see that that is wasteful if you were to answer a thousand questions the model has to process the same documentation thousands and thousands of times Well embeddings really is this way of caching that that knowledge and retrieving the um the important bits um so yeah these are a couple of things that experimenting and thinking about reliable behavior is is one of the learning curves that a lot of people have to go through what are skills and material needed to do that if well first can only one person do that if I want to create an app do I need a whole team and do I need a server do I need to to go through a course beforehand like what is the the required skill set and material to get into that is it just impossible for one person or can the person listening right now that has an ID can just start and and learn in the process what's what's accessible and how accessible is it so in in software like in general you need a user interface right so if you're targeting users Howard will they interact with it um or are you creating let's say an API endpoint that other people can just connect to it so there's a bunch of software hurdles that are not necessarily language modeling or prompt engineering or uh so that was a piping of that information and how you your users sort of uh connect with it so if you know python or and and the JavaScript one person can go very very far if they invest in sort of these two things if you only know Python and machine learning or data science you can create a proof of concept you can use something like streamlit um and and sort of create an application that the user interface that you can maybe demo to investors to you know help you uh build the next sort of level of it and more and more you see yeah companies like versel coming along and making that uh front-end to AI Pathways sort of a little bit easier so the language models will continue to make it easier for generalists to do many things um very well we're still in the beginning of that but it's clear that you individual people who are productive will become massively more productive aided by these Technologies and what they can do so yes smaller groups of people will be able to do a lot more and but for now yeah there are these skill sets of how are you going to build the the UI how are you then going to put it in front of users you can roll it out on the App Store you can do it through some some Marketplace can you do that individually do you know that customer segment do you know the really know the pain that you can solve with them um but yeah I mean a lot of people run one or two person um companies where it's like okay charge credit cards use these various Frameworks and then put some some good looking UI on top of it and uh but then the question is do you have enough of a competitive advantage that somebody else doesn't copy you once you once your service is popular and that's another challenge of are you building what's called enough of a mode or competitive Advantage around your product that others can just steal your idea steal your UI yeah indeed you talked about how generalists can now do more thanks to Ai and will only increase and that's really cool because I believe I am some somewhat of a generalist I really like to know about everything and I'm even though I'm doing a PhD I'm I'm not completely sure about like being super specialized to one thing and forget the others like I really like to learn everything and that's a very recurrent Topic in in the podcast we it's funny how basically years ago a lot of people said that AI will increase the discrepancy between rich and poor and will just make things even more unfair than what they were and now I believe we I'm not sure if I have all the data and information but I believe we see almost the opposite where in my case at least and in lots of people I know AI actually allows people to do things that they couldn't do before which is quite cool it's pretty much the opposite and then it just democratizes lots of stuff like for example building applications one of my my friend is is currently doing some kind of challenge that she she is trying to learn like to use chat GPT and she does something posting daily for 30 days about Chachi BT and what she does and she's like in human resources and she doesn't know any programming but she's still coded an application that is a to-do list and everything thanks to jgpt without any like python JavaScript any notion of coding and I don't know that's incredible it's it's it's so cool 100 and like I you know predict that we'll start to see not one but many five people companies reach you know billion dollar in valuation pretty soon just aided and augmented by by AI uh definitely a lot of opportunities created but also there are a lot of of challenges and uh that you know we need to be sort of cautious about there's opportunities for misuse and as well as like the need for people to keep learning keep developing their skill sets uh you know use these new technologies in their own workflows to augment them and make their you know what they do better and better you can't just rely on what you learned in college the world just keeps changing very quickly and so the more you you're quick at learning and adapting and uh incorporating these tools in your in what you do the more of the opportunity you will catch and then the your resist the challenges and speaking of these challenges one last challenge that I often struggle with using 10gbt or other models is hallucination and is there any other way than for example using memory retrieval or if that's the only way to solve it but is there any other way to help improve hallucination or just in general make those applications safer and like as open AI says more aligned with what you actually want yeah there's two sides of these questions like obviously you can do that during training as as open AI does but what if you are using an open ai's product or a cohere spreader and you want to make it safer on your end is there anything you can do to help mitigate the modal hallucination even if you are you you do not control the training process so we already mentioned one of the big ones which is like you know actually injecting the correct information so you're not relying on the models parametric yeah that's that's one there are methods that you as an engineer can can build systems around um a lot of them were outlined in Google's Minerva model paper that just you know solves a lot of very complex mathematics using so where we heard about things like Chain of Thought where the model you know you ask it a complex question it shouldn't answer it right away it should output the steps of how it can sort of arrive at that then there are things called there's another method called majority voting where the model is supposed to Output not just one result but maybe 10 results and then those 10 results you you know choose which ones have occurred more than one time and use those as votes that's specifically if you have let's say one specific output that you can have at the end and so that's that's another sort of uh way there's a paper called it around it um and the methods called majority voting close to this is this idea this also a recent idea of three of thought where it's like Chain of Thought but like multiple chains um of our thought and then you can sort of reduce the so that's that's let's say one way of evaluating if the model generates the answer four or five or ten times if it says the same thing over and over again most there's probably a good chance that it's it knows this but if there's variance in what it generates from across these five or ten times that is probably an indication that the model is just being creative and then there are things like temperature and let's say setting the right temperature setting that sort of arguments to a certain degree yeah it's a it's an easy way to to at least mitigate like very random answers but still it also increases the customer since you have to generate multiple times but yeah it's a it's a very easy way to do that one last question that I have very specifically towards large language model is the first one is how are for example check GPT that work with many languages how they are built and trained on many languages because I know there's definitely a difference between GPT that works with almost every language and a model for example Facebook released a model that was trained specifically on French and so it's it's definitely not the same thing so what are the differences with jgbt why how does it work with any language that you that you can type in so multilingual models are just you know done by incorporating other languages in the training data set and in optimizing the model or let's say initializing the the tokenizer which is like a step that comes before the training the model to you know choose how to break down um the words but it's it's really just a factor of the same training process it is language modeling predict the next word except in our data set we have a lot of other languages and that we also use to evaluate the mods how coherent is the model on this uh this language and that language so you use that in your incorporation because if you're serving these models so if you're serving a model like um like cohes command the one that we put out is not the only model that we've trained no you have to train tens or hundreds of models with a lot of different experiments yeah to really find the best performing model and do a lot of complex evaluations so if you if multilingual is one of your focus areas which for us it is it is a you know very much a focus area um there's a lot of this incorporating it in the training data but also in the evaluations um we have a lot of focus on multilingual on the embedding site um you have this embedding model that um supports over 100 languages um with which is like completely sort of geared and focused on on search in in the multilingual settings so you have to pay extra attention when you're building the model to incorporate languages because it was very easy in the beginning to just focus on English and you know not consider that the vast majority of people uh you know also speak other languages and they need them in a day-to-day businesses the usage so it's mainly just trained with even more data and I believe from research maybe you can confirm but just like for humans training on different languages actually improves the results in English as well yes yes there are things that are strange like that kind of like you know training on code also enhances generating text to with like you know Common Sense or like reasoning capabilities so yeah the more you you throw in there of like high quality data that always seems to improve the results yeah it's just like humans when I don't remember exactly what it does to your brain but learning and a music instrument actually helps you understand other stuff better and just like makes you not more intelligent but like it definitely helps you it's not it's not irrelevant to learn art or to learn a music instrument and just with also the different languages that you basically are a different person in when you speak another language that's also super interesting I wonder if that's the case for large language models where they act differently in different languages but yeah I mean the distributions would be different in the different languages and I love how it also this extends to multi-module models once you throw audio in there once you throw images in there how that also could then uh plywood generation yeah really exciting how do you see AI evolving over the next I don't know it's the classic question the next five years but how do you see in the next few years Ai and large English models evolves where do you see we are going like mostly trying to come back to specific applications with retrieval systems or build a neither better General model with less hallucination or anything else what where are we going yeah I've been generation and like embedding approaches I I feel are here to stay um things like yeah semantic search or retrieval or some of these are the things that you can't really even do with generative models there will be a lot of development in the models themselves so in the quality of data that can be presented to them and in the quantity and the types of data so now we're at internet scale text Data where do you go next we talked about multimodality that's another area where the models will improved by being able to look at images or even generate images by looking at audio or getting other sort of modalities um Beyond this there's this idea of embodiment so where the models can interact with environments and learn from those interactions that will also be another sort of source of information and feedback to promote Behavior and then there's this idea of the social interaction how models can you know socially interact with large groups of people not just one person who gives it a prompt and gets a result back um there's social interactions these are three of uh five world uh Scopes that this one paper I've discussed on my YouTube channel sort of displays this this future of where are we going to get more data now that we've you know trained on internet data so that's on the modeling front and how to make the models better definitely new architectures improvements Hardware that will all sort of continue to develop even though right now there's a little bit of a convergence there haven't been any major steps on the modeling side for for a while but there's still a lot of to be done in engineering so rolling these models out Building Systems around the capabilities we currently have there's so much that that can be done there that will you know keep so many people busy for the next two three years but also inventing other ways of using other media formats that are now possible that you can generate images or generic texts or full stories or full podcasts um so yeah the world will be a little different and a lot of people are going to be very creative in terms of what they what they deploy a lot of it's going to come from the engineering side but not only from engineering modeling on your end is there one thing that right now ai cannot do and you you would love it to be able to do is there one thing that comes to your mind yes one thing that I'm it's a little random but I don't think I'm really obsessed about just the nature of learning about intelligence uh in in software and having software do solve problems in intelligent ways makes me very intrigued about other natural intelligences Beyond humans so animal intelligence um the Dolphins uh the octopus the ant colony the Apes there are efforts like there's this project called the seti project c-e-t-i uh for you know trying to throw all the NLP technology that we have on try to decode the language and vocalizations of of whales um to see you know can we start to understand maybe communicate with these other you know forms of of intelligent life around us that we sort of don't have yet ways of communicating to them I'm absolutely passionate about you know this language being able to allow us to to connect better to our more intelligent um uh forms of life around this yeah it's so cool I've always been drawn into how we understand things and just also how a cat sees the world and just all the animals and and living beings it's it's really cool that like it's like neuroscience and all this is a completely well not completely but it's a different field and now lots of people come from Pure software and they become interested in that in in neuroscience and these topics just thanks to language models and how it makes you think of how things understand its it's really cool and I'm excited to see where Ai and like my field well our field can can help the human race to understand other other things it's it's it's really cool I have one last question for you just because I'm it's a topic that I'm particularly interested in it's on your end since you are a blogger and now an even a YouTuber first are you using any AI help when creating educational content well not necessarily llm but maybe AI editing or just generation or asking questions brainstorming are you using any AI power tools to make your writing process better or just creating creative process better play not on let's say a daily basis but yes sometimes for like outlines or idea generation uh that are useful or like some artwork or thumbnails sometimes like mid-journey has been has been useful for for some of these um but uh like everybody I'm just learning how to adapt them into into my workflows toward the investment I didn't see myself use it until very recently and now I've I've seen a particular a particular use case for me just when I'm it's mainly because I'm French but and not a native English speaker but it it's really helpful to help improve your formulation and syntax that's one thing just because it helps me improve but another thing is when you I'm still currently learning lots of new stuff and I still try to explain them while learning and why I see a word that I don't understand or a topic that is that seems a bit blurry even if I have the paper and I I think I understand them asking GPT or any other model is quite useful it actually like reformulates and it can be very useful to quickly get high level understanding of specific topics that that's something I've been using recently and it took me a while to get into that which is weird because we are actually explaining how they work but we don't use it nearly as much but it's now now I see a better use case like the more the more time that we have with them the better we we use them obviously and that's yeah I see it's really promising it's it's really cool but yeah you definitely have to double check the outputs and ensure it's not hallucinating or anything else it is it still requires human inputs but it's really useful and so just as a it's not really a question but the last thing I wanted to mention is there anything you would like to share with the audience are you do you have any project projects other than the large language model University that of course um anyone can go right now through cohere for free learn a lot about Transformers and everything we discussed in this interview and it's a really good resources resource I I definitely recommend it but is there anything else on your end that you are excited to share about or to release soon or work on yeah so aside from the llm University we have the cohere Discord where we answer questions so if you have questions as you go through the llm university join us let us know what you want sort of uh to to learn about we're happy to sort of help you with your learning education and then when you build something we're also welcome that you share it and say you know what problems you faced how you solve them uh so it's a community to learn together and you know we welcome everybody on the uh career Discord that's awesome is there anything coming up uh on your YouTube or or the blog so I've been doing a bunch of shorts I've been digging deeper into these tools that build on top of llms like Lang chain like llama index so I've been doing a few of these of these shorts so that's a little bit of my focus area now but in terms of topics if I can carve some time to talk about uh human feedback and other Chef that's high on my list yeah I'd love to do I'd love to see that so perfect well thank you very much for all the time you you gave us and did the amazing insights it was a really cool discussion to have with you I I've known you for only two years unfortunately I didn't know your blog before that but it's just amazing resources and likewise for the llmu I'm yeah I'm really thankful for for you to you and and your team for building that but also to you personally for the YouTube and the blog it's just really cool that people like you exist so thank you and and thank you for joining the podcast thank you so much that's so kind of you uh you know I'm just a student like any other and we're just learning together thank you so much for having me and uh looking forward to uh yeah interacting it and speaking together in the future\n"