Using Vector Databases for Better LLM Results with Ram Sriharsha, CTO at Pinecone

**The Economics of Large Language Models**

One of the significant benefits of using Vector databases with LLMs is that it can lead to a more economical workflow. As mentioned in the conversation, "the more you use up databases in the loop and more do you kind of Leverage it uh the cheaper it's going to turn out to be." This trend is backed by research, which suggests that by utilizing existing knowledge from LLMs and integrating it with Vector databases, the overall cost of using these models can be reduced. In fact, there are studies that show that even taking a significant portion of what an LLM already knows and putting it into a Vector database can lead to substantial cost savings.

**Choosing the Right Model for Your Task**

It's also important to note that not all tasks require the best possible language model. While top-performing models excel in reasoning and attention-grabbing capabilities, they often struggle with picking out the most relevant information from large contexts. In such cases, a lesser or "low-power" cheaper model might be sufficient. The speaker notes that powerful LMs tend to perform well on tasks like reasoning due to their extensive training data. However, as new models emerge, particularly open-source ones, the landscape is constantly evolving, and what was once considered the best might no longer hold true.

**The Importance of Adaptability**

Another crucial aspect to consider when working with language models is adaptability. As the field continues to advance, it's essential to be able to swap out or adjust models as needed to optimize performance. The speaker emphasizes that treating LLMs as "black boxes" that can be optimized away is a good approach. This allows developers to focus on prompt engineering and adapting their workflows to accommodate changes in model performance.

**Private Data and Evaluation Metrics**

Despite the advancements in language models, there remains a significant gap between open-source and proprietary models. One major challenge is the lack of reliable private data and evaluation metrics for these models. The speaker notes that while the leaderboard is constantly changing, the most accurate indicator of an LLM's performance for a specific application is often the model's own performance on that particular dataset. This highlights the importance of relying on real-world testing and data to inform decisions about which models to use.

**The Future of Language Model Development**

As research in language model development continues to advance, it's clear that the field will continue to evolve rapidly. The speaker notes that new models are emerging at an incredible pace, particularly open-source ones. While this presents challenges, it also creates opportunities for innovation and improvement. As developers, it's essential to stay informed about the latest developments and adapt to changes in the landscape.

**Optimizing Workflows with Vector Databases**

The integration of Vector databases with LLMs is a key area of focus for optimizing workflows. By leveraging existing knowledge from LLMs and incorporating it into Vector databases, developers can create more efficient and cost-effective applications. This approach has shown promise in reducing costs while maintaining or improving model performance.

**Reasoning Tasks and the Limitations of LLMs**

When it comes to tasks that require strong reasoning capabilities, top-performing language models often excel. However, this doesn't mean that lesser models are entirely unsuitable for these tasks. The speaker notes that powerful LMs tend to perform well on reasoning due to their extensive training data. Nevertheless, the ability to pick out the most relevant information from large contexts remains a challenging problem in natural language processing.

**Evolving Landscape and Recommendations**

As the landscape of language models continues to evolve, it's essential for developers to stay informed about the latest developments and adapt to changes in performance. The speaker offers several key takeaways:

* Use Vector databases to optimize workflows

* Choose the right model for your task, considering both cost and performance

* Leverage prompt engineering to adapt to changing model performance

* Prioritize real-world testing and data over leaderboard metrics

* Stay informed about emerging models and technologies

By embracing these strategies, developers can unlock the full potential of language models and create more efficient, cost-effective applications.

"WEBVTTKind: captionsLanguage: enI'd like to talk about the the llm side of things as well now um because you mentioned that you're probably going want to pick like a decent like one of the best sort of llm available is when you start creating a chatbot but then you might swap it out for something um cheaper to run um can you think of when you're building an application can you think of the llm and the vector database separately or is there some kind of relationship between them that you need to worry about so first of all depending on the task at hand you might be able to use cheaper maybe less powerful LMS so that's already something that uh uh one should be aware of so you don't need gp4 for everything you don't need the most powerful llm for everything it's really T dependent but it it don't uh in some sense uh the quality of your vector database and and how scalable your VOR databases also has something to say here so one thing we have found and we've we've written about it and so one is you could take gpt3 GPD 3.5 you could TR take all the data that it was trained on just put that data into a vector database and use gbd 3.5 along with that Vector database and you have better uh groundedness and better ultimate uh retrieval capability than gbt 4 okay which means that as a weer model with a powerful Vector database even just retrieving on data that the model was already trained on can still do at least as good if not better than the state of the art models okay uh and you found this to be true even for some open source models as well so clearly this means that if you use Vector da basis together with llms you actually can uh the sum of Parts is actually bigger right in some so in some sense you can get more bank for the B by doing that um but even otherwise it's important for you to know that uh you don't you don't always need the best llm for every every task now in terms of cost the today the cost is dominated by large language models okay so uh language models really dominate this cost in fact the bigger language models provide you bigger context and that's actually expensive as well but sometimes bigger context actually helps so even understanding exactly what some workflow should cost you is pretty tricky today so you you might you might go with a bigger context with a bigger language model and end up with a cheaper workflow sometimes sometimes you can go with a smaller model with lesser context with more data in a vector datase and end up with a cheaper workflow but overall I think what I can say safely is that the more do you use up databases in the loop and more do you kind of Leverage it uh the cheaper it's going to turn out to be and in fact there's a lot of research that shows that you could even take a lot of what the language models already know and put them into the vect database and that's going to lead to a overall more economical workflow so I think economics is going to drive us in that direction any okay that's very interesting that just if you make more use of vector databases then you can go away with a cheaper language model and overall cost of GR in chatot might be less um I'm curious as to when you need like the best um sort of llm and when a lesser llm or a low power a cheaper one will do do you have a sense of which tasks uh you need for which yeah I think it's mainly uh reasoning tasks and uh so so the best LMS are really good at reasoning and really good at picking out uh in some sense paying attention to the even if you give some an llm a large context it it's not necessarily going to be good at picking out the things that matter from the things that don't matter okay uh this is this is usually the hard problem with giving llms the whole context okay uh so but we find that uh some of the some of the powerful El distinguish themselves in that ability okay and they distinguish themselves in the general reasoning ability that's also because they've been trained on a lot of data um so but again this is evolving as we speak as we speak there are newer models coming out particularly open source models that are increasingly challenging language models the propriety language models and so on so whatever I say here is the going to be obsolete in three months yeah okay uh yeah actually just on that note it just seemed like that the best LM just changes weekly uh and I guess you need to build your applications to make sure that it's possible to swap out the llm in order to you know optimize it is there anything you need to do in order to ensure that's possible yeah so I think the more you can treat your llm as uh in some sense of black box that you can actually optimize away the better because that landscape is changing quite a bit uh of course you really really need really need to understand prompt engineering because the prompt engineering is specific to llms and so on and so you want to be careful about that but if you can engineer your obstructions in such a way that you can swap out an llm that's that's I think eventually a good thing to do that said while we see that the leaderboard is changing quite often and like there's a lot of competition in some sense at the top for a language models one thing to keep in mind is that there is not a lot of private data there's not a lot of really good metrics and a really good data that the llms themselves have not seen so in some sense the the best indicator of an llm performance for your application is actually your data and your queries and trying it out because I can tell you from uh uh lot of benchmarks that we do and lot of data that we have collected and so on there is still a gap between open source models and proprietary models and so on even though that open that Gap is getting closed uh as we speak uh because in this in this entire space we are still missing really good metrics we still missing really good data sets that in some sense the models have no idea about and the ability to to generalize to that sort of data and those sort of queries and so on is what matters for people at the end of the dayI'd like to talk about the the llm side of things as well now um because you mentioned that you're probably going want to pick like a decent like one of the best sort of llm available is when you start creating a chatbot but then you might swap it out for something um cheaper to run um can you think of when you're building an application can you think of the llm and the vector database separately or is there some kind of relationship between them that you need to worry about so first of all depending on the task at hand you might be able to use cheaper maybe less powerful LMS so that's already something that uh uh one should be aware of so you don't need gp4 for everything you don't need the most powerful llm for everything it's really T dependent but it it don't uh in some sense uh the quality of your vector database and and how scalable your VOR databases also has something to say here so one thing we have found and we've we've written about it and so one is you could take gpt3 GPD 3.5 you could TR take all the data that it was trained on just put that data into a vector database and use gbd 3.5 along with that Vector database and you have better uh groundedness and better ultimate uh retrieval capability than gbt 4 okay which means that as a weer model with a powerful Vector database even just retrieving on data that the model was already trained on can still do at least as good if not better than the state of the art models okay uh and you found this to be true even for some open source models as well so clearly this means that if you use Vector da basis together with llms you actually can uh the sum of Parts is actually bigger right in some so in some sense you can get more bank for the B by doing that um but even otherwise it's important for you to know that uh you don't you don't always need the best llm for every every task now in terms of cost the today the cost is dominated by large language models okay so uh language models really dominate this cost in fact the bigger language models provide you bigger context and that's actually expensive as well but sometimes bigger context actually helps so even understanding exactly what some workflow should cost you is pretty tricky today so you you might you might go with a bigger context with a bigger language model and end up with a cheaper workflow sometimes sometimes you can go with a smaller model with lesser context with more data in a vector datase and end up with a cheaper workflow but overall I think what I can say safely is that the more do you use up databases in the loop and more do you kind of Leverage it uh the cheaper it's going to turn out to be and in fact there's a lot of research that shows that you could even take a lot of what the language models already know and put them into the vect database and that's going to lead to a overall more economical workflow so I think economics is going to drive us in that direction any okay that's very interesting that just if you make more use of vector databases then you can go away with a cheaper language model and overall cost of GR in chatot might be less um I'm curious as to when you need like the best um sort of llm and when a lesser llm or a low power a cheaper one will do do you have a sense of which tasks uh you need for which yeah I think it's mainly uh reasoning tasks and uh so so the best LMS are really good at reasoning and really good at picking out uh in some sense paying attention to the even if you give some an llm a large context it it's not necessarily going to be good at picking out the things that matter from the things that don't matter okay uh this is this is usually the hard problem with giving llms the whole context okay uh so but we find that uh some of the some of the powerful El distinguish themselves in that ability okay and they distinguish themselves in the general reasoning ability that's also because they've been trained on a lot of data um so but again this is evolving as we speak as we speak there are newer models coming out particularly open source models that are increasingly challenging language models the propriety language models and so on so whatever I say here is the going to be obsolete in three months yeah okay uh yeah actually just on that note it just seemed like that the best LM just changes weekly uh and I guess you need to build your applications to make sure that it's possible to swap out the llm in order to you know optimize it is there anything you need to do in order to ensure that's possible yeah so I think the more you can treat your llm as uh in some sense of black box that you can actually optimize away the better because that landscape is changing quite a bit uh of course you really really need really need to understand prompt engineering because the prompt engineering is specific to llms and so on and so you want to be careful about that but if you can engineer your obstructions in such a way that you can swap out an llm that's that's I think eventually a good thing to do that said while we see that the leaderboard is changing quite often and like there's a lot of competition in some sense at the top for a language models one thing to keep in mind is that there is not a lot of private data there's not a lot of really good metrics and a really good data that the llms themselves have not seen so in some sense the the best indicator of an llm performance for your application is actually your data and your queries and trying it out because I can tell you from uh uh lot of benchmarks that we do and lot of data that we have collected and so on there is still a gap between open source models and proprietary models and so on even though that open that Gap is getting closed uh as we speak uh because in this in this entire space we are still missing really good metrics we still missing really good data sets that in some sense the models have no idea about and the ability to to generalize to that sort of data and those sort of queries and so on is what matters for people at the end of the day\n"