What is Semantic Search (with Elan Dekel, VP of Product at Pinecone)

Semantic Search: Understanding the Difference from Traditional Search Capabilities

We've had search capabilities for documents and websites and the whole internet for decades, so how is semantic search different? It's a great question. Search engines started by essentially searching over text strings, so you type in, say, a few characters like Peter, and then it'll search through all the data in its database to find that exact sequence of strings. This was the initial type of search, but it has some major limitations. For example, if you've misspelled the name or are talking about a concept that's not exactly what you mean, you won't find anything relevant.

That's where semantic search comes in. It's a more sophisticated type of search engine that knows the meaning of what you're searching for and can handle things like misspellings and vague concepts. Companies like Google have spent thousands and thousands of programmer years making their search engine aware of the semantic information in documents, resulting in high-quality searches. The great thing about using vector embeddings and large language models for search is that you can get to the quality almost to the quality of Google search just by using a model and feeding the text through it.

Using vector embeddings and large language models allows you to do vector search without needing thousands of programmer years of work to build your own search engine. This means you don't have to spend time figuring out which words are synonymous or nearly synonymous with other words – having that done for you automatically is a huge productivity boost. However, semantic search and its applications can be very complex, involving large amounts of data and advanced algorithms.

The Concept of Image Search

Image search falls under the category of classification problems. It involves categorizing images into different groups based on their content or features. There are two main types of image search: classification and extreme classification. Classification is a more straightforward type of image search where you can identify objects, such as food or animals, within an image.

For example, if you're building a website that allows users to upload pictures of their dinner, you can use image search to figure out what the dish is. You take a picture of your dinner, and the search engine identifies it as a specific type of cuisine or ingredient. In contrast, extreme classification involves identifying individuals within an image. For instance, if you're building a security system that uses facial recognition, you need to identify people in images to verify their identity.

Facial Recognition: A Key Component of Image Search

Facial recognition is a key component of image search, particularly in applications involving extreme classification. When it comes to identifying individuals within an image, facial recognition is essential. Companies like Google and Microsoft have developed sophisticated algorithms that can recognize faces and match them with millions of database entries.

The process works by comparing the face in the image with known faces in the database. If there's a match, the system can identify the person in the image. This technology has far-reaching implications for security systems, surveillance cameras, and other applications where facial recognition is crucial.

Anomalies Detection: Using Vector Databases

Vector databases are another application of semantic search. They involve identifying anomalies or unexpected patterns within data, such as people entering a secure area when they shouldn't be. Companies with security cameras have used vector databases to detect these anomalies, allowing them to alert the relevant authorities.

The process works by feeding the images into the database and using the algorithms to identify unusual patterns or behavior. This allows the system to detect people who shouldn't be in a particular location, such as someone entering a secure area when they're not authorized. Vector databases have revolutionized the way companies approach security, enabling them to respond quickly to potential threats.

Conclusion

Semantic search has come a long way since its inception, and its applications continue to grow. From traditional text-based searches to image search and anomaly detection, semantic search is becoming an essential tool for businesses and individuals alike. By understanding how it works and its capabilities, we can unlock new possibilities for search and data analysis, leading to improved productivity and decision-making.

"WEBVTTKind: captionsLanguage: ennow we've had search capabilities for documents and websites and the whole internet for decades so how is semantic search different yeah it's a great question so um search engines started by uh essentially um searching over uh text strings so you type in say you know uh few characters like uh Peter and then it'll search through uh all the in its database to find that exact sequence of strings that was sort of the initial type of search um obviously it's uh it's uh problematic say you've misspelled the name um or say you're talking about a concept say you're talking about uh a dog and the article that you're searching over refers to K9 or something like that you won't find any of that um so uh that's where we come into something called semantic search which is where the search engine knows the the meaning of what you're searching for and it knows how to do things like uh you know uh handle misspellings and things like that so those are much more sophisticated uh search engines and companies like Google spent thousands and thousands of um programmer years uh essentially um making a uh making their search engine um be aware of the semantic uh information uh in the document to make a highquality search so um uh the great thing about uh using vector embeddings and using um large language models for search is that you're essentially can get to the Quality almost to the quality of Google search uh just by using a model and feeding the text through it using vector embeddings and doing Vector search you don't need um you know thousands and thousands of uh of programmer years of uh of work to build uh to build your own search engine yeah so seems incredibly timec consuming trying to work out which words are synonymous or nearly synonymous with other words just having that done for you automatically um has got to be a huge productivity boost exactly and it's uh it's it's very very complex and um again like companies like Google and Microsoft have thousands of Engineers like making that work really well and in a sense you can think of these large language models as sort of you know in one Fell Swoop allowing people to um reproduce that level of quality if not better in many cases you also mentioned image search before so that seems like a a very cool idea but I'm not quite sure when you would use it so um what are the sort of business applications of image search yeah a good question so there's different types of use cases there's um uh there's two Ty it's image search is a we call it a classification problem so we have uh classification then we have extreme classification so uh classification would be like uh hey this thing here is a hamburger uh this this is a picture of you know whatever say You're Building A a website uh where you can search over food or things like that and people are uploading pictures of their dinner so you can you can sort of figure out what uh what that dish is Extreme classification on the other hand is where you're actually trying to find a an individual so for example if it's people you can you can do um like face recognition so if you have like millions of of bases in your database and um you know you're building a say a system to verify your identity or some sort of security system and you take a picture of yourself um that it can actually find an exact match of that face it classification would say hey this is a face or this is a person extreme classification would say oh this is Peter this is Elon this is uh this is John so both of those can be supported by U by uh these techniques and by vector databases and uh again um many many different use cases yeah I can certainly see how there's different levels of difficulty there so just saying um is this uh a person in the video is much easier than is this Elan in the video exactly so for example like uh people have used Pine con for um for security systems for like um if you think of um a company that has security cameras and they have thousands of cameras with feeds of video and you want to find uh anomalies like situations that are unexpected like oh a person is now going through that doorway in my data center which people don't usually go through like I'd like to get an alert so that's the kind of thing that you could build using a vector databasenow we've had search capabilities for documents and websites and the whole internet for decades so how is semantic search different yeah it's a great question so um search engines started by uh essentially um searching over uh text strings so you type in say you know uh few characters like uh Peter and then it'll search through uh all the in its database to find that exact sequence of strings that was sort of the initial type of search um obviously it's uh it's uh problematic say you've misspelled the name um or say you're talking about a concept say you're talking about uh a dog and the article that you're searching over refers to K9 or something like that you won't find any of that um so uh that's where we come into something called semantic search which is where the search engine knows the the meaning of what you're searching for and it knows how to do things like uh you know uh handle misspellings and things like that so those are much more sophisticated uh search engines and companies like Google spent thousands and thousands of um programmer years uh essentially um making a uh making their search engine um be aware of the semantic uh information uh in the document to make a highquality search so um uh the great thing about uh using vector embeddings and using um large language models for search is that you're essentially can get to the Quality almost to the quality of Google search uh just by using a model and feeding the text through it using vector embeddings and doing Vector search you don't need um you know thousands and thousands of uh of programmer years of uh of work to build uh to build your own search engine yeah so seems incredibly timec consuming trying to work out which words are synonymous or nearly synonymous with other words just having that done for you automatically um has got to be a huge productivity boost exactly and it's uh it's it's very very complex and um again like companies like Google and Microsoft have thousands of Engineers like making that work really well and in a sense you can think of these large language models as sort of you know in one Fell Swoop allowing people to um reproduce that level of quality if not better in many cases you also mentioned image search before so that seems like a a very cool idea but I'm not quite sure when you would use it so um what are the sort of business applications of image search yeah a good question so there's different types of use cases there's um uh there's two Ty it's image search is a we call it a classification problem so we have uh classification then we have extreme classification so uh classification would be like uh hey this thing here is a hamburger uh this this is a picture of you know whatever say You're Building A a website uh where you can search over food or things like that and people are uploading pictures of their dinner so you can you can sort of figure out what uh what that dish is Extreme classification on the other hand is where you're actually trying to find a an individual so for example if it's people you can you can do um like face recognition so if you have like millions of of bases in your database and um you know you're building a say a system to verify your identity or some sort of security system and you take a picture of yourself um that it can actually find an exact match of that face it classification would say hey this is a face or this is a person extreme classification would say oh this is Peter this is Elon this is uh this is John so both of those can be supported by U by uh these techniques and by vector databases and uh again um many many different use cases yeah I can certainly see how there's different levels of difficulty there so just saying um is this uh a person in the video is much easier than is this Elan in the video exactly so for example like uh people have used Pine con for um for security systems for like um if you think of um a company that has security cameras and they have thousands of cameras with feeds of video and you want to find uh anomalies like situations that are unexpected like oh a person is now going through that doorway in my data center which people don't usually go through like I'd like to get an alert so that's the kind of thing that you could build using a vector database\n"