[AI and the Modern Data Stack] #183 Adding AI to the Data Warehouse with Sridhar Ramaswamy

The Future of AI and Data Management: A Conversation with Richie Chang

In this conversation, we had the opportunity to sit down with Richie Chang, CEO of Snowflake, to discuss the current state of AI and data management. We began by discussing the recent surge in investments in AI companies, which has sparked concerns about whether it's a bubble that's about to burst.

I think uh people are definitely going to be asking questions um about uh about sort of Revenue returns and what is actually uh going on over uh over there. A 5% interest rate environment has a profound influence on startups, and I think it's hard for people to realize that like the difference between zero and 5% interest rates is basically Infinity. When we talk about large amounts of money being thrown at companies, it's not just investments from venture capitalists or private equity firms, but also from the big platforms themselves. These investments have largely turned into cloud spend, which isn't necessarily a real investment. As my colleague VI puts it, taking your balance sheet and converting it into revenue is what that's all about.

I do think that VCs are cautious about not throwing too much money into unknown companies, and I think the time of 100x Revenue valuations are definitely a thing of the past. AI has perhaps another six to nine months before similar kinds of questions will be asked about revenue and revenue growth. When it comes to our customers, for example, they are already asking us about how much should I invest, how should I be looking for ROI, where are we creating value? These are all good hard questions that people need to ask, and honestly, avoiding some of the hype will keep us all in a better place. Bubbles don't do anyone any favors.

As for advice for organizations wanting to improve their data management or AI capabilities, Richie emphasized the importance of thoughtfully using the power of language models. He shared his own experience with Python date functions and how having an assistant like chatGPT can make writing code more efficient. "I've probably written code with like the python date functions for the past 20 years, but I can never remember exactly what they do," he joked.

Richie also highlighted the importance of embracing the value that AI provides, rather than just hyping up the latest thing. He encouraged CIOs and CDOs to think about how they can make existing processes more efficient using language understanding, knowledge, and making it easier for people to talk and query data. "I think that's a breakthrough," he said. "There's a whole bunch of those opportunities there."

Overall, our conversation with Richie Chang provided valuable insights into the current state of AI and data management. As companies continue to invest in these technologies, it's essential to approach them with caution and focus on creating real value rather than just hype.

The Breakthroughs in Language Models

One of the most exciting developments in recent years is the rapid progress being made in language models. These models are capable of understanding natural language and generating human-like responses, which has a profound impact on how we interact with data. Richie highlighted the importance of embracing this technology and using it to make existing processes more efficient.

Thoughtfully using the power of language models can make a huge difference in productivity. For example, having access to chatGPT or similar tools can instantly provide answers to complex questions without the need for hours of research or digging through stack Overflow posts. Richie shared his own experience with writing code, noting that he's written Python date functions for years but never knew exactly what they did until he discovered the power of language models.

The seamless integration of AI infrastructure with existing systems is another key area where Snowflake excels. The company takes enormous pride in its ability to provide a platform that seamlessly integrates with everything else, including access control and cloud spend. This allows customers to focus on creating value without getting bogged down in the technical details.

The Importance of Partnerships

Richie also emphasized the importance of partnerships in driving innovation in AI and data management. When companies partner with reputable firms like Snowflake, they can tap into a wealth of expertise and knowledge that helps them make the most of these technologies.

Partnerships are essential for creating value from AI investments rather than just throwing money at it. Richie noted that the time of 100x Revenue valuations is behind us, and AI companies will need to focus on delivering real returns on investment. By partnering with reputable firms like Snowflake, companies can get access to expertise and knowledge that helps them achieve this goal.

The Value of AI

Richie Chang concluded our conversation by emphasizing the value that AI provides to organizations. While it's easy to get caught up in the hype surrounding new technologies, Richie encouraged CIOs and CDOs to focus on creating real value rather than just chasing after the latest thing.

"AI has made it easier for us to talk and query data," he said. "I think that's a breakthrough." By embracing this technology and using it to make existing processes more efficient, companies can unlock significant productivity gains and drive business growth.

"WEBVTTKind: captionsLanguage: enI tell people I have probably written code with like the python date functions like for the past 20 years but I can never remember exactly what they do to be able to Simply type in the chat GPD here I'm trying to extract date from you know a string that looks like this and it instantly gives me that answer so that I'm not fishing through eight stack Overflow posts about doing that I think that kind of productivity is uh is super real but I think the value that companies can get out of AI which basically comes from understanding language understanding knowledge making it easier for us to talk and query it um I think that is a breakthrough um and I would definitely uh encourage every CIO every CDO uh to think about how can they make existing things that they do that are tedious um are difficult more efficient hi St out great to have you on the show excited to be on the show with you Richie excellent um i' love to have a bit of context on snowflake so your F your Flagship product is a data Cloud so what makes this different to a data warehouse well data is at the center of most Enterprises you know that's what they run on day in and uh day uh day out um it is uh one thing to have a warehouse to store the data but of course you want to do stuff with it whether it is transforming it or being able to run machine learning on top of it or build applications on top or have your partners bring their data to you so you have that context in one place um or others bring applications also you begin to get the picture you know it snowflake started um as the place for all data 10 plus years ago um but over time this data has so much gravity that things like collaboration applications very different kinds of things that you can do with data all begin to be part of the core offering from us and from our partners that's what we mean when we talk about um snowflake being a data Club okay so um you can't just have just the the data warehouse where things are stored you that application sort of layer and all these other bits on top of it so I'd love to discuss all these things more in detail before we get to that uh can you tell us a bit tell me a bit about um what sort of organizations are using snowflake uh well well uh much of the fortune th000 Fortune you know the the world the Enterprise 2000 um they are all using uh snowflake um these include very large companies like uh Fidelity across you know across the board across different Industries I would say that um Finance healthc care media are some of our strong suits but it spans the Spectrum anybody that basically um wants an authoritative View of data gravitates towards snowflake um because they realize um that our things like our unique architecture that offers for very flexible and separated compute and storage um and also um our business model um which is consumption based you only pay for what you consume um makes for a great addition uh to the it space that pretty much every organization has so we have broad adoption by a lot of people and uh they love us because um you know we just work out of the box require very very low uh maintenance um and uh are very cost efficient for the value that we bring to these Enterprises okay uh so yeah low maintenance and cost efficient sound like good things um I'd like to talk a bit about um how uh generative AI has changed things obviously that's been the big story of this last year so do you think the rise of generative AI has changed executive attitudes to data I think smart Executives have always known that having their data story in a good place and just made their job easier um if you look at some of our customers Fidelity for example and they're open about the fact that we are like the data layer um for how they run their uh their business um this is because they have a number of operational systems for doing things like you you trade stocks on Fidelity it goes to an operational system um but those systems um are not really meant for visibility not really meant for insight and so they collect all of that data into into snowflake also have their Partners bring that data um to the same platform so that they have the 360 degree view off it I would say data to a certain EX for Enterprises has been an ongoing priority um but to me um what really really excites everybody including me including your mom um and the CEOs about generative AI is they all go like wait you mean I can talk to a computer in like plain language and it's actually going to understand um what I'm saying I think it's that that people are excited by uh CEOs know uh for example that they have needed a bunch of analysts uh you know a bunch of different tools like dashboards and visualization tools and Pi Tools in order to look at the data I think people are super excited by the prospect of just better human to data communication and that's the same attitude that we have at snowflake um we think oh wait um you mean we can create a chatbot for a specific data set and you can just ask questions in English um and it'll do a good job of giving you answers and if it can't give you an answer it'll just say like no I can't do that um you know we are very excited by being able to provide things like that but I would say the core thing that all of us um are and should be excited about um is this idea that natural language as opposed to strange buttons and text boxes you have to enter data into and magical incantations from software engineers and analy is getting replaced by ordinary language um I think that's the real power of language marvels of course they can do a lot more but to me just that if you can realize the value of that is going to be a big big deal for Enterprises uh yes certainly the idea of having an natural language interface is um much more U well intuitive for many people I think um can you tell me how this um this idea uh translates to a competitive Advantage for businesses um well uh you know for snowflake for example um uh the idea of language models and uh and and AI is a great add-on um but the core advantage that we have is that thousands of Enterprises trust us with their data um they bring all of the data about their businesses to their snowflake instance um they set up different kinds of extraction pipelines different kinds of visualization pipelines um and uh all of that is there and what AI now does is it creates this additional value on top where this data is more easily accessible um where insights are easier uh to uh to to get out and so in that sense I see AI as a major accelerant um for traditional enterprise software now there are lots of new applications that is also going to be disruption um there are lots of new applications that are going to come up that were basically like unimagined and unimaginable meaning um in 2001 and two you know by the time there were cell phones like you know you and I probably used them um and I remember like this brick of a phone that I had back when I was working at Bell it be a pound um but uh uh we could never really imagine Uber because a whole bunch of other things needed to come together similarly I think AI is going to throw up a whole new class of applications everything from image generation to video generation I don't know about you but I don't really use memes anymore I go to chat GPD try up a little description of I'll send you one after our podcast of uh hey I'm talking with Richie about data make me a little cartoon uh saying something funny about it and out comes the school cartoon I think those kinds of applications will also be there um I think it will cause disruption in the media sphere um but I think for core Enterprises um I think the the Nimble ones especially will adopt Ai and use that as an accelerant for their core offering absolutely that's what we're doing at snowlake that's really like I hadn't really thought of like the meme industry being the the thing that's been disrupted all the Millennials is like no that's it no AF no AI for me all right cool I don't know whether you have any more examples of like things like do any of your have any of your sort of early adopter customers started creating some of these new applications you've been talking about oh totally um so the kinds of applications that people are super excited by um is just mod fluid interaction um with existing stuff um so for example uh the first project that my team launched um you know little sight story um I started a search engine called Nea among the the first uh AI power search engines on the planet um and so snowflake acquired us like May last year um and so we are experts in like search and in AI hot ingredients right now and but the first application that we launched was really just like conversational Marketplace search snowflake has a marketplace where he can buy data sets where he can buy applications and be like ah you should be able to type in anything into that not just like a three-word query you can type in like sentences into it and we'll generate the right answers for you um there in lies a kernel of an idea um a lot of the work that we do day-to-day is search over specialized carasses um meaning we search for help they're using a particular product and we will search like in drive for specific documents on on and on and on and uh um the like I would say like the the prototypical application um for AI is to take the data that is relevant to a particular context um and uh put it into some sort of search index you can use the vector index you can use like what's called an IR and information retrieval index or combine the two as you're doing at uh at snowflake um so you search for that information you take the output of the search feed it into a language model and ask it to generate a fluid um sort of interactive chat box now you're chatting with a data carpus um that's like the earliest application um that our customers are developing and snowflake makes it easy just yesterday I was like ah I want to build an end application using streamlet which is a rapid prototyping environment and in like an hour I took a CNN news data set stuck it into snowflake um in you know put up a vector index on it um and then used uh streamlet and the language model to be like you know you can search over this you can interact with this carpus um no I'm not the best programmer in the world those days are long gone but I was able to do this as I said in less than an hour that's the power that we bring and there are also other applications uh something that we call snow pilot which is a co-pilot that helps you write SQL lots of people are trying it out we have another project that uses language models um in order to extract structured information say from things like contracts you know companies sign lots of contracts that all kinds of magical numbers in these contracts what's the rep share what's the you know what's the penalty if something is out of SLA and they forget about these contracts and don't really know what goes into them but people want to extract that structured information so we have a project called doc AI um that helps people extract structured information from unstructured documents puts it into a table U so that you can run classical analysis using SQL on top of that um we now have I think over 100 customers that are using it um and it's in private preview and soon headed to public preview so they can deploy it in production hopefully this gives you a flavor of the kinds of things um but I would say like table Stakes application number one is U think customer support think document search how can we do this much better how can we do it a lot more interactively uh and then going all the way up to O Let's create a multimodal model that can look through PDFs and extract structured information and there's a whole lot in between that's pretty amazing that and there are like so many different applications there and you mentioned that even with some sort of Fairly basic uh programming skills you could build something that actually added value in an hour um yeah that's the dream um all right I'd like to get into some of these applications in a bit more Det detail so uh maybe we'll start the search since that's your your Forte now I speak to an awful lot of Chief data officers and the one thing every single one of the complaints that is the data across their organization is stuck in silos devil this data no one really knows where it is they can't access it um it feels like AI um and your and search is going to help with this can you talk me um through how it's going to help 100% so you know some of our larger deployments um of snowflake have a 100,000 tables um that's nutty if you're like oh I want information about this specific topic where should I look um it's really really hard usually all of these devolve into a giant slack channel in which you're like hey i' like information about this project like it does somebody know something um it sort of comes down to it um and tools like Google don't really help because they don't have the kind of deep context uh into specific data sets what are the semantics of it and and and and so on um so we have an at snowflake we have an ambitious effort called Horizon um which basically makes sharing creating of shares uh sharing data within an Enterprise just like a whole lot easier you can attack you know we help you figure out semantics we help you figure out for example um is this column email addresses is this column other kind of pii um of course you can also put information about tables about schemas um and uh um we have this effort to make it really easy for you to search through the data sets again in natural language um and uh uh and get to the data of course you know access control is a big big deal and no company um is going to say uh in the name of making data easily visible I'm going to make everything visible you know that is also a disaster um but what we're doing are clever techniques um by which you can search over the metadata and figure out oh there is this data set but I actually don't have access to it how do I request the owner of the data to provide me with access because my U my request is a legitimate request so we think about the life cycle of data Discovery um and then how that subsequently drives data sharing um and I think this is the kind of stuff that is going to be helping a whole lot on then other aspects of AI that we will get into um then will make it easy for people to be able to quickly query that data um part of um the objective of snow pilot the co-pilot effort within snowl um is that it should be able to use things like the previous queries the contexts from the experts on any particular schema to help future people write SQL in an easier fashion um but it all starts um with having the data in the right place having metadata attached to it and making it super easy to discover um and share data in a controlled way and that's what we're doing with Horizon that's really interesting the idea that even if a data set needs to be kept private you can still make the metadata public or at least slightly more visible across your organization the metadata searchable within the Enterprise in other words you can separate out the two searching over the the privilege to search over metadata is different from the privilege to actually be able to run stuff on it um and uh again you know we are in the business of providing data owners Enterprises our customers with the right tools um we think this is an interesting differential um by the way you obsessed about these details um we even have something called a future Grant um very basically say you know um I want to give access to this particular schema um let's say about like Revenue data from Europe uh to Richie um but I also want to give the same access for all future tables that I'm going to create in the schema because these things are living breathing things and you know as new things come on you want to keep that access again that's a choice that um uh business owners can make okay so data access management is one of those things where it feels like it's no one no one's outa of a fun time so maybe you don't want all this stuff automated so people aren't um having to mess about with with some of the technical details yeah I think but it's it's a matter of providing the right level of abstraction um you know just saying everything is open is clearly not going to work on the other hand people are realizing that saying everything is closed doesn't really work either so it becomes a question of what's the right level of abstraction that you offer uh to the administrators to the business data owners so that they can responsibly manage how data is shared uh I always think of process as like you know it should be just enough not too much friction but not too little you know friction everything is open either you know that's like the magical goldilock situation that uh we try to get our customers into so you mentioned before about um you're using uh semantic search and these natural language interfaces to make all this sort of work so this has been hyped up as a technology that's going to be like much better than keyword search it's going to sort of solve a lot of our Enterprise search problems I was wondering how realistic is that like are we at a point where all of our Enterprise search problems are solved now or is there still work to be done oh I mean look the basic problem is that uh you know there are a ton of applications used in the Enterprise like hundreds um you know in our we didn't realize it at Neva we were a 50% company you know around for only four years uh and then we got bought by snowflake we had to make like a list of all of the software that we used and uh what data that there is and then let's get going and going and going and uh all of these are little silos um so I think that uh um getting all the data together um in a queriable form is very much an open project I don't think that is uh you know that is that is done um and things like access control every application remember not only has data but has rules for who can access data and the rules um are typically also disjointed um and so we have a number of connectors for bringing in data from different kinds of applications like Salesforce uh into snowflake U so it really becomes more of a second brain um for the Enterprise where all of this data sits um you know sits in there um and only after you have the data does semantic search come into play um and and get you the right data uh you know people are big fans of uh what's called Vector indexing uh it's an evolution of the same language model AI technology really um what it does is it takes your English query um and creates an embedding out of it um and then looks for documents that are roughly in the same space um the problem with Vector search is that sometimes it lacks Precision um it turns out even if you type in 20 words there are two or three of those words that really matter a lot um and so you need to make sure that documents that you return have those words uh so I would say this is an this is a rapidly evolving field um you know there is excitement because of vector indexing because it can do some pretty magical stuff um but you also need to combine that with more traditional what call IR information retrieval techniques of the kind that were pioneered by you know by by Google um it is it is getting better um but it's not a you know press this button or sign this agreement and everything is done kind of situation there's work to do okay it does sound like a lot of the success then um is really based on like the quality of the data and how are you're managing it and how will you doing you're working with metadata that's right that's right and how you you know how you bring it in you know let's face it if you if if a CIO is using 300 applications they're not going to say I need a copy of like each of the 300 application somewhere else or I need to figure out how to provide API access these applications often have terrible apis for accessing the data in them because it's not really in their interest to provide you with the API they're like yeah yeah yeah come to our application using our you know you using our website and so it is it's it's worked It's Tricky there's it's not a gim okay and it seems like even beyond the tooling there just for data quality you need to worry about uh processes and um your organizational culture um I don't know whether you have any advice on how you might improve your culture to improve um the data quality and management you know I think a a um a culture of thoughtful inquiry um where you're like uh you know let's let's use data wherever it is feasible um let's look at the biggest needs that uh you know that we have um and make sure that we have the data to support it which will D which will drive us set of uh priorities for the organization every organization has and you know this all of us have more things to do then we can realistically get done um and uh so prioritization of what are the most important sources um how do we make sure that we have a handle on those um and then how do we provide visibility um I would say like like prioritization uh and a um and the mentality of really having data in a good place for the things that matter like how much revenue are you making you better have good data on that how much are you spending you better have good data on that it's like start taking a topown approach like that and then prioritizing the biggest places where you where you need to um invest in getting data invest in insights on the data um is what I think is important um you know too many teams too many companies will start us you know start these Mega digital transformation projects we going to be a digital only everything in one place sort of company um those projects usually don't really succeed um because they try to attend I mean they they basically try to do too much um so I think prioritization um using tools that uh a company like snowflake provides we not only provide uh the data platform but we also provide things like connectors um thoughtfully um and prioritizing the right data sources so that they can be they can be queried and insight Insight can be built on top of them um I think there's there's no real substitute for that there's not a silver bullet that is going to solve data um visibility problems in any complex Enterprise life is just too complicated I do think uh you made a very good point there that yeah most businesses should probably know how much money they're making how much money they're they're spending and just starting with that real high value um data yeah it's going to be useful all right um so I'd like to go back to Applications one thing that you were talking about earlier was SQL generation so uh this is like one of the big Promises of generative AI you can instead of writing SQL you can just write a natural language query uh can you tell me a bit about how that works in Snowflake yeah I mean first of all um you know let me start out by saying that SQL generation on complicated schemas with poor metadata is not a solved problem don't let anyone convince you um that a language model is going to look at a horribly designed schema and like magically help you be proficient with this is like you know that's just not going that is just not where the tech is um however um if columns have good names um if there is additional metadata available on tables if there are things like views for example um that capture the essence of you know the data that is sitting in a schema then language models indeed can help a whole lot they can take all of this context um the metadata about tables the metadata about columns the metadata about value distribution in the columns um say have access to previous queries that have been run against a schema and people have written comments on those queries they can take all of that context um and use them as AIDS um in generating SQL that's what we do with snow fil um because it's snowflake we have access to everything that I just said um we can bring all of that smarts present it in a clever way to the language model and and tell them and tell the language model these are the tables you're dealing with this is all they're normally joined and this is the question that the user has um can you think through the process of writing a piece of SQL for that you know in situations like that um the m do much much you know much better um and they're able to generate SQL or some pretty difficult um problems and that's what we're doing at snowflake um so we take set of the art models whether it's a llama 2 or a mistal um and uh we have a pretty large team several hundred Engineers um that are working on things like fine-tuning these models to do better SQL so we do a lot of work in like in the in the data PR um and uh we're also looking into things like can we um fine-tune models with um customer specific information but give them a copy so that their data is not mixed in with anyone else's and can these models be much better at generating SQL for those customers because it has this additional um additional content um so we have this team that is that's basically working on things like f tuning models for uh for for SQL generation as I said we have an effort that looks uh in into understanding the metadata behind schemas and we combine both of these into the co-pilot experience on snowflake um which unsurprisingly is like this pan on the right um where you type in you know a query in English um and it's going to generate a a piece of SQL for you you look at it make sure that it is fine um and then you can click it like hit run and uh the query runs in the worksheet the next thing that we are working on is um basically and an API a programmatic version of co-pilot so that our customers can now build applications the idea is that you point this API to a uh to a schema um and embed the API into a tool um that a user now is able to ask questions and underneath the model generates a SQL runs the SQL and Returns the result back to the user um that's like the next thing that U um that that we are working on um but hopefully this gives you an idea of like what are the ingredients of snow pilot um and how is it being deployed in practice and where is it going to go relatively um relatively soon the thing that I'll stress here is that this is very much a software insuring just like GitHub co-pilot uses models to help you write code but really there's a lot of clever software inuring that goes into presenting the right context for the model so that it can do a great job there's no like magic and of uh heyy I have a couple of million lines of code uh language model help me do you you know do my thing that's that's that's fiction it does sound very interesting that a lot of what you're doing seems to be prompt engineering so you're basically just priding all that extra context in order to write good SQL sort of in the background so the user says well this is my business problem and then you provide all that sort of background data it's h it's actually multiple things it is uh it is fineing um which is where you take a model that is capable of doing a lot uh and give it teach it a bunch of additional context so for example out of the box none of these models are great at snowflake sequel other other sort of variant dialect of SQL and so that's what you find tun um on the other hand you also want to present very specific context um and uh it's more than U you know I would say like when when you say prompt engineering um I think of that as like a kid hacking around in chat GPT um these are more sofware Engineering Systems that carefully construct what goes into a model how many calls get made um and and so on in order to solve a business problem it's the difference between somebody writing a line one line of python uh and getting something done versus uh you know versus a team that is going to write like a python package that is going to be used by thousands and thousands of people that's what I mean by it's real softare engine yeah like that put the focusing on the engineering side things okay uh all right um so one thing you mentioned was that um once you get to these big Enterprise schemas and if there's like poorly labeled data then that's where problems start to occur I'm wondering if do people need to start designing databases differently in order to make this work like do you need to have smaller schemas or do you need to focus more on column names and metadata and things like that in order to good SQL generation is a great question uh the good news is that uh bi tools have already been teaching Enterprises uh to create these semantic layers um what language model struggle to do uh today uh biols have been struggling with this for the past 20 years uh and so there are a ton of unfortunately it's not really standardized you know DBT has its semantic layer so does powerbi or hotspot or looker um and so there are variant of these things uh so the work uh that is needed to get a schema to a place better a language model will have a better you know is is able to do a better job with it is similar uh to the work that uh Enterprises do in order to get their data ready for pi tools and we are actively looking at uh if an Enterprise already has this data can we just can you know can we just use it um but I would overall to answer to your question I would say like data cleanliness um and uh making sure that there are not like eight date columns um and you need an archaeologist to figure out which one to use for a particular Gant that's just like good sof dur in practice um I when I write code for example I write it from the perspective of a this code is going to live forever B I'm going to come back to it 3 months from now and not remember a about what I did because it's not the kind of stuff that stays in your brain forever and so having that mentality and really like saying you know things should be uh named appropriately um is uh is is is very important absolutely I've certainly been I love the name nextg by the way um whenever people start like a new software module um they'll call it like you know nextg Fubar um and six months will go by and you're like wait what what what what this has been production for six months what do you mean this is nextg because a new person that shows up this is that's the only thing that they not yeah somebody is last gen but yeah badly named um all right so uh yeah I can certainly see how um being able to like maintain code is going to be incredibly important so naming things is very useful um I'm curious as to how people use these AI features in practice is it people um using natural language to write all their um to do all the queries then or the people who are just like I don't want this I just want to write SQL is there a mix uh what do you uses do um so I would say uh it spans a spectrum um we wanted the power of language models to be available to all of our users including our analysts um and we don't think of uh say co-pilot as like a replacement for a business analyst um it's going to make them more efficient is going to make some of their more mundane tasks a little bit easier um but there's more um so when we design snowflake cortex which is the AI layer that ships with every snowflake deployment um we first of all wanted to make it super easy to use so we in fact expose cortex as a set of SQL functions so for example if you wanted to do summarization on a text column that is in a snowflake table that is as simple as making a SQL call um to this summarized function um and you pick the model that you want to send this text to um and it'll take your instructions and generate a summary for you and the list sort of goes on and on we also expose um what we call uh a complete which is basically you can think of it as you know assembling a prompt and sending it to a language model um but you can do this in SQL this prototype that I was telling you about um the news search prototype basically does that it assembles a prom um in SQL and sends it to the language Mark so that's like step number one which is existing analysts are now able to use the power of language models to do sentiment detection to do translation to do summarization um to do structured data extraction from text all of those things just come out of the box um in Snowflake you know courtesy of snowflake cortex um but we also designed these in a way um that you can now begin to build applications like chat bots on top of it by combining semantic search with a language model as I said that's the ingredient for a chatbot you can put all the documents for a particular topic into a snowflake table and then you can create a chat bot by which you can just have a conversation about those um about those documents um and but be also expose our most complex models like the SQL generation model to our customers so that if they say no no no no I do not want to call SQL functions I don't want to use like your easy to build chatbot I want to build something amazing myself I'm I have the people that are you know that can do the soft R we make that possible as well using something called container services which is our extensibility framework um our customers can f tune models themselves deploy them in container services and then write applications that talk to these deployed Mars um but you got the idea which is language models at every layer we don't try to lock our customers into one way of doing things we expose these at every layer so that they can mix and match what it is that they want to do um you know my Mantra is uh simple things should be simple complex things should be possible and that's really what snowflake cortex and container services um are about I really like that there's a sort of gradual um Evolution you mentioned there a spectrum so you can start with just doing simple things and like everything is just generated in natural language and then if you've got more technical skills you can build up to doing uh things in in Snowflake SQL that's right that's right that's right excellent and uh so you mentioned like often this is going to be te like whole data teams uh being involved or maybe even software developers so it does seem like working with data is very much um a team sport these days days so um can you talk about um how you see your customers doing collaboration on some of these tasks um you mean within within their teams or with us uh yes so within teams and I guess even like in organizations um often working with data and AI tends to be um several teams involved in things so do you have any advice on effective collaboration techniques I mean I think the the first thing that U I talked about when it came to collaborating was how do you make sure that uh uh you know silos are eliminated and data duplication is a thing of the past um this is where Horizon and all of the efforts that we're doing around collaboration comes into play which is uh you know don't reinvent the uh again you know this uh most big Enterprises have like you know um three implementations for every project or like things repeated over and over again simply because community communication is not all that effective so at a very basic level we want to make sure that uh you know we make it super easy for people to leverage the work of their colleagues um and U and and use the data um and then then you you know when when you go up one step from there uh to let's build uh let's build applications um that is very much a collaborative team sport um then you need one person um that knows the data you're probably more of an analyst uh type person um but if you're building a chat part for example you do want somebody that knows a little bit about you know promp engineering and what language models do um and uh and and so on um and then you also need governance on these things um so you sort of have do have to get the right people together in order to do these projects um but uh part of our design with cortex was simple things like an analyst being being able to use a language model to do interesting things within like the Ambit of what they do day-to-day that should be much easier similarly an analyst that sort of knows data that is looking at a new schema uh making them productive quickly um uh you know with snow pilot uh is something that sort of happens naturally um I think this does not take away from the need to bring people together um with different skills in order to make a project succeed U and pilot for example uh we have people that are search infrastructure Engineers um we have people um that know like data uh really really well we have language model experts that are doing things like fine-tuning we also have ux Engineers because you need to actually create a product um that people love that is easy to interact with um and so and then obviously you need like product managers and designers um in order to be able to get something like that uh done snow pilot is a complicated project but like you get the idea you do need to bring people together um and they need to have the right skills in order to get in order to get things like meaningful things done okay yes so really is about make sure that everyone's got access to the right information and um they being able to work at the sort of the appropriate level of uh technical skills uh for them that's right that's right that's right and this is also where like leads and managers are really important in this process and they have to have a mental idea of like this is what it takes to be successful with a project like this these are the skills that need to be there sometimes with small teams you know of two people sometimes it's four people but just like thinking through that um and assembling these teams is an essential ingredient for Success all right so I'd like to step away from uh the snowflake talk now so you also have another job as a partner at Greylock I'm a venture partner it's a part-time role yeah excellent uh CU you know you got all that free time for an extra job uh okay so um I'd like to know what uh data Andi companies you are most excited about right now yeah so um uh at grock we've invested in several what we call Foundation model companies these are I you know these are we think the infrastructure companies that will power the future um this is mustafa's uh inflection Mustafa is of course one of the world's uh suan uh is one of the world's renowned AI experts used to work at at at Deep mine um they also invested in a company called Adept um so those are at at the at the foundational um ler um but we do think um that there are a ton um of Enterprise application companies um that are going to be interesting um we don't think like AI is going to be quite like a magic SAU as I was saying you still need to acquire customers you still need to have that M um because pretty much everybody knows knows how to use like gbd3 or gbd4 on an API call you know the the bar for excellence is is quite a bit higher um and we're also excited by some of the newer generative appications everything from image generation uh to I think video generation is uh is is pretty exciting I don't know about you but I used to edit videos when my son used to play tennis it's a mind bogglingly like tedious job to do anything I us to try and make like three minute summaries out of matches that would last like 3 hours it's so hard to do um and I think like whether it is video generation or video editing um I think there's just a lot of value that they're going to that they're going to create um I think advertising and marketing um are going to be changed in uh in in a pretty big way uh my son who's also a software engineer uh recently showed me this l owered application for making experimentation on websites just a whole lot easier and he's like here drop this little piece of JavaScript we'll run the experiment we will generate potential variations run experiments for you um so I think there's like a set of these kinds of products that are going to be AI native um that U are going to have a big impact on both Enterprises and consumers I'm definitely very excited for um all these sort of generative video applications it just seemed like they've been kind of okay and coming soon for a while now so yeah I hope they they get their Spotlight um all right uh so the other thing is that it seems like there has been a lot of money thrown at AI companies just over the last year is there something you think is going to continue or do you think it's sort of a bubble that's about to burst uh I think uh people are definitely going to be asking questions um about uh about about sort of Revenue returns and what is actually uh going on over uh over there uh you know a 5% interest rate environment has a profound influence on startups uh I think it's hard for people to realize um that like the difference between zero and 5% interest rates is it's it's basically Infinity um and uh uh I think what is going I mean one of the things that happened last year um when we talk about large amounts of money being thrown at companies is that quite a bit of it but also Investments um by the csps by the very large platforms um and these Investments basically turned around into Cloud spent um you know so I don't think of that as like real investment um it's it's like it's like a little bit of uh uh as my colleague VI puts it uh taking your balance sheet and converting it into into into Revenue um that's of that's what that's what that is um I do think that DCs are cautious about not throwing too much money um into unknown kind of companies um and uh I think the time of uh like 100x Revenue valuations are definitely a thing of the past um uh AI I think has perhaps another six to nine months before similar kinds of questions will be asked um about revenue and revenue growth and and things like that so um you know when it comes to our customers for example already they are asking us about hey how much should I invest how should I be looking for Roi where are we creating value um these are all good hard questions for people to ask um and honestly I think like avoiding some of the height will keep us all in a better place um because bubbles do not do you know anyone any favors absolutely so uh it sounds like it's good that uh there are some hard questions being asked before money's being thrown it of course excellent all right money money being thrown itself of course is a funny phrase but yeah true uh all right so uh before we wrap up do you have any final advice for organizations wanting to improve their data management or AI capabilities yeah I think that um uh there are pretty solid breakthroughs when it comes to Ai and language Mark um it is not right um I think like thoughtfully using the power language models um you know can make things more efficient um this whole like writing code like there is a before and an after if you have an assistant even having access to chat GPT um you know I tell people I have probably written code with like the python date functions like for the past 20 years but I can never remember exactly what they do to be able to Simply type in the chat GPD here I'm trying to extract date from you know a string that looks like this and instantly gives me that answer so that I'm not fishing through eight stack Overflow posts about doing that I think that kind of productivity is is super real um I would say that embracing what these models enable and using a platform like snowflake to build on top of um we take enormous pride in the fact that our AI infrastructure is seamlessly integrated with everything else that's going on in Snowflake the Investments that you have um in Access control just carry over naturally to everything that we provide with AI um I think like having partners like that um that are about providing real value and not just hyping up the latest thing um that uh you know customers can put money into I think that's an important thing to keep uh to keep in mind but I think the value that companies can get out of AI which basically comes from understanding language understanding knowledge making it easier for us to talk and query it um I think that is a breakthrough um and I would definitely uh encourage every CIO every CDO uh to think about how can they make existing things that they do that are tedious um are difficult more efficient there's a whole bunch of those to go after wonderful all right uh lots of opportunities there then uh excellent so uh thank you very much for your time s thank you Richie this was a fun fun fun conversation ohI tell people I have probably written code with like the python date functions like for the past 20 years but I can never remember exactly what they do to be able to Simply type in the chat GPD here I'm trying to extract date from you know a string that looks like this and it instantly gives me that answer so that I'm not fishing through eight stack Overflow posts about doing that I think that kind of productivity is uh is super real but I think the value that companies can get out of AI which basically comes from understanding language understanding knowledge making it easier for us to talk and query it um I think that is a breakthrough um and I would definitely uh encourage every CIO every CDO uh to think about how can they make existing things that they do that are tedious um are difficult more efficient hi St out great to have you on the show excited to be on the show with you Richie excellent um i' love to have a bit of context on snowflake so your F your Flagship product is a data Cloud so what makes this different to a data warehouse well data is at the center of most Enterprises you know that's what they run on day in and uh day uh day out um it is uh one thing to have a warehouse to store the data but of course you want to do stuff with it whether it is transforming it or being able to run machine learning on top of it or build applications on top or have your partners bring their data to you so you have that context in one place um or others bring applications also you begin to get the picture you know it snowflake started um as the place for all data 10 plus years ago um but over time this data has so much gravity that things like collaboration applications very different kinds of things that you can do with data all begin to be part of the core offering from us and from our partners that's what we mean when we talk about um snowflake being a data Club okay so um you can't just have just the the data warehouse where things are stored you that application sort of layer and all these other bits on top of it so I'd love to discuss all these things more in detail before we get to that uh can you tell us a bit tell me a bit about um what sort of organizations are using snowflake uh well well uh much of the fortune th000 Fortune you know the the world the Enterprise 2000 um they are all using uh snowflake um these include very large companies like uh Fidelity across you know across the board across different Industries I would say that um Finance healthc care media are some of our strong suits but it spans the Spectrum anybody that basically um wants an authoritative View of data gravitates towards snowflake um because they realize um that our things like our unique architecture that offers for very flexible and separated compute and storage um and also um our business model um which is consumption based you only pay for what you consume um makes for a great addition uh to the it space that pretty much every organization has so we have broad adoption by a lot of people and uh they love us because um you know we just work out of the box require very very low uh maintenance um and uh are very cost efficient for the value that we bring to these Enterprises okay uh so yeah low maintenance and cost efficient sound like good things um I'd like to talk a bit about um how uh generative AI has changed things obviously that's been the big story of this last year so do you think the rise of generative AI has changed executive attitudes to data I think smart Executives have always known that having their data story in a good place and just made their job easier um if you look at some of our customers Fidelity for example and they're open about the fact that we are like the data layer um for how they run their uh their business um this is because they have a number of operational systems for doing things like you you trade stocks on Fidelity it goes to an operational system um but those systems um are not really meant for visibility not really meant for insight and so they collect all of that data into into snowflake also have their Partners bring that data um to the same platform so that they have the 360 degree view off it I would say data to a certain EX for Enterprises has been an ongoing priority um but to me um what really really excites everybody including me including your mom um and the CEOs about generative AI is they all go like wait you mean I can talk to a computer in like plain language and it's actually going to understand um what I'm saying I think it's that that people are excited by uh CEOs know uh for example that they have needed a bunch of analysts uh you know a bunch of different tools like dashboards and visualization tools and Pi Tools in order to look at the data I think people are super excited by the prospect of just better human to data communication and that's the same attitude that we have at snowflake um we think oh wait um you mean we can create a chatbot for a specific data set and you can just ask questions in English um and it'll do a good job of giving you answers and if it can't give you an answer it'll just say like no I can't do that um you know we are very excited by being able to provide things like that but I would say the core thing that all of us um are and should be excited about um is this idea that natural language as opposed to strange buttons and text boxes you have to enter data into and magical incantations from software engineers and analy is getting replaced by ordinary language um I think that's the real power of language marvels of course they can do a lot more but to me just that if you can realize the value of that is going to be a big big deal for Enterprises uh yes certainly the idea of having an natural language interface is um much more U well intuitive for many people I think um can you tell me how this um this idea uh translates to a competitive Advantage for businesses um well uh you know for snowflake for example um uh the idea of language models and uh and and AI is a great add-on um but the core advantage that we have is that thousands of Enterprises trust us with their data um they bring all of the data about their businesses to their snowflake instance um they set up different kinds of extraction pipelines different kinds of visualization pipelines um and uh all of that is there and what AI now does is it creates this additional value on top where this data is more easily accessible um where insights are easier uh to uh to to get out and so in that sense I see AI as a major accelerant um for traditional enterprise software now there are lots of new applications that is also going to be disruption um there are lots of new applications that are going to come up that were basically like unimagined and unimaginable meaning um in 2001 and two you know by the time there were cell phones like you know you and I probably used them um and I remember like this brick of a phone that I had back when I was working at Bell it be a pound um but uh uh we could never really imagine Uber because a whole bunch of other things needed to come together similarly I think AI is going to throw up a whole new class of applications everything from image generation to video generation I don't know about you but I don't really use memes anymore I go to chat GPD try up a little description of I'll send you one after our podcast of uh hey I'm talking with Richie about data make me a little cartoon uh saying something funny about it and out comes the school cartoon I think those kinds of applications will also be there um I think it will cause disruption in the media sphere um but I think for core Enterprises um I think the the Nimble ones especially will adopt Ai and use that as an accelerant for their core offering absolutely that's what we're doing at snowlake that's really like I hadn't really thought of like the meme industry being the the thing that's been disrupted all the Millennials is like no that's it no AF no AI for me all right cool I don't know whether you have any more examples of like things like do any of your have any of your sort of early adopter customers started creating some of these new applications you've been talking about oh totally um so the kinds of applications that people are super excited by um is just mod fluid interaction um with existing stuff um so for example uh the first project that my team launched um you know little sight story um I started a search engine called Nea among the the first uh AI power search engines on the planet um and so snowflake acquired us like May last year um and so we are experts in like search and in AI hot ingredients right now and but the first application that we launched was really just like conversational Marketplace search snowflake has a marketplace where he can buy data sets where he can buy applications and be like ah you should be able to type in anything into that not just like a three-word query you can type in like sentences into it and we'll generate the right answers for you um there in lies a kernel of an idea um a lot of the work that we do day-to-day is search over specialized carasses um meaning we search for help they're using a particular product and we will search like in drive for specific documents on on and on and on and uh um the like I would say like the the prototypical application um for AI is to take the data that is relevant to a particular context um and uh put it into some sort of search index you can use the vector index you can use like what's called an IR and information retrieval index or combine the two as you're doing at uh at snowflake um so you search for that information you take the output of the search feed it into a language model and ask it to generate a fluid um sort of interactive chat box now you're chatting with a data carpus um that's like the earliest application um that our customers are developing and snowflake makes it easy just yesterday I was like ah I want to build an end application using streamlet which is a rapid prototyping environment and in like an hour I took a CNN news data set stuck it into snowflake um in you know put up a vector index on it um and then used uh streamlet and the language model to be like you know you can search over this you can interact with this carpus um no I'm not the best programmer in the world those days are long gone but I was able to do this as I said in less than an hour that's the power that we bring and there are also other applications uh something that we call snow pilot which is a co-pilot that helps you write SQL lots of people are trying it out we have another project that uses language models um in order to extract structured information say from things like contracts you know companies sign lots of contracts that all kinds of magical numbers in these contracts what's the rep share what's the you know what's the penalty if something is out of SLA and they forget about these contracts and don't really know what goes into them but people want to extract that structured information so we have a project called doc AI um that helps people extract structured information from unstructured documents puts it into a table U so that you can run classical analysis using SQL on top of that um we now have I think over 100 customers that are using it um and it's in private preview and soon headed to public preview so they can deploy it in production hopefully this gives you a flavor of the kinds of things um but I would say like table Stakes application number one is U think customer support think document search how can we do this much better how can we do it a lot more interactively uh and then going all the way up to O Let's create a multimodal model that can look through PDFs and extract structured information and there's a whole lot in between that's pretty amazing that and there are like so many different applications there and you mentioned that even with some sort of Fairly basic uh programming skills you could build something that actually added value in an hour um yeah that's the dream um all right I'd like to get into some of these applications in a bit more Det detail so uh maybe we'll start the search since that's your your Forte now I speak to an awful lot of Chief data officers and the one thing every single one of the complaints that is the data across their organization is stuck in silos devil this data no one really knows where it is they can't access it um it feels like AI um and your and search is going to help with this can you talk me um through how it's going to help 100% so you know some of our larger deployments um of snowflake have a 100,000 tables um that's nutty if you're like oh I want information about this specific topic where should I look um it's really really hard usually all of these devolve into a giant slack channel in which you're like hey i' like information about this project like it does somebody know something um it sort of comes down to it um and tools like Google don't really help because they don't have the kind of deep context uh into specific data sets what are the semantics of it and and and and so on um so we have an at snowflake we have an ambitious effort called Horizon um which basically makes sharing creating of shares uh sharing data within an Enterprise just like a whole lot easier you can attack you know we help you figure out semantics we help you figure out for example um is this column email addresses is this column other kind of pii um of course you can also put information about tables about schemas um and uh um we have this effort to make it really easy for you to search through the data sets again in natural language um and uh uh and get to the data of course you know access control is a big big deal and no company um is going to say uh in the name of making data easily visible I'm going to make everything visible you know that is also a disaster um but what we're doing are clever techniques um by which you can search over the metadata and figure out oh there is this data set but I actually don't have access to it how do I request the owner of the data to provide me with access because my U my request is a legitimate request so we think about the life cycle of data Discovery um and then how that subsequently drives data sharing um and I think this is the kind of stuff that is going to be helping a whole lot on then other aspects of AI that we will get into um then will make it easy for people to be able to quickly query that data um part of um the objective of snow pilot the co-pilot effort within snowl um is that it should be able to use things like the previous queries the contexts from the experts on any particular schema to help future people write SQL in an easier fashion um but it all starts um with having the data in the right place having metadata attached to it and making it super easy to discover um and share data in a controlled way and that's what we're doing with Horizon that's really interesting the idea that even if a data set needs to be kept private you can still make the metadata public or at least slightly more visible across your organization the metadata searchable within the Enterprise in other words you can separate out the two searching over the the privilege to search over metadata is different from the privilege to actually be able to run stuff on it um and uh again you know we are in the business of providing data owners Enterprises our customers with the right tools um we think this is an interesting differential um by the way you obsessed about these details um we even have something called a future Grant um very basically say you know um I want to give access to this particular schema um let's say about like Revenue data from Europe uh to Richie um but I also want to give the same access for all future tables that I'm going to create in the schema because these things are living breathing things and you know as new things come on you want to keep that access again that's a choice that um uh business owners can make okay so data access management is one of those things where it feels like it's no one no one's outa of a fun time so maybe you don't want all this stuff automated so people aren't um having to mess about with with some of the technical details yeah I think but it's it's a matter of providing the right level of abstraction um you know just saying everything is open is clearly not going to work on the other hand people are realizing that saying everything is closed doesn't really work either so it becomes a question of what's the right level of abstraction that you offer uh to the administrators to the business data owners so that they can responsibly manage how data is shared uh I always think of process as like you know it should be just enough not too much friction but not too little you know friction everything is open either you know that's like the magical goldilock situation that uh we try to get our customers into so you mentioned before about um you're using uh semantic search and these natural language interfaces to make all this sort of work so this has been hyped up as a technology that's going to be like much better than keyword search it's going to sort of solve a lot of our Enterprise search problems I was wondering how realistic is that like are we at a point where all of our Enterprise search problems are solved now or is there still work to be done oh I mean look the basic problem is that uh you know there are a ton of applications used in the Enterprise like hundreds um you know in our we didn't realize it at Neva we were a 50% company you know around for only four years uh and then we got bought by snowflake we had to make like a list of all of the software that we used and uh what data that there is and then let's get going and going and going and uh all of these are little silos um so I think that uh um getting all the data together um in a queriable form is very much an open project I don't think that is uh you know that is that is done um and things like access control every application remember not only has data but has rules for who can access data and the rules um are typically also disjointed um and so we have a number of connectors for bringing in data from different kinds of applications like Salesforce uh into snowflake U so it really becomes more of a second brain um for the Enterprise where all of this data sits um you know sits in there um and only after you have the data does semantic search come into play um and and get you the right data uh you know people are big fans of uh what's called Vector indexing uh it's an evolution of the same language model AI technology really um what it does is it takes your English query um and creates an embedding out of it um and then looks for documents that are roughly in the same space um the problem with Vector search is that sometimes it lacks Precision um it turns out even if you type in 20 words there are two or three of those words that really matter a lot um and so you need to make sure that documents that you return have those words uh so I would say this is an this is a rapidly evolving field um you know there is excitement because of vector indexing because it can do some pretty magical stuff um but you also need to combine that with more traditional what call IR information retrieval techniques of the kind that were pioneered by you know by by Google um it is it is getting better um but it's not a you know press this button or sign this agreement and everything is done kind of situation there's work to do okay it does sound like a lot of the success then um is really based on like the quality of the data and how are you're managing it and how will you doing you're working with metadata that's right that's right and how you you know how you bring it in you know let's face it if you if if a CIO is using 300 applications they're not going to say I need a copy of like each of the 300 application somewhere else or I need to figure out how to provide API access these applications often have terrible apis for accessing the data in them because it's not really in their interest to provide you with the API they're like yeah yeah yeah come to our application using our you know you using our website and so it is it's it's worked It's Tricky there's it's not a gim okay and it seems like even beyond the tooling there just for data quality you need to worry about uh processes and um your organizational culture um I don't know whether you have any advice on how you might improve your culture to improve um the data quality and management you know I think a a um a culture of thoughtful inquiry um where you're like uh you know let's let's use data wherever it is feasible um let's look at the biggest needs that uh you know that we have um and make sure that we have the data to support it which will D which will drive us set of uh priorities for the organization every organization has and you know this all of us have more things to do then we can realistically get done um and uh so prioritization of what are the most important sources um how do we make sure that we have a handle on those um and then how do we provide visibility um I would say like like prioritization uh and a um and the mentality of really having data in a good place for the things that matter like how much revenue are you making you better have good data on that how much are you spending you better have good data on that it's like start taking a topown approach like that and then prioritizing the biggest places where you where you need to um invest in getting data invest in insights on the data um is what I think is important um you know too many teams too many companies will start us you know start these Mega digital transformation projects we going to be a digital only everything in one place sort of company um those projects usually don't really succeed um because they try to attend I mean they they basically try to do too much um so I think prioritization um using tools that uh a company like snowflake provides we not only provide uh the data platform but we also provide things like connectors um thoughtfully um and prioritizing the right data sources so that they can be they can be queried and insight Insight can be built on top of them um I think there's there's no real substitute for that there's not a silver bullet that is going to solve data um visibility problems in any complex Enterprise life is just too complicated I do think uh you made a very good point there that yeah most businesses should probably know how much money they're making how much money they're they're spending and just starting with that real high value um data yeah it's going to be useful all right um so I'd like to go back to Applications one thing that you were talking about earlier was SQL generation so uh this is like one of the big Promises of generative AI you can instead of writing SQL you can just write a natural language query uh can you tell me a bit about how that works in Snowflake yeah I mean first of all um you know let me start out by saying that SQL generation on complicated schemas with poor metadata is not a solved problem don't let anyone convince you um that a language model is going to look at a horribly designed schema and like magically help you be proficient with this is like you know that's just not going that is just not where the tech is um however um if columns have good names um if there is additional metadata available on tables if there are things like views for example um that capture the essence of you know the data that is sitting in a schema then language models indeed can help a whole lot they can take all of this context um the metadata about tables the metadata about columns the metadata about value distribution in the columns um say have access to previous queries that have been run against a schema and people have written comments on those queries they can take all of that context um and use them as AIDS um in generating SQL that's what we do with snow fil um because it's snowflake we have access to everything that I just said um we can bring all of that smarts present it in a clever way to the language model and and tell them and tell the language model these are the tables you're dealing with this is all they're normally joined and this is the question that the user has um can you think through the process of writing a piece of SQL for that you know in situations like that um the m do much much you know much better um and they're able to generate SQL or some pretty difficult um problems and that's what we're doing at snowflake um so we take set of the art models whether it's a llama 2 or a mistal um and uh we have a pretty large team several hundred Engineers um that are working on things like fine-tuning these models to do better SQL so we do a lot of work in like in the in the data PR um and uh we're also looking into things like can we um fine-tune models with um customer specific information but give them a copy so that their data is not mixed in with anyone else's and can these models be much better at generating SQL for those customers because it has this additional um additional content um so we have this team that is that's basically working on things like f tuning models for uh for for SQL generation as I said we have an effort that looks uh in into understanding the metadata behind schemas and we combine both of these into the co-pilot experience on snowflake um which unsurprisingly is like this pan on the right um where you type in you know a query in English um and it's going to generate a a piece of SQL for you you look at it make sure that it is fine um and then you can click it like hit run and uh the query runs in the worksheet the next thing that we are working on is um basically and an API a programmatic version of co-pilot so that our customers can now build applications the idea is that you point this API to a uh to a schema um and embed the API into a tool um that a user now is able to ask questions and underneath the model generates a SQL runs the SQL and Returns the result back to the user um that's like the next thing that U um that that we are working on um but hopefully this gives you an idea of like what are the ingredients of snow pilot um and how is it being deployed in practice and where is it going to go relatively um relatively soon the thing that I'll stress here is that this is very much a software insuring just like GitHub co-pilot uses models to help you write code but really there's a lot of clever software inuring that goes into presenting the right context for the model so that it can do a great job there's no like magic and of uh heyy I have a couple of million lines of code uh language model help me do you you know do my thing that's that's that's fiction it does sound very interesting that a lot of what you're doing seems to be prompt engineering so you're basically just priding all that extra context in order to write good SQL sort of in the background so the user says well this is my business problem and then you provide all that sort of background data it's h it's actually multiple things it is uh it is fineing um which is where you take a model that is capable of doing a lot uh and give it teach it a bunch of additional context so for example out of the box none of these models are great at snowflake sequel other other sort of variant dialect of SQL and so that's what you find tun um on the other hand you also want to present very specific context um and uh it's more than U you know I would say like when when you say prompt engineering um I think of that as like a kid hacking around in chat GPT um these are more sofware Engineering Systems that carefully construct what goes into a model how many calls get made um and and so on in order to solve a business problem it's the difference between somebody writing a line one line of python uh and getting something done versus uh you know versus a team that is going to write like a python package that is going to be used by thousands and thousands of people that's what I mean by it's real softare engine yeah like that put the focusing on the engineering side things okay uh all right um so one thing you mentioned was that um once you get to these big Enterprise schemas and if there's like poorly labeled data then that's where problems start to occur I'm wondering if do people need to start designing databases differently in order to make this work like do you need to have smaller schemas or do you need to focus more on column names and metadata and things like that in order to good SQL generation is a great question uh the good news is that uh bi tools have already been teaching Enterprises uh to create these semantic layers um what language model struggle to do uh today uh biols have been struggling with this for the past 20 years uh and so there are a ton of unfortunately it's not really standardized you know DBT has its semantic layer so does powerbi or hotspot or looker um and so there are variant of these things uh so the work uh that is needed to get a schema to a place better a language model will have a better you know is is able to do a better job with it is similar uh to the work that uh Enterprises do in order to get their data ready for pi tools and we are actively looking at uh if an Enterprise already has this data can we just can you know can we just use it um but I would overall to answer to your question I would say like data cleanliness um and uh making sure that there are not like eight date columns um and you need an archaeologist to figure out which one to use for a particular Gant that's just like good sof dur in practice um I when I write code for example I write it from the perspective of a this code is going to live forever B I'm going to come back to it 3 months from now and not remember a about what I did because it's not the kind of stuff that stays in your brain forever and so having that mentality and really like saying you know things should be uh named appropriately um is uh is is is very important absolutely I've certainly been I love the name nextg by the way um whenever people start like a new software module um they'll call it like you know nextg Fubar um and six months will go by and you're like wait what what what what this has been production for six months what do you mean this is nextg because a new person that shows up this is that's the only thing that they not yeah somebody is last gen but yeah badly named um all right so uh yeah I can certainly see how um being able to like maintain code is going to be incredibly important so naming things is very useful um I'm curious as to how people use these AI features in practice is it people um using natural language to write all their um to do all the queries then or the people who are just like I don't want this I just want to write SQL is there a mix uh what do you uses do um so I would say uh it spans a spectrum um we wanted the power of language models to be available to all of our users including our analysts um and we don't think of uh say co-pilot as like a replacement for a business analyst um it's going to make them more efficient is going to make some of their more mundane tasks a little bit easier um but there's more um so when we design snowflake cortex which is the AI layer that ships with every snowflake deployment um we first of all wanted to make it super easy to use so we in fact expose cortex as a set of SQL functions so for example if you wanted to do summarization on a text column that is in a snowflake table that is as simple as making a SQL call um to this summarized function um and you pick the model that you want to send this text to um and it'll take your instructions and generate a summary for you and the list sort of goes on and on we also expose um what we call uh a complete which is basically you can think of it as you know assembling a prompt and sending it to a language model um but you can do this in SQL this prototype that I was telling you about um the news search prototype basically does that it assembles a prom um in SQL and sends it to the language Mark so that's like step number one which is existing analysts are now able to use the power of language models to do sentiment detection to do translation to do summarization um to do structured data extraction from text all of those things just come out of the box um in Snowflake you know courtesy of snowflake cortex um but we also designed these in a way um that you can now begin to build applications like chat bots on top of it by combining semantic search with a language model as I said that's the ingredient for a chatbot you can put all the documents for a particular topic into a snowflake table and then you can create a chat bot by which you can just have a conversation about those um about those documents um and but be also expose our most complex models like the SQL generation model to our customers so that if they say no no no no I do not want to call SQL functions I don't want to use like your easy to build chatbot I want to build something amazing myself I'm I have the people that are you know that can do the soft R we make that possible as well using something called container services which is our extensibility framework um our customers can f tune models themselves deploy them in container services and then write applications that talk to these deployed Mars um but you got the idea which is language models at every layer we don't try to lock our customers into one way of doing things we expose these at every layer so that they can mix and match what it is that they want to do um you know my Mantra is uh simple things should be simple complex things should be possible and that's really what snowflake cortex and container services um are about I really like that there's a sort of gradual um Evolution you mentioned there a spectrum so you can start with just doing simple things and like everything is just generated in natural language and then if you've got more technical skills you can build up to doing uh things in in Snowflake SQL that's right that's right that's right excellent and uh so you mentioned like often this is going to be te like whole data teams uh being involved or maybe even software developers so it does seem like working with data is very much um a team sport these days days so um can you talk about um how you see your customers doing collaboration on some of these tasks um you mean within within their teams or with us uh yes so within teams and I guess even like in organizations um often working with data and AI tends to be um several teams involved in things so do you have any advice on effective collaboration techniques I mean I think the the first thing that U I talked about when it came to collaborating was how do you make sure that uh uh you know silos are eliminated and data duplication is a thing of the past um this is where Horizon and all of the efforts that we're doing around collaboration comes into play which is uh you know don't reinvent the uh again you know this uh most big Enterprises have like you know um three implementations for every project or like things repeated over and over again simply because community communication is not all that effective so at a very basic level we want to make sure that uh you know we make it super easy for people to leverage the work of their colleagues um and U and and use the data um and then then you you know when when you go up one step from there uh to let's build uh let's build applications um that is very much a collaborative team sport um then you need one person um that knows the data you're probably more of an analyst uh type person um but if you're building a chat part for example you do want somebody that knows a little bit about you know promp engineering and what language models do um and uh and and so on um and then you also need governance on these things um so you sort of have do have to get the right people together in order to do these projects um but uh part of our design with cortex was simple things like an analyst being being able to use a language model to do interesting things within like the Ambit of what they do day-to-day that should be much easier similarly an analyst that sort of knows data that is looking at a new schema uh making them productive quickly um uh you know with snow pilot uh is something that sort of happens naturally um I think this does not take away from the need to bring people together um with different skills in order to make a project succeed U and pilot for example uh we have people that are search infrastructure Engineers um we have people um that know like data uh really really well we have language model experts that are doing things like fine-tuning we also have ux Engineers because you need to actually create a product um that people love that is easy to interact with um and so and then obviously you need like product managers and designers um in order to be able to get something like that uh done snow pilot is a complicated project but like you get the idea you do need to bring people together um and they need to have the right skills in order to get in order to get things like meaningful things done okay yes so really is about make sure that everyone's got access to the right information and um they being able to work at the sort of the appropriate level of uh technical skills uh for them that's right that's right that's right and this is also where like leads and managers are really important in this process and they have to have a mental idea of like this is what it takes to be successful with a project like this these are the skills that need to be there sometimes with small teams you know of two people sometimes it's four people but just like thinking through that um and assembling these teams is an essential ingredient for Success all right so I'd like to step away from uh the snowflake talk now so you also have another job as a partner at Greylock I'm a venture partner it's a part-time role yeah excellent uh CU you know you got all that free time for an extra job uh okay so um I'd like to know what uh data Andi companies you are most excited about right now yeah so um uh at grock we've invested in several what we call Foundation model companies these are I you know these are we think the infrastructure companies that will power the future um this is mustafa's uh inflection Mustafa is of course one of the world's uh suan uh is one of the world's renowned AI experts used to work at at at Deep mine um they also invested in a company called Adept um so those are at at the at the foundational um ler um but we do think um that there are a ton um of Enterprise application companies um that are going to be interesting um we don't think like AI is going to be quite like a magic SAU as I was saying you still need to acquire customers you still need to have that M um because pretty much everybody knows knows how to use like gbd3 or gbd4 on an API call you know the the bar for excellence is is quite a bit higher um and we're also excited by some of the newer generative appications everything from image generation uh to I think video generation is uh is is pretty exciting I don't know about you but I used to edit videos when my son used to play tennis it's a mind bogglingly like tedious job to do anything I us to try and make like three minute summaries out of matches that would last like 3 hours it's so hard to do um and I think like whether it is video generation or video editing um I think there's just a lot of value that they're going to that they're going to create um I think advertising and marketing um are going to be changed in uh in in a pretty big way uh my son who's also a software engineer uh recently showed me this l owered application for making experimentation on websites just a whole lot easier and he's like here drop this little piece of JavaScript we'll run the experiment we will generate potential variations run experiments for you um so I think there's like a set of these kinds of products that are going to be AI native um that U are going to have a big impact on both Enterprises and consumers I'm definitely very excited for um all these sort of generative video applications it just seemed like they've been kind of okay and coming soon for a while now so yeah I hope they they get their Spotlight um all right uh so the other thing is that it seems like there has been a lot of money thrown at AI companies just over the last year is there something you think is going to continue or do you think it's sort of a bubble that's about to burst uh I think uh people are definitely going to be asking questions um about uh about about sort of Revenue returns and what is actually uh going on over uh over there uh you know a 5% interest rate environment has a profound influence on startups uh I think it's hard for people to realize um that like the difference between zero and 5% interest rates is it's it's basically Infinity um and uh uh I think what is going I mean one of the things that happened last year um when we talk about large amounts of money being thrown at companies is that quite a bit of it but also Investments um by the csps by the very large platforms um and these Investments basically turned around into Cloud spent um you know so I don't think of that as like real investment um it's it's like it's like a little bit of uh uh as my colleague VI puts it uh taking your balance sheet and converting it into into into Revenue um that's of that's what that's what that is um I do think that DCs are cautious about not throwing too much money um into unknown kind of companies um and uh I think the time of uh like 100x Revenue valuations are definitely a thing of the past um uh AI I think has perhaps another six to nine months before similar kinds of questions will be asked um about revenue and revenue growth and and things like that so um you know when it comes to our customers for example already they are asking us about hey how much should I invest how should I be looking for Roi where are we creating value um these are all good hard questions for people to ask um and honestly I think like avoiding some of the height will keep us all in a better place um because bubbles do not do you know anyone any favors absolutely so uh it sounds like it's good that uh there are some hard questions being asked before money's being thrown it of course excellent all right money money being thrown itself of course is a funny phrase but yeah true uh all right so uh before we wrap up do you have any final advice for organizations wanting to improve their data management or AI capabilities yeah I think that um uh there are pretty solid breakthroughs when it comes to Ai and language Mark um it is not right um I think like thoughtfully using the power language models um you know can make things more efficient um this whole like writing code like there is a before and an after if you have an assistant even having access to chat GPT um you know I tell people I have probably written code with like the python date functions like for the past 20 years but I can never remember exactly what they do to be able to Simply type in the chat GPD here I'm trying to extract date from you know a string that looks like this and instantly gives me that answer so that I'm not fishing through eight stack Overflow posts about doing that I think that kind of productivity is is super real um I would say that embracing what these models enable and using a platform like snowflake to build on top of um we take enormous pride in the fact that our AI infrastructure is seamlessly integrated with everything else that's going on in Snowflake the Investments that you have um in Access control just carry over naturally to everything that we provide with AI um I think like having partners like that um that are about providing real value and not just hyping up the latest thing um that uh you know customers can put money into I think that's an important thing to keep uh to keep in mind but I think the value that companies can get out of AI which basically comes from understanding language understanding knowledge making it easier for us to talk and query it um I think that is a breakthrough um and I would definitely uh encourage every CIO every CDO uh to think about how can they make existing things that they do that are tedious um are difficult more efficient there's a whole bunch of those to go after wonderful all right uh lots of opportunities there then uh excellent so uh thank you very much for your time s thank you Richie this was a fun fun fun conversation oh\n"