The Unique Nature of AI Product Management with Marily Nika, Gen AI Product Lead at Google Assistant

The Challenges of AI Product Management: Navigating Probabilistic Thinking and Experimentation

When it comes to managing products that utilize Artificial Intelligence (AI), there are several key challenges that come with this type of product development. One of the most significant hurdles is the probabilistic nature of AI itself, which can lead to a clash between data scientists and software engineers who may have different backgrounds and approaches to problem-solving.

For those coming from a data science background, much of their work is probabilistic in nature. This means that they are accustomed to dealing with uncertainty and variability in their results, and as such, they may be more comfortable with the idea of experiments and testing hypotheses. On the other hand, software engineers who come from a deterministic mindset may struggle to adapt to this way of thinking, which can lead to frustration and demotivation.

In traditional product management, success is often measured by metrics such as user acquisition and engagement numbers. However, in AI product management, success is not always so clear-cut. Instead, it may be more about testing hypotheses and iterating on the product based on the results of those tests. This means that teams must be willing to pivot quickly in response to new information, which can be challenging for some individuals.

Fostering a culture of experimentation within a team is crucial when working with AI products. This includes encouraging engineers and stakeholders to think creatively and take calculated risks, as well as providing them with the resources and support they need to experiment safely and effectively. Without this kind of cultural shift, teams may struggle to adapt to the changing nature of AI product management.

Another key challenge in AI product management is defining what constitutes "success" for a product. In traditional product management, it's often easy to see when something has failed – but with AI products, success can be more nuanced and difficult to quantify. Instead of relying on metrics such as user acquisition numbers or engagement rates, teams may need to rely on more subjective measures of success, such as the quality of the predictions made by the model or the satisfaction of the end-user.

Let's consider a hypothetical example to illustrate these challenges in action. Suppose we're working for a startup that has created an app which replaces the native keyboard on users' phones. The premise of this app is that it learns from users' texting behavior and uses that information to predict what they want to say, allowing them to communicate more efficiently without having to type out every word.

In this scenario, our team may launch the product with a certain model in place, which has been trained on a dataset of user interactions. However, as we collect more data from users, we begin to realize that there are areas where the model is not performing well – for example, it's consistently mispredicting sarcasm or idioms.

To address this issue, our team decides to run an experiment to test a new version of the model. We split our user base into two groups: one group will continue to use the original model, while the other group will be switched to the new model. We then measure the success of each model based on metrics such as the ratio of accepted suggestions to prompts made by users.

If the results show that the new model is performing significantly better in certain areas – for example, it's consistently predicting sarcasm with higher accuracy – we may decide to roll out the new model to the rest of our user base. However, if the results are inconclusive or even suggest that the new model is performing worse, we'll need to reconsider our approach.

This kind of experimentation and iteration can be challenging for teams, especially when it comes to dealing with the uncertainty inherent in AI product management. But by fostering a culture of experimentation and being willing to pivot quickly based on new information, teams can improve their products over time and provide users with better experiences as a result.

Ultimately, success in AI product management requires a deep understanding of both the technical challenges involved and the user needs that drive those challenges. By embracing the probabilistic nature of AI and working closely with users to test hypotheses and iterate on our approach, we can build more effective products that meet the evolving needs of our users.

"WEBVTTKind: captionsLanguage: enwhen I first became a PM I I was an aipm on day one so I thought that all the challenges I was facing were challenges everyone else was facing but um you know I was doing it before it was cool type of thing um so when you work with AI there's just so many little things you need to consider number one the probabilistic nature of AI um this means that every single time you use a feature you're going to get kind of a different behavior and when you're building a product that's you hope to have a specific experience for the users you can really guess what that's going to look like and a lot of people are not comfortable with that let me give you an example I I was a part of a product review and I was demonstrating this this AI product AI feature I was launching and um leadership was trying it out and they said hey I don't think this works well because it's not consistent every time I'm using it I'm getting a different answer and I said well that's the probabilistic nature like this is working as intended so probabilistic Nature has a lot of um challenges it gives you as Z PM because you you have to make people around you be comfortable with this uncertainty if you will but it's also kind of the beauty example is if you use um you know Google Gemini or um Dolly to generate an image with the same prompt you're gonna get a different image generated and um or if you're driving a a car and you used self-driving there's always like a a little probability that comes to it so nothing is 100% safe nothing is 100% accurate so that's um chall number one Challenge number two there's this experimentation culture um that is tied to AI products because just because of this uncertainty um you need to do way more experiments than in classic product management um because you're just making hypothesis at all times and you say well I believe that if I deploy this model into um this experience we're going to get outcome X but it's not always the case so there's a ton of experiments there's a ton of pivoting and you know if you have a team that's very much used to okay I'm GNA do a b c and then achieve y um they don't really get that they get demotivated they um they don't know if they're on the right track they don't know if you know they're progressing in their career so fostering a culture of experimentation in your team engineers and stakeholders is important so I had in the past to figure out what success meant for a product and in traditional product management it's okay we're launching we're getting hundreds or millions of users awesome figure out whereas in AI product management success may be hey we made hypothesis a we tested it out we figured out hey that's not going to work and we pivoted and that hitting a mouse and of being able to get an answer to the hypothesis is Success so it's more of a milestone based um progress um if you will so yeah I I can speak about AI product forever but I think these are the the things that stand out the most okay so certainly that first point about uh things being probabilistic is really interesting because uh if you come from a data science background a lot of stuff is probabilistic if you become from a software engineering background like almost everything's deterministic so I can see how there's a bit of a culture Clash there um okay so your second point about um experiments was really interesting um can you just go into maybe a bit more about like what do these experiments look like and who performs them and what are you expecting let's assume we work for a startup and this is totally hypothetical by the way and let's assume that the startup has created an app which essentially replaces the native keyboard on your phone and let's assume that the whole premise is hey this keyboard learns from your texting Behavior so that it can actually predict what you want to say and it's going to kind of speak for you and on your behalf eventually because he knows how you speak he knows the words you use and you don't need to type everything you will just say hey did you mean to say hey how are you doing today with like two question marks just pop it um it gets prefilled so let's assume that's it let's assume you've launched and you know you have a certain amount of users people like it now in the back end and in the startup you likely have a partner which is a research science scientist that gets a lot of the the data that come in from users and logs um and they can use this data in order in the back end to improve their model so in the back end the scientists will come to you and they will say um hey we see the usage we have some metrics to measure success these metrics can be you know correct prediction um of what the user meant to say and satisfaction and all these things but I have a model that I think that if we run and launch this model and replace the model is currently live we are going to get improved metrics so we're going to run a little experiment and we are going to say all right for 50% of the population we are going to keep the model that's currently out that can predict what merily wants to type um as they tape it and then for the other 50% we're going to um silent launch the new version of this model and we're going to measure um our our start metric which can be ratio of accepted suggestions let's say prompted to the user um if this metric is goes up for the the version we're testing which is the improved model awesome will'll start rolling out to the rest of the population if not we'll roll back out um so that we just discard this so there's a ton of that and sometimes I I had I launched the new version of the model even if it was performing worse because I knew that over time you'd get more data and it would get improved but as a user I've been in the the site where I I had a product that I loved and it was perfect and then suddenly I upd upgraded to like the new version and I hated it um but that's what Ai and probabilistic nature is you kind of need to learn that there's a there's a learning curve in the system that needs to learn you and you need to be careful um you need to be comfortable as a user to know that hey this thing listens to me and you know it it understands me and how I behave and it's can improve a product for mewhen I first became a PM I I was an aipm on day one so I thought that all the challenges I was facing were challenges everyone else was facing but um you know I was doing it before it was cool type of thing um so when you work with AI there's just so many little things you need to consider number one the probabilistic nature of AI um this means that every single time you use a feature you're going to get kind of a different behavior and when you're building a product that's you hope to have a specific experience for the users you can really guess what that's going to look like and a lot of people are not comfortable with that let me give you an example I I was a part of a product review and I was demonstrating this this AI product AI feature I was launching and um leadership was trying it out and they said hey I don't think this works well because it's not consistent every time I'm using it I'm getting a different answer and I said well that's the probabilistic nature like this is working as intended so probabilistic Nature has a lot of um challenges it gives you as Z PM because you you have to make people around you be comfortable with this uncertainty if you will but it's also kind of the beauty example is if you use um you know Google Gemini or um Dolly to generate an image with the same prompt you're gonna get a different image generated and um or if you're driving a a car and you used self-driving there's always like a a little probability that comes to it so nothing is 100% safe nothing is 100% accurate so that's um chall number one Challenge number two there's this experimentation culture um that is tied to AI products because just because of this uncertainty um you need to do way more experiments than in classic product management um because you're just making hypothesis at all times and you say well I believe that if I deploy this model into um this experience we're going to get outcome X but it's not always the case so there's a ton of experiments there's a ton of pivoting and you know if you have a team that's very much used to okay I'm GNA do a b c and then achieve y um they don't really get that they get demotivated they um they don't know if they're on the right track they don't know if you know they're progressing in their career so fostering a culture of experimentation in your team engineers and stakeholders is important so I had in the past to figure out what success meant for a product and in traditional product management it's okay we're launching we're getting hundreds or millions of users awesome figure out whereas in AI product management success may be hey we made hypothesis a we tested it out we figured out hey that's not going to work and we pivoted and that hitting a mouse and of being able to get an answer to the hypothesis is Success so it's more of a milestone based um progress um if you will so yeah I I can speak about AI product forever but I think these are the the things that stand out the most okay so certainly that first point about uh things being probabilistic is really interesting because uh if you come from a data science background a lot of stuff is probabilistic if you become from a software engineering background like almost everything's deterministic so I can see how there's a bit of a culture Clash there um okay so your second point about um experiments was really interesting um can you just go into maybe a bit more about like what do these experiments look like and who performs them and what are you expecting let's assume we work for a startup and this is totally hypothetical by the way and let's assume that the startup has created an app which essentially replaces the native keyboard on your phone and let's assume that the whole premise is hey this keyboard learns from your texting Behavior so that it can actually predict what you want to say and it's going to kind of speak for you and on your behalf eventually because he knows how you speak he knows the words you use and you don't need to type everything you will just say hey did you mean to say hey how are you doing today with like two question marks just pop it um it gets prefilled so let's assume that's it let's assume you've launched and you know you have a certain amount of users people like it now in the back end and in the startup you likely have a partner which is a research science scientist that gets a lot of the the data that come in from users and logs um and they can use this data in order in the back end to improve their model so in the back end the scientists will come to you and they will say um hey we see the usage we have some metrics to measure success these metrics can be you know correct prediction um of what the user meant to say and satisfaction and all these things but I have a model that I think that if we run and launch this model and replace the model is currently live we are going to get improved metrics so we're going to run a little experiment and we are going to say all right for 50% of the population we are going to keep the model that's currently out that can predict what merily wants to type um as they tape it and then for the other 50% we're going to um silent launch the new version of this model and we're going to measure um our our start metric which can be ratio of accepted suggestions let's say prompted to the user um if this metric is goes up for the the version we're testing which is the improved model awesome will'll start rolling out to the rest of the population if not we'll roll back out um so that we just discard this so there's a ton of that and sometimes I I had I launched the new version of the model even if it was performing worse because I knew that over time you'd get more data and it would get improved but as a user I've been in the the site where I I had a product that I loved and it was perfect and then suddenly I upd upgraded to like the new version and I hated it um but that's what Ai and probabilistic nature is you kind of need to learn that there's a there's a learning curve in the system that needs to learn you and you need to be careful um you need to be comfortable as a user to know that hey this thing listens to me and you know it it understands me and how I behave and it's can improve a product for me\n"