Bridging the Patient-Physician Gap with ML and Expert Systems w_ Xavier Amatriain - #316

The Challenges and Opportunities of Evaluating Symptom Checkers

Evaluating symptom checkers is an essential task that requires careful consideration of various factors. Dr. Karin Nyberg, a renowned expert in machine learning for healthcare, has been working on this topic for several years. She believes that it's not just about being better than the average doctor, but rather about being better than the best doctor and being able to augment human capabilities.

Dr. Nyberg emphasizes the importance of benchmarking against commonly used symptom checkers and medical vignettes. These resources are widely available and can be used to evaluate the performance of a symptom checker. However, she cautions that they may not always provide accurate results and should be used with caution. Despite these limitations, Dr. Nyberg believes that using well-established benchmarks is still a useful way to gauge the performance of a symptom checker.

One of the key challenges in evaluating symptom checkers is determining what constitutes "good" coverage of medical conditions. Dr. Nyberg notes that even if a symptom checker can identify all possible diagnoses, it may not necessarily provide accurate information about the underlying causes or treatment options. She also highlights the importance of considering the limitations and biases inherent in any machine learning model.

Another critical aspect of evaluating symptom checkers is assessing their ability to handle complex and nuanced medical cases. Dr. Nyberg believes that even if a symptom checker can identify a specific diagnosis, it may not be able to provide a comprehensive treatment plan or recommend follow-up care. She emphasizes the need for symptom checkers to be designed with human oversight and review mechanisms in place.

To address these challenges, Dr. Nyberg's team has been exploring various machine learning techniques, including transformer models. These models have shown promise in natural language processing tasks, such as text classification and sentiment analysis. By fine-tuning these models on specific datasets related to healthcare, her team can develop symptom checkers that are more accurate and effective.

One of the benefits of using transformer models is their ability to capture complex patterns in language data. Dr. Nyberg notes that this can be particularly useful when dealing with medical terminology and jargon. By fine-tuning these models on large datasets of medical texts, her team can develop symptom checkers that are more linguistically accurate and effective.

Another application of transformer models is generating assistance for physicians as they interact with patients. Dr. Nyberg's team has been exploring ways to use these models to generate patient reports, treatment plans, and other clinical information. This can help alleviate the workload of clinicians while also providing them with valuable insights and recommendations.

The potential applications of machine learning in healthcare are vast and varied. Dr. Nyberg believes that symptom checkers have the potential to revolutionize the way we diagnose and treat medical conditions. By fine-tuning these models on specific datasets related to healthcare, her team can develop symptom checkers that are more accurate, effective, and user-friendly.

As the field of machine learning for healthcare continues to evolve, Dr. Nyberg's work is an important contribution to our understanding of this complex topic. Her emphasis on the importance of human oversight, nuanced medical cases, and linguistic accuracy highlights the need for careful consideration when evaluating symptom checkers. By exploring new techniques and approaches, her team can develop symptom checkers that truly make a difference in the lives of patients.

The Connection to Self-Driving Cars

Dr. Nyberg's work on symptom checkers has led her to compare their development to that of self-driving cars. She notes that while both projects aim to automate complex tasks, they require different approaches and considerations. In the case of self-driving cars, the focus is on safety and reliability, whereas with symptom checkers, it's about accuracy and effectiveness.

Dr. Nyberg believes that both projects share a common goal: to augment human capabilities rather than replace them entirely. In the context of self-driving cars, this means providing the driver with critical information and support to ensure safe operation. Similarly, in the case of symptom checkers, the goal is to provide clinicians with valuable insights and recommendations to inform their diagnoses and treatment plans.

Using Transformers for Healthcare

One of the key techniques that Dr. Nyberg's team has been exploring is the use of transformer models for healthcare. These models have shown promise in natural language processing tasks, such as text classification and sentiment analysis. By fine-tuning these models on specific datasets related to healthcare, her team can develop symptom checkers that are more accurate and effective.

Dr. Nyberg highlights the importance of transfer learning in this context. By leveraging pre-trained transformer models and fine-tuning them on large datasets of medical texts, her team can develop symptom checkers that capture complex patterns in language data. This can be particularly useful when dealing with medical terminology and jargon.

Generating Assistance for Physicians

Another application of transformer models is generating assistance for physicians as they interact with patients. Dr. Nyberg's team has been exploring ways to use these models to generate patient reports, treatment plans, and other clinical information. This can help alleviate the workload of clinicians while also providing them with valuable insights and recommendations.

The potential benefits of this approach are significant. By automating routine tasks and providing clinicians with critical information, symptom checkers can help reduce errors and improve outcomes. Dr. Nyberg's team is working to develop symptom checkers that are not only accurate but also user-friendly and intuitive.

Conclusion

Evaluating symptom checkers is a complex task that requires careful consideration of various factors. By exploring new techniques and approaches, Dr. Nyberg's team can develop symptom checkers that truly make a difference in the lives of patients. Her emphasis on the importance of human oversight, nuanced medical cases, and linguistic accuracy highlights the need for careful consideration when evaluating these tools.

As the field of machine learning for healthcare continues to evolve, Dr. Nyberg's work is an important contribution to our understanding of this complex topic. By exploring new techniques and approaches, her team can develop symptom checkers that are more accurate, effective, and user-friendly.

"WEBVTTKind: captionsLanguage: enwelcome to the 1200 i podcast I'm your host Sam Cherrington did you miss Tomac on AI platforms if so you'll definitely want to check out our Tomac on video packages featuring over 25 sessions discussing expert perspectives on ml and AI at scale and in production you'll hear from industry leaders such as Facebook Levi's Zappos and more about their experiences automating accelerating and scaling machine learning and AI in each video package you'll receive our keynote interviews the exclusive team teardown panels featuring Airbnb and SurveyMonkey case studies and more over 13 hours of footage once again visits will mock on Comm slash videos for more information or to secure your advance purchase today alrighty everyone I am on the line with cha VA a Matar a chav EA is the co-founder and CTO of cure I cha VA welcome back to the 1200 I pod cast yeah thanks for having me Sam so for those that don't recognize the name cha VA was actually our third guest after switching to the interview format so this was over three years ago and so much has happened for both of us we last had an opportunity to catch up at the AWS remoras conference when was that back in June or so and I thought it makes sense to get cha VA back on the show to get a little bit of an update as to what he's been up to so when we last spoke to chav EA he was leading the engineering team at kora doing a ton of work on recommendation systems and other machine learning use cases prior to that he the machine learning algorithms team at Netflix and again he's currently the co-founder of cure I a startup in the healthcare a I space javi why don't we just jump right in and have you bring us up to date on Kyra and what you're up to there sure yeah so here I we are using state-of-the-art AI machine learning for a very big and bold mission which is to basically bring the world's best health care to everyone and of course that is a as I said a very bold and very big mission and we are making it concrete by basically focusing first on primary care so we wanna billing bring the cost down of providing good not good but best quality health care to everyone by using AI and machine learning to bring it down to a place where it can be affordable and can be a scalable and everyone in the world who has a phone can have like primary care in a very convenient accessible and affordable way and so when you're talking about allowing people to use their phones for primary care are we talking about like turning your phone into a tricorder or are we talking about using your phone as a kind of a vehicle for accessing human physicians or something in between or something totally different it's it's a combination of the above and plus something is lightly different but the the realization is that you know a lot of what mmm can be solved in primary care and in health care it really boils down to having conversations between patients and physicians and of course providing input to those conversations from different sensors and labs and other places right but the the core of happens in any kind of like medical visit is a conversation between the patient and the doctor and that's the part that can be really automated and not only automated but actually brought to a point where it's it's you can do it from anywhere that you have a phone with any kind of connection and you can start having a conversation and chatting with a with a doctor that's all you need and of course there's always there's always gonna be things that you can do over the phone like you can get your medication over the phone right but that's okay because we can always deliver medication to your home or we can always always refer you to a lab that is near by and get the results from the lab and whatnot but the the really key issue the part that we're focusing on is in that conversation that happens between patients and doctors and we have a service where we employ physicians to basically be on the other end of the line to have those conversations and what we're doing is applying this AI machine learning approaches to automate as much as possible this conversation so we can augment and scale the doctor so an important piece here is we're not replacing the doctors with the AI in machine learning what we're making them is we're giving them I usually say we're giving them superpowers that and in instead of being able to see say 100 days in two days they'll be able to see 10,000 the conversation the most of the easy stuff will be handled by the AI and machine learning and they'll be able to focus only in the places that they're needed the most is a hundred patients a day a typical metric for practicing physician it really depends and in primary care the numbers are roughly about the average is I think 12 minutes per patient so you can it depend on what the working hours are for for doctors that's actually that's in the in the u.s. okay all the places like India for example where it's much less than that and doctors can see way more than 100 patients in a day and and the reality is they're not even seeing them there the nurses are taking care of them before they get to see the doctor but they they count as having seen the doctor so it's yeah it's not a it's not a but but you can obviously you can imagine that a doctor with 15 minutes or less to see a patient and to remember first of all the history of the patient what's going on ask the right questions get the right answers remember everything they know about medical school and come up with a diagnosis and come up with a recommendation it's really hard for any you know human being to do that at that rate right and of course there's a lot of mistakes a lot of things that happen because of this what we're trying to do is say hey hand that off as much as possible to the algorithms and the machines and then make sure that when the doctor comes in they come at the right time and they come at the point where they have all that information laid out for them and they can verify the decisions and make sure that they're saying the right thing and at the same time that's what we mean by augmenting right the doctors we are of course giving them information that is state-of-the-art and based on real science and they can get that information in a way that they can parse it and they can say okay yeah this is the right position I agree instead of sort of like having to deal with all the messiness of gathering that information parsing it remembering things going through the electronic health record and then making a decision all of that in less than 15 minutes right now there are aspects of this that sound very much like from a the kind of technology I'd expect to see like other conversational agents where you've got some back-end resources or team that you want to optimize these of the time and allow some AI system to handle the of easy easy responses I've got to imagine that you know things get a lot more complicated and Messier different certainly more important and healthcare side of things can you talk about some of the unique challenges associated with applying this kind of technology in healthcare yeah yeah definitely there's there's a lot of challenges and you're right you could you could think that you know the typical approach to dialogue systems and all the advances that we're having recently on this kind of chatbot and things like transformers and birds and gbt tools and things like that are useful and they are I mean we are using all of the above in different ways but the reality is in a domain like healthcare medicine where the stakes are so high you cannot leave things out to you no chance or just to a model to rely on on this kinds of conversations to actually following the right path and there's a lot of examples out there where you can trick any of these models to say things that seem reasonable for any human being but they're medically completely wrong right and there's been a few examples of that and and of course that's that's the that's the key the key issue that we were tackling with it's like how do we combine prior knowledge about what's correct and incorrect in science and in medicine with some of this automation right we do have a key insight here a very important thing of what we're doing is we we we control the end to end so we have both sides of the conversation and we meaning the patient and the expert the doctor right in this case so the interesting thing in our in in the way that we're applying this technology is that we can deploy the conversation helpers in in both ways like we can we can decide to serve something directly to the user if we want to but we can also serve it to the doctor and the doctor can use it an assistant and make a call whether that makes sense or not if it's being helpful right that's a really important thing right because then you go you're basically walking the line between a chat bot and an assistant kind of like a gmail assistant if you will when you get an auto-response suggestion right and the doctor can decide ok yeah this makes sense I'll just take it as it is or this doesn't make sense but they're still sort of like making sure that that's medically correct and at the same time we are getting training data on how our model the model that we're building are accurate or not and in what way they are or not we've we have actually an upcoming publication in one of the works of the new ribs which basically talked about this how to constrain the flexibility of this sort of like deep neural dialogue systems with expert feedback in order to make sure that the information is well is that accurate for a domain particularly in medicine in this case so we need to combine the best of both worlds and by the way we do the same thing in other parts of our modeling strategies like diagnose diagnosis in diagnosis we're also combining expert systems which is you know old-school day i with deep learning and i think you cannot rely 100% on any of the two but you can get much better if you combine those two strategies in in some smart ways which I think it's a it's a key insight for medicine but it's also something that will happen in its being advocated for by many people in machine learning in general right it's like you can't blindly trust models that come only with it from the data with no prior no or some form of knowledge constraint and a lot of people are trying to figure out like how do you combine those two things right how do you combine all the power you get from models are basically just being trained from lots and lots of data with knowledge that we have and structuring that knowledge in form of prior into the models right right that is a theme that continues to recur here on the podcast and in my conversations one interesting thought there is you know certainly on the probabilistic side we've benefited from a huge recent explosion in available tools and you know algorithms and the like you've mentioned a bunch of those already Bert etc and you know we've got tools like tensor flow and pi torch and many many many others whereas expert systems you know we think of as kind of a throwback to pre winter you know AI and I can't think of you know not being deep in that space I can't think of kind of what the leading open-source expert system software might be is there a a tools ecosystem there or is it you know are people building you know when people have this realization that they need both and not one or the other are they kind of building it from scratch yeah I don't think there there is such a thing I don't think there's a you know that there's a expert system component for tensor floral pie tours or should there be does that make sense you know with something like that have benefited you or is it you know is it basic it basically just kind of rules that we know how to code them because you know it's it's not probabilistic not really I mean and and by the way they can be probabilistic right at the end of the day what you have with this expert systems is it's a graph and then you can do probabilistic inference on the graph and you can do different things on that graph so basically I'm thinking a generic tool for expert systems would be rather simple in the sense that all is a way to represent sort of like graph and make him for instance on those graphs to go wouldn't be that complicated to sort of like have a component for tensorflow or for for pi tours that basically does that for you so the key thing here is those expert systems rely a lot still on sort of like manual labor and and just to give you an exam example in the case of some of the expert systems we're using we're using some that have been developed for over 50 years right so there's a there's a couple of expert systems for medical diagnosis that go back 50 years and we we're using both of them actually and and interestingly you know they there's a lot of knowledge in there right you can think about you know 50 years of a bunch of hundreds of really well trained physicians in coding knowledge and information about medicine in a graph right and that's really valuable and and it's really something that if you can then inject it into any learning system you get a lot of a lot from it right to your question there is no you know there's no tooling for that on the other hand you can do interesting things like one of the things we've done is use this expert systems basically data generator to generate synthetic data and train the learning models from the data that is generated from the expert system right that's an example of something that I think is very useful and really really valuable because then you can you can even merge synthetic data with natural data and you can tweak it in ways that you can learn a model that actually now has some prior knowledge that has been injected in the form of ground truth data so to speak can you speak to that particular point and a little bit more detail yeah sure okay so so in this case the thought process was like that right like we know that you know it really have very good data and we train a deep learn learning neural net we could get to like a really high accurate diagnosis system the reality is that high quality data does not exist if you go to electronic health records which we have used ourselves I mean we have a project with Stanford where we have been working with them on using their chronic health records and this has been something that others have done like Google and did mine and you name it it's like learning predictive models from electronic health records electronic health records the data quality is really really poor notoriously so yes yes and there's a lot of reason for it but one of it is you know they weren't designed for the purpose of diagnosis they were designed for the purpose of billing and to make sure the insurance companies got their money back so there's a there's a ton of issues with them and so but but but again that data is valuable it's not like it's totally noise there is something in it so how how can you generate some kind of data that it's more you know solid and until I can treat more as a ground truth well you can go to this expert system which again they're all they are is you know a graph and you can start activating notes and generate data from that graph that basically becomes sort of other cases that you used to train your deep learning model and that's what we we showed in this desert paper we published last year where we basically generated data from the expert system we injected noise to that data because an interesting and important thing is you want to train a model that is robust to noise right the problem with expert systems one of the problems is that they're not capable of dealing with noise so in other words if you know if the patient doesn't say exactly the symptom they have and they make a mistake because they didn't understand the question or the doctor enters and a wrong thing the expert system is is basically doomed and and and it's gonna give you an incorrect output that's not the case for you know you can train machine learning models are you know relatively robust to know it's because you can even do adversarial training and you can do a lot of different things to make them robust so how do you combine both well you can you can also inject noise to the expert system that's basically what we did so we generated data from the expert system we injected different kinds of noise one example which I think will be very off bill is you can inject noise by saying hey I'm just gonna randomly inject things symptoms that are very common right I'll just add coughing to everything because you know coughing is something that people have in general no matter whether they have one disease or not right it's like it's you always can call for its knees or something like that it's very prevalent and very common it's a typical thing and can confuse really confuse an expert system but it's if you train a machine learning model on ignoring cough because it's something that's very common and it's not very it's not gonna help determine what the diagnosis is well that and then you build a robust model so we again generated synthetic data from from these systems injected noise in ways that we made the learn model more robust and we also combined that synthetic data with natural data that we had from EHRs and other sources to also prove that you can you you don't need to constrain yourself to just one single kind of data right all you need to do is combine it in smart ways to sort of like understand because there's there's obviously value to training from real world beta all you need to do is figure out how to combine it with more clean data and and data you can trust you mentioned the this kind of injecting noise via adding symptoms that are frequently recurring what are some other examples of the the kind of noise that you're injecting and more broadly how do you quantify the value of the synthetic data in building out your models yeah so okay so the to the first question I mean I think that the key insight to adding nodes in a domain like medicine is that you do need you need to have some domain knowledge right it's like when I when I give you the example of adding symptoms are very common that makes sense right because that it makes sense because we know about medicine like okay yeah the explanation makes sense another examples like well you can remove symptoms that are very rare or are likely to be missed right that's another thing that makes sense once you explain it right but you need to have some insight and you need to talk to doctors and that's something we do all the time right this kind of strategies don't come up by you know sheer imagination they come up because we talk to our physicians and we talk to them and say hey what how do you deal with these issues where issues are common and that lead to mistakes in diagnosis how can we make sure that our model doesn't make the same mistakes so I think that is a key and important thing is you need to work with domain experts and that leads me to answer your second question and let me just pause because that's a kind of an interesting point I think you know what I think of noise at least from a classical engineering perspective I think if noises like this junk that's you know uncorrelated from your signal but what you're suggesting is that at least when you're creating synthetic data your noise needs to be correlated with your actual noise that you need to expect you can't just have you know purely random noise because that won't help your model yeah that's pretty much it I mean here it's a slightly different right and notion of of noise if you will and but what would you have a synthetic data that is strictly true if you will because true in a scientific sense because been generated by kind of like an expert system that it's been designed on on science but what you need to do is inject noise that mimics more the reality of nature right and the messiness right but that noise yet needs to model some of the natural messiness that you see in real life and you need to not inject it yeah it's not wide nose when in that sense right it's noise that tries to turn that synthetic data into something that is more real right if you think about it I mean I use sometimes as a metaphor of life the self-driving cars also using tech data that is generated from video games and it's like well you can imagine that you're training your self-driving model on data from from Grand Theft Auto but you need to inject I don't know and you need to inject rain and you need to inject things that are not maybe in your synthetic data and they're adding noise to the capture of the image but in a way that mimics real-life situations right not just white noise mm-hmm in that sense it's a bit like the concept of domain adaptation yeah I mean you you could consider it that that for sure and that's another it is a very mean domain adaptation in itself I mean we could go into that it's another important thing that you need to do in many in many cases because and yeah youyou you're right it could be seen as that right because sometimes you are training on ideal data but then you're gonna be faced with real-life data that it's gonna have to be interpreted and in the context of the of the ideal data that you use for training so yeah it is yeah okay so you're about to take on that second question yeah the second question was about how do you even you know how do you know that that that the data is good or even the model that your training is good and and and you know beyond that the relative advantage you know how do you compare with and without using the synthetic data you know that uh is it a training time or is it a you know accuracy or some combination of all these things it's mostly about accuracy right and and the the problem is that the definition of the accuracy is again really tricky and and and and not that obvious right accuracy in the context of medical diagnosis is a very very tricky thing to define particularly because you would hope that by asking physician you would get a ground truth but that's not the case right there's studies out there for example the human DX project that published some studies that the on their data set the average accuracy of a single physician was sixty percent right which is really low now if you if you take the consensus of twenty fishin's that got up to over eighty percent which is much better but then of course you need to have twenty physicians agree and you're still up to eighty percent which is a lot better but not necessarily comforting if you're the patient exactly and and I think that's that's a key issue in like what do we treat as ground truth so in our case we we I mean we use a combination of a lot of things we use a combination of sort of like publicly known data set which there's not that many unfortunately for for this domain and they're just you know a few what's called medical vignettes that you can use to evaluate we also use our own physicians to QA and we make sure that we have sort of like several them agreeing on the cases so we know that we're right and then at the end it there's also this kind of like synthetic data rights like you need to treat that synthetic data as pseudo ground truth in the sense that as I mentioned if you think about it that that synthetic data is the result of as I said before 50 years of research from hundreds of physicians who have agreed that that's what you know that particular disease should be defined as and that's those are the symptoms that are related so it's it's as good as a ground truth as you can get in many other cases right so again it's I wish I had a like a great answer for this but the reality is I don't it's like it's it's a it's kind of an iterative process where you like treat one day there's a ground truth but then you compare it to your other data you let your physicians go through it and say yeah this is correct or it is not and then and you feed it back and you keep improving both overtime and I think that's that's another very important lesson learned here is that you need to design all the systems as really learning systems right so it's it's not only about what's their accuracy today it's more about how can you make sure that the accuracy and all the other metrics you care about improve over time right and in the mean time that the the important thing is like we always default to humans right it's like we'll always default to a human doctor and improve the model over time and and just tell that human doctor like hey our model thinks that this three things are important you want to consider them and the doctor will say yes or no and it's they're called and you know we'll be as good as the as the doctors are but over time we we are pretty sure actually even in our outline evaluation metrics we think that we're already our models are already at least as good if not better than the average doctor but even with that it's not enough right it's like they need to be better than the best doctor to even make it feasible to rely on them but they're a good assistant and a good alimentation to the human physician for sure do you have you made any attempts to benchmark the third-party expert systems with regard to you know some elusive metric around accuracy or you know I guess that the thought is that you know even if we were confident that each of the elements in this expert system you know was vetted by the 20 doctors or whatever required to you know have a consensus that you know has some sufficient level of accuracy you know medical perspectives have changed significantly over 50 years we may I don't know the extent to which this is tracked in this expert system but you know there are diagnostic practices that apply not equally across different groups of patients so you have all the potential for all kinds of biases within a data set like that have you made any attempt at kind of evaluating that I mean we are constantly evaluating that with our data but it's really hard to come up with a you know something that I I i would dare to publish right because it's it's it's the problem is the same it's like there is no no ground truth there's a there's a couple of papers on evaluating different systems and different online symptom checkers and those are the ones that everyone is using as sort of like the benchmark and there's a paper by semi gram on evaluating symptom checkers and there are some medical vignettes that she published which are commonly used by a bunch of people including some like Babylon in the UK and so on where they publish things like well we use this vignette because that's all we have that at least it's commonly available and you can benchmark against but they're far from you know something that it's that you could consider sort of like has good coverage of medical conditions and and you can trust us as being comparable but that that being said again I think that the the reality is as harsh as it may sound it's not too hard to be better than the average physician but again that's not enough that's that's not convincing like if I told you like oh I can build a self-driving car that it's better than the average teenage driver would you be okay like well probably not because the average teenage driver is not somebody I would trust on an automated driving machine so I think here is it it's it's pretty much the same it's it's not about being better than the average doctor it's about being better than the best doctor and being able to augment and always sort of like fall back on humans and I think that's exactly I like that comparison to self-driving cars a lot because I think what we're trying to build is not a completely autonomous vehicle right we were trying to build this AI automation as an assistant to the driver just like many cars do right now but in this case the driver is an expert who is a physician one more question for you you mentioned earlier that among the techniques that you're relying on you do make some use of transformers burt GPT - that kind of thing how does that play out in what you're building that plays out in in many different ways I mean there's there's a lot of great things about those approaches that the one that I think is probably the most relevant in in our case is the fact that it's transfer it's all about transfer learning right it's about if you have a great model that has learned in general how to speak sort of sort of say you can then fine tune it on some specific domain to become better about speaking about healthcare so a lot of the approaches we take is we look at some of this models we fine-tune them on very specific data that we have that is focus on health care and then we can use it to do a bunch of things I mean those the output of those models can be used in the context of a CAD model or a dialog system but you can also use them to generate features for anything for a classifier or you name it right and I think they because they they build a representation of language in general right so so we we use them as inputs to many of the things we do but more directly we also use them as as I was mentioning before to generate assistance to the physicians as they're chatting and they're talking to the patient right so if you if you think about mmm and that's also I think I dare to say pretty common in many applications of just customer service in general like or customer service will have sort of like assistance actually there there are some papers I think for example from Airbnb where they've done similar things for their customer service where there's basically an assistant that is telling the customer service and suggesting things they could say so they can basically accept them or not and decide whether they they want to type them out or just simply like the suggested respond so that's an example where you can almost you know you can take one of these models fine-tune it training on training on very specific data that it's more healthcare oriented and you can generate like an assistant for a physician or an expert in any given domain well cha VA it was absolutely wonderful catching up with you really excited to learn more about what you're up to there karai and I'll definitely be following along ok yeah great I would say that many of these things that with mentioned we we are publishing and we are we have I think four papers in this machine learning for healthcare works up in new ribs and if people are interested in following up in some of the details of how we use this transformer models or how do we do diagnoses and so on that's all I mean they can go to archive and find more details on some of this techniques and how we're using them and trying to solve sort of like this huge healthcare problem access so yeah fantastic we'll we'll include some links to those papers on archive in the show notes great so great talking to you thank you that's our show for today to learn more about today's show visit to Malaysia comm slash shows once again if you missed one walk on or want to share what you learned with your team be sure to visit swim walk on comm slash videos for more information about soma convey do packages thanks so much for listening peacewelcome to the 1200 i podcast I'm your host Sam Cherrington did you miss Tomac on AI platforms if so you'll definitely want to check out our Tomac on video packages featuring over 25 sessions discussing expert perspectives on ml and AI at scale and in production you'll hear from industry leaders such as Facebook Levi's Zappos and more about their experiences automating accelerating and scaling machine learning and AI in each video package you'll receive our keynote interviews the exclusive team teardown panels featuring Airbnb and SurveyMonkey case studies and more over 13 hours of footage once again visits will mock on Comm slash videos for more information or to secure your advance purchase today alrighty everyone I am on the line with cha VA a Matar a chav EA is the co-founder and CTO of cure I cha VA welcome back to the 1200 I pod cast yeah thanks for having me Sam so for those that don't recognize the name cha VA was actually our third guest after switching to the interview format so this was over three years ago and so much has happened for both of us we last had an opportunity to catch up at the AWS remoras conference when was that back in June or so and I thought it makes sense to get cha VA back on the show to get a little bit of an update as to what he's been up to so when we last spoke to chav EA he was leading the engineering team at kora doing a ton of work on recommendation systems and other machine learning use cases prior to that he the machine learning algorithms team at Netflix and again he's currently the co-founder of cure I a startup in the healthcare a I space javi why don't we just jump right in and have you bring us up to date on Kyra and what you're up to there sure yeah so here I we are using state-of-the-art AI machine learning for a very big and bold mission which is to basically bring the world's best health care to everyone and of course that is a as I said a very bold and very big mission and we are making it concrete by basically focusing first on primary care so we wanna billing bring the cost down of providing good not good but best quality health care to everyone by using AI and machine learning to bring it down to a place where it can be affordable and can be a scalable and everyone in the world who has a phone can have like primary care in a very convenient accessible and affordable way and so when you're talking about allowing people to use their phones for primary care are we talking about like turning your phone into a tricorder or are we talking about using your phone as a kind of a vehicle for accessing human physicians or something in between or something totally different it's it's a combination of the above and plus something is lightly different but the the realization is that you know a lot of what mmm can be solved in primary care and in health care it really boils down to having conversations between patients and physicians and of course providing input to those conversations from different sensors and labs and other places right but the the core of happens in any kind of like medical visit is a conversation between the patient and the doctor and that's the part that can be really automated and not only automated but actually brought to a point where it's it's you can do it from anywhere that you have a phone with any kind of connection and you can start having a conversation and chatting with a with a doctor that's all you need and of course there's always there's always gonna be things that you can do over the phone like you can get your medication over the phone right but that's okay because we can always deliver medication to your home or we can always always refer you to a lab that is near by and get the results from the lab and whatnot but the the really key issue the part that we're focusing on is in that conversation that happens between patients and doctors and we have a service where we employ physicians to basically be on the other end of the line to have those conversations and what we're doing is applying this AI machine learning approaches to automate as much as possible this conversation so we can augment and scale the doctor so an important piece here is we're not replacing the doctors with the AI in machine learning what we're making them is we're giving them I usually say we're giving them superpowers that and in instead of being able to see say 100 days in two days they'll be able to see 10,000 the conversation the most of the easy stuff will be handled by the AI and machine learning and they'll be able to focus only in the places that they're needed the most is a hundred patients a day a typical metric for practicing physician it really depends and in primary care the numbers are roughly about the average is I think 12 minutes per patient so you can it depend on what the working hours are for for doctors that's actually that's in the in the u.s. okay all the places like India for example where it's much less than that and doctors can see way more than 100 patients in a day and and the reality is they're not even seeing them there the nurses are taking care of them before they get to see the doctor but they they count as having seen the doctor so it's yeah it's not a it's not a but but you can obviously you can imagine that a doctor with 15 minutes or less to see a patient and to remember first of all the history of the patient what's going on ask the right questions get the right answers remember everything they know about medical school and come up with a diagnosis and come up with a recommendation it's really hard for any you know human being to do that at that rate right and of course there's a lot of mistakes a lot of things that happen because of this what we're trying to do is say hey hand that off as much as possible to the algorithms and the machines and then make sure that when the doctor comes in they come at the right time and they come at the point where they have all that information laid out for them and they can verify the decisions and make sure that they're saying the right thing and at the same time that's what we mean by augmenting right the doctors we are of course giving them information that is state-of-the-art and based on real science and they can get that information in a way that they can parse it and they can say okay yeah this is the right position I agree instead of sort of like having to deal with all the messiness of gathering that information parsing it remembering things going through the electronic health record and then making a decision all of that in less than 15 minutes right now there are aspects of this that sound very much like from a the kind of technology I'd expect to see like other conversational agents where you've got some back-end resources or team that you want to optimize these of the time and allow some AI system to handle the of easy easy responses I've got to imagine that you know things get a lot more complicated and Messier different certainly more important and healthcare side of things can you talk about some of the unique challenges associated with applying this kind of technology in healthcare yeah yeah definitely there's there's a lot of challenges and you're right you could you could think that you know the typical approach to dialogue systems and all the advances that we're having recently on this kind of chatbot and things like transformers and birds and gbt tools and things like that are useful and they are I mean we are using all of the above in different ways but the reality is in a domain like healthcare medicine where the stakes are so high you cannot leave things out to you no chance or just to a model to rely on on this kinds of conversations to actually following the right path and there's a lot of examples out there where you can trick any of these models to say things that seem reasonable for any human being but they're medically completely wrong right and there's been a few examples of that and and of course that's that's the that's the key the key issue that we were tackling with it's like how do we combine prior knowledge about what's correct and incorrect in science and in medicine with some of this automation right we do have a key insight here a very important thing of what we're doing is we we we control the end to end so we have both sides of the conversation and we meaning the patient and the expert the doctor right in this case so the interesting thing in our in in the way that we're applying this technology is that we can deploy the conversation helpers in in both ways like we can we can decide to serve something directly to the user if we want to but we can also serve it to the doctor and the doctor can use it an assistant and make a call whether that makes sense or not if it's being helpful right that's a really important thing right because then you go you're basically walking the line between a chat bot and an assistant kind of like a gmail assistant if you will when you get an auto-response suggestion right and the doctor can decide ok yeah this makes sense I'll just take it as it is or this doesn't make sense but they're still sort of like making sure that that's medically correct and at the same time we are getting training data on how our model the model that we're building are accurate or not and in what way they are or not we've we have actually an upcoming publication in one of the works of the new ribs which basically talked about this how to constrain the flexibility of this sort of like deep neural dialogue systems with expert feedback in order to make sure that the information is well is that accurate for a domain particularly in medicine in this case so we need to combine the best of both worlds and by the way we do the same thing in other parts of our modeling strategies like diagnose diagnosis in diagnosis we're also combining expert systems which is you know old-school day i with deep learning and i think you cannot rely 100% on any of the two but you can get much better if you combine those two strategies in in some smart ways which I think it's a it's a key insight for medicine but it's also something that will happen in its being advocated for by many people in machine learning in general right it's like you can't blindly trust models that come only with it from the data with no prior no or some form of knowledge constraint and a lot of people are trying to figure out like how do you combine those two things right how do you combine all the power you get from models are basically just being trained from lots and lots of data with knowledge that we have and structuring that knowledge in form of prior into the models right right that is a theme that continues to recur here on the podcast and in my conversations one interesting thought there is you know certainly on the probabilistic side we've benefited from a huge recent explosion in available tools and you know algorithms and the like you've mentioned a bunch of those already Bert etc and you know we've got tools like tensor flow and pi torch and many many many others whereas expert systems you know we think of as kind of a throwback to pre winter you know AI and I can't think of you know not being deep in that space I can't think of kind of what the leading open-source expert system software might be is there a a tools ecosystem there or is it you know are people building you know when people have this realization that they need both and not one or the other are they kind of building it from scratch yeah I don't think there there is such a thing I don't think there's a you know that there's a expert system component for tensor floral pie tours or should there be does that make sense you know with something like that have benefited you or is it you know is it basic it basically just kind of rules that we know how to code them because you know it's it's not probabilistic not really I mean and and by the way they can be probabilistic right at the end of the day what you have with this expert systems is it's a graph and then you can do probabilistic inference on the graph and you can do different things on that graph so basically I'm thinking a generic tool for expert systems would be rather simple in the sense that all is a way to represent sort of like graph and make him for instance on those graphs to go wouldn't be that complicated to sort of like have a component for tensorflow or for for pi tours that basically does that for you so the key thing here is those expert systems rely a lot still on sort of like manual labor and and just to give you an exam example in the case of some of the expert systems we're using we're using some that have been developed for over 50 years right so there's a there's a couple of expert systems for medical diagnosis that go back 50 years and we we're using both of them actually and and interestingly you know they there's a lot of knowledge in there right you can think about you know 50 years of a bunch of hundreds of really well trained physicians in coding knowledge and information about medicine in a graph right and that's really valuable and and it's really something that if you can then inject it into any learning system you get a lot of a lot from it right to your question there is no you know there's no tooling for that on the other hand you can do interesting things like one of the things we've done is use this expert systems basically data generator to generate synthetic data and train the learning models from the data that is generated from the expert system right that's an example of something that I think is very useful and really really valuable because then you can you can even merge synthetic data with natural data and you can tweak it in ways that you can learn a model that actually now has some prior knowledge that has been injected in the form of ground truth data so to speak can you speak to that particular point and a little bit more detail yeah sure okay so so in this case the thought process was like that right like we know that you know it really have very good data and we train a deep learn learning neural net we could get to like a really high accurate diagnosis system the reality is that high quality data does not exist if you go to electronic health records which we have used ourselves I mean we have a project with Stanford where we have been working with them on using their chronic health records and this has been something that others have done like Google and did mine and you name it it's like learning predictive models from electronic health records electronic health records the data quality is really really poor notoriously so yes yes and there's a lot of reason for it but one of it is you know they weren't designed for the purpose of diagnosis they were designed for the purpose of billing and to make sure the insurance companies got their money back so there's a there's a ton of issues with them and so but but but again that data is valuable it's not like it's totally noise there is something in it so how how can you generate some kind of data that it's more you know solid and until I can treat more as a ground truth well you can go to this expert system which again they're all they are is you know a graph and you can start activating notes and generate data from that graph that basically becomes sort of other cases that you used to train your deep learning model and that's what we we showed in this desert paper we published last year where we basically generated data from the expert system we injected noise to that data because an interesting and important thing is you want to train a model that is robust to noise right the problem with expert systems one of the problems is that they're not capable of dealing with noise so in other words if you know if the patient doesn't say exactly the symptom they have and they make a mistake because they didn't understand the question or the doctor enters and a wrong thing the expert system is is basically doomed and and and it's gonna give you an incorrect output that's not the case for you know you can train machine learning models are you know relatively robust to know it's because you can even do adversarial training and you can do a lot of different things to make them robust so how do you combine both well you can you can also inject noise to the expert system that's basically what we did so we generated data from the expert system we injected different kinds of noise one example which I think will be very off bill is you can inject noise by saying hey I'm just gonna randomly inject things symptoms that are very common right I'll just add coughing to everything because you know coughing is something that people have in general no matter whether they have one disease or not right it's like it's you always can call for its knees or something like that it's very prevalent and very common it's a typical thing and can confuse really confuse an expert system but it's if you train a machine learning model on ignoring cough because it's something that's very common and it's not very it's not gonna help determine what the diagnosis is well that and then you build a robust model so we again generated synthetic data from from these systems injected noise in ways that we made the learn model more robust and we also combined that synthetic data with natural data that we had from EHRs and other sources to also prove that you can you you don't need to constrain yourself to just one single kind of data right all you need to do is combine it in smart ways to sort of like understand because there's there's obviously value to training from real world beta all you need to do is figure out how to combine it with more clean data and and data you can trust you mentioned the this kind of injecting noise via adding symptoms that are frequently recurring what are some other examples of the the kind of noise that you're injecting and more broadly how do you quantify the value of the synthetic data in building out your models yeah so okay so the to the first question I mean I think that the key insight to adding nodes in a domain like medicine is that you do need you need to have some domain knowledge right it's like when I when I give you the example of adding symptoms are very common that makes sense right because that it makes sense because we know about medicine like okay yeah the explanation makes sense another examples like well you can remove symptoms that are very rare or are likely to be missed right that's another thing that makes sense once you explain it right but you need to have some insight and you need to talk to doctors and that's something we do all the time right this kind of strategies don't come up by you know sheer imagination they come up because we talk to our physicians and we talk to them and say hey what how do you deal with these issues where issues are common and that lead to mistakes in diagnosis how can we make sure that our model doesn't make the same mistakes so I think that is a key and important thing is you need to work with domain experts and that leads me to answer your second question and let me just pause because that's a kind of an interesting point I think you know what I think of noise at least from a classical engineering perspective I think if noises like this junk that's you know uncorrelated from your signal but what you're suggesting is that at least when you're creating synthetic data your noise needs to be correlated with your actual noise that you need to expect you can't just have you know purely random noise because that won't help your model yeah that's pretty much it I mean here it's a slightly different right and notion of of noise if you will and but what would you have a synthetic data that is strictly true if you will because true in a scientific sense because been generated by kind of like an expert system that it's been designed on on science but what you need to do is inject noise that mimics more the reality of nature right and the messiness right but that noise yet needs to model some of the natural messiness that you see in real life and you need to not inject it yeah it's not wide nose when in that sense right it's noise that tries to turn that synthetic data into something that is more real right if you think about it I mean I use sometimes as a metaphor of life the self-driving cars also using tech data that is generated from video games and it's like well you can imagine that you're training your self-driving model on data from from Grand Theft Auto but you need to inject I don't know and you need to inject rain and you need to inject things that are not maybe in your synthetic data and they're adding noise to the capture of the image but in a way that mimics real-life situations right not just white noise mm-hmm in that sense it's a bit like the concept of domain adaptation yeah I mean you you could consider it that that for sure and that's another it is a very mean domain adaptation in itself I mean we could go into that it's another important thing that you need to do in many in many cases because and yeah youyou you're right it could be seen as that right because sometimes you are training on ideal data but then you're gonna be faced with real-life data that it's gonna have to be interpreted and in the context of the of the ideal data that you use for training so yeah it is yeah okay so you're about to take on that second question yeah the second question was about how do you even you know how do you know that that that the data is good or even the model that your training is good and and and you know beyond that the relative advantage you know how do you compare with and without using the synthetic data you know that uh is it a training time or is it a you know accuracy or some combination of all these things it's mostly about accuracy right and and the the problem is that the definition of the accuracy is again really tricky and and and and not that obvious right accuracy in the context of medical diagnosis is a very very tricky thing to define particularly because you would hope that by asking physician you would get a ground truth but that's not the case right there's studies out there for example the human DX project that published some studies that the on their data set the average accuracy of a single physician was sixty percent right which is really low now if you if you take the consensus of twenty fishin's that got up to over eighty percent which is much better but then of course you need to have twenty physicians agree and you're still up to eighty percent which is a lot better but not necessarily comforting if you're the patient exactly and and I think that's that's a key issue in like what do we treat as ground truth so in our case we we I mean we use a combination of a lot of things we use a combination of sort of like publicly known data set which there's not that many unfortunately for for this domain and they're just you know a few what's called medical vignettes that you can use to evaluate we also use our own physicians to QA and we make sure that we have sort of like several them agreeing on the cases so we know that we're right and then at the end it there's also this kind of like synthetic data rights like you need to treat that synthetic data as pseudo ground truth in the sense that as I mentioned if you think about it that that synthetic data is the result of as I said before 50 years of research from hundreds of physicians who have agreed that that's what you know that particular disease should be defined as and that's those are the symptoms that are related so it's it's as good as a ground truth as you can get in many other cases right so again it's I wish I had a like a great answer for this but the reality is I don't it's like it's it's a it's kind of an iterative process where you like treat one day there's a ground truth but then you compare it to your other data you let your physicians go through it and say yeah this is correct or it is not and then and you feed it back and you keep improving both overtime and I think that's that's another very important lesson learned here is that you need to design all the systems as really learning systems right so it's it's not only about what's their accuracy today it's more about how can you make sure that the accuracy and all the other metrics you care about improve over time right and in the mean time that the the important thing is like we always default to humans right it's like we'll always default to a human doctor and improve the model over time and and just tell that human doctor like hey our model thinks that this three things are important you want to consider them and the doctor will say yes or no and it's they're called and you know we'll be as good as the as the doctors are but over time we we are pretty sure actually even in our outline evaluation metrics we think that we're already our models are already at least as good if not better than the average doctor but even with that it's not enough right it's like they need to be better than the best doctor to even make it feasible to rely on them but they're a good assistant and a good alimentation to the human physician for sure do you have you made any attempts to benchmark the third-party expert systems with regard to you know some elusive metric around accuracy or you know I guess that the thought is that you know even if we were confident that each of the elements in this expert system you know was vetted by the 20 doctors or whatever required to you know have a consensus that you know has some sufficient level of accuracy you know medical perspectives have changed significantly over 50 years we may I don't know the extent to which this is tracked in this expert system but you know there are diagnostic practices that apply not equally across different groups of patients so you have all the potential for all kinds of biases within a data set like that have you made any attempt at kind of evaluating that I mean we are constantly evaluating that with our data but it's really hard to come up with a you know something that I I i would dare to publish right because it's it's it's the problem is the same it's like there is no no ground truth there's a there's a couple of papers on evaluating different systems and different online symptom checkers and those are the ones that everyone is using as sort of like the benchmark and there's a paper by semi gram on evaluating symptom checkers and there are some medical vignettes that she published which are commonly used by a bunch of people including some like Babylon in the UK and so on where they publish things like well we use this vignette because that's all we have that at least it's commonly available and you can benchmark against but they're far from you know something that it's that you could consider sort of like has good coverage of medical conditions and and you can trust us as being comparable but that that being said again I think that the the reality is as harsh as it may sound it's not too hard to be better than the average physician but again that's not enough that's that's not convincing like if I told you like oh I can build a self-driving car that it's better than the average teenage driver would you be okay like well probably not because the average teenage driver is not somebody I would trust on an automated driving machine so I think here is it it's it's pretty much the same it's it's not about being better than the average doctor it's about being better than the best doctor and being able to augment and always sort of like fall back on humans and I think that's exactly I like that comparison to self-driving cars a lot because I think what we're trying to build is not a completely autonomous vehicle right we were trying to build this AI automation as an assistant to the driver just like many cars do right now but in this case the driver is an expert who is a physician one more question for you you mentioned earlier that among the techniques that you're relying on you do make some use of transformers burt GPT - that kind of thing how does that play out in what you're building that plays out in in many different ways I mean there's there's a lot of great things about those approaches that the one that I think is probably the most relevant in in our case is the fact that it's transfer it's all about transfer learning right it's about if you have a great model that has learned in general how to speak sort of sort of say you can then fine tune it on some specific domain to become better about speaking about healthcare so a lot of the approaches we take is we look at some of this models we fine-tune them on very specific data that we have that is focus on health care and then we can use it to do a bunch of things I mean those the output of those models can be used in the context of a CAD model or a dialog system but you can also use them to generate features for anything for a classifier or you name it right and I think they because they they build a representation of language in general right so so we we use them as inputs to many of the things we do but more directly we also use them as as I was mentioning before to generate assistance to the physicians as they're chatting and they're talking to the patient right so if you if you think about mmm and that's also I think I dare to say pretty common in many applications of just customer service in general like or customer service will have sort of like assistance actually there there are some papers I think for example from Airbnb where they've done similar things for their customer service where there's basically an assistant that is telling the customer service and suggesting things they could say so they can basically accept them or not and decide whether they they want to type them out or just simply like the suggested respond so that's an example where you can almost you know you can take one of these models fine-tune it training on training on very specific data that it's more healthcare oriented and you can generate like an assistant for a physician or an expert in any given domain well cha VA it was absolutely wonderful catching up with you really excited to learn more about what you're up to there karai and I'll definitely be following along ok yeah great I would say that many of these things that with mentioned we we are publishing and we are we have I think four papers in this machine learning for healthcare works up in new ribs and if people are interested in following up in some of the details of how we use this transformer models or how do we do diagnoses and so on that's all I mean they can go to archive and find more details on some of this techniques and how we're using them and trying to solve sort of like this huge healthcare problem access so yeah fantastic we'll we'll include some links to those papers on archive in the show notes great so great talking to you thank you that's our show for today to learn more about today's show visit to Malaysia comm slash shows once again if you missed one walk on or want to share what you learned with your team be sure to visit swim walk on comm slash videos for more information about soma convey do packages thanks so much for listening peace\n"