#37 Data Science and Insurance (with JD Long)

The Power of Empathy in Data Science: A Conversation with JD Meier

In this episode of our podcast, we're joined by JD Meier, a data scientist and leader who has spent years working in the field of insurance reinsurance. JD shares his insights on the importance of empathy in data science, particularly when it comes to understanding the needs of users. He emphasizes the need for leaders within organizations to have candid conversations with their analytical teams about whether the research or analysis they're doing is truly making a difference.

JD's conversation begins by highlighting the cultural norms that are present within his own organization. "One of the things that I realized in the organization I work in," he says, "is there's a cultural norm here of asking the question: does it change the answer?" This simple yet powerful question becomes the foundation for JD's discussion on how to prioritize analysis and ensure that resources are being spent in the most impactful way possible. By asking whether an analysis is truly making a difference, leaders can avoid wasting time and resources on " appendix pages" of presentations.

JD also emphasizes the importance of comparing new methods or models against existing ones, rather than simply doing nothing. "Doing nothing is rarely the alternative," he notes. "Usually it's something that's a little simpler." By establishing a baseline model or approach, teams can compare their results to see if the added sophistication of a new method is truly worth it.

One of JD's favorite techniques for teaching machine learning is to have students create a simple predictive model before introducing more complex algorithms. This approach helps them develop a deeper understanding of how data is used and how predictions are made. He also highlights the importance of plotting data first, which can be a game-changer in terms of communicating insights effectively.

JD's conversation with our host, Hugo Banan Anderson, also touches on the topic of empathy in data science. "I think about having an impact on the margin," he says. "If we ask ourselves what's the next best simpler alternative, we should never compare our analysis to doing nothing because doing nothing is rarely the alternative." Instead, teams should compare their results against existing methods or approaches.

Another key takeaway from JD's conversation is the importance of rapid prototyping and getting Minimum Viable Products (MVPs) out the door. "I love that you're doing that with the class," our host notes in response to JD's suggestion to establish a baseline model. This approach can help teams quickly test hypotheses and iterate on their results.

In conclusion, JD Meier's conversation with our host highlights the importance of empathy and critical thinking in data science. By asking questions like "does it change the answer?" and prioritizing analysis that truly makes a difference, teams can avoid wasting time and resources. We also appreciate JD's emphasis on rapid prototyping and getting MVPs out the door – these approaches can help teams quickly test hypotheses and iterate on their results.

Empathy Hack: Building User Stories

JD also shares his approach to empathy hacking, which involves building user stories. "I think this is a fantastic way to get into the mindset of your users," he says. By writing down what our users need and want, we can create products that truly meet those needs. This approach requires us to be patient and empathetic, but the results are well worth it.

Rapid Prototyping: Getting MVPs Out the Door

JD's conversation also touches on the importance of rapid prototyping and getting MVPs out the door. "I love that you're doing that with the class," our host notes in response to JD's suggestion to establish a baseline model. This approach can help teams quickly test hypotheses and iterate on their results.

Industry Insights: What It Means to Be a Data Consultant

In future episodes, we'll be talking to Tanya Kasha-Ramli, a founding partner of TCB Analytics, a Boston-based data consultancy. Tanya has applied her experience in bioinformatics to other industries, including healthcare, finance, retail, and sports.

Data Products: Impact and Applications

Tanya will also share her insights on what it means to be a data consultant, the wide range of industries she's worked in, and the impact of data products in her work. She'll discuss rapid prototyping and getting MVPs out the door, as well as the importance of establishing a baseline model.

Getting Started: Data Science and Machine Learning

For those interested in learning more about data science and machine learning, we recommend checking out our next episode, where we'll be talking to Tanya Kasha-Ramli. In this conversation, we'll explore what it means to be a data consultant, the wide range of industries that Tanya works in, the impact of data products in her work, and the importance of rapid prototyping and getting MVPs out the door.

About Our Host

Hugo Banan Anderson is your host for this podcast. You can follow him on Twitter at @hugobanan or connect with him on LinkedIn. For more episodes and show notes, check out Data Camp's website at datacamp.com/podcast.

"WEBVTTKind: captionsLanguage: enin this episode of data frame a data cam podcast I'll be speaking with James long VP of Risk Management for Renaissance reinsurance and a misplaced southern agricultural economist Quan stochastic modeler and cocktail party host James otherwise known as JD and I will talk applications of data science techniques to the omnipresent worlds of insurance reinsurance risk management and uncertainty what are the biggest challenges in insurance and reinsurance that data science can impact how does JD go about building risk representations of every deal how can thinking in a distributed fashion allow us to think about risk and uncertainty what is the role of empathy and data science stick around to find out I'm Hugo bond Anderson a data scientist at data camp and this is data frame welcome to data frame a weekly data count podcast exploring what data science looks like on the ground for working data scientists and what problems that can solve I'm your host Hugo von Anderson you can follow me on Twitter as you go down and data camp at data camp you can find all our episodes and show notes at data camp comm slash community slash podcast hi there JD and welcome to dataframes hey you go it's great to have you on the show really excited to have you here to talk about data science insurance reinsurance your work in the our community the role of empathy and data science which we've had great conversations about before but before we get into all of that I'd like to know a bit about you and maybe you can start off by telling us what you're known for in data community yeah that's interesting here we go I'm never completely sure when I meet new people if what they may have expressed into me or run into something that I wrote I think most common or maybe asking our questions on Stack Overflow possibly a presentation at a conference or maybe starting the Chicago our user group or maybe some foolishness on Twitter it's really hard to guess and of course your role in asking questions on Stack Overflow was quite early on there right yes so this this goes back to when Stack Overflow was really first starting the story there's kind of interesting that notoriously in the our community the our help email list at the time didn't suffer fools well or newbies for that matter and there was a lot of encouragement to RTFM and that sort of thing so it was a completely new be friendly and this was about the time I was learning our and I had observed that and Mike Driscoll was toying with the idea of a beginner's our mailing list and I contacted Mike I had been watching how Stack Overflow was being developed and I said Mike I'm not sure like maybe what we should do is try to get new users using Stack Overflow and because it looked innovative to me which you know it now that Stack Overflow is eating the world it's kind of quaint to think about but Stack Overflow had some social science thinking in their design it had rewards incentives thinking about the effect of design or nudges if you would sort of we think about in behavioral economics how do you nudge people towards good behavior and it seemed like a good environment for newbie type questions so Mike Driscoll and I and a handful of other people we got from the are seek website a whole bunch of questions people had typed into they are the search engine this was a website dedicated to our information and we tried to figure out what do we think people were asking so we created you know I don't know 100 questions and answers or something and we did a flash mob at the birds of a feather session at Oz con 2009 and I was not there I was in Chicago at the time living and but we I participated virtually and we seeded Stack Overflow with a bunch of questions and answers and that sort of kick-started they our our discussion there and later another one was done and I continued to be active asking questions and as of I haven't looked recently but as of a few years ago I was still one of the top question askers on Stack Overflow for the tag of our for the our programming language and so in terms of this initial inspiration of help didn't really suffer fools on you bees well how do you think that is played out over the past several years and where we are now yeah it's a good question I mean if we look at Stack Overflow they've had their own challenges and growth issues you know at first it was just getting to mass and then they have clearly become the the dominant you know monopoly for information on programming questions and it's a really good resource but in you alluded to this we'll talk to it a bit later there's this issue for a need for lots of empathy both in question askers and in question answers and that's proving to be a challenge in kind of some new ways so I think where we are now is Stack Overflow is a fantastic place to get information there's so much information there most new beginners are able to find their question already asked and not have to ask question or venture into that and if they do one there's the opportunity to get people to answer oh really I'm really excited to get back to this idea of empathy and data science and that's a little teaser for something coming up a bit later but tell us a bit about what you're up to now and what you do currently JD sure so what pays the bills is I'm VP of Risk Management for Renaissance reinsurance although I lawyers prefer that I state that everything I'm sharing here with you is my personal views of course I'm not representing the company I work for here but I've worked for Renasant three for over nine years now and been in in different insurance companies and reinsurance companies most of my career and can you tell us remind us what insurance is and tell us what reinsurance is sure so insurance think most people are familiar with it because of their house or car or property insurance they have it's a company that makes a payment on policies when adverse outcomes happen what reinsurance is is something that most people don't face or interact with and that is that individual insurance companies will buy protection from events or losses that are bigger than that insurance company could handle so an example of that would be homeowners insurance company in Florida may have more exposure to hurricanes than they have capital to pay out claims and so they would need to buy reinsurance to help make sure that they can make good on their promise to pay future claims so you're insuring the insurance we insuring the insurance so you know you must know what my next question is is it insurance all the way down it is insurance all the way down so let me tell you a language story here real quick I've done some work with the World Bank and a number of years ago I was in Mongolia and we were discussing insurance and so we asked I'm trying to learn what the word was and they said well the words dotco and so we're like oh okay well what's reinsurance and they're like well it's coal coal we're like okay now what we do this thing called retro where reinsurance companies trade with other reinsurance companies is that dot gold coal coal and they assured us that it was not but it seemed intuitive to me so yeah yes it is insurance all the way down I suppose that's like asking us is it re reinsurance whereas you call it retro right yeah we could we call it retro but that would be reinsurance and we stopped counting the reason we start trading it around between the reactors fantastic so I'm really excited about talking about insurance and reinsurance particularly framed by you know the new emergence of data science because insurance and you know actuarial sciences have been around for a lot longer than than data science so I'm interested in this pact but before we get there I just want to hear a bit about your story and how you got into data and data science originally yeah so you know like most insurance data scientist I'm an agricultural economist that obviously is not intuitive at all but I came into agricultural economics and in the 90s and when I graduated with an undergrad and I guess it was about 96 I was starting graduate school and I remember talking to my major professor about where the PhD graduates that year were going and one of the PhD graduates was going to American Express and I remember being like baffled like she's an ID and agricultural economics was she going to American Express for and he explained that American Express recruited explicitly agricultural economists because we have a very applied background not pure theory actually had an experience working with data they tended to have coding experience now this was 96 so that's mostly SAS in the university I went to now let's put this in perspective cran the our network started in 1997 so this was the year before cran even existed and Python was not available on dos and windows until 94 so this was just a couple of years after Python was available on Windows platforms so you know we were in SAS we were using mainframes and UNIX machines and American Express was hiring and recruiting agricultural economist because they've had some it's coding with this kind of messy real-world data and you know within agricultural economics I got an exposure to you know crop insurance model building lots of regression analysis we call it econometrics and building those models at what at the time seemed like a degree of scale it seems a little trivial in retrospect but that's kind of how I got in and I like to tell the story that this story under the pretext of agricultural economics is the kind of OG of data science because we've been doing these sort of combining programming and domain expertise in statistics you know for a long time and later the data science name sort of caught up but I've been doing that same sort of thing for a number of years and I suppose also you know working with like serious real data sets as well right and messy data yeah absolutely so that you know we were working with actual you know field data literally field data and sometimes long historical sets and it would require you know cleaning of outliers and a lot of the same sort of things that we talk about now taking trend out you know looking at analysis of time series and cyclicality and removing that before you start building a model to explain other things so a lot of these methodologies we've been you know using an agricultural economic for a number of years and kind of my experience with applying that to agricultural insurance is how it was my gateway to entering into financial risk and specifically insurance and reinsurance cool so what are the biggest challenges in insurance and reinsurance that you think data science can have a huge impact on or is currently having a huge impact on yeah so my view here is a little bit skewed because if we think about like the problem space there's a bunch of things going on in marketing the marketing of insurance you know where you see ads online in the claims process of how claim payments are made and how quickly those can be made by using data analysis and operations inside of companies there's big gains being made on all those I don't work particularly in those three areas I work more in what we would call underwriting and risk the distinction there underwriting is you know the decision about taking an individual risk now at an insurance level that might be whether or not a company writes a given policy to a person or a company in reinsurance it's more understanding the risk of a deal that may have hundreds or thousands of policies underneath it and then risk or risk management is kind of broadly thinking about how do all of those risks aggregate up inside of a reinsurance company you know some will be correlated some will be idiosyncratic some may be anti-correlated and then how do you think about rolling up that risk inside of the reinsurance company and being confident that you have the right amount of capital to hold behind that but not too much Kathryn alright and I suppose essentially that it's a huge task to as you say rolled all up and aggregate it to make one final decision based on all the data and all the modeling coming in right yeah exactly right so the a lot of little decisions get made and in the way that that feeds back in shapes the portfolio at least you know in the companies I've worked at is some feedback mechanism for a risk adjusted return on capital so when an individual deal is looked at its evaluated relative to the portfolio as a whole and there's some capital charge and it's guy that deal needs to be profitable in excess of the capital that's required to hold behind the deal so that's how we think about feeding back from corporate risk to the deal-making side great and so what industries do you insure will work in so you know what by the time you're aggregating at the reinsurance level it's very global and it's every industry because we're trying to spread risk across across really the whole globe and all industries so that we aren't concentrated in one specific area now if you think about the space for data science and insurance and reinsurance you know marketing ops claims that I mentioned earlier wellmaybe claims is but definitely marketing in ops they're not super reinsurance insurance specific those are very similar in lots of other transactional companies but the risk and underwriting is fairly domain knowledge intense so the domain knowledge there is really more about the deal understanding the type of risk how those risks fit together into a portfolio and you know for me I work both in the in the micro and the macro so the micro would be looking at individual deals and then the macro is this corporate risk management component I have a unusual job in that I do a little bit of both so JD why don't you tell me a bit about the micro scale then we can move on to the macro so for example maybe you tell me a bit about how the crop insurance modeling works yeah sure Hugo so if we look at crop insurance in the US which is one of the most mature crop insurance markets the current products that dominate that market have only been around since 1996 so the historic record isn't very long for that product and so we have to say well what data do we have about crop insurance and what we have is a history of agricultural yields that goes back in a long time series we have a history of agricultural commodity prices and we have a history of weather so one of the more data science e-type activities that I've engaged in is trying to take the data we do have and say ok how might the current portfolio of crop insurance have behaved all these years in the past for which we do have data right so this is a kind of a classic modelling exercise where we're taking something we know and we're trying to kind of project that into something we don't know and build up a historical understanding and once we do that we can do things like well let stochastic ly generate a whole bunch of different yield and price outcomes and see if we can build up a model of a full stochastic distribution of how this crop insurance industry and this give in a given country might work and that was you know one of my more interesting jobs for a number of years was building that model and so that's where we kind of moved from data analytics into something more data science II right we're building models to understand something we couldn't understand otherwise that's really interesting I'm gonna stop you there for a second because you used a couple of terms that I'm very interested in you talked about stochastically generating and then you talked about a distribution so I'm gonna try to tease that apart and let me know where I'm getting this incorrect so let's say we're trying to predict something concerning a market you can stochastically generate and what that essentially means to my understanding is you can simulate the behavior and stochastic means there's some sort of variation right so each time you simulate it you'll get a slightly different result and what you actually get in the end is a lot of different results and you may get a thousand or ten thousand or a hundred thousand that give you some idea of the distribution of the possibilities of the market is that what you're talking about that's exactly right hugo like we do very little predicting of what i think next year is gonna happen what we try to do is say what is the distribution of the potential outcomes for next year and what's the shape of that distribution and we might ask questions like what's the one in a thousand worst case scenario so it doesn't mean like we're thinking a thousand years into the future at all it means this is about next year but it's the improbable way but still possible that next year might turn out this is awesome and i actually think a lot of industries and verticals and basic science research that's adopting data science and data science techniques as methodologies could learn a lot from this conversation because they're still you know still a lot of managers will want point estimates right they'll want the average and then make a decision based around that maybe with some arrow bars but the fact that you're doing these math simulations and getting our entire distributions of predictions i think is a very robust technique as you say you can actually say 1% of the time we see something crazy that we actually do not want to happen at all yeah that's exactly right there's a really good and I'll make sure we have this in the shownotes Hugo that you have it to put in the show notes there's a book called how to measure anything that's a great name by the way in a great name and they have an introduction to this right and they take it from the idea of well initially you're estimating what do we think next year is gonna happen then you start say okay well what's next year and then a high estimate and a low estimate so you're beginning to think about a range around next year's outcome and from there we can start just thinking okay let's increase the resolution right let's what's an extreme event that could still happen and you could begin to think about creating like a some sort of error bars around your estimate and then ultimately move on to this idea of a full stochastic simulation where you have a whole you know thousands of possible outcomes so I want to tease apart something now that we've been using the word risk which will have an intuition of what risk means but there's something you know there's an idea that's coupled to this and I want to try to decoupling in some sense which is uncertainty in the sense that once you do these predicted simulations and get out a distribution you may not know what will actually happen and so I'm wondering is that uncertainty or risk and how do you think about this in insurance yeah that's that's a really good point I would generally think about the outcome of these models that I'm talking about as risk and then uncertainty as a separate thing then let me tease those apart and these get used in the vernacular interchangeably but in 1921 and in a book called risk uncertainty and profit The Economist Frank Knight who's sort of of the Chicago School he's a University of Chicago economist he presented this idea of risk versus uncertainty and the way he defined it is risk is when you understand the underlying distribution but you don't know what outcome you're gonna get so it's like that you know the classical urn full of marbles of you know white and black marbles you don't know which one you're gonna draw out but maybe you've been told at a time what's the ratio of white marbles to black marbles well that would be yes if an another example is if you flip a coin 10 times you can literally write the probability of seeing 10 heads or seeing nine heads or seeing eight heads or seeing seven so you know the entire distribution of possibilities right that's exactly right and then we have other processes where we know the underlying distribution is a Gaussian distribution so the outcomes gonna follow the and in the real world do you have risk as opposed to uncertainty because these are toy examples we have both let me just define uncertainty real quick so uncertainty is the piece where we don't know we know it's not deterministic we know it can have wild outcomes or some other outcome than what we know about but we can't put a distribution around it so that's uncertainty so let's go back to the real world if we're doing things like you know flipping a coin there is some uncertainty that maybe we have a loaded coin now we don't know how to what's the probability of this coin being loaded given no other information just we have it in our hand well we don't know right it's an uncertainty but we don't know what the probability is is probably pretty low but you don't know a better example like from the insurance world might be you know auto insurance is a pretty good example of a a scenario where they're a type of product where there's mostly risk and less uncertainty you know with the product has been around for a long time people behave in relatively predictable patterns and so most of that activity follows a well-behaved historic distribution there's a little bit of uncertainty some wild things happen and tail events happen that worked in your model distribution but it's pretty well behaved now on the flip side would be say terrorism insurance or just think of terror events the underlying distribution we'd only know what it is we know what the historic distribution of terror events looks like we can make a catalog of those but there's no reason to believe that world events are such that the next 12 months is a random draw from a historically stable distribution right we expect the distribution is probably not stable it's probably a function of changing geopolitics around the world and a reaction to events that are going on in real time and so there's a component of risk but there's also a much larger component of uncertainty now does that help makes perfect sense and it has kind of led me down a variety of rabbit holes my first question is to governments or corporations take out terrorism insurance they do they do there are a number of just property policies that would cover in the event of terrorism and there are of course policies that explicitly exclude active acts of terrorism so I'm if I recall I believe in certain countries it's it normal for crop insurance policies to exclude terrorism for example so JT we were led along this path talking about the micro-level you work in terms of crop insurance modeling and risk representation of single deals you tell us a bit about you know the the macro levels that you work at and thinking about insurance and reinsurance sure here you go so if we think about a reinsurance company that has a number of risks in many different lines of insurance I mentioned earlier that some of those risks are correlated in the correlation being can be caused from underlying physical relationships so all of the homeowner's insurance in New York City should be correlated in their outcome because if we have a large event like a hurricane sandy hits New York the impact is going to impact all of the insurance companies that write business in New York so that's a physical process that causes correlation or maybe on a casualty program there's an underlying risk that multiple companies have insurance for and when that turns out to be a problem and there be as a casualty claim it impacts multiple companies and other times they have connection because maybe there's a risk like changing legal framework causes all claim to increase 15% on property claims there's these relationships between the policies that we have to understand as we aggregate the risk together and think about combined risk inside of a reinsurance company sometimes that involves building the physical models like the hurricane and earthquake models where the policies are analyzed based on spatially where on the map the risk is and then understanding the exposure across different programs for risk in a specific geographical location and other times it may be introduced with more traditional modeling methods where the correlation is added after the modeling through something like a copula method so two distributions can be brought together and in a joint relationship be added added using a copula now it's always important to keep in mind that we add correlations at the end sometimes and our modeling the correlation is always kind of and everywhere an artifact of some other process and when we do something like a copula we're just trying to make sure our model data reflects what should be there already but we don't have any other method for putting it in place okay great so you've given me some insight into the types of tools and techniques that you use but maybe you could speak a bit more to what data science looks like in insurance and reinsurance and what I mean by that is you know in tech we know that most of our data will be in our sequel database so we'll query our sequel database and then use our or Python problem R to do a bunch of exploratory data analysis and visualization dashboards if you want to do production eyes machine learning we'll do that in Python so I'm just wondering you know what the techniques and tools that you use on a daily basis are when doing this type of modeling and data science sure so at the the initial deal level in a reinsurance company a bunch of the analysis looks like the historical data science the analysis you just described only the person doing the analysis may self-identify as a catastrophe analyst a cat analyst or they may identify as an actuary but what they're doing is analyzing data that they receive from someone maybe combining it with industry data trying to understand trends that are in the data in order to create this stochastic representation of a single deal so that may follow a similar pattern to other data sciency sort of modeling with the idea that what's coming out the other end is a say mean expectation but also a distribution around it for the outcome of a deal they'll then put that into a risk system and you know I think most companies use a system of some kind that then is a framework where the whole book can be rolled up and understood in a meaningful way and there's a million different approaches for doing that I've traditionally worked with an in-house tool and it handles making sure that deals that are connected because of spatial exposure get connected that way and the final modeling that deals that are not get at least correctly correlated with the other deals in their business class so that these relationships are tied together and reflected so we can get an aggregate distribution that's a reasonable view of these individual marginal distributions marginal here meaning individual deals in a portfolio that we can roll those up into one aggregate deal and understand its characteristics fantastic we'll jump right back in to our interview with JD after a short signal we're back here with Neil Brown for more insights from computational education hi Neil hi Hugo so Neil I'm interested to get some tips on teaching programming okay so one thing that can work well teaching programming is using live coding by which I mean typing in the program code that you're using to teach while the learners watch you as opposed to just having it all done beforehand yep that's right and so the benefit of live coding is that learners can actually see the process of coding a lot of learners get this idea in their head that everyone else just writes perfect code first time and if you turn up with pre-prepared code then you know that's what they're going to think I reckon it's a lot like going to see a play and thinking that all the actors just made up their dialogue on the spot it's not true you need to kind of see the process of construction so life coding is useful when it lets you see the errors that you can be made when you're entering the code and that you can see the debugging process when things go wrong so actually what you're saying is that live coding is best when it's not totally smooth yeah pretty much I mean I have one workshop that I give so often the problem has actually become that I'm so practiced I tend to live code it without making any mistakes that she kind of ruins the advantages of live coding so if you ever do get to that point it's as she worth deliberately engineering in a couple of mistakes useful points live coding seems to have taken off a lot in technical talks too recently yeah and I'm actually a bit more skeptical about its value in tech talks I think the main advantage there is it slows down speakers who are very nervous and tend to just rush through a whole load of slides in one go but if you're an expert who's giving a talk to other people who know how to code maybe about a new API or something then where's the value in actually showing them the coding process there you know they know how to code they just want to learn the sort of the new details so I think sometimes live coding is maybe just a bit of trying to show off and if your audience can already code it can just sort of slow down a talk for no reason because they're just watching you type it in or even worse people get sort of concentrated on doing the live coding while they're giving a talk without actually explaining what they're doing so what you're telling me is that live coding is best for education right yeah I think so so I think you need to make sure you have your endpoint in mind before you start you can't just make it up as you completely make it up as you go along and talk through what you're doing and why you're doing it and don't rush to sort of cover up mistakes instead if you make a mistake just pause explain what the mistake is explain how you're going to fix it and that way you're teaching them if you get embarrassed by mistake and try to hide it then it sends the wrong message to people who are learning when they themselves make a mistake well we all make mistakes while programming for sure exactly my final tip on live coding is only type in the interesting bits so if you've got a bunch of boilerplate that you need like import statements or a skeleton for a class just start with it or copy and pasted it in don't watch make people sort of sit there what's you're typing in the boring parts couldn't agree more thanks Neal for another set of insights into computational education time to get straight back into our chat with JD long so I'd like to step back a bit now and think about you know where insurance has come from the actuarial sciences and now the impact of data science on the discipline as a whole so could you give us a brief history of all of these disciplines and how they intertwine you bet Hugo let's go back 3,000 BCE and the I'd love to the Babylonians this was this was the earliest record I could find of a of a disaster contingency event the Babylonians developed a system of loans where you would person could get a loan for building a ship and they might not have to repay that loan if a certain type of loss of that happened because of certain type of accidents well that's kind of like insurance right kind of like a builder's loan so the idea has long been around now one of the things I find interesting is is edmond halley of Halley's Comet fame created one of the first modern style mortality tables and that was in 1693 and then around about the same time but completely disconnected from that the Lloyd's coffeehouse which was a place for sailors to hang out and shipowners to talk about what's coming into London on ships the Lloyd's coffeehouse emerged as a kind of a place to drink coffee get shipping news and also to buy shipping insurance and that later became Lloyd's of London which we've all heard of which Lloyd's it may not be well understood outside of the insurance community but Lloyd's is not an actual company that takes risks it's more of a marketplace so it's like the Chicago Mercantile Exchange of risk so lots of individual companies including one I work for take risk at the woods of London so that was late 1600s and then you know computational tools and statistical methodologies developed alongside the actuarial process and became part of that process but an interesting thing happened in 1992 Hurricane Andrew ripped across Florida and then kind of recharged in the Gulf of Mexico and plowed into Louisiana and Alabama and it was a huge catastrophe for the global reinsurance market because prior to 92 hurricane reinsurance was kind of a gentleman's game and it wasn't really a quantitative well understood risk business and Andrew caused many reinsurance bankruptcies and it was a big contraction of the of the market there just wasn't a lot of capacity for reinsurance because of that event and that was filled by the crop of reinsurers that sprouted up on the island of Bermuda and you know that market became a much more quantitative analysis market that looked more like the quantitative finance world and that has driven the way reinsurance around the globe has been modeled and approached that was really the turning point of reinsurance becoming much more quantitative and also how I ended up living on Bermuda for four years that's incredible so firstly why Bermuda well you know the history there is it's got reasonable proximity to the United States but it's a favorable tax jurist action for endeavors requiring lots of capital and not a lot of people so the reinsurance companies based there are you know it's not a tax loophole type of jurisdiction it's been a place where there's no corporate income tax but it's also well regulated so it ends up being regulated at a level that's consistent with mainland Europe but with not very heavy corporate tax structure so activities like reinsurance which has periods of high returns followed by a year or two with negative returns it's pretty tax efficient to do those in Bermuda and so that's why it sort of cropped up in 1993 as a jurisdiction for global reinsurance and especially US catastrophe reinsurance so something we've mentioned several times this is this idea of building models and you said that you know building models is is really key to your work you just say bit about what model building actually means to you and what it entails sure Hugo when I think about model building in the context of insurance and reinsurance what I'm really always thinking about was this process we've discussed a few times where it's coming up with a distribution of outcomes that reflects the possible outcomes for a given financial contract that's the simplest way I can think to describe it so you know we might use dozens and dozens of different methods there's different approaches to try to get our arms around the risk and uncertainty of a financial deal and depending on what data is available you know we might use complicated regression analysis we might use a Bayesian method we might even use a you know machine learning deep neural network of some kind but ultimately what we're trying to say is we have a potential contract we may enter and we're trying to understand all the possible outcomes to make sure that the reinsurance company is being compensated for the risk that they're taking on as part of this contract so the model quote-unquote could lots of things that possibly are very complicated or it may be there's very little data and we're gonna look at the past you know 15 years of experience and we're gonna fit a distribution to that because that's all the experience information we have and then we're gonna put a little premium on there a little extra load for this uncertainty because we can't fully quantify the risk so that's what I mean when I think about modeling in this context okay great so I want to find out a bit more about how data science has impacted the insurance and reinsurance world and actually the Avenue I want to approach it from is there's a great quote by Robin wiglesworth from the Financial Times who said traders used to be first-class citizens of the financial world but that's not true anymore technologists are the priority now and I would actually that was in 2015 I would say that data scientists now are first-class citizens of the financial world and in terms of insurance and reinsurance I mean actuaries have always been the first-class citizens of the insurance world and how is this relationship now with the emergence of data science working there well you know you go there's there's been a little fluke historically in actuarial science in that the historical fluke that resulted with me really ending up in this industry is that the catastrophic events the catastrophe modeling did not exactly fit in the historic actuarial methods very well because sometimes in catastrophe insurance we're pricing and modeling risk that we've never observed historically so maybe we're looking at a reinsurance deal that would be impacted by a worse hurricane than we have ever experienced or a hurricane season with more hurricanes than we've ever experienced and if you look at a actuarial method that's based on looking at historical data and you know making Corrections for sample size and evaluating that using heuristics that kind of expect large sample size it doesn't work very effectively for these extreme tale events so my work in crop insurance that it was really around catastrophe work and similarly property cat work whether it's hurricanes and earthquakes often deal with these risks that are so far out in the tail we haven't experienced them and so it gave a lot of opportunity for those of us with quantitative background maybe as systems modeling and historically like engineers who do engineering modeling to work in the space alongside actuaries and what we're seeing is a very fruitful environment in my opinion in insurance and reinsurance where there's hopefully a collaborative work between actuaries who have a tremendous set of tools experience and knowledge that's specific to insurance but keep in mind a lot of if it is heuristics that make certain assumptions and then we've got data scientists and financial engineers and systems modelers who are used to modeling slightly different things making kind of different assumptions often with different constraints and if we can get those two groups working together we can make even more effective models and my experience relatively recently I was just last month I spoke at an actuarial conference and one of the sessions I sat in after I presented I was really impressed because one of the actuaries shared an actuarial methodology and then after he kind of shared it he said now here's a more data sciency way of doing this the way our data scientist friends might approach it and he shared the exact same example but working through using some type of GLM and he showed how the answers were similar but where they might differ and I thought that's the future inside of insurance companies is we can get the actuaries and the data science scientists talking together about what are the strengths and weaknesses of our different methodologies and get the deep business understanding from the actuaries and maybe some of the methodology experience of the data scientists sort of deployed at the same problems I think that would be tremendously powerful and that falls apart only if one side or the other kind of isn't in a very collaborative place so I'm a huge proponent of sort of collaborative data science that's fantastic and I think it actually provides a wonderful segue into what we've promised the the eager listener previously because a key component right of these types of collaborations particularly with such strong-minded communities such as actuaries and data scientists a key component of that collaboration a requirement a necessity in fact is empathy yeah is Sharia Syria so you gave a wonderful talk that I saw when we first met IRL with correspondent before that but when we met at our senior conf in San Diego earlier this year you'd have a wonderful talk called empathy in data science and I just love to hear your take once again on what the role of empathy and data science is at the moment in your mind yeah you go I feel like I don't think empathy is a panacea for all of our problems however I do observe on a very regular basis situations that really need empathy in order to bridge to people who are talking past each other or a person who's making what is obvious to other people but not to them is kind of a boneheaded mistake because they aren't thinking about who's consuming what they're producing you know my example I alluded to earlier was on Stack Overflow I watch people ask questions on a regular basis and they clearly are not thinking about the person who's receiving the question who's going to answer their question and making it easy for the question answerer because if the asker was making it easy for the question asker they would make an example that allah had code that the answerer could copy and paste into their environment execute it and observe what the question asker is seeing right and immediately be able to help but instead they the asker may put incomplete code or maybe not even syntactically correct code and the question is I'm trying to do something and it doesn't work what's wrong and the answer has no way to know and if we can bridge that by helping an asker in and invite that environment and think to themselves what's it like to be on the other end of this question what's it like to be the other person and how can I make their life easier and basically help them help me they'll find they're much more successful at what they're after well it's the same inside of our workplace right if we're doing analysis I have to ask myself you know maybe I'm doing analysis that's gonna equipped an underwriter to negotiate a deal I have to think what information does that underwriter need to be well equipped to negotiate this deal and that's gonna drive my thinking of how I serve that person with my analysis so JD tell me about the role of empathy and data science sure you go I think I've just observed so many situations over the year where I felt there were two parties and engaged in a conversation who were talking past each other and didn't quite appreciate where the other person where their understanding was or what they were concerned about and I'm not so Pollyannish as to assume that empathy is the solution to all our problems but we have a lot of business problems and data problems that could be greatly helped by a dose of empathy and a good example is when I alluded to with observing questions and answers on Stack Overflow I observed any number of situations where question asker clearly has not thought about the situation the answer is going to be in because if the asker had they might have put an example that they could could be copy and pasted by the answer into their environment and executed and the answer or see exactly what the problem is and answer the question but instead we get often sort of conceptual ideas I'm trying to do this thing here's a little piece of code you can't actually run it because you don't have my data but I'm not getting the answer I would expect help me fix it and that's really hard for an answer to answer and this got me thinking about empathizing with the other person and early on and Stack Overflow grew at first I felt like askers needed more and at times I feel now sometimes like the answers could use some empathy as well but the same is true like in our business environment if I'm working with an underwriter to do the analysis for a deal I need to be thinking about what does this person need when they go to negotiate this deal what analysis do I need to have done that they can have in front of them to make them more effective right this isn't about me and my understanding I'm not doing this as a science fair exercise so that I'm smarter about risk I'm doing it towards a business purpose of providing insight for a negotiation so that's a useful mindset and I feel like it's one that we need to explicitly teach a lot of people that will resonate for to resonate with immediately and others it may need some more work to help them build this empathy muscle if you will of learning to think about who's reading my analysis what are they doing with it maybe who's my user and so I think there's lots of you know ways we can build that and it's an important part of data science in my opinion yeah I couldn't agree more I will say though that to approximate some sort of truly empathic behavior or mind frame that can be really energetically consuming so are there any ways we can approximate it or hack empathy absolutely so my favorite example of this actually comes from the agile development methodology which is more of a computer programming thing than a specific data science II thing but in agile they do this method where they do user stories so you know it's Hugo is a data scientist who's trying to understand X he needs this tool to do Y so that he can understand X well what's so great about that in my opinion is it forces the developer who's reading it or the data scientist who's reading it to think about what it's like to be Hugo I mean it's an empathy hack now none of the agile methodologies that I've ever seen use the word empathy like it's just not mentioned but that's what we do with user stories and you know I've had situations inside my company where a developer would be developing something and I'm like that's a great idea but you know I know your user personally like I have lunch with them and they're not gonna think that's near as great because you're building the tool you want not the tool they want so think about you know you're in user or if you're a data scientist producing an analytic or a model outcome think about who's consuming it so we can give lots of little nudges whether it's something explicitly like an empathy hack from agile the user story or sometimes it's just reminding someone hey remember your person consuming this you know has a name it's Bob and we know that Bob doesn't think that way right and we actually we have learner profiles at data camp which is similar with respect to you know what a learner's backgrounds will be along with how advanced they are as you know aspiring data scientists and whenever we build courses we very much think about who this course which one of our learner profiles our set of learner profiles these courses are aimed at that's super Hugo you know the podcast 99% invisible had a great episode on designing for average and how basically if you design for average you design for no one and we'll make sure that's in the show notes but I think it's such a fantastic idea to actually give your target audience a name so we can route the people working on products for them can relate to them that's a super idea that's great and this is actually we had a segment on the podcast with my friend Court who is caught developer and maintainer a standard probabilistic programming language and he was talking about what's commonly referred to as the tyranny of the main oh gosh so true you know in a couple of dimensions you're fine but as soon as you get in multi dimensional space if you're thinking about measuring someone's height someone's leg length perimeter of thighs and cars and that type of stuff suddenly if you have designed something for the main there you're absolutely lost because nobody really is around that mean at all yeah not not in all dimensions right so if I remember the 99p I article had a statistic and I'll probably be wrong but the gist my takeaway was something like if you have three two engines of human body dimension like you know leg length arm length head circumference you know hand size any three in a small margin of error only six percent of your population is going to be near that because everybody's off a little bit in some dimension it's incredible okay so we've talked a lot about data science insurance reinsurance and empathy and data science where it's led to now what does the future of data science in insurance reinsurance and and otherwise look like to you well I am really suspicious will see the term data science wane some and I think that's fine it was very very helpful term for a number of years to help us think about bringing in like technology computer science II type terms along with business acumen and statistics it will fade I think because it's become so obvious and we kind of is going to be the data analysts of the future is going to be much more data science II than a data analyst of you know five or ten years ago that I'm confident of that and you know I was just explaining having a conversation at coffee today after lunch it's a friend and we were discussing this idea of where's the market opportunity he works in the talent acquisition you know like the head on her space where's the market opportunity and you know I was telling him like well it seems like deep learning and a lot of these very complicated artificial intelligence type methodologies get a huge amount of ink spilled because they're interesting and they do have the potential to make some revolutionary changes and that's great and there's good needs to be worked there and there will be but I think about the other tail of the distribution and I think about your former guests Ginni Brian and her work of trying to get people out of Excel if she's like it's a widely spread need and you've got nobody else crowd in the space so I think the future is going to be building a lot more structured process and structured tools around so many things that aren't you know the sexy deep AI blockchain based G wizardry it's gonna be a lot of more mundane things but are gonna fundamentally change how efficient organizations are great so I've got time for one more question and what I really want to know is do you have a final call to action for all our listeners out there yes you know one of the things that I realized in the organization I work in one of the cultural norms that's been very valuable to me is there's a cultural norm here of asking the question does it change the answer or another way would be what's the next best simpler alternative the idea is if we don't ever ask ourself does our analysis change the outcome the answer what we're actually trying to study we can do infinite analysis because there's an infinite number of things we don't know and we can keep entire teams busy inside of organizations doing infinite analysis that may just end up as appendix pages in the back of a PowerPoint presentation and may never drive our organization so I would like to encourage leaders within organizations to have candid conversations with their analytical teams about does the research or the analysis we're doing now have potential to change the answer of the decisions we make and if the answer is probably not ask yourself why you're throwing resources at it yeah I've watched organizations do analysis just because the leader was concerned they would be standing in front of their board and be asked a question that they might not be able to give an answer to when the answer might be that's not relative not relative to our business are not relevant sorry that is not relevant to our business and so we need to ask these questions so that we don't spend our precious analytical resources on solving not very important problems and similarly you know as an economist I think about having an impact on the margin so if we ask ourselves what's the next best simpler alternative you know we should never compare our analysis our methodology compared against doing nothing because doing nothing is rarely the alternative usually it's something that's a little simpler so if to implement a very complicated model well we should be comparing it not to no model at all but comparing it to our old forecasting method or a simpler easier cheaper faster forecasting method and then ask ourselves is the sophistication of that new method worth the added complexity I think that's where so many rich and important conversations in data science teams will happen in the future yeah I love that and actually whenever I teach machine learning for example I actually get the learners to establish a baseline model not using machine learning I'll get them to do you know 20 minutes of exploratory data analysis look at some of the features and make a prediction themselves in a classification challenge not using machine learning and that will be a baseline model against which I get them to test any other machine learning model they use later on that's such a good idea Hugo you know I see this done with public policy often there'll be some policy proposal and the benchmarks that are given of the effect of this policy are like relative to doing nothing and it's like that's not the good alternative so I love that you're doing that with the class and I also like that you mentioned plotting the data first I think somebody already gave this is the call to action in one of your interviews but plot your damn data could be a very good entre for all of us I love it I'm actually I put it up on my wall this evening fantastic I'm gonna get bumper stickers made up fantastic JD you rock it's been such a pleasure having you on the show thank you here you go I appreciate the opportunity I look forward to seeing you soon thanks for joining our conversation with JD about data science insurance reinsurance and the importance of empathy and data science we saw how quantitative disciplines such as insurance have been using data science techniques since way back when and how essential statistical modeling is to the world of risk representations in particular JD specializes in the art and science of simulating the outcomes of stochastic models to get out the distributions of possible outcomes in order to quantify risk we also saw the importance of empathy and data science and the central notion of thinking about your users or who whoever will be on the receiving end of what you're constructing whether it be a product or a stackoverflow question JD also enlightened us with the empathy hack of building user stories thanks JD also make sure to check out our next episode a conversation with Tanya kasha rally a founding partner of TCB analytics a boston-based data consultancy Tanya started her career in bioinformatics and has applied her experience to other industries such as healthcare finance retail and sports we'll be talking about what it means to be a data consultant the wide range of industries that Tanya works in the impact of data products in her work and the importance of rapid prototyping and getting MVPs or Minimum Viable products out the door I'm your host Hugo ban Anderson you can follow me on Twitter at you go about and data camp at data camp you can find all our episodes and show notes at data camp comm slash community slash podcast youin this episode of data frame a data cam podcast I'll be speaking with James long VP of Risk Management for Renaissance reinsurance and a misplaced southern agricultural economist Quan stochastic modeler and cocktail party host James otherwise known as JD and I will talk applications of data science techniques to the omnipresent worlds of insurance reinsurance risk management and uncertainty what are the biggest challenges in insurance and reinsurance that data science can impact how does JD go about building risk representations of every deal how can thinking in a distributed fashion allow us to think about risk and uncertainty what is the role of empathy and data science stick around to find out I'm Hugo bond Anderson a data scientist at data camp and this is data frame welcome to data frame a weekly data count podcast exploring what data science looks like on the ground for working data scientists and what problems that can solve I'm your host Hugo von Anderson you can follow me on Twitter as you go down and data camp at data camp you can find all our episodes and show notes at data camp comm slash community slash podcast hi there JD and welcome to dataframes hey you go it's great to have you on the show really excited to have you here to talk about data science insurance reinsurance your work in the our community the role of empathy and data science which we've had great conversations about before but before we get into all of that I'd like to know a bit about you and maybe you can start off by telling us what you're known for in data community yeah that's interesting here we go I'm never completely sure when I meet new people if what they may have expressed into me or run into something that I wrote I think most common or maybe asking our questions on Stack Overflow possibly a presentation at a conference or maybe starting the Chicago our user group or maybe some foolishness on Twitter it's really hard to guess and of course your role in asking questions on Stack Overflow was quite early on there right yes so this this goes back to when Stack Overflow was really first starting the story there's kind of interesting that notoriously in the our community the our help email list at the time didn't suffer fools well or newbies for that matter and there was a lot of encouragement to RTFM and that sort of thing so it was a completely new be friendly and this was about the time I was learning our and I had observed that and Mike Driscoll was toying with the idea of a beginner's our mailing list and I contacted Mike I had been watching how Stack Overflow was being developed and I said Mike I'm not sure like maybe what we should do is try to get new users using Stack Overflow and because it looked innovative to me which you know it now that Stack Overflow is eating the world it's kind of quaint to think about but Stack Overflow had some social science thinking in their design it had rewards incentives thinking about the effect of design or nudges if you would sort of we think about in behavioral economics how do you nudge people towards good behavior and it seemed like a good environment for newbie type questions so Mike Driscoll and I and a handful of other people we got from the are seek website a whole bunch of questions people had typed into they are the search engine this was a website dedicated to our information and we tried to figure out what do we think people were asking so we created you know I don't know 100 questions and answers or something and we did a flash mob at the birds of a feather session at Oz con 2009 and I was not there I was in Chicago at the time living and but we I participated virtually and we seeded Stack Overflow with a bunch of questions and answers and that sort of kick-started they our our discussion there and later another one was done and I continued to be active asking questions and as of I haven't looked recently but as of a few years ago I was still one of the top question askers on Stack Overflow for the tag of our for the our programming language and so in terms of this initial inspiration of help didn't really suffer fools on you bees well how do you think that is played out over the past several years and where we are now yeah it's a good question I mean if we look at Stack Overflow they've had their own challenges and growth issues you know at first it was just getting to mass and then they have clearly become the the dominant you know monopoly for information on programming questions and it's a really good resource but in you alluded to this we'll talk to it a bit later there's this issue for a need for lots of empathy both in question askers and in question answers and that's proving to be a challenge in kind of some new ways so I think where we are now is Stack Overflow is a fantastic place to get information there's so much information there most new beginners are able to find their question already asked and not have to ask question or venture into that and if they do one there's the opportunity to get people to answer oh really I'm really excited to get back to this idea of empathy and data science and that's a little teaser for something coming up a bit later but tell us a bit about what you're up to now and what you do currently JD sure so what pays the bills is I'm VP of Risk Management for Renaissance reinsurance although I lawyers prefer that I state that everything I'm sharing here with you is my personal views of course I'm not representing the company I work for here but I've worked for Renasant three for over nine years now and been in in different insurance companies and reinsurance companies most of my career and can you tell us remind us what insurance is and tell us what reinsurance is sure so insurance think most people are familiar with it because of their house or car or property insurance they have it's a company that makes a payment on policies when adverse outcomes happen what reinsurance is is something that most people don't face or interact with and that is that individual insurance companies will buy protection from events or losses that are bigger than that insurance company could handle so an example of that would be homeowners insurance company in Florida may have more exposure to hurricanes than they have capital to pay out claims and so they would need to buy reinsurance to help make sure that they can make good on their promise to pay future claims so you're insuring the insurance we insuring the insurance so you know you must know what my next question is is it insurance all the way down it is insurance all the way down so let me tell you a language story here real quick I've done some work with the World Bank and a number of years ago I was in Mongolia and we were discussing insurance and so we asked I'm trying to learn what the word was and they said well the words dotco and so we're like oh okay well what's reinsurance and they're like well it's coal coal we're like okay now what we do this thing called retro where reinsurance companies trade with other reinsurance companies is that dot gold coal coal and they assured us that it was not but it seemed intuitive to me so yeah yes it is insurance all the way down I suppose that's like asking us is it re reinsurance whereas you call it retro right yeah we could we call it retro but that would be reinsurance and we stopped counting the reason we start trading it around between the reactors fantastic so I'm really excited about talking about insurance and reinsurance particularly framed by you know the new emergence of data science because insurance and you know actuarial sciences have been around for a lot longer than than data science so I'm interested in this pact but before we get there I just want to hear a bit about your story and how you got into data and data science originally yeah so you know like most insurance data scientist I'm an agricultural economist that obviously is not intuitive at all but I came into agricultural economics and in the 90s and when I graduated with an undergrad and I guess it was about 96 I was starting graduate school and I remember talking to my major professor about where the PhD graduates that year were going and one of the PhD graduates was going to American Express and I remember being like baffled like she's an ID and agricultural economics was she going to American Express for and he explained that American Express recruited explicitly agricultural economists because we have a very applied background not pure theory actually had an experience working with data they tended to have coding experience now this was 96 so that's mostly SAS in the university I went to now let's put this in perspective cran the our network started in 1997 so this was the year before cran even existed and Python was not available on dos and windows until 94 so this was just a couple of years after Python was available on Windows platforms so you know we were in SAS we were using mainframes and UNIX machines and American Express was hiring and recruiting agricultural economist because they've had some it's coding with this kind of messy real-world data and you know within agricultural economics I got an exposure to you know crop insurance model building lots of regression analysis we call it econometrics and building those models at what at the time seemed like a degree of scale it seems a little trivial in retrospect but that's kind of how I got in and I like to tell the story that this story under the pretext of agricultural economics is the kind of OG of data science because we've been doing these sort of combining programming and domain expertise in statistics you know for a long time and later the data science name sort of caught up but I've been doing that same sort of thing for a number of years and I suppose also you know working with like serious real data sets as well right and messy data yeah absolutely so that you know we were working with actual you know field data literally field data and sometimes long historical sets and it would require you know cleaning of outliers and a lot of the same sort of things that we talk about now taking trend out you know looking at analysis of time series and cyclicality and removing that before you start building a model to explain other things so a lot of these methodologies we've been you know using an agricultural economic for a number of years and kind of my experience with applying that to agricultural insurance is how it was my gateway to entering into financial risk and specifically insurance and reinsurance cool so what are the biggest challenges in insurance and reinsurance that you think data science can have a huge impact on or is currently having a huge impact on yeah so my view here is a little bit skewed because if we think about like the problem space there's a bunch of things going on in marketing the marketing of insurance you know where you see ads online in the claims process of how claim payments are made and how quickly those can be made by using data analysis and operations inside of companies there's big gains being made on all those I don't work particularly in those three areas I work more in what we would call underwriting and risk the distinction there underwriting is you know the decision about taking an individual risk now at an insurance level that might be whether or not a company writes a given policy to a person or a company in reinsurance it's more understanding the risk of a deal that may have hundreds or thousands of policies underneath it and then risk or risk management is kind of broadly thinking about how do all of those risks aggregate up inside of a reinsurance company you know some will be correlated some will be idiosyncratic some may be anti-correlated and then how do you think about rolling up that risk inside of the reinsurance company and being confident that you have the right amount of capital to hold behind that but not too much Kathryn alright and I suppose essentially that it's a huge task to as you say rolled all up and aggregate it to make one final decision based on all the data and all the modeling coming in right yeah exactly right so the a lot of little decisions get made and in the way that that feeds back in shapes the portfolio at least you know in the companies I've worked at is some feedback mechanism for a risk adjusted return on capital so when an individual deal is looked at its evaluated relative to the portfolio as a whole and there's some capital charge and it's guy that deal needs to be profitable in excess of the capital that's required to hold behind the deal so that's how we think about feeding back from corporate risk to the deal-making side great and so what industries do you insure will work in so you know what by the time you're aggregating at the reinsurance level it's very global and it's every industry because we're trying to spread risk across across really the whole globe and all industries so that we aren't concentrated in one specific area now if you think about the space for data science and insurance and reinsurance you know marketing ops claims that I mentioned earlier wellmaybe claims is but definitely marketing in ops they're not super reinsurance insurance specific those are very similar in lots of other transactional companies but the risk and underwriting is fairly domain knowledge intense so the domain knowledge there is really more about the deal understanding the type of risk how those risks fit together into a portfolio and you know for me I work both in the in the micro and the macro so the micro would be looking at individual deals and then the macro is this corporate risk management component I have a unusual job in that I do a little bit of both so JD why don't you tell me a bit about the micro scale then we can move on to the macro so for example maybe you tell me a bit about how the crop insurance modeling works yeah sure Hugo so if we look at crop insurance in the US which is one of the most mature crop insurance markets the current products that dominate that market have only been around since 1996 so the historic record isn't very long for that product and so we have to say well what data do we have about crop insurance and what we have is a history of agricultural yields that goes back in a long time series we have a history of agricultural commodity prices and we have a history of weather so one of the more data science e-type activities that I've engaged in is trying to take the data we do have and say ok how might the current portfolio of crop insurance have behaved all these years in the past for which we do have data right so this is a kind of a classic modelling exercise where we're taking something we know and we're trying to kind of project that into something we don't know and build up a historical understanding and once we do that we can do things like well let stochastic ly generate a whole bunch of different yield and price outcomes and see if we can build up a model of a full stochastic distribution of how this crop insurance industry and this give in a given country might work and that was you know one of my more interesting jobs for a number of years was building that model and so that's where we kind of moved from data analytics into something more data science II right we're building models to understand something we couldn't understand otherwise that's really interesting I'm gonna stop you there for a second because you used a couple of terms that I'm very interested in you talked about stochastically generating and then you talked about a distribution so I'm gonna try to tease that apart and let me know where I'm getting this incorrect so let's say we're trying to predict something concerning a market you can stochastically generate and what that essentially means to my understanding is you can simulate the behavior and stochastic means there's some sort of variation right so each time you simulate it you'll get a slightly different result and what you actually get in the end is a lot of different results and you may get a thousand or ten thousand or a hundred thousand that give you some idea of the distribution of the possibilities of the market is that what you're talking about that's exactly right hugo like we do very little predicting of what i think next year is gonna happen what we try to do is say what is the distribution of the potential outcomes for next year and what's the shape of that distribution and we might ask questions like what's the one in a thousand worst case scenario so it doesn't mean like we're thinking a thousand years into the future at all it means this is about next year but it's the improbable way but still possible that next year might turn out this is awesome and i actually think a lot of industries and verticals and basic science research that's adopting data science and data science techniques as methodologies could learn a lot from this conversation because they're still you know still a lot of managers will want point estimates right they'll want the average and then make a decision based around that maybe with some arrow bars but the fact that you're doing these math simulations and getting our entire distributions of predictions i think is a very robust technique as you say you can actually say 1% of the time we see something crazy that we actually do not want to happen at all yeah that's exactly right there's a really good and I'll make sure we have this in the shownotes Hugo that you have it to put in the show notes there's a book called how to measure anything that's a great name by the way in a great name and they have an introduction to this right and they take it from the idea of well initially you're estimating what do we think next year is gonna happen then you start say okay well what's next year and then a high estimate and a low estimate so you're beginning to think about a range around next year's outcome and from there we can start just thinking okay let's increase the resolution right let's what's an extreme event that could still happen and you could begin to think about creating like a some sort of error bars around your estimate and then ultimately move on to this idea of a full stochastic simulation where you have a whole you know thousands of possible outcomes so I want to tease apart something now that we've been using the word risk which will have an intuition of what risk means but there's something you know there's an idea that's coupled to this and I want to try to decoupling in some sense which is uncertainty in the sense that once you do these predicted simulations and get out a distribution you may not know what will actually happen and so I'm wondering is that uncertainty or risk and how do you think about this in insurance yeah that's that's a really good point I would generally think about the outcome of these models that I'm talking about as risk and then uncertainty as a separate thing then let me tease those apart and these get used in the vernacular interchangeably but in 1921 and in a book called risk uncertainty and profit The Economist Frank Knight who's sort of of the Chicago School he's a University of Chicago economist he presented this idea of risk versus uncertainty and the way he defined it is risk is when you understand the underlying distribution but you don't know what outcome you're gonna get so it's like that you know the classical urn full of marbles of you know white and black marbles you don't know which one you're gonna draw out but maybe you've been told at a time what's the ratio of white marbles to black marbles well that would be yes if an another example is if you flip a coin 10 times you can literally write the probability of seeing 10 heads or seeing nine heads or seeing eight heads or seeing seven so you know the entire distribution of possibilities right that's exactly right and then we have other processes where we know the underlying distribution is a Gaussian distribution so the outcomes gonna follow the and in the real world do you have risk as opposed to uncertainty because these are toy examples we have both let me just define uncertainty real quick so uncertainty is the piece where we don't know we know it's not deterministic we know it can have wild outcomes or some other outcome than what we know about but we can't put a distribution around it so that's uncertainty so let's go back to the real world if we're doing things like you know flipping a coin there is some uncertainty that maybe we have a loaded coin now we don't know how to what's the probability of this coin being loaded given no other information just we have it in our hand well we don't know right it's an uncertainty but we don't know what the probability is is probably pretty low but you don't know a better example like from the insurance world might be you know auto insurance is a pretty good example of a a scenario where they're a type of product where there's mostly risk and less uncertainty you know with the product has been around for a long time people behave in relatively predictable patterns and so most of that activity follows a well-behaved historic distribution there's a little bit of uncertainty some wild things happen and tail events happen that worked in your model distribution but it's pretty well behaved now on the flip side would be say terrorism insurance or just think of terror events the underlying distribution we'd only know what it is we know what the historic distribution of terror events looks like we can make a catalog of those but there's no reason to believe that world events are such that the next 12 months is a random draw from a historically stable distribution right we expect the distribution is probably not stable it's probably a function of changing geopolitics around the world and a reaction to events that are going on in real time and so there's a component of risk but there's also a much larger component of uncertainty now does that help makes perfect sense and it has kind of led me down a variety of rabbit holes my first question is to governments or corporations take out terrorism insurance they do they do there are a number of just property policies that would cover in the event of terrorism and there are of course policies that explicitly exclude active acts of terrorism so I'm if I recall I believe in certain countries it's it normal for crop insurance policies to exclude terrorism for example so JT we were led along this path talking about the micro-level you work in terms of crop insurance modeling and risk representation of single deals you tell us a bit about you know the the macro levels that you work at and thinking about insurance and reinsurance sure here you go so if we think about a reinsurance company that has a number of risks in many different lines of insurance I mentioned earlier that some of those risks are correlated in the correlation being can be caused from underlying physical relationships so all of the homeowner's insurance in New York City should be correlated in their outcome because if we have a large event like a hurricane sandy hits New York the impact is going to impact all of the insurance companies that write business in New York so that's a physical process that causes correlation or maybe on a casualty program there's an underlying risk that multiple companies have insurance for and when that turns out to be a problem and there be as a casualty claim it impacts multiple companies and other times they have connection because maybe there's a risk like changing legal framework causes all claim to increase 15% on property claims there's these relationships between the policies that we have to understand as we aggregate the risk together and think about combined risk inside of a reinsurance company sometimes that involves building the physical models like the hurricane and earthquake models where the policies are analyzed based on spatially where on the map the risk is and then understanding the exposure across different programs for risk in a specific geographical location and other times it may be introduced with more traditional modeling methods where the correlation is added after the modeling through something like a copula method so two distributions can be brought together and in a joint relationship be added added using a copula now it's always important to keep in mind that we add correlations at the end sometimes and our modeling the correlation is always kind of and everywhere an artifact of some other process and when we do something like a copula we're just trying to make sure our model data reflects what should be there already but we don't have any other method for putting it in place okay great so you've given me some insight into the types of tools and techniques that you use but maybe you could speak a bit more to what data science looks like in insurance and reinsurance and what I mean by that is you know in tech we know that most of our data will be in our sequel database so we'll query our sequel database and then use our or Python problem R to do a bunch of exploratory data analysis and visualization dashboards if you want to do production eyes machine learning we'll do that in Python so I'm just wondering you know what the techniques and tools that you use on a daily basis are when doing this type of modeling and data science sure so at the the initial deal level in a reinsurance company a bunch of the analysis looks like the historical data science the analysis you just described only the person doing the analysis may self-identify as a catastrophe analyst a cat analyst or they may identify as an actuary but what they're doing is analyzing data that they receive from someone maybe combining it with industry data trying to understand trends that are in the data in order to create this stochastic representation of a single deal so that may follow a similar pattern to other data sciency sort of modeling with the idea that what's coming out the other end is a say mean expectation but also a distribution around it for the outcome of a deal they'll then put that into a risk system and you know I think most companies use a system of some kind that then is a framework where the whole book can be rolled up and understood in a meaningful way and there's a million different approaches for doing that I've traditionally worked with an in-house tool and it handles making sure that deals that are connected because of spatial exposure get connected that way and the final modeling that deals that are not get at least correctly correlated with the other deals in their business class so that these relationships are tied together and reflected so we can get an aggregate distribution that's a reasonable view of these individual marginal distributions marginal here meaning individual deals in a portfolio that we can roll those up into one aggregate deal and understand its characteristics fantastic we'll jump right back in to our interview with JD after a short signal we're back here with Neil Brown for more insights from computational education hi Neil hi Hugo so Neil I'm interested to get some tips on teaching programming okay so one thing that can work well teaching programming is using live coding by which I mean typing in the program code that you're using to teach while the learners watch you as opposed to just having it all done beforehand yep that's right and so the benefit of live coding is that learners can actually see the process of coding a lot of learners get this idea in their head that everyone else just writes perfect code first time and if you turn up with pre-prepared code then you know that's what they're going to think I reckon it's a lot like going to see a play and thinking that all the actors just made up their dialogue on the spot it's not true you need to kind of see the process of construction so life coding is useful when it lets you see the errors that you can be made when you're entering the code and that you can see the debugging process when things go wrong so actually what you're saying is that live coding is best when it's not totally smooth yeah pretty much I mean I have one workshop that I give so often the problem has actually become that I'm so practiced I tend to live code it without making any mistakes that she kind of ruins the advantages of live coding so if you ever do get to that point it's as she worth deliberately engineering in a couple of mistakes useful points live coding seems to have taken off a lot in technical talks too recently yeah and I'm actually a bit more skeptical about its value in tech talks I think the main advantage there is it slows down speakers who are very nervous and tend to just rush through a whole load of slides in one go but if you're an expert who's giving a talk to other people who know how to code maybe about a new API or something then where's the value in actually showing them the coding process there you know they know how to code they just want to learn the sort of the new details so I think sometimes live coding is maybe just a bit of trying to show off and if your audience can already code it can just sort of slow down a talk for no reason because they're just watching you type it in or even worse people get sort of concentrated on doing the live coding while they're giving a talk without actually explaining what they're doing so what you're telling me is that live coding is best for education right yeah I think so so I think you need to make sure you have your endpoint in mind before you start you can't just make it up as you completely make it up as you go along and talk through what you're doing and why you're doing it and don't rush to sort of cover up mistakes instead if you make a mistake just pause explain what the mistake is explain how you're going to fix it and that way you're teaching them if you get embarrassed by mistake and try to hide it then it sends the wrong message to people who are learning when they themselves make a mistake well we all make mistakes while programming for sure exactly my final tip on live coding is only type in the interesting bits so if you've got a bunch of boilerplate that you need like import statements or a skeleton for a class just start with it or copy and pasted it in don't watch make people sort of sit there what's you're typing in the boring parts couldn't agree more thanks Neal for another set of insights into computational education time to get straight back into our chat with JD long so I'd like to step back a bit now and think about you know where insurance has come from the actuarial sciences and now the impact of data science on the discipline as a whole so could you give us a brief history of all of these disciplines and how they intertwine you bet Hugo let's go back 3,000 BCE and the I'd love to the Babylonians this was this was the earliest record I could find of a of a disaster contingency event the Babylonians developed a system of loans where you would person could get a loan for building a ship and they might not have to repay that loan if a certain type of loss of that happened because of certain type of accidents well that's kind of like insurance right kind of like a builder's loan so the idea has long been around now one of the things I find interesting is is edmond halley of Halley's Comet fame created one of the first modern style mortality tables and that was in 1693 and then around about the same time but completely disconnected from that the Lloyd's coffeehouse which was a place for sailors to hang out and shipowners to talk about what's coming into London on ships the Lloyd's coffeehouse emerged as a kind of a place to drink coffee get shipping news and also to buy shipping insurance and that later became Lloyd's of London which we've all heard of which Lloyd's it may not be well understood outside of the insurance community but Lloyd's is not an actual company that takes risks it's more of a marketplace so it's like the Chicago Mercantile Exchange of risk so lots of individual companies including one I work for take risk at the woods of London so that was late 1600s and then you know computational tools and statistical methodologies developed alongside the actuarial process and became part of that process but an interesting thing happened in 1992 Hurricane Andrew ripped across Florida and then kind of recharged in the Gulf of Mexico and plowed into Louisiana and Alabama and it was a huge catastrophe for the global reinsurance market because prior to 92 hurricane reinsurance was kind of a gentleman's game and it wasn't really a quantitative well understood risk business and Andrew caused many reinsurance bankruptcies and it was a big contraction of the of the market there just wasn't a lot of capacity for reinsurance because of that event and that was filled by the crop of reinsurers that sprouted up on the island of Bermuda and you know that market became a much more quantitative analysis market that looked more like the quantitative finance world and that has driven the way reinsurance around the globe has been modeled and approached that was really the turning point of reinsurance becoming much more quantitative and also how I ended up living on Bermuda for four years that's incredible so firstly why Bermuda well you know the history there is it's got reasonable proximity to the United States but it's a favorable tax jurist action for endeavors requiring lots of capital and not a lot of people so the reinsurance companies based there are you know it's not a tax loophole type of jurisdiction it's been a place where there's no corporate income tax but it's also well regulated so it ends up being regulated at a level that's consistent with mainland Europe but with not very heavy corporate tax structure so activities like reinsurance which has periods of high returns followed by a year or two with negative returns it's pretty tax efficient to do those in Bermuda and so that's why it sort of cropped up in 1993 as a jurisdiction for global reinsurance and especially US catastrophe reinsurance so something we've mentioned several times this is this idea of building models and you said that you know building models is is really key to your work you just say bit about what model building actually means to you and what it entails sure Hugo when I think about model building in the context of insurance and reinsurance what I'm really always thinking about was this process we've discussed a few times where it's coming up with a distribution of outcomes that reflects the possible outcomes for a given financial contract that's the simplest way I can think to describe it so you know we might use dozens and dozens of different methods there's different approaches to try to get our arms around the risk and uncertainty of a financial deal and depending on what data is available you know we might use complicated regression analysis we might use a Bayesian method we might even use a you know machine learning deep neural network of some kind but ultimately what we're trying to say is we have a potential contract we may enter and we're trying to understand all the possible outcomes to make sure that the reinsurance company is being compensated for the risk that they're taking on as part of this contract so the model quote-unquote could lots of things that possibly are very complicated or it may be there's very little data and we're gonna look at the past you know 15 years of experience and we're gonna fit a distribution to that because that's all the experience information we have and then we're gonna put a little premium on there a little extra load for this uncertainty because we can't fully quantify the risk so that's what I mean when I think about modeling in this context okay great so I want to find out a bit more about how data science has impacted the insurance and reinsurance world and actually the Avenue I want to approach it from is there's a great quote by Robin wiglesworth from the Financial Times who said traders used to be first-class citizens of the financial world but that's not true anymore technologists are the priority now and I would actually that was in 2015 I would say that data scientists now are first-class citizens of the financial world and in terms of insurance and reinsurance I mean actuaries have always been the first-class citizens of the insurance world and how is this relationship now with the emergence of data science working there well you know you go there's there's been a little fluke historically in actuarial science in that the historical fluke that resulted with me really ending up in this industry is that the catastrophic events the catastrophe modeling did not exactly fit in the historic actuarial methods very well because sometimes in catastrophe insurance we're pricing and modeling risk that we've never observed historically so maybe we're looking at a reinsurance deal that would be impacted by a worse hurricane than we have ever experienced or a hurricane season with more hurricanes than we've ever experienced and if you look at a actuarial method that's based on looking at historical data and you know making Corrections for sample size and evaluating that using heuristics that kind of expect large sample size it doesn't work very effectively for these extreme tale events so my work in crop insurance that it was really around catastrophe work and similarly property cat work whether it's hurricanes and earthquakes often deal with these risks that are so far out in the tail we haven't experienced them and so it gave a lot of opportunity for those of us with quantitative background maybe as systems modeling and historically like engineers who do engineering modeling to work in the space alongside actuaries and what we're seeing is a very fruitful environment in my opinion in insurance and reinsurance where there's hopefully a collaborative work between actuaries who have a tremendous set of tools experience and knowledge that's specific to insurance but keep in mind a lot of if it is heuristics that make certain assumptions and then we've got data scientists and financial engineers and systems modelers who are used to modeling slightly different things making kind of different assumptions often with different constraints and if we can get those two groups working together we can make even more effective models and my experience relatively recently I was just last month I spoke at an actuarial conference and one of the sessions I sat in after I presented I was really impressed because one of the actuaries shared an actuarial methodology and then after he kind of shared it he said now here's a more data sciency way of doing this the way our data scientist friends might approach it and he shared the exact same example but working through using some type of GLM and he showed how the answers were similar but where they might differ and I thought that's the future inside of insurance companies is we can get the actuaries and the data science scientists talking together about what are the strengths and weaknesses of our different methodologies and get the deep business understanding from the actuaries and maybe some of the methodology experience of the data scientists sort of deployed at the same problems I think that would be tremendously powerful and that falls apart only if one side or the other kind of isn't in a very collaborative place so I'm a huge proponent of sort of collaborative data science that's fantastic and I think it actually provides a wonderful segue into what we've promised the the eager listener previously because a key component right of these types of collaborations particularly with such strong-minded communities such as actuaries and data scientists a key component of that collaboration a requirement a necessity in fact is empathy yeah is Sharia Syria so you gave a wonderful talk that I saw when we first met IRL with correspondent before that but when we met at our senior conf in San Diego earlier this year you'd have a wonderful talk called empathy in data science and I just love to hear your take once again on what the role of empathy and data science is at the moment in your mind yeah you go I feel like I don't think empathy is a panacea for all of our problems however I do observe on a very regular basis situations that really need empathy in order to bridge to people who are talking past each other or a person who's making what is obvious to other people but not to them is kind of a boneheaded mistake because they aren't thinking about who's consuming what they're producing you know my example I alluded to earlier was on Stack Overflow I watch people ask questions on a regular basis and they clearly are not thinking about the person who's receiving the question who's going to answer their question and making it easy for the question answerer because if the asker was making it easy for the question asker they would make an example that allah had code that the answerer could copy and paste into their environment execute it and observe what the question asker is seeing right and immediately be able to help but instead they the asker may put incomplete code or maybe not even syntactically correct code and the question is I'm trying to do something and it doesn't work what's wrong and the answer has no way to know and if we can bridge that by helping an asker in and invite that environment and think to themselves what's it like to be on the other end of this question what's it like to be the other person and how can I make their life easier and basically help them help me they'll find they're much more successful at what they're after well it's the same inside of our workplace right if we're doing analysis I have to ask myself you know maybe I'm doing analysis that's gonna equipped an underwriter to negotiate a deal I have to think what information does that underwriter need to be well equipped to negotiate this deal and that's gonna drive my thinking of how I serve that person with my analysis so JD tell me about the role of empathy and data science sure you go I think I've just observed so many situations over the year where I felt there were two parties and engaged in a conversation who were talking past each other and didn't quite appreciate where the other person where their understanding was or what they were concerned about and I'm not so Pollyannish as to assume that empathy is the solution to all our problems but we have a lot of business problems and data problems that could be greatly helped by a dose of empathy and a good example is when I alluded to with observing questions and answers on Stack Overflow I observed any number of situations where question asker clearly has not thought about the situation the answer is going to be in because if the asker had they might have put an example that they could could be copy and pasted by the answer into their environment and executed and the answer or see exactly what the problem is and answer the question but instead we get often sort of conceptual ideas I'm trying to do this thing here's a little piece of code you can't actually run it because you don't have my data but I'm not getting the answer I would expect help me fix it and that's really hard for an answer to answer and this got me thinking about empathizing with the other person and early on and Stack Overflow grew at first I felt like askers needed more and at times I feel now sometimes like the answers could use some empathy as well but the same is true like in our business environment if I'm working with an underwriter to do the analysis for a deal I need to be thinking about what does this person need when they go to negotiate this deal what analysis do I need to have done that they can have in front of them to make them more effective right this isn't about me and my understanding I'm not doing this as a science fair exercise so that I'm smarter about risk I'm doing it towards a business purpose of providing insight for a negotiation so that's a useful mindset and I feel like it's one that we need to explicitly teach a lot of people that will resonate for to resonate with immediately and others it may need some more work to help them build this empathy muscle if you will of learning to think about who's reading my analysis what are they doing with it maybe who's my user and so I think there's lots of you know ways we can build that and it's an important part of data science in my opinion yeah I couldn't agree more I will say though that to approximate some sort of truly empathic behavior or mind frame that can be really energetically consuming so are there any ways we can approximate it or hack empathy absolutely so my favorite example of this actually comes from the agile development methodology which is more of a computer programming thing than a specific data science II thing but in agile they do this method where they do user stories so you know it's Hugo is a data scientist who's trying to understand X he needs this tool to do Y so that he can understand X well what's so great about that in my opinion is it forces the developer who's reading it or the data scientist who's reading it to think about what it's like to be Hugo I mean it's an empathy hack now none of the agile methodologies that I've ever seen use the word empathy like it's just not mentioned but that's what we do with user stories and you know I've had situations inside my company where a developer would be developing something and I'm like that's a great idea but you know I know your user personally like I have lunch with them and they're not gonna think that's near as great because you're building the tool you want not the tool they want so think about you know you're in user or if you're a data scientist producing an analytic or a model outcome think about who's consuming it so we can give lots of little nudges whether it's something explicitly like an empathy hack from agile the user story or sometimes it's just reminding someone hey remember your person consuming this you know has a name it's Bob and we know that Bob doesn't think that way right and we actually we have learner profiles at data camp which is similar with respect to you know what a learner's backgrounds will be along with how advanced they are as you know aspiring data scientists and whenever we build courses we very much think about who this course which one of our learner profiles our set of learner profiles these courses are aimed at that's super Hugo you know the podcast 99% invisible had a great episode on designing for average and how basically if you design for average you design for no one and we'll make sure that's in the show notes but I think it's such a fantastic idea to actually give your target audience a name so we can route the people working on products for them can relate to them that's a super idea that's great and this is actually we had a segment on the podcast with my friend Court who is caught developer and maintainer a standard probabilistic programming language and he was talking about what's commonly referred to as the tyranny of the main oh gosh so true you know in a couple of dimensions you're fine but as soon as you get in multi dimensional space if you're thinking about measuring someone's height someone's leg length perimeter of thighs and cars and that type of stuff suddenly if you have designed something for the main there you're absolutely lost because nobody really is around that mean at all yeah not not in all dimensions right so if I remember the 99p I article had a statistic and I'll probably be wrong but the gist my takeaway was something like if you have three two engines of human body dimension like you know leg length arm length head circumference you know hand size any three in a small margin of error only six percent of your population is going to be near that because everybody's off a little bit in some dimension it's incredible okay so we've talked a lot about data science insurance reinsurance and empathy and data science where it's led to now what does the future of data science in insurance reinsurance and and otherwise look like to you well I am really suspicious will see the term data science wane some and I think that's fine it was very very helpful term for a number of years to help us think about bringing in like technology computer science II type terms along with business acumen and statistics it will fade I think because it's become so obvious and we kind of is going to be the data analysts of the future is going to be much more data science II than a data analyst of you know five or ten years ago that I'm confident of that and you know I was just explaining having a conversation at coffee today after lunch it's a friend and we were discussing this idea of where's the market opportunity he works in the talent acquisition you know like the head on her space where's the market opportunity and you know I was telling him like well it seems like deep learning and a lot of these very complicated artificial intelligence type methodologies get a huge amount of ink spilled because they're interesting and they do have the potential to make some revolutionary changes and that's great and there's good needs to be worked there and there will be but I think about the other tail of the distribution and I think about your former guests Ginni Brian and her work of trying to get people out of Excel if she's like it's a widely spread need and you've got nobody else crowd in the space so I think the future is going to be building a lot more structured process and structured tools around so many things that aren't you know the sexy deep AI blockchain based G wizardry it's gonna be a lot of more mundane things but are gonna fundamentally change how efficient organizations are great so I've got time for one more question and what I really want to know is do you have a final call to action for all our listeners out there yes you know one of the things that I realized in the organization I work in one of the cultural norms that's been very valuable to me is there's a cultural norm here of asking the question does it change the answer or another way would be what's the next best simpler alternative the idea is if we don't ever ask ourself does our analysis change the outcome the answer what we're actually trying to study we can do infinite analysis because there's an infinite number of things we don't know and we can keep entire teams busy inside of organizations doing infinite analysis that may just end up as appendix pages in the back of a PowerPoint presentation and may never drive our organization so I would like to encourage leaders within organizations to have candid conversations with their analytical teams about does the research or the analysis we're doing now have potential to change the answer of the decisions we make and if the answer is probably not ask yourself why you're throwing resources at it yeah I've watched organizations do analysis just because the leader was concerned they would be standing in front of their board and be asked a question that they might not be able to give an answer to when the answer might be that's not relative not relative to our business are not relevant sorry that is not relevant to our business and so we need to ask these questions so that we don't spend our precious analytical resources on solving not very important problems and similarly you know as an economist I think about having an impact on the margin so if we ask ourselves what's the next best simpler alternative you know we should never compare our analysis our methodology compared against doing nothing because doing nothing is rarely the alternative usually it's something that's a little simpler so if to implement a very complicated model well we should be comparing it not to no model at all but comparing it to our old forecasting method or a simpler easier cheaper faster forecasting method and then ask ourselves is the sophistication of that new method worth the added complexity I think that's where so many rich and important conversations in data science teams will happen in the future yeah I love that and actually whenever I teach machine learning for example I actually get the learners to establish a baseline model not using machine learning I'll get them to do you know 20 minutes of exploratory data analysis look at some of the features and make a prediction themselves in a classification challenge not using machine learning and that will be a baseline model against which I get them to test any other machine learning model they use later on that's such a good idea Hugo you know I see this done with public policy often there'll be some policy proposal and the benchmarks that are given of the effect of this policy are like relative to doing nothing and it's like that's not the good alternative so I love that you're doing that with the class and I also like that you mentioned plotting the data first I think somebody already gave this is the call to action in one of your interviews but plot your damn data could be a very good entre for all of us I love it I'm actually I put it up on my wall this evening fantastic I'm gonna get bumper stickers made up fantastic JD you rock it's been such a pleasure having you on the show thank you here you go I appreciate the opportunity I look forward to seeing you soon thanks for joining our conversation with JD about data science insurance reinsurance and the importance of empathy and data science we saw how quantitative disciplines such as insurance have been using data science techniques since way back when and how essential statistical modeling is to the world of risk representations in particular JD specializes in the art and science of simulating the outcomes of stochastic models to get out the distributions of possible outcomes in order to quantify risk we also saw the importance of empathy and data science and the central notion of thinking about your users or who whoever will be on the receiving end of what you're constructing whether it be a product or a stackoverflow question JD also enlightened us with the empathy hack of building user stories thanks JD also make sure to check out our next episode a conversation with Tanya kasha rally a founding partner of TCB analytics a boston-based data consultancy Tanya started her career in bioinformatics and has applied her experience to other industries such as healthcare finance retail and sports we'll be talking about what it means to be a data consultant the wide range of industries that Tanya works in the impact of data products in her work and the importance of rapid prototyping and getting MVPs or Minimum Viable products out the door I'm your host Hugo ban Anderson you can follow me on Twitter at you go about and data camp at data camp you can find all our episodes and show notes at data camp comm slash community slash podcast you\n"