The Role of Data Science in Sports Betting and Bookmaking: An Interview with Marco Caracciolo
In this episode, we're joined by Marco Caracciolo, the head of data science at Pinnacle Sports, a leading online sportsbook. As we discussed earlier, data science plays a crucial role in predicting outcomes in sports betting and bookmaking. Marco's team uses Bayesian modeling to analyze vast amounts of data, making predictions that have proven accurate time and time again.
For Marco, having a classic lack of data is a significant challenge. "Every game is so short," he notes. "You never ever have sufficient data for painting in films." This lack of data makes it difficult to build accurate models, which is why Bayesian modeling has become essential for his team. By using Bayesian modeling, Marco's team can update their predictions in real-time, taking into account new information and trends.
Another challenge Marco faces is the classic "wet noodle" problem – having too little data to make meaningful predictions. However, by leveraging large datasets and advanced statistical techniques, his team has been able to build accurate models that have proven successful over time. This approach not only helps Pinnacle Sports make more informed decisions but also enables them to provide better odds and predictions for their customers.
One of the most interesting aspects of Marco's work is how he approaches betting on Game of Thrones. As a fan, he loves watching the show and often finds himself guessing about upcoming events. But what sets him apart from other fans is his willingness to share these predictions with others – even if they don't always come true! In fact, he's known to have placed bets on various outcomes, making it an entertaining experience for those around him.
When asked about the favorites in the upcoming season of Game of Thrones, Marco reveals that Daenerys Targaryen is currently the favorite to win the Iron Throne. However, he also notes that Jon Snow and Bran Stark are close behind, making this a highly competitive season. As for why Tyrion Lannister isn't considered a top contender – well, that's still up for debate!
In conclusion, Marco Caracciolo's work in data science is truly fascinating. By applying advanced statistical techniques to complex problems like sports betting and bookmaking, he's able to provide accurate predictions and make informed decisions. His commitment to democratizing data science and analytics has also had a significant impact on his colleagues, with over 150 team members trained in data fluency through DataCamp.
As we wrap up this episode, Marco emphasizes the importance of teaching others to become more data fluent. By spreading knowledge and skills across organizations, he believes that data science can be made accessible to anyone, regardless of background or experience. With his approach and expertise, it's clear that data science is here to stay – and will continue to play a vital role in fields like sports betting and bookmaking.
As our conversation comes to a close, I'd like to extend my gratitude to Marco for sharing his insights with us. His work serves as a reminder of the importance of education and training in data science, and we're excited to have him back on the show again soon. If you're interested in learning more about data science and how it's applied in sports betting and bookmaking, be sure to check out our show notes at DataCamp.com/Community/Podcast.
Finally, I'd like to take this opportunity to encourage all of our listeners to share their own experiences with data science. Whether you're a seasoned expert or just starting out, there's always room to learn and grow in this field. As Marco so eloquently put it, "Teach if you are data scientist, help your colleagues to become data scientists – that will make your life easier."
"WEBVTTKind: captionsLanguage: enthis week I'll be speaking with marco bloom trading director of Pinnacle Sports Marco and I will talk about the role of data science in large-scale bookmaking how Marco is training an army of data scientists and much more at Pinnacle Marco uses tight risk management built on cutting-edge models to provide bets not only on sports but on questions such as who will be the next pope who will be the world hot dog eating champion who will land on Mars first and who will be on the Iron Throne at the end of Game of Thrones will discuss the relations between risk management and uncertainty how great forecasters are necessarily good at updating their predictions in the light of new data and evidence how you can model this using Bayesian inference and the future of biometric sensing in sports betting and as always much much more for the record we recorded this conversation in December 2018 welcome to data framed the weekly data camp podcast exploring what data science looks like on the ground for working data scientists and what problems are consult I am your host Hugo Bound Anderson you can follow data camp on twitter at data camp and me as you go down you can find all our episodes and show notes at data camp comm slash community slash podcast this is data frayed hi there Marco and welcome to data framed oh hi thanks for having me real pleasure to have you on the show and I'm actually really excited to have you here today to talk about sports betting how data science plays a huge role in what you do as trading director at Pinnacle with respect to sports betting and also the fact that sports betting as in your line of work doesn't only allude to sports but that at pinnacle you do lots of different types types of bets I'm really excited about getting into the weeds there but before we get to all of that I want to find out a bit about you and so I'm wondering first what your colleagues would say that you do risk management I think I think that's the probably the best assessment now I'm responsible for managing all the risks that is associated with a Chester a pineco overall sports live free life any aspect of the betting III I manage the risk of in occur fantastic and do you think your colleagues as you do so much quantitative stuff they have an awareness of kind of the ins and outs of your daily life or today do they think it's all let's say textures and whiteboards or pen and paper or writing code and building models black box for them I'm you know the other day I mean most of them have different different areas of expertise and inner workings of the trade floor just too complex and too specific these days but I think that's true for most areas if you deep down look and said how much do you actually know about other areas anymore so I would say the day-to-day is probably unknown for them but one extra day to day job is yeah and I think you're right that judo increasing specialization across so many disciplines that it is things do become more more blackbox as we head down that path so maybe we can step back a bit and you can just tell me a bit about what pinnacle actually does and this is 2018 is our 20th anniversary we are one of the largest bookmakers in the world and we are known for being a very efficient bookmaker in terms of pricing we are considered some people compare us to the Nasdaq of prices meaning that the traditional bookmakers that people know and heard of are usually more their recreational field offer of work and vinegar is actually a true bookmaker that means we have very low very high limits the website is not so flashy but we have an API that people can interact with we are like a real true bookmaker trying to cognitive analysis of sponsor events and other events and a lot of people to build models against us and and place wagers with us and so then as trading director what does your day look like what are the ins and outs of your actual job I mean it largely depends on on on season so a sponsor so obviously very seasoned owner you have your big events like this summer we have the World Cup will change its my job dramatically but overall day to day would be sitting down with my managers maybe going over the week or over the month discussing some plans about some products that we want to roll out discussing some models that we need to test discussing some of the new strategies we want to try and overall it's like a constant strive to improve our product and obviously to analysis about things that we tried that didn't go so well there's a bread about a hot day today so how did you get into data science initially by sheer force so I was always like math garden and but I was no I wasn't data science and once we started building our fun team out you know our consulate before we were used Excel for everything and then the cons started using our and you know the recording in our and I quickly picked up that the level of efficiency gain they had over me was was order of magnitudes they could analyze data so easily there was unaccessible to me just because of the natural restrictions of our Excel and so I started at the Coursera course if my lectures there and I'm solid coding our and then pretty soon it became a bread and butter to a tool for me I couldn't actually believe that until I didn't have these skills have before and did my job and which Coursera course was it that you took so it's the very first one it was the very first cause I think it's actually the data science Trek there's for Michael so let's Roger Payne and Jeff Lee yeah roughing degree exactly that's the original first course I talk I'm in exactly the same position I actually I spoke with Roger about this one on this podcast that I was actually in one of their first cohorts and maybe you were to around 2012-2013 something like that yeah baby around that time for sure yeah it could have been a tough because I I didn't come from coding so for me with this is brand-new I thought was a really tough course for me I actually struggled Colorado and though the thing is I knew I had enough expertise of my team that the answers were available to me if I if I had a question so on and I knew exactly what I wanted to achieve so I had a very clear goal in my mind you know what do I want to achieve like I wanted to interact with our data directly I want to I want to access our our database directly and do analysis over it without the need for to ask somebody for data poor and then the state of your hands have some some missing columns or missing attributes and you transform again and I need to give to the analyst team I just wanted to reduce the red tape and be and be able to be self-sufficient so some people might have a question revolving around like the Venn diagram of data science and sports betting and I'm wondering historically up until now like what the role has been of analytics and data science for bookmakers in bookmaking you ever you have a few leftover of data analysis which makes it really interesting you have the classical sports analytics how does a spot work and save my metrics for the people who know baseball was leading in many aspects but you know sabermetrics ideas and concepts are now almost existed in every other sport especially soccer you know football for the Europeans there's a high level of error system right now but this is all all the field that surrounds the sport data analysis and but since we are trading house and we actually have a ticket flow coming in and out and so we also have the tradition of an analysis or risk management assessment and basic game theory strategies and all of that stuff in addition so we have a very nice overlap between those two words and then and and have to manage both separately and then meshing together eventually which is often the haha I'm sure so when I came into this Aaron who our first conversation earlier this year I was under the misapprehension that sports betting was really only about sports and you up in my eyes are pinnacle you do all types of bets so I thought maybe you could run us through a few of the more interesting to your mind types of bits you can make inside and outside the sports speiser pinnacle you can bet on literally every single spot that you could possibly imagine and this includes stars and chests and and anything you see obviously in eSports your videos for it's very popular with us but you also have politics politics is a big benningfield you have a few of the more exotic and fun stuff you know since you're recording this from from New York I believe we do Nathan's hot dog eating contest since he I know exactly when when Kobayashi it was was not on top of his game anymore I remember that and we do the Pope election was a fun one no no not on you know that was very interesting to try to price up the Pope the Pope election so it's almost any event in the world here or the year you could even go as far and and do stuff about Game of Thrones so we have a game of Thrones proper who wakes it up the Iron Throne at the end of the season Oscar betting going grow betting you name it literally any event you could possibly think of that's incredible and I am of course I don't want you to give up any of your I pay here and of course you want but I'm wondering like let's take you know hot dog eating contest or Game of Thrones or who will be the next pope I'm wondering how you even I mean you have the technical skills but in terms of domain expertise I don't suppose this hero obviously is your expertise so I mean I mean let's talk about the Pope so so so out of the work we're each columns about populist writers what do people believe to be the truth and then we price according to this we don't have any inside information we don't know anything about it but we read up a little bit we try to prize as as good as we as we can and then you let over market efficiency boost off the crowd your effects shape the price Game of Thrones of the say we ever defense ourselves so we speculate ourselves and but we don't know we don't have an insight we read a bit on we don't know drug george RR martin personally or anybody its guessing its but to me coming to be fair frank these are entertainment props to bet on they are not in comparison like order let's say on a work have came in smoker you can bet up to $500,000 million dollars with us without even questioning and all these kind of props the limits are low maybe a thousand dollars maybe five not dollars so there is a difference between the level of scrutiny that goes into pricing one or the other absolutely so in terms of pricing them I suppose can you talk us through the process from go to whoa in the sense that I presume you you know you have some model which ends up with a problem the distribution or probably mass functional density function with respect to outcome and then you price according to those distributions but maybe you can spell that out in a particular example it really depends it really really depends so I mean yes we obviously have have exactly what you just said some aspect it might just be market prices so the market has a price already but I might mean by that it's like if you would like to open and exchange the trade separate stocks you wouldn't need to do LD ever listen to yourself I mean ever stops is traded at many many exchanges so you have an idea what the price should be and that's the same in sports betting many many bookmakers exist and they work well connected but especially when you talk about a life game you know that we have tons of models running so you feed the model terms of inputs and then the crunches the numbers and spits out something in this several layers of models and all kinds of AI and machine learning elements and it's very sophisticated depending on the sport and depending on how much betting there's thunder in the sport the more betting is done the most sophisticated we have to be because the most of physical people on the other as well so I think once again this speaks to your job of essentially managing risk and I can can you just say a few more words about like what risk management or managing risk in general amounts to view or looks like or how you think about it yeah I mean probably a risk is obvious like even if how do you maximize equity or over probability space meaning even if you have a coin flip both could 50/50 you know you don't gain equity there but you know I hope maybe somebody tastes a little bit more and then but you also now you know would lose a little bit of money how do you how do you hence yourself against this risk can you take it are you willing to take it what happens if you lose is does it have an impact on the financial bottom line the are you exposing yourselves and all these kind of questions right how do we think long term about managing our book you know where we are we're a big company or like everything else has to stay afloat and there's a lot of regulations and in it so you have to be very carefully managing your wrist probably and and then try to balance the book at some aspects there's not always easy to it able to balance the book here and the the way the betting works is often based on news you know somebody is injured and if you're not on top of that and then you don't pick up trading frequency very very quickly you just get overrun by wages and then you Expo in a very unfavorable scenario but that makes perfect sense and how does this idea of risk relate to uncertainty in general uncertainty is there's a few level of uncertainty I mean obviously you have an inherent uncertainty because it's a sport event which is a non perfect environment so you never know exactly what other parameters that matter but you also have uncertainty variance meaning some a some events are naturally just more volatile and less known as other guess and other events quite clearly it has to do with the historic data available if two people had a card you know did the same competition in 100 days in a row you're very very very strong data that you eventually picks up on percentages or the other and then you have events like for example a Soccer World Cup where Germany plays against Uruguay which has not happened ever basically in the sense that these exact teams have never played against each other at the end of a at the end of the NBA season all the teams that played against each other many many times over so you get a very good idea about the relative strengths of San Antonio Spurs so the gold Statesboro to the Cleveland Cavaliers even though maybe they could have capsuled the Spurs have only been paired off two or three times but because of cross relationships you have a very good idea how the south of stripes is but Germany playing Uruguay you actually have no idea I mean Bolton's at meta but German Europe I was 20 years ago but none of the players on the pitch and he won the game has changed so you have a lot of different kinds of uncertainty in this in the gaming world that's interesting so sounds like there's a distinction between kind of uncertainty that you can quantify so that would be risk and uncertainty that you just don't know a lot about the situation so you can't size so much exactly yeah you have the no known and the known unknown and the unknown unknown it's very tricky I mean especially if these peak events are very very tricky and then you notice it I mean baseball this is probably a prime example I mean with a 180 games for season at the end of the season you have a very good idea what the strength is but all bookmakers in the beginning of of the season you will notice that the lines are much more volatile bookmakers including us you know much much more careful and taking on risk because we don't believe that the underlying odds are as certain as you wish they would be but at the end of the season if you want to bet against our we actually are much less willing to adjust our probabilities based on betting behaviors and are willing to accept much more risk just because the certainty has grown so significantly all right so you're saying then that when there's more uncertainty when you have less data less knowledge about the space your lines will be more responsive to betting behavior absolutely a wager that could move a line at the beginning of the season 3% might move the line 0% or 10.1% at the end of the season absolutely you know just because the certainty is there eventually you know this is the price everybody has spoken the entire world and has placed a wager we know that price we're willing to take again but yeah yeah and I don't know whether you think about this in framework in this framework here's stuff like this I think about in an almost a Bayesian sense that you have some sort of prior knowledge about about the space and then once you start getting more and more data you can update whatever you're interested in and get a more precise estimate as you keep updating essentially they're basically thinking this is predominant in our world like almost everything is we do from a Bayesian point of reference yeah Fantana second you know I think as I said this idea of updating like that's what came to my mind when you understand like if if you price anything up you know any any event like you see two people on the street and you mentioned that I'll do a hundred meter dash the initial price might be 50 50 or something and then you see one guy actually is on crutches suddenly the line moves thirty percent right or 40 percent now it's ninety ten but then you see he throws the crutches away and then you're like and you're on an omelet but eventually you have a pretty good idea about okay now now I have no ass dude this guy's all my way this guy looks fit and you're pretty sure about your prize if somebody tells you it's the it's the complete opposite you might not believe it anymore and so and not believing means that our laying in our language we were willing to take a lot of risk before we actually get moved over to the new price we jump right back into our interview with marco bloom after a short segment now it's time for a segment called data signs best practices I'm here with Ben's cranker an independent data science consultant hi Ben hi whew go it's great to be back on the show so what do you do when you have a bug in your code maybe you've tried unit testing but you still can't figure it out try it so a first approximation I think is to include some print statements in your code to see what's actually happening in there when you execute it actually that's a very common approach we are all guilty of sprinkling a lot of print statements throughout our code to find a problem in general good programmers are lazy but that is not the case here first you will add more and more print statements as you track your bug down these print statements make your code harder to read and you have to find and remove all of them later which is a waste of time I cannot deal with any of that and it could be even worse than a distributed system when I was in grad school we had a visiting scholar who was debugging a Fortran program which ran in parallel to find a bug he instrumented his code to log information to a file he didn't really think this through because he was running at scale he filled the head nodes disk and crashed the entire cluster I was delighted when his visit was over and I could run jobs again sometimes print statements can be really bad that sounds like the stuff of nightmares Ben tell me what the alternative is it turns out our engineering friends invented a magic tool long ago called a debugger a debugger allows you to run your code a line at a time to see how it executes you can examine how variables change as your code runs debuggers have pretty much the same interface for every language whether it is python r c++ java matlab or something else once you learn one debugger you can operate any other and Python use PDB in R the debugger in our studio is a great place to start if you are using an IDE it should have a debugger and for those like me who love ipython there's also I paid a B right so often when someone asks me to help find a bug I always ask if they have tried using the debugger invariably the answer is no I don't understand why so many people resist learning to use a debugger or actually using it if they already know how it is not that complicated and you all to master about four commands to use it being a professional means learning how to debug it'll make you more productive and more self-sufficient so how can our listeners get started with debugging start by figuring out how to start the debugger for the language and platform you are using for example in Python run with the - PDB flag which will put you in the debugger when an error occurs or import PDB and call set trace at the line where you want to start debugging and then then use the debugger to step through the code and examine your variables and call stack these commands are identical for any debugger use step to advance one line of code including into function calls next to skip over function calls finish to complete execution of the current function and continue to resume execution right so to recap you use step to advance next to skip function calls finish to complete execution of current function and continue to resume you got it HPA so Ben how do you look at variables in Python it is easy because you still have access to an interpreter to examine any variable or execute any Python code just type code into the interpreter like you normally would anything else that's pretty much it other than if you want to see a local variable or function argument you need to position the debugger to the correct frame in the function call stack but you will have to read the fine manual for that learning to use the debugger will help you quickly find in fix bugs allowing more time to focus on fun stuff like doing science thanks Ben for that delightful introduction into debugging your data science code after that interlude it's time to jump back into our chat with markup so we've spoken around this but as we've cited as director as trading director you think about everything from hour and day to odds making to everything related to markets I'm just wondering if you could speak to how all these different aspects of your job are related perhaps speaking through the lens of a particular real or hypothetical project one of our goals is to to improve our models high accuracy in our models but also open new betting opportunities up to our clients and more interesting betting options allow people to then you know hone their own models and first give us liquidity and then the Machine gets rolling right there and the other day when we are we are a very low margin high-volume bookmaker I believe it like Walmart right we don't want to make a lot of money selling an orange juice we will you just want to make a little bit where we want to sell one of orange juices so if the idea is that we only want a little bit of a new product and let's take an apathetic a product we want to I don't know how many throw-ins per half time are they gonna be in a soccer game and so you start modeling this you start putting it out some Penta picks up that your model has says complete two different wrong assumptions and and and would bet a lot of money in you that you will refine your model and so on until you go to solid market I once you have this you can roll it out over many leagues you have to do more more refining more refining but eventually you get to a stable product which now it might be something that there's a lot of crimes and draw bedding and stuffs you created a new market that clients like yeah a new product that they interested in embedding this product then would stay mainstream for the next 10-15 years interesting when you spring to this once again relationship between domain expertise and data science skills and actually made me think of have you read a book called super forecasters or super forecasting yes yes of course for the listener this is a project by Philip tetlock and colleagues and the basic idea is he found certain members of society who are better at forecasting than other other people one thing he does is kind of analyzes looks at the characteristics of these people and sees what makes some better forecasters than others and I'm just wondering Marco do you try to hire people who are super forecasters or instill this super forecasting culture in your organization or how do you think about that I'm actually paying them already I used to call these people an army of consultants because if you're super focused in my world what it means is you're actually better than our models and you can actually predict the outcome better than we can and first you know by betting making a profit and so what I'm actually doing is I'm consulting you given please see here's my prediction for this event what is your prediction and by placing a wager you're telling me your opinion which are then can incorporate in my model and can change my prices but I have to pay you the price and so all these people who are great at forecasting anything are basically working with me on a consulting basis that's awesome and I actually I just had a thought that it kind of brings this this full circle in a sense that we've moved from data analysis and data science to bookmaking to the idea of super forecasting and something that becomes very apparent in this book is that you know you don't necessarily need super special skills to be to be able to be in super forecaster but there are several key aspects such as um being less prone to confirmation bias than other people in the world right which of course is the hallmark of a great data analyst as well yeah so what are the classic training pitches that I used to give in for the longest time in cerium and I get new training recruits in and I mean these are all bright people you know they all are successful and bright and eager and obviously at the beginning of the career but and they want to make a mark you know so they're young and willing to gamble and I always tell them you know this is the strategy this is how it works and obviously you have to bring in your own feel field but if you ever start gambling with our money you know then I show them a credit card do it with your own money and if you are rich enough buy yourself an island but you're not gambling with our money we are like try to really separate for them like you you think you might know something because you're sitting on this side of the table but if you could you will sit on the other side of the table maybe you know if you would be as good as you think you are you would be a super forecaster and you can see everything so clearly then time under spent with us you know like on our end it's it's hard work it's hard analytic work every day we have to grind we have to refine our model we have to get new data it's a craft you know you have to hold it over many many years you don't just go in in the classic Vegas movies and know that the spread should be eight and a half year and the this is the total is going to be hard at 65 points that's not how it works at all every day we go back in every day we we have people smarter than us people better than us outsmarting us in our own game so we need to improve all the time so I've seen over many many years customers who actually have a lot of talent you know then become lazy and the evidence has become sloppy and at that moment they're not winning anymore because somebody else is more hungry at them and knows the numbers better than them and that's the other way yeah absolutely and the other quality of super forecasters that I just remembered it speaks to this idea of updating and doing a Bayesian update and essentially is that super forecasters of very good at updating their predictions and beliefs with respect to new data coming in as well so the way that trading works is one of the key aspects of trading is actually that the past is the past you cannot change past waitress the only way to that you can change is the next wager so what you doing is you you come up with a scenario like almost so a probability tree and you say okay I put the line here so I expect 80 put this to happen 20% this term and whatever percent maybe 0.1% there that to happen and this is basically your tree of probabilities and from your experience but if something unexpected happens then you basically have to update you with your assumption very very quickly because something out of the ordinary has happened which means all your assumptions might not be correct anymore and the politics you know hope this was okay we always used it to call the ladies of service problem you know every politician if he would ever be found with a lady of service all the prior work would put pretty almost meaningless there also be abysmal I mean obviously Donald Trump might refute this now but back in the day it was always this problem that politics are so dangerous because if there's one character flaw being revealed of a politician all the analysis before becomes completely meaningless now and then all is in the mid the odds would shift from 60 percent or 5 percent in the matter of segments and you have to account for the for this possibility then somebody gets the information ahead of you I mean we've seen it many times over the years that somebody has good information about something that's not public yet we've discussed a couple of tools and techniques from Bayesian inference to mentioning that historically you started using our move from excel to the r programming language I'm wondering for people who want to enter this type of space bookmaking sports gambling or these types of prediction challenges in general what type of tools and techniques in data science would you suggest they learn and speak to this from just general suggestions or like the type of people you want to hire as well I don't have a strong preference there yeah so I mean I mean classic what we look for is the classic our pricing license tech you know like machine learning Society thought after it doesn't really matter which framework it is some machine learning framework what we can teach the other it is a good a good thing if you actually have done some sports modeling already doesn't matter which sport you know but that you're familiar with how sports one link works conceptually some of our people do I do like tango competitions you like that kind of stuff there's a lot of different ways you can come from the same of metrics which is how score base by analytics but you don't come from different way we have people who in the past were big and polka AI and they did a lot of work on game theoretical approaches there because I mean our field is so so diverse you can actually come from a game theoretical point of view and they came here illegal trading models you can come from the spot a little gamble bases and build small analytic models and there's many different ways how you can bring in your creativity and your knowledge to get there but the classic computer science background is very very far how you thought after the alternative is a strong math background you come from the other end you're very very proficient and high level math and now you learn some coding skills and able to help or sit down with another guide to develop a profile model and you know the qualms stuffiness the coding stuff so you've mentioned our and Python is there a culture of one of these more strongly in your organization and the other so pinnacle is very heavy reliable are we much bigger than we are in Python we do Python it's not we're not a cult of are and then the sense that we feel like we need to use our we fill the table that our it gives us the best bang for our buck in many aspects we have also been a very active in the our community for a long time and speaking of conferences and you know we send people to almost every our conference and it's a community that also embraces the idea was for spreading we release data sets into the community we have worked with members of the community to improve packages that we maintain which are free and available for everybody that helped with sports betting so it's a great community I mean most people who'd like analytics like sport analytics as far as fun analytics are created in the essence the difference between analyzing sports and many other things is that sports have a finite end you know like you can analyze the game as much as you're born and then after 90 minutes or one hour or whatever you have the result on the table and now or have the next game to analyze there's always like something happening in sports which makes a very interesting and with mething you actually have a way of keeping track of your score betting is just a way of keeping track of your score how good is your model you know the better it is the more money you make absolutely and as you say there is a final result in sports right someone wins and someone loses most of the time exactly you're like yeah this is the big difference between financial trading and sports betting trading the big difference that financial trading is almost infinite right there there's not an end to the price of oil you know a commodity exists continuously while all these 40 events are discrete events you know I'm just wondering with all the new technology and you know deep learning and video analysis and that type of stuff I know that like a lot of basketball for example is captured on film and things people think about doing deep learning analysis on players movements and that type of stuff is this something you've thought about at all we think about it we haven't done OCR analysis what are the key features of us is that whatever attributes we want to use in our models needs to be available life and fast you know it doesn't help us to have a very rich data set of data data polish that we cannot get while the game is in play at a reasonable speed and reasonable for us means maximum maybe a second or two seconds slow if something is ten-second so in our world it's it's it's basically like yesterday it doesn't matter to us so our world is very fast-paced so we need to find data points that can be analyzed and it can be transferred to us on a fast pace this has increased I mean if I remember I mean if you can imagine that 10 years ago or 20 years ago in most games the only data points that you would get on the very superficial aisle everyone something like if you talk of basketball you might get rebounds as the yields points blocks the classic for but now you need you get to something like dangerous attacks a human put spots an element of of a concept on it in your and then gives gives your judgment that helps you understand the game and then so it has gotten a lot better and now with eventually biometric where we will eventually get data super fast super accurate if we can use I mean it it sounds fantastic use heart rates and basically try to see if heart rates mara and all the kind of stuff you know body temperature but analysis less perspiration on forehead you know all of these things I wonder how only how many dimensions you need in order to describe these things I think that's that's a cool question but we in terms of processing all this this data in real time I suppose we have this misconception in the cultural consciousness that arm in data science that to run like fast code and machine learning models in production they better be in Python and that's something we've been discussing on the podcast recently actually but I'm just running if you can tell me about your experience with production izing our code and efficiency which I know you're very interested in in these funds we actually spent the better we use now thinking exactly what the the question about how to production lies are effectively so if you've done a lot of work in the tradition of my advice would be to transfer the are coder to CC shop for production rising just just because of the speed but you know we found now new api's you'll be using plumber actually and in some aspects for the people who know our which is an API interface to production Eliza in small-scale testing area has been working well for us so we actually running our code in production in production environment in trading algorithm environments fantastic and I actually had someone on the podcast recently who talked about using plumber and arc Harris together that it worked very well for them here I mean it's still a little bit at the entities and some of our team members actually on the very cutting edge and working with with the guys all of these packages together to help to improve stuff but it is promising and it allows us generate even faster our models because we don't have to take this extra step of our actual rationalizing in traditionally Python was used in our word the AI learning models didn't have a good interface into our that's when you basically distinction was if you wanna data analysis you do it a bar but if you want to quote have to do it - yeah but nowadays with our being in the interface and all all the other big machine learning frameworks you don't necessary that anymore and of course you're at a studio confident in San Diego this year and JJ aleeah's cake a note about arc Harris and how we're seeing more and more of our being in interface language to these pretty-pretty the serious some packaging infrastructures was really cool that was an amazing talk I mean I also couldn't agree more with him that the way that they're pushing it is towards an open framework for everything y'all let's let's make our interface simple with everything and not let's not try to close it up and build a bin and try to do it our own because then you get it to this classic language was problem about people being in a cult you know like people should use whatever language they they want to you know said are should be able to help use all the tools that are available in all other languages exactly so you've mentioned several times in various guises how pinnacle and yourself work with the our community at large from you know giving talks in our studio come from other conferences to working with the developers on packages I'm wondering if you can just speak to how how important the sense of community in an open-source landscape is for you in your job oh that's everything to us I mean that's part of what we like to all so much as be at the very beginning remember we have problems you know with like the classic OCD be packages your this is Jim hey so yeah you have direct access to the guy who made it and you can ask him a question and if he knows that you know what I'm talking about is actually your working with you on your environment to help troubleshoot and then improve the code don't you have this concept that the developer of a package actually cares so much about you're making a bug-fix that addresses a tiny problem that might only exist in your configuration yeah but which is a buck in his code ultimately and to make it better is it's amazing so we've done we found so much work on that I know that that's why we decided eventually also to release some packages into the world to tell people hey but if you want to get into sports betting these are some brace a or how this has worked words that we used internally before that might make your life a little bit easier cool and we'll definitely link to some of those packages in them in the notes as well so interested listeners can check them out yeah we jump right back into a conversation with Marco after a short segment now we've got another segment on statistical distributions and their stories with Justin Boyce a lecturer at Cal Tech and a data camp instructor hi there Justin hey Hugo it's good to be back for another season of data friend I agree as it may have been a while since data frame listeners have heard one of these segments we may even have some new listeners this season can you give us a quick review of what you're doing with these segments sure there are many named probability distributions out there and it can be a challenge to make sense of all of them I find that it's easiest to think of the distributions in terms of the stories behind them in each segment we've introduced a distribution and a story along the way we've introduced ideas such as Bernoulli trials and Poisson processes right so can you give an example of a distribution you covered last season how about the binomial the number R of defective light bulbs in a production batch of n of them each with probability theta of being defective is binomially distributed more generally we can think of a light bulb being defective or not as a Bernoulli trial which has an outcome that can be coded as true or false or equivalently success or failure the number R of successes in n Bernoulli trials each with probability theta of success is binomial e distributed great so here we have stories plus changing around the nouns the name of the game yes with the usual caveats it's not the whole game but you can get really far by thinking about stories like this so what other distributions have we confident well we did several discrete distributions in addition to the binomial distribution we were just talking about these include the Bernoulli Poisson and geometric distributions we also did two continuous distributions the exponential and the gamma these are all used to model real-world phenomena and you can review them and the other distributions and their stories and how to use them in Python and stand at the link in the show notes okay I think we're up to speed what do you got for us today Justin well today we're going to talk about the story of the normal distribution which is also known as the Gaussian distribution I'm going to do this one a little bit differently though I'm actually going to tell you this story of how the normal distribution was discovered I love history lessons but why do some history for this one you haven't done that for any of the others well in this case I find that the history of how the distribution was discovered helps us understand the story behind the distribution sounds good let's hear it but before we do can you remind us what the normal distribution looks like the probability density function of the normal distribution is the classic bell curve we are all used to seeing a symmetric peak with tails that fall off like e to the minus x squared the field of probability began with studies of games of chance notably by the Bernoulli's Jacob Bernoulli had already identified the binomial distribution as an important tool in understanding outcomes of discrete events which games of chance often involve binomial coefficients which are necessary to compute probabilities using binomial distributions were difficult to calculate especially in the first part of the 18th century when all of this was going on Bernoulli and others were working to find ways to approximately calculate binomial coefficients and that's where de moivre min de moi found that he could approximate the binomial coefficients at least for a large end with integrals we now know are related to the cumulative distribution function of the normal distribution so you're telling me demobbed Escala the normal distribution kind of he did not understand the concept of a probability density function and he was really only looking to approximate binomial coefficients but looking back at it his result is an important one it says that the binomial distribution which is discrete can be approximated by a normal distribution which is continuous right so do we have a story for the normal distribution then yes and I like the way you use an indefinite article there a story for the normal distribution is this if a variable is binomially distributed with parameters N and theta with large N and theta not too close to 0 or 1 it is approximately normally distributed the mean of that normal distribution is n times theta and the variance is n times theta times the quantity 1 minus theta so this is a story for the normal distribution now them all there are indeed great let's have you talk about more normal stories the next time you're on the podcast I look forward to it time to get straight back into our chat with maka you mentioned earlier that um what you referred to as an army of consultants and I love I love your military metaphors and analogies in general and one of the ones I really love is that you've stated that part of your mission is to train an army of new data scientists I was predominant always predominant in the early team but your pinnacle is a data-driven organization and so we have this huge gap between the people who know our and people who don't know our and my belief was you know especially with the Thai Devers coming along that there was a path where people who are unqualified in the terms of in the terms of they never learn computer starts before they never coded they don't have math background and they're not protect no people either they they they just you know maybe in human resources maybe they work in business analysis yeah they've done every or maybe we've met people who worked in customer service for years and then we can come over the curriculum based on the tiny worst based also only mass of the taichiwus lessons that I quite enjoyed myself and basically tried to build a curriculum with the help of data camp and specifically tailored to the people and the success has been overwhelming we have now we trained over 150 people now maybe more by now we have are being used in every aspect of a company it is the smile on my face when I go around and I see somebody who I know doesn't come from this background showing me an R markdown that he just created and a report that he sends to his colleague which they're gonna discuss it's just amazing to me I mean if there's a sense of empowerment at every level of the organization is which is fantastic we have a data warehouse where people can access the data yeah we have we've made an interface which made us men she's made it very easy to get the data from the data warehouse directly into your session we're using all kinds of tools that our studio provides reusing our audio server using all the tools that they have and it is amazing to see we now have people who face the same story I believe as a 45 year old woman who worked for us for over 15 years as a customer service rep well now is a full-blown data scientist with us who has actually doing some phenomenal work and really really you wouldn't know that she would be that she was a customer service for years you would have no idea yeah that's incredible my next question you could I suppose answer in the framework of her story or other other success stories I'm wondering about the how you think about the relationship between using platforms such as such as data cam which as you said has been very successful for you and in person training and how how these two can complement each other our Stata campus was invaluable we couldn't have done this without at a camp unlawfully we need a data can what we did is we did call them their training you know definitely so we did we basically cooked a like and causes graded them in terms of difficulty and then put them together in a logical order which now I believe data can pass themselves they arrived the cold Shrek's I think Rebecca Medina didn't exist so we did it ourselves we did these tracks at the end of each track we got together in a group in a group sessions we brought up very funny problems you know we brought up the interesting problems that we found often maybe maybe a few problems are actually real pinnacle problems real pinnacle data and we did some analysis over it and we showed the people about how how efficient this can be and what I was I was trying to sell people was that our is nothing else than power Excel I tried to take nearly the fear away from being in an interface where you have to type in something I just tried to always bring it down to them in terms of Excel this is just like Excel but instead of instead of being able to work on sixty thousand data points or sixty thousand rows you can now work on few million rows without sweating a beat that's awesome and I do wonder what would how things would have been different if it had been cold power Excel that type of branding for an open source language such as so we quite fair prior to the tidy words I don't think there would be a fair effect clarification but if you if you break down the tidy verse and you breaks you break it down to deploy and ggplot and just those two I think cover 95% of all data work needs everything else is specialists in many aspects but you know deploy on ggplot that's all you need to do to do that vast majority of work so we only teach physically deep liar did you proud a mock down because the company's marked on based so we're sending our markdowns we send it to each other and so we now in this nightmare scenario where somebody sends your data and you don't know where they pulled it from you don't know what about filters they put on you have no idea now we have a markdown you can just look through the code you see exactly what they did and you can point out oh you came up with this you forgotten this in this scenario and can help them right away to do a better job absolutely so this particular case though your colleague who moved from customer support to learning a lot of our using data camp and in-person training at Pinnacle then moving to a data science role how much math did she need to pick up or statistics or machine learning all this fiber stuff because I understand using the applier and ggplot2 the classical data analyst in data science role but then there is another step above that right for sure so so we with now I believe we we now give me training on forecasting models most separate techniques you know just to get her into a different mindset so obviously she she doesn't have the traditional training so yeah but detail releases is such an important step for many company for many of our areas where they just like basic data analysis your help so she was productive I think from day one we actually put her into her own environment in the customer service field and she was writing the framework and reporting framework for the customer team because she was obviously a subject matter expert that she's been on the front end for years and years and years and now we you know a classic example is we've redone our staffing we realized that we had our some of our local speaking customer service agents working at the wrong hours maybe not will the local speaking languages they're saying like the Swedish speaking customers were having questions they were working at the wrong hours so we now were able to to mention these two data set with each other and actually optimize our scheduling to service or cribes better so direct win for our clients and direct impact from her that's incredible so you've spoken to a lot of different modeling techniques you use in forecasting including a lot of machine learning I'm wondering like we all still write a lot of code to build machine learning models and that type of stuff but speaking to your colleague you know the work she's been doing to impact the business there are a lot of people worldwide who can impact their businesses developing machine learning models but may not be able to code this is kind of a roundabout way of asking you about how you think about machine learning as a service and as a platform going forward so people in businesses able to build machine learning models without necessarily writing code it's starting fast like this for sure because because some of us build frameworks for others where they actually don't green know what's happening underneath the hood anymore but underneath old days but she learning we want to see the shift over the next year's for sure you know and it's also a good thing like just because you're driving a car does not mean you need to know how the combustion engine works that it's not a requirement and you can drive a car perfectly fine and you can do your job with it without knowing the inner workings of the combustion engine and this is what we're gonna see with machine learning as well yeah other people are gonna do it for you absolutely but of course what we need in place are checks and balances and processes so that when your car busts up it doesn't explode and kill you right I've seen data analysis coming wrong and our company many times over quite obviously they're very famous one is obviously if the subject matter expert forgets to tell the model you know that there's certain parameters which which cannot exist you know like I classical think it's like a volleyball game it's the best of three and so they cannot be a false set you know a six-step so sorry you know because it's impossible they've never exit well but mathematically you could easily forecast this right you put some density some Poisson distributions on it whatever it is and and you get a distribution for a team yeah yeah but that's a crazy thing where you have to train people proud to tell the model a little bit better what are the frames what what's the framework that you're operating in and I love that you use the term data analysis gone wrong I think we should have a segment on the podcast at some point in the future called data analysis gone wrong nightmare stories so speaking of data analysis and data science I'm wondering what one of your favorite data sciency techniques or methodologies is my favorite by myself yeah yeah just something you love to do you know in my day to day hobby because I'm actually too far detached off from is I mainly stick to the tidy with myself pull some data graph a little bit of dig deep if that's all I do when I speak to the extra crumbs you know like think the one that I always like the best are some genetic algorithms I just wanted so cute how they develop you know and then how they stumble around here I just I just loved watching watching mothers grow like this and eventually also something that for a long time it has such poor results eventually just like exploding and producing results which are far greater than you would ever expected that's really cool that we've got you mentioned genetic algorithms Bayesian inference and Bayesian updating machine learning models I presume when thinking about timeseriesforecasting you think about ARIMA models as well is there anything else that is kind of the bread and butter of building these types of models in your line of work bayesian appearance is huge for us that's probably is maybe I better buttons but many aspects just because we have a classic lack of data very classical like every game is so short for it so you have a classical you never ever have sufficient data for painting in films is very important for us that might be the biggest worm that that we use just for fun what are some current bets that you have that you're really excited about or you find cute or interesting who actually don't know like I always like the game of thrones fun that's something I always liked but we always try to put some fun bets you know it's just but it's it's fun bets just come around that some of us talk about an event and then we post some odds you're mainly sometimes to see who's right between us as well you know we were just guessing ourselves it's not it's not unheard of that there people are pending on Game of Thrones events you know in the company just because it's fun and you might not be able to answer this but who's your pick to be on the Iron Throne at the end of Game of Thrones um at the moment or whatever was Cersei heavy yeah right cool has that changed over the past several seasons or you've been being a Cersei stronghold I think so she asserted so either let's see we don't know the truth things here you know we definitely will what are we about let me I should check right now Game of Thrones I can actually tell you is the favorite please do it so that did the favor is so we have Jon Snow Daenerys Oh while bran oh wow Brendan interesting also the three favorites interesting I wonder why Tyrion is it isn't a favorite but I think we're digressing now we can have a Game of Thrones episode when it comes up so my final question Marco is do you have a final call to action for all our listeners out there something you'd like to see them do or implement moving forward in their data science careers to me is just like like teach if you are data science help your colleagues to become data scientist that can power everybody but it will make your life easier to make viola it's gonna it's gonna change their life you know just just try to teach teach teach as much as you can I couldn't agree more Marco it's been an absolute pleasure having you on the show it was a pleasure any any time you were you're here with me again I'll be a bit of fantastic all right thanks for joining our wild ride with Marco of how the role of data science in sports betting and bookmaking and how he's building an army of data scientists and democratizing data science and analytics so as a low margin high volume bookmaker a little bit like Walmart Marco says pinnacle doesn't want to make a lot of money selling orange juice I just want to make a little bit but they sell a lot of orange juice to do this they're adept at using all the data they have which may not be a lot such as when Germany played Uruguay in the World Cup but as many super forecasters do they're ruthless at updating their models and predictions in light of new evidence and Bayesian modeling is a fantastic framework for both of these things in terms of coding pinnacle is in our shop and Marco having come from the Excel world himself proselytizers the gains in efficiency performance and scalability that are allows on top of this he busted the myth once again that our models aren't really scalable in production he also told us about how he has used data camp to train over 150 colleagues to become more data fluent and spoke of one colleague who moved from customer support to data science and has already had direct impact on optimizing scheduling to serve as pinnacles clients better I can't stress enough how essential this is data fluency is becoming a skill spread more and more across organizations and not only in the hands of the few next week I'll be speaking with Reshma Shaikh a freelance data scientist and statistician who works in python r & sass Reshma is also an organizer of the meetup groups women in machine learning and data science otherwise known as women TS & play ladies she's organized wildy s for 4 years and is a board member we'll discuss her work at window D s and what you our listeners can do to support and promote women and gender minorities in data science we'll also delve into why women are flourishing in the our community but lagging in Python and discuss more generally how num focus thinks about diversity and inclusion including their code of conduct all this and more I'm your host Hugo Bound Anderson you can follow data camp on twitter at data camp and me at sugar bound you can find all our episodes in show notes at data camp comm / community / podcastthis week I'll be speaking with marco bloom trading director of Pinnacle Sports Marco and I will talk about the role of data science in large-scale bookmaking how Marco is training an army of data scientists and much more at Pinnacle Marco uses tight risk management built on cutting-edge models to provide bets not only on sports but on questions such as who will be the next pope who will be the world hot dog eating champion who will land on Mars first and who will be on the Iron Throne at the end of Game of Thrones will discuss the relations between risk management and uncertainty how great forecasters are necessarily good at updating their predictions in the light of new data and evidence how you can model this using Bayesian inference and the future of biometric sensing in sports betting and as always much much more for the record we recorded this conversation in December 2018 welcome to data framed the weekly data camp podcast exploring what data science looks like on the ground for working data scientists and what problems are consult I am your host Hugo Bound Anderson you can follow data camp on twitter at data camp and me as you go down you can find all our episodes and show notes at data camp comm slash community slash podcast this is data frayed hi there Marco and welcome to data framed oh hi thanks for having me real pleasure to have you on the show and I'm actually really excited to have you here today to talk about sports betting how data science plays a huge role in what you do as trading director at Pinnacle with respect to sports betting and also the fact that sports betting as in your line of work doesn't only allude to sports but that at pinnacle you do lots of different types types of bets I'm really excited about getting into the weeds there but before we get to all of that I want to find out a bit about you and so I'm wondering first what your colleagues would say that you do risk management I think I think that's the probably the best assessment now I'm responsible for managing all the risks that is associated with a Chester a pineco overall sports live free life any aspect of the betting III I manage the risk of in occur fantastic and do you think your colleagues as you do so much quantitative stuff they have an awareness of kind of the ins and outs of your daily life or today do they think it's all let's say textures and whiteboards or pen and paper or writing code and building models black box for them I'm you know the other day I mean most of them have different different areas of expertise and inner workings of the trade floor just too complex and too specific these days but I think that's true for most areas if you deep down look and said how much do you actually know about other areas anymore so I would say the day-to-day is probably unknown for them but one extra day to day job is yeah and I think you're right that judo increasing specialization across so many disciplines that it is things do become more more blackbox as we head down that path so maybe we can step back a bit and you can just tell me a bit about what pinnacle actually does and this is 2018 is our 20th anniversary we are one of the largest bookmakers in the world and we are known for being a very efficient bookmaker in terms of pricing we are considered some people compare us to the Nasdaq of prices meaning that the traditional bookmakers that people know and heard of are usually more their recreational field offer of work and vinegar is actually a true bookmaker that means we have very low very high limits the website is not so flashy but we have an API that people can interact with we are like a real true bookmaker trying to cognitive analysis of sponsor events and other events and a lot of people to build models against us and and place wagers with us and so then as trading director what does your day look like what are the ins and outs of your actual job I mean it largely depends on on on season so a sponsor so obviously very seasoned owner you have your big events like this summer we have the World Cup will change its my job dramatically but overall day to day would be sitting down with my managers maybe going over the week or over the month discussing some plans about some products that we want to roll out discussing some models that we need to test discussing some of the new strategies we want to try and overall it's like a constant strive to improve our product and obviously to analysis about things that we tried that didn't go so well there's a bread about a hot day today so how did you get into data science initially by sheer force so I was always like math garden and but I was no I wasn't data science and once we started building our fun team out you know our consulate before we were used Excel for everything and then the cons started using our and you know the recording in our and I quickly picked up that the level of efficiency gain they had over me was was order of magnitudes they could analyze data so easily there was unaccessible to me just because of the natural restrictions of our Excel and so I started at the Coursera course if my lectures there and I'm solid coding our and then pretty soon it became a bread and butter to a tool for me I couldn't actually believe that until I didn't have these skills have before and did my job and which Coursera course was it that you took so it's the very first one it was the very first cause I think it's actually the data science Trek there's for Michael so let's Roger Payne and Jeff Lee yeah roughing degree exactly that's the original first course I talk I'm in exactly the same position I actually I spoke with Roger about this one on this podcast that I was actually in one of their first cohorts and maybe you were to around 2012-2013 something like that yeah baby around that time for sure yeah it could have been a tough because I I didn't come from coding so for me with this is brand-new I thought was a really tough course for me I actually struggled Colorado and though the thing is I knew I had enough expertise of my team that the answers were available to me if I if I had a question so on and I knew exactly what I wanted to achieve so I had a very clear goal in my mind you know what do I want to achieve like I wanted to interact with our data directly I want to I want to access our our database directly and do analysis over it without the need for to ask somebody for data poor and then the state of your hands have some some missing columns or missing attributes and you transform again and I need to give to the analyst team I just wanted to reduce the red tape and be and be able to be self-sufficient so some people might have a question revolving around like the Venn diagram of data science and sports betting and I'm wondering historically up until now like what the role has been of analytics and data science for bookmakers in bookmaking you ever you have a few leftover of data analysis which makes it really interesting you have the classical sports analytics how does a spot work and save my metrics for the people who know baseball was leading in many aspects but you know sabermetrics ideas and concepts are now almost existed in every other sport especially soccer you know football for the Europeans there's a high level of error system right now but this is all all the field that surrounds the sport data analysis and but since we are trading house and we actually have a ticket flow coming in and out and so we also have the tradition of an analysis or risk management assessment and basic game theory strategies and all of that stuff in addition so we have a very nice overlap between those two words and then and and have to manage both separately and then meshing together eventually which is often the haha I'm sure so when I came into this Aaron who our first conversation earlier this year I was under the misapprehension that sports betting was really only about sports and you up in my eyes are pinnacle you do all types of bets so I thought maybe you could run us through a few of the more interesting to your mind types of bits you can make inside and outside the sports speiser pinnacle you can bet on literally every single spot that you could possibly imagine and this includes stars and chests and and anything you see obviously in eSports your videos for it's very popular with us but you also have politics politics is a big benningfield you have a few of the more exotic and fun stuff you know since you're recording this from from New York I believe we do Nathan's hot dog eating contest since he I know exactly when when Kobayashi it was was not on top of his game anymore I remember that and we do the Pope election was a fun one no no not on you know that was very interesting to try to price up the Pope the Pope election so it's almost any event in the world here or the year you could even go as far and and do stuff about Game of Thrones so we have a game of Thrones proper who wakes it up the Iron Throne at the end of the season Oscar betting going grow betting you name it literally any event you could possibly think of that's incredible and I am of course I don't want you to give up any of your I pay here and of course you want but I'm wondering like let's take you know hot dog eating contest or Game of Thrones or who will be the next pope I'm wondering how you even I mean you have the technical skills but in terms of domain expertise I don't suppose this hero obviously is your expertise so I mean I mean let's talk about the Pope so so so out of the work we're each columns about populist writers what do people believe to be the truth and then we price according to this we don't have any inside information we don't know anything about it but we read up a little bit we try to prize as as good as we as we can and then you let over market efficiency boost off the crowd your effects shape the price Game of Thrones of the say we ever defense ourselves so we speculate ourselves and but we don't know we don't have an insight we read a bit on we don't know drug george RR martin personally or anybody its guessing its but to me coming to be fair frank these are entertainment props to bet on they are not in comparison like order let's say on a work have came in smoker you can bet up to $500,000 million dollars with us without even questioning and all these kind of props the limits are low maybe a thousand dollars maybe five not dollars so there is a difference between the level of scrutiny that goes into pricing one or the other absolutely so in terms of pricing them I suppose can you talk us through the process from go to whoa in the sense that I presume you you know you have some model which ends up with a problem the distribution or probably mass functional density function with respect to outcome and then you price according to those distributions but maybe you can spell that out in a particular example it really depends it really really depends so I mean yes we obviously have have exactly what you just said some aspect it might just be market prices so the market has a price already but I might mean by that it's like if you would like to open and exchange the trade separate stocks you wouldn't need to do LD ever listen to yourself I mean ever stops is traded at many many exchanges so you have an idea what the price should be and that's the same in sports betting many many bookmakers exist and they work well connected but especially when you talk about a life game you know that we have tons of models running so you feed the model terms of inputs and then the crunches the numbers and spits out something in this several layers of models and all kinds of AI and machine learning elements and it's very sophisticated depending on the sport and depending on how much betting there's thunder in the sport the more betting is done the most sophisticated we have to be because the most of physical people on the other as well so I think once again this speaks to your job of essentially managing risk and I can can you just say a few more words about like what risk management or managing risk in general amounts to view or looks like or how you think about it yeah I mean probably a risk is obvious like even if how do you maximize equity or over probability space meaning even if you have a coin flip both could 50/50 you know you don't gain equity there but you know I hope maybe somebody tastes a little bit more and then but you also now you know would lose a little bit of money how do you how do you hence yourself against this risk can you take it are you willing to take it what happens if you lose is does it have an impact on the financial bottom line the are you exposing yourselves and all these kind of questions right how do we think long term about managing our book you know where we are we're a big company or like everything else has to stay afloat and there's a lot of regulations and in it so you have to be very carefully managing your wrist probably and and then try to balance the book at some aspects there's not always easy to it able to balance the book here and the the way the betting works is often based on news you know somebody is injured and if you're not on top of that and then you don't pick up trading frequency very very quickly you just get overrun by wages and then you Expo in a very unfavorable scenario but that makes perfect sense and how does this idea of risk relate to uncertainty in general uncertainty is there's a few level of uncertainty I mean obviously you have an inherent uncertainty because it's a sport event which is a non perfect environment so you never know exactly what other parameters that matter but you also have uncertainty variance meaning some a some events are naturally just more volatile and less known as other guess and other events quite clearly it has to do with the historic data available if two people had a card you know did the same competition in 100 days in a row you're very very very strong data that you eventually picks up on percentages or the other and then you have events like for example a Soccer World Cup where Germany plays against Uruguay which has not happened ever basically in the sense that these exact teams have never played against each other at the end of a at the end of the NBA season all the teams that played against each other many many times over so you get a very good idea about the relative strengths of San Antonio Spurs so the gold Statesboro to the Cleveland Cavaliers even though maybe they could have capsuled the Spurs have only been paired off two or three times but because of cross relationships you have a very good idea how the south of stripes is but Germany playing Uruguay you actually have no idea I mean Bolton's at meta but German Europe I was 20 years ago but none of the players on the pitch and he won the game has changed so you have a lot of different kinds of uncertainty in this in the gaming world that's interesting so sounds like there's a distinction between kind of uncertainty that you can quantify so that would be risk and uncertainty that you just don't know a lot about the situation so you can't size so much exactly yeah you have the no known and the known unknown and the unknown unknown it's very tricky I mean especially if these peak events are very very tricky and then you notice it I mean baseball this is probably a prime example I mean with a 180 games for season at the end of the season you have a very good idea what the strength is but all bookmakers in the beginning of of the season you will notice that the lines are much more volatile bookmakers including us you know much much more careful and taking on risk because we don't believe that the underlying odds are as certain as you wish they would be but at the end of the season if you want to bet against our we actually are much less willing to adjust our probabilities based on betting behaviors and are willing to accept much more risk just because the certainty has grown so significantly all right so you're saying then that when there's more uncertainty when you have less data less knowledge about the space your lines will be more responsive to betting behavior absolutely a wager that could move a line at the beginning of the season 3% might move the line 0% or 10.1% at the end of the season absolutely you know just because the certainty is there eventually you know this is the price everybody has spoken the entire world and has placed a wager we know that price we're willing to take again but yeah yeah and I don't know whether you think about this in framework in this framework here's stuff like this I think about in an almost a Bayesian sense that you have some sort of prior knowledge about about the space and then once you start getting more and more data you can update whatever you're interested in and get a more precise estimate as you keep updating essentially they're basically thinking this is predominant in our world like almost everything is we do from a Bayesian point of reference yeah Fantana second you know I think as I said this idea of updating like that's what came to my mind when you understand like if if you price anything up you know any any event like you see two people on the street and you mentioned that I'll do a hundred meter dash the initial price might be 50 50 or something and then you see one guy actually is on crutches suddenly the line moves thirty percent right or 40 percent now it's ninety ten but then you see he throws the crutches away and then you're like and you're on an omelet but eventually you have a pretty good idea about okay now now I have no ass dude this guy's all my way this guy looks fit and you're pretty sure about your prize if somebody tells you it's the it's the complete opposite you might not believe it anymore and so and not believing means that our laying in our language we were willing to take a lot of risk before we actually get moved over to the new price we jump right back into our interview with marco bloom after a short segment now it's time for a segment called data signs best practices I'm here with Ben's cranker an independent data science consultant hi Ben hi whew go it's great to be back on the show so what do you do when you have a bug in your code maybe you've tried unit testing but you still can't figure it out try it so a first approximation I think is to include some print statements in your code to see what's actually happening in there when you execute it actually that's a very common approach we are all guilty of sprinkling a lot of print statements throughout our code to find a problem in general good programmers are lazy but that is not the case here first you will add more and more print statements as you track your bug down these print statements make your code harder to read and you have to find and remove all of them later which is a waste of time I cannot deal with any of that and it could be even worse than a distributed system when I was in grad school we had a visiting scholar who was debugging a Fortran program which ran in parallel to find a bug he instrumented his code to log information to a file he didn't really think this through because he was running at scale he filled the head nodes disk and crashed the entire cluster I was delighted when his visit was over and I could run jobs again sometimes print statements can be really bad that sounds like the stuff of nightmares Ben tell me what the alternative is it turns out our engineering friends invented a magic tool long ago called a debugger a debugger allows you to run your code a line at a time to see how it executes you can examine how variables change as your code runs debuggers have pretty much the same interface for every language whether it is python r c++ java matlab or something else once you learn one debugger you can operate any other and Python use PDB in R the debugger in our studio is a great place to start if you are using an IDE it should have a debugger and for those like me who love ipython there's also I paid a B right so often when someone asks me to help find a bug I always ask if they have tried using the debugger invariably the answer is no I don't understand why so many people resist learning to use a debugger or actually using it if they already know how it is not that complicated and you all to master about four commands to use it being a professional means learning how to debug it'll make you more productive and more self-sufficient so how can our listeners get started with debugging start by figuring out how to start the debugger for the language and platform you are using for example in Python run with the - PDB flag which will put you in the debugger when an error occurs or import PDB and call set trace at the line where you want to start debugging and then then use the debugger to step through the code and examine your variables and call stack these commands are identical for any debugger use step to advance one line of code including into function calls next to skip over function calls finish to complete execution of the current function and continue to resume execution right so to recap you use step to advance next to skip function calls finish to complete execution of current function and continue to resume you got it HPA so Ben how do you look at variables in Python it is easy because you still have access to an interpreter to examine any variable or execute any Python code just type code into the interpreter like you normally would anything else that's pretty much it other than if you want to see a local variable or function argument you need to position the debugger to the correct frame in the function call stack but you will have to read the fine manual for that learning to use the debugger will help you quickly find in fix bugs allowing more time to focus on fun stuff like doing science thanks Ben for that delightful introduction into debugging your data science code after that interlude it's time to jump back into our chat with markup so we've spoken around this but as we've cited as director as trading director you think about everything from hour and day to odds making to everything related to markets I'm just wondering if you could speak to how all these different aspects of your job are related perhaps speaking through the lens of a particular real or hypothetical project one of our goals is to to improve our models high accuracy in our models but also open new betting opportunities up to our clients and more interesting betting options allow people to then you know hone their own models and first give us liquidity and then the Machine gets rolling right there and the other day when we are we are a very low margin high-volume bookmaker I believe it like Walmart right we don't want to make a lot of money selling an orange juice we will you just want to make a little bit where we want to sell one of orange juices so if the idea is that we only want a little bit of a new product and let's take an apathetic a product we want to I don't know how many throw-ins per half time are they gonna be in a soccer game and so you start modeling this you start putting it out some Penta picks up that your model has says complete two different wrong assumptions and and and would bet a lot of money in you that you will refine your model and so on until you go to solid market I once you have this you can roll it out over many leagues you have to do more more refining more refining but eventually you get to a stable product which now it might be something that there's a lot of crimes and draw bedding and stuffs you created a new market that clients like yeah a new product that they interested in embedding this product then would stay mainstream for the next 10-15 years interesting when you spring to this once again relationship between domain expertise and data science skills and actually made me think of have you read a book called super forecasters or super forecasting yes yes of course for the listener this is a project by Philip tetlock and colleagues and the basic idea is he found certain members of society who are better at forecasting than other other people one thing he does is kind of analyzes looks at the characteristics of these people and sees what makes some better forecasters than others and I'm just wondering Marco do you try to hire people who are super forecasters or instill this super forecasting culture in your organization or how do you think about that I'm actually paying them already I used to call these people an army of consultants because if you're super focused in my world what it means is you're actually better than our models and you can actually predict the outcome better than we can and first you know by betting making a profit and so what I'm actually doing is I'm consulting you given please see here's my prediction for this event what is your prediction and by placing a wager you're telling me your opinion which are then can incorporate in my model and can change my prices but I have to pay you the price and so all these people who are great at forecasting anything are basically working with me on a consulting basis that's awesome and I actually I just had a thought that it kind of brings this this full circle in a sense that we've moved from data analysis and data science to bookmaking to the idea of super forecasting and something that becomes very apparent in this book is that you know you don't necessarily need super special skills to be to be able to be in super forecaster but there are several key aspects such as um being less prone to confirmation bias than other people in the world right which of course is the hallmark of a great data analyst as well yeah so what are the classic training pitches that I used to give in for the longest time in cerium and I get new training recruits in and I mean these are all bright people you know they all are successful and bright and eager and obviously at the beginning of the career but and they want to make a mark you know so they're young and willing to gamble and I always tell them you know this is the strategy this is how it works and obviously you have to bring in your own feel field but if you ever start gambling with our money you know then I show them a credit card do it with your own money and if you are rich enough buy yourself an island but you're not gambling with our money we are like try to really separate for them like you you think you might know something because you're sitting on this side of the table but if you could you will sit on the other side of the table maybe you know if you would be as good as you think you are you would be a super forecaster and you can see everything so clearly then time under spent with us you know like on our end it's it's hard work it's hard analytic work every day we have to grind we have to refine our model we have to get new data it's a craft you know you have to hold it over many many years you don't just go in in the classic Vegas movies and know that the spread should be eight and a half year and the this is the total is going to be hard at 65 points that's not how it works at all every day we go back in every day we we have people smarter than us people better than us outsmarting us in our own game so we need to improve all the time so I've seen over many many years customers who actually have a lot of talent you know then become lazy and the evidence has become sloppy and at that moment they're not winning anymore because somebody else is more hungry at them and knows the numbers better than them and that's the other way yeah absolutely and the other quality of super forecasters that I just remembered it speaks to this idea of updating and doing a Bayesian update and essentially is that super forecasters of very good at updating their predictions and beliefs with respect to new data coming in as well so the way that trading works is one of the key aspects of trading is actually that the past is the past you cannot change past waitress the only way to that you can change is the next wager so what you doing is you you come up with a scenario like almost so a probability tree and you say okay I put the line here so I expect 80 put this to happen 20% this term and whatever percent maybe 0.1% there that to happen and this is basically your tree of probabilities and from your experience but if something unexpected happens then you basically have to update you with your assumption very very quickly because something out of the ordinary has happened which means all your assumptions might not be correct anymore and the politics you know hope this was okay we always used it to call the ladies of service problem you know every politician if he would ever be found with a lady of service all the prior work would put pretty almost meaningless there also be abysmal I mean obviously Donald Trump might refute this now but back in the day it was always this problem that politics are so dangerous because if there's one character flaw being revealed of a politician all the analysis before becomes completely meaningless now and then all is in the mid the odds would shift from 60 percent or 5 percent in the matter of segments and you have to account for the for this possibility then somebody gets the information ahead of you I mean we've seen it many times over the years that somebody has good information about something that's not public yet we've discussed a couple of tools and techniques from Bayesian inference to mentioning that historically you started using our move from excel to the r programming language I'm wondering for people who want to enter this type of space bookmaking sports gambling or these types of prediction challenges in general what type of tools and techniques in data science would you suggest they learn and speak to this from just general suggestions or like the type of people you want to hire as well I don't have a strong preference there yeah so I mean I mean classic what we look for is the classic our pricing license tech you know like machine learning Society thought after it doesn't really matter which framework it is some machine learning framework what we can teach the other it is a good a good thing if you actually have done some sports modeling already doesn't matter which sport you know but that you're familiar with how sports one link works conceptually some of our people do I do like tango competitions you like that kind of stuff there's a lot of different ways you can come from the same of metrics which is how score base by analytics but you don't come from different way we have people who in the past were big and polka AI and they did a lot of work on game theoretical approaches there because I mean our field is so so diverse you can actually come from a game theoretical point of view and they came here illegal trading models you can come from the spot a little gamble bases and build small analytic models and there's many different ways how you can bring in your creativity and your knowledge to get there but the classic computer science background is very very far how you thought after the alternative is a strong math background you come from the other end you're very very proficient and high level math and now you learn some coding skills and able to help or sit down with another guide to develop a profile model and you know the qualms stuffiness the coding stuff so you've mentioned our and Python is there a culture of one of these more strongly in your organization and the other so pinnacle is very heavy reliable are we much bigger than we are in Python we do Python it's not we're not a cult of are and then the sense that we feel like we need to use our we fill the table that our it gives us the best bang for our buck in many aspects we have also been a very active in the our community for a long time and speaking of conferences and you know we send people to almost every our conference and it's a community that also embraces the idea was for spreading we release data sets into the community we have worked with members of the community to improve packages that we maintain which are free and available for everybody that helped with sports betting so it's a great community I mean most people who'd like analytics like sport analytics as far as fun analytics are created in the essence the difference between analyzing sports and many other things is that sports have a finite end you know like you can analyze the game as much as you're born and then after 90 minutes or one hour or whatever you have the result on the table and now or have the next game to analyze there's always like something happening in sports which makes a very interesting and with mething you actually have a way of keeping track of your score betting is just a way of keeping track of your score how good is your model you know the better it is the more money you make absolutely and as you say there is a final result in sports right someone wins and someone loses most of the time exactly you're like yeah this is the big difference between financial trading and sports betting trading the big difference that financial trading is almost infinite right there there's not an end to the price of oil you know a commodity exists continuously while all these 40 events are discrete events you know I'm just wondering with all the new technology and you know deep learning and video analysis and that type of stuff I know that like a lot of basketball for example is captured on film and things people think about doing deep learning analysis on players movements and that type of stuff is this something you've thought about at all we think about it we haven't done OCR analysis what are the key features of us is that whatever attributes we want to use in our models needs to be available life and fast you know it doesn't help us to have a very rich data set of data data polish that we cannot get while the game is in play at a reasonable speed and reasonable for us means maximum maybe a second or two seconds slow if something is ten-second so in our world it's it's it's basically like yesterday it doesn't matter to us so our world is very fast-paced so we need to find data points that can be analyzed and it can be transferred to us on a fast pace this has increased I mean if I remember I mean if you can imagine that 10 years ago or 20 years ago in most games the only data points that you would get on the very superficial aisle everyone something like if you talk of basketball you might get rebounds as the yields points blocks the classic for but now you need you get to something like dangerous attacks a human put spots an element of of a concept on it in your and then gives gives your judgment that helps you understand the game and then so it has gotten a lot better and now with eventually biometric where we will eventually get data super fast super accurate if we can use I mean it it sounds fantastic use heart rates and basically try to see if heart rates mara and all the kind of stuff you know body temperature but analysis less perspiration on forehead you know all of these things I wonder how only how many dimensions you need in order to describe these things I think that's that's a cool question but we in terms of processing all this this data in real time I suppose we have this misconception in the cultural consciousness that arm in data science that to run like fast code and machine learning models in production they better be in Python and that's something we've been discussing on the podcast recently actually but I'm just running if you can tell me about your experience with production izing our code and efficiency which I know you're very interested in in these funds we actually spent the better we use now thinking exactly what the the question about how to production lies are effectively so if you've done a lot of work in the tradition of my advice would be to transfer the are coder to CC shop for production rising just just because of the speed but you know we found now new api's you'll be using plumber actually and in some aspects for the people who know our which is an API interface to production Eliza in small-scale testing area has been working well for us so we actually running our code in production in production environment in trading algorithm environments fantastic and I actually had someone on the podcast recently who talked about using plumber and arc Harris together that it worked very well for them here I mean it's still a little bit at the entities and some of our team members actually on the very cutting edge and working with with the guys all of these packages together to help to improve stuff but it is promising and it allows us generate even faster our models because we don't have to take this extra step of our actual rationalizing in traditionally Python was used in our word the AI learning models didn't have a good interface into our that's when you basically distinction was if you wanna data analysis you do it a bar but if you want to quote have to do it - yeah but nowadays with our being in the interface and all all the other big machine learning frameworks you don't necessary that anymore and of course you're at a studio confident in San Diego this year and JJ aleeah's cake a note about arc Harris and how we're seeing more and more of our being in interface language to these pretty-pretty the serious some packaging infrastructures was really cool that was an amazing talk I mean I also couldn't agree more with him that the way that they're pushing it is towards an open framework for everything y'all let's let's make our interface simple with everything and not let's not try to close it up and build a bin and try to do it our own because then you get it to this classic language was problem about people being in a cult you know like people should use whatever language they they want to you know said are should be able to help use all the tools that are available in all other languages exactly so you've mentioned several times in various guises how pinnacle and yourself work with the our community at large from you know giving talks in our studio come from other conferences to working with the developers on packages I'm wondering if you can just speak to how how important the sense of community in an open-source landscape is for you in your job oh that's everything to us I mean that's part of what we like to all so much as be at the very beginning remember we have problems you know with like the classic OCD be packages your this is Jim hey so yeah you have direct access to the guy who made it and you can ask him a question and if he knows that you know what I'm talking about is actually your working with you on your environment to help troubleshoot and then improve the code don't you have this concept that the developer of a package actually cares so much about you're making a bug-fix that addresses a tiny problem that might only exist in your configuration yeah but which is a buck in his code ultimately and to make it better is it's amazing so we've done we found so much work on that I know that that's why we decided eventually also to release some packages into the world to tell people hey but if you want to get into sports betting these are some brace a or how this has worked words that we used internally before that might make your life a little bit easier cool and we'll definitely link to some of those packages in them in the notes as well so interested listeners can check them out yeah we jump right back into a conversation with Marco after a short segment now we've got another segment on statistical distributions and their stories with Justin Boyce a lecturer at Cal Tech and a data camp instructor hi there Justin hey Hugo it's good to be back for another season of data friend I agree as it may have been a while since data frame listeners have heard one of these segments we may even have some new listeners this season can you give us a quick review of what you're doing with these segments sure there are many named probability distributions out there and it can be a challenge to make sense of all of them I find that it's easiest to think of the distributions in terms of the stories behind them in each segment we've introduced a distribution and a story along the way we've introduced ideas such as Bernoulli trials and Poisson processes right so can you give an example of a distribution you covered last season how about the binomial the number R of defective light bulbs in a production batch of n of them each with probability theta of being defective is binomially distributed more generally we can think of a light bulb being defective or not as a Bernoulli trial which has an outcome that can be coded as true or false or equivalently success or failure the number R of successes in n Bernoulli trials each with probability theta of success is binomial e distributed great so here we have stories plus changing around the nouns the name of the game yes with the usual caveats it's not the whole game but you can get really far by thinking about stories like this so what other distributions have we confident well we did several discrete distributions in addition to the binomial distribution we were just talking about these include the Bernoulli Poisson and geometric distributions we also did two continuous distributions the exponential and the gamma these are all used to model real-world phenomena and you can review them and the other distributions and their stories and how to use them in Python and stand at the link in the show notes okay I think we're up to speed what do you got for us today Justin well today we're going to talk about the story of the normal distribution which is also known as the Gaussian distribution I'm going to do this one a little bit differently though I'm actually going to tell you this story of how the normal distribution was discovered I love history lessons but why do some history for this one you haven't done that for any of the others well in this case I find that the history of how the distribution was discovered helps us understand the story behind the distribution sounds good let's hear it but before we do can you remind us what the normal distribution looks like the probability density function of the normal distribution is the classic bell curve we are all used to seeing a symmetric peak with tails that fall off like e to the minus x squared the field of probability began with studies of games of chance notably by the Bernoulli's Jacob Bernoulli had already identified the binomial distribution as an important tool in understanding outcomes of discrete events which games of chance often involve binomial coefficients which are necessary to compute probabilities using binomial distributions were difficult to calculate especially in the first part of the 18th century when all of this was going on Bernoulli and others were working to find ways to approximately calculate binomial coefficients and that's where de moivre min de moi found that he could approximate the binomial coefficients at least for a large end with integrals we now know are related to the cumulative distribution function of the normal distribution so you're telling me demobbed Escala the normal distribution kind of he did not understand the concept of a probability density function and he was really only looking to approximate binomial coefficients but looking back at it his result is an important one it says that the binomial distribution which is discrete can be approximated by a normal distribution which is continuous right so do we have a story for the normal distribution then yes and I like the way you use an indefinite article there a story for the normal distribution is this if a variable is binomially distributed with parameters N and theta with large N and theta not too close to 0 or 1 it is approximately normally distributed the mean of that normal distribution is n times theta and the variance is n times theta times the quantity 1 minus theta so this is a story for the normal distribution now them all there are indeed great let's have you talk about more normal stories the next time you're on the podcast I look forward to it time to get straight back into our chat with maka you mentioned earlier that um what you referred to as an army of consultants and I love I love your military metaphors and analogies in general and one of the ones I really love is that you've stated that part of your mission is to train an army of new data scientists I was predominant always predominant in the early team but your pinnacle is a data-driven organization and so we have this huge gap between the people who know our and people who don't know our and my belief was you know especially with the Thai Devers coming along that there was a path where people who are unqualified in the terms of in the terms of they never learn computer starts before they never coded they don't have math background and they're not protect no people either they they they just you know maybe in human resources maybe they work in business analysis yeah they've done every or maybe we've met people who worked in customer service for years and then we can come over the curriculum based on the tiny worst based also only mass of the taichiwus lessons that I quite enjoyed myself and basically tried to build a curriculum with the help of data camp and specifically tailored to the people and the success has been overwhelming we have now we trained over 150 people now maybe more by now we have are being used in every aspect of a company it is the smile on my face when I go around and I see somebody who I know doesn't come from this background showing me an R markdown that he just created and a report that he sends to his colleague which they're gonna discuss it's just amazing to me I mean if there's a sense of empowerment at every level of the organization is which is fantastic we have a data warehouse where people can access the data yeah we have we've made an interface which made us men she's made it very easy to get the data from the data warehouse directly into your session we're using all kinds of tools that our studio provides reusing our audio server using all the tools that they have and it is amazing to see we now have people who face the same story I believe as a 45 year old woman who worked for us for over 15 years as a customer service rep well now is a full-blown data scientist with us who has actually doing some phenomenal work and really really you wouldn't know that she would be that she was a customer service for years you would have no idea yeah that's incredible my next question you could I suppose answer in the framework of her story or other other success stories I'm wondering about the how you think about the relationship between using platforms such as such as data cam which as you said has been very successful for you and in person training and how how these two can complement each other our Stata campus was invaluable we couldn't have done this without at a camp unlawfully we need a data can what we did is we did call them their training you know definitely so we did we basically cooked a like and causes graded them in terms of difficulty and then put them together in a logical order which now I believe data can pass themselves they arrived the cold Shrek's I think Rebecca Medina didn't exist so we did it ourselves we did these tracks at the end of each track we got together in a group in a group sessions we brought up very funny problems you know we brought up the interesting problems that we found often maybe maybe a few problems are actually real pinnacle problems real pinnacle data and we did some analysis over it and we showed the people about how how efficient this can be and what I was I was trying to sell people was that our is nothing else than power Excel I tried to take nearly the fear away from being in an interface where you have to type in something I just tried to always bring it down to them in terms of Excel this is just like Excel but instead of instead of being able to work on sixty thousand data points or sixty thousand rows you can now work on few million rows without sweating a beat that's awesome and I do wonder what would how things would have been different if it had been cold power Excel that type of branding for an open source language such as so we quite fair prior to the tidy words I don't think there would be a fair effect clarification but if you if you break down the tidy verse and you breaks you break it down to deploy and ggplot and just those two I think cover 95% of all data work needs everything else is specialists in many aspects but you know deploy on ggplot that's all you need to do to do that vast majority of work so we only teach physically deep liar did you proud a mock down because the company's marked on based so we're sending our markdowns we send it to each other and so we now in this nightmare scenario where somebody sends your data and you don't know where they pulled it from you don't know what about filters they put on you have no idea now we have a markdown you can just look through the code you see exactly what they did and you can point out oh you came up with this you forgotten this in this scenario and can help them right away to do a better job absolutely so this particular case though your colleague who moved from customer support to learning a lot of our using data camp and in-person training at Pinnacle then moving to a data science role how much math did she need to pick up or statistics or machine learning all this fiber stuff because I understand using the applier and ggplot2 the classical data analyst in data science role but then there is another step above that right for sure so so we with now I believe we we now give me training on forecasting models most separate techniques you know just to get her into a different mindset so obviously she she doesn't have the traditional training so yeah but detail releases is such an important step for many company for many of our areas where they just like basic data analysis your help so she was productive I think from day one we actually put her into her own environment in the customer service field and she was writing the framework and reporting framework for the customer team because she was obviously a subject matter expert that she's been on the front end for years and years and years and now we you know a classic example is we've redone our staffing we realized that we had our some of our local speaking customer service agents working at the wrong hours maybe not will the local speaking languages they're saying like the Swedish speaking customers were having questions they were working at the wrong hours so we now were able to to mention these two data set with each other and actually optimize our scheduling to service or cribes better so direct win for our clients and direct impact from her that's incredible so you've spoken to a lot of different modeling techniques you use in forecasting including a lot of machine learning I'm wondering like we all still write a lot of code to build machine learning models and that type of stuff but speaking to your colleague you know the work she's been doing to impact the business there are a lot of people worldwide who can impact their businesses developing machine learning models but may not be able to code this is kind of a roundabout way of asking you about how you think about machine learning as a service and as a platform going forward so people in businesses able to build machine learning models without necessarily writing code it's starting fast like this for sure because because some of us build frameworks for others where they actually don't green know what's happening underneath the hood anymore but underneath old days but she learning we want to see the shift over the next year's for sure you know and it's also a good thing like just because you're driving a car does not mean you need to know how the combustion engine works that it's not a requirement and you can drive a car perfectly fine and you can do your job with it without knowing the inner workings of the combustion engine and this is what we're gonna see with machine learning as well yeah other people are gonna do it for you absolutely but of course what we need in place are checks and balances and processes so that when your car busts up it doesn't explode and kill you right I've seen data analysis coming wrong and our company many times over quite obviously they're very famous one is obviously if the subject matter expert forgets to tell the model you know that there's certain parameters which which cannot exist you know like I classical think it's like a volleyball game it's the best of three and so they cannot be a false set you know a six-step so sorry you know because it's impossible they've never exit well but mathematically you could easily forecast this right you put some density some Poisson distributions on it whatever it is and and you get a distribution for a team yeah yeah but that's a crazy thing where you have to train people proud to tell the model a little bit better what are the frames what what's the framework that you're operating in and I love that you use the term data analysis gone wrong I think we should have a segment on the podcast at some point in the future called data analysis gone wrong nightmare stories so speaking of data analysis and data science I'm wondering what one of your favorite data sciency techniques or methodologies is my favorite by myself yeah yeah just something you love to do you know in my day to day hobby because I'm actually too far detached off from is I mainly stick to the tidy with myself pull some data graph a little bit of dig deep if that's all I do when I speak to the extra crumbs you know like think the one that I always like the best are some genetic algorithms I just wanted so cute how they develop you know and then how they stumble around here I just I just loved watching watching mothers grow like this and eventually also something that for a long time it has such poor results eventually just like exploding and producing results which are far greater than you would ever expected that's really cool that we've got you mentioned genetic algorithms Bayesian inference and Bayesian updating machine learning models I presume when thinking about timeseriesforecasting you think about ARIMA models as well is there anything else that is kind of the bread and butter of building these types of models in your line of work bayesian appearance is huge for us that's probably is maybe I better buttons but many aspects just because we have a classic lack of data very classical like every game is so short for it so you have a classical you never ever have sufficient data for painting in films is very important for us that might be the biggest worm that that we use just for fun what are some current bets that you have that you're really excited about or you find cute or interesting who actually don't know like I always like the game of thrones fun that's something I always liked but we always try to put some fun bets you know it's just but it's it's fun bets just come around that some of us talk about an event and then we post some odds you're mainly sometimes to see who's right between us as well you know we were just guessing ourselves it's not it's not unheard of that there people are pending on Game of Thrones events you know in the company just because it's fun and you might not be able to answer this but who's your pick to be on the Iron Throne at the end of Game of Thrones um at the moment or whatever was Cersei heavy yeah right cool has that changed over the past several seasons or you've been being a Cersei stronghold I think so she asserted so either let's see we don't know the truth things here you know we definitely will what are we about let me I should check right now Game of Thrones I can actually tell you is the favorite please do it so that did the favor is so we have Jon Snow Daenerys Oh while bran oh wow Brendan interesting also the three favorites interesting I wonder why Tyrion is it isn't a favorite but I think we're digressing now we can have a Game of Thrones episode when it comes up so my final question Marco is do you have a final call to action for all our listeners out there something you'd like to see them do or implement moving forward in their data science careers to me is just like like teach if you are data science help your colleagues to become data scientist that can power everybody but it will make your life easier to make viola it's gonna it's gonna change their life you know just just try to teach teach teach as much as you can I couldn't agree more Marco it's been an absolute pleasure having you on the show it was a pleasure any any time you were you're here with me again I'll be a bit of fantastic all right thanks for joining our wild ride with Marco of how the role of data science in sports betting and bookmaking and how he's building an army of data scientists and democratizing data science and analytics so as a low margin high volume bookmaker a little bit like Walmart Marco says pinnacle doesn't want to make a lot of money selling orange juice I just want to make a little bit but they sell a lot of orange juice to do this they're adept at using all the data they have which may not be a lot such as when Germany played Uruguay in the World Cup but as many super forecasters do they're ruthless at updating their models and predictions in light of new evidence and Bayesian modeling is a fantastic framework for both of these things in terms of coding pinnacle is in our shop and Marco having come from the Excel world himself proselytizers the gains in efficiency performance and scalability that are allows on top of this he busted the myth once again that our models aren't really scalable in production he also told us about how he has used data camp to train over 150 colleagues to become more data fluent and spoke of one colleague who moved from customer support to data science and has already had direct impact on optimizing scheduling to serve as pinnacles clients better I can't stress enough how essential this is data fluency is becoming a skill spread more and more across organizations and not only in the hands of the few next week I'll be speaking with Reshma Shaikh a freelance data scientist and statistician who works in python r & sass Reshma is also an organizer of the meetup groups women in machine learning and data science otherwise known as women TS & play ladies she's organized wildy s for 4 years and is a board member we'll discuss her work at window D s and what you our listeners can do to support and promote women and gender minorities in data science we'll also delve into why women are flourishing in the our community but lagging in Python and discuss more generally how num focus thinks about diversity and inclusion including their code of conduct all this and more I'm your host Hugo Bound Anderson you can follow data camp on twitter at data camp and me at sugar bound you can find all our episodes in show notes at data camp comm / community / podcast\n"