#95 How to Build a Data Science Team from Scratch (with Elettra DaMaggio)

**The Art of Lean Data Science: A Conversation with Elektra**

In today's fast-paced world of data science, embracing a lean and agile approach is crucial for success. This mindset allows teams to quickly adapt to changing requirements and deliver value to their organizations. We sat down with Elektra, a seasoned data scientist at Stonex, to discuss the importance of this approach and how it has helped her team in various projects.

**The Lean Startup Approach**

Elektra believes that a lean startup approach is essential for data teams. "It's all about getting that Minimum Viable Product (MVP) first and then building up from there," she explains. This mindset allows teams to focus on delivering value quickly, rather than investing too much time and resources into a single project. By doing so, they can iterate faster, gather feedback from customers, and make data-driven decisions more efficiently.

**Data Science in Financial Services**

As we dive deeper into our conversation with Elektra, it becomes clear that her experience is rooted in the financial services sector. Specifically, she highlights the importance of understanding how data science contributes to commodities trading, foreign exchange trading, and other financial processes. With recent global events like the war in Ukraine and economic uncertainty, the need for accurate and timely data analysis has never been more pressing.

**The Impact on Stonex**

Although Stonex hasn't been directly affected by the international situation, the company's business is inherently volatile. As a result, Elektra notes that her team has focused on ensuring that their systems are functioning correctly and not impacting other processes in the organization. While they haven't had to deal with the complexities of sanctions or account blockages, they have been busy analyzing market trends and developing predictive models to support informed decision-making.

**Key Use Cases**

Elektra's team at Stonex has tackled a range of data science projects across various domains. These include marketing analysis, cell segmentation, attribution modeling, churn prediction, lifetime value prediction, and even natural language processing (NLP). One of the most recent applications is in classifying customer communications, which promises to revolutionize how companies interact with their clients.

**Future Trends and Innovations**

As we wrap up our conversation with Elektra, it becomes clear that her team is keenly aware of the ever-evolving landscape of data science. The focus on industrializing data science practices and developing more efficient pipelines is a top priority for them. In terms of innovation, they're exploring the use of meta-learning models to enable dynamic model selection and adaptability.

**Marketing and Machine Learning**

One area that holds particular promise is the integration of machine learning into marketing campaigns. By leveraging internal and external data sources, companies can gain valuable insights into consumer behavior and preferences. Elektra notes that this is an especially pressing issue in light of growing cookie policy restrictions, which are forcing marketers to rethink their approaches.

**The Power of Collaboration**

Throughout our conversation, it's evident that collaboration and teamwork are essential components of Elektra's success as a data scientist. Her emphasis on patience and hard work serves as a reminder that data science is a journey, not a destination.

**Conclusion**

As we conclude our conversation with Elektra, it becomes clear that her team at Stonex embodies the lean data science approach. By embracing an agile mindset, they're able to deliver value quickly and adapt to changing requirements. From marketing analysis to machine learning innovation, Elektra's insights offer a valuable perspective on the future of data science.

"WEBVTTKind: captionsLanguage: enyou're listening to data framed a podcast by datacamp in this show you'll hear all the latest trends and insights in data science whether you're just getting started in your data career or you're a data leader looking to scale data-driven decisions in your organization join us for in-depth discussions with data and analytics leaders at the Forefront of the data Revolution Let's Dive Right In foreign this is Adele data science evangelist and educator at datacamp as data science becomes more and more integral to the success of organizations now more than ever organizations of all sorts and sizes are building data science functions to make the most of the data that they generate however I think given all the data framed episodes we've covered thus far this year it is definitely no easy feat to launch a data science function from scratch so I am excited to have Electra DiMaggio on today's podcast letra is the director of data science at stonex she has been deeply embedded in the data and digital transformation space and financial services and played a crucial role in launching the data science function at stonex throughout the episode we talked about the main challenges associated with launching a data science function how data leaders can prioritize the roadmap between low hanging fruit and long-term Vision how to earn trust with stakeholders within the organization as a data leader use cases she's worked on advice she has for aspiring practitioner owners and much more if you enjoyed this episode make sure to rate subscribe and comment but only if you liked it now let's Dive Right In alera great to have you on the show thanks guys for having me I'm excited to talk to you about your work leading data Sciences stonex best practices for launching a data science function from scratch how to manage short-term objectives and long-term priorities and more but before can you give us a bit of a background about yourself yeah sure so I started like studying computer science a long time ago so I graduated I had my Bachelor of Science and master of science in computer science and then during my master I majored in Ai and databases and I graduated in 2009 so yeah a long time ago and at that time I had to say that the science wasn't yet a thing although I went through all the neural network Vision NLP type of projects that you might imagine so I started to work in consultancy and then after a while I get bored of that so I I want a fellowship in Paris and got my MBA it was really interesting to get a business I was a business educational background as well it actually was very useful to me to learn a lot about how company works and what's behind the product or the service that a company actually offers and after that I went back to Italy to work in Gardner Consulting so again in consultancy was just it was a little bit of a curse on me at the time but then I moved finally in BMP Party by and in so as in in the client side as in consultancy they used to say in financial institution retail mostly so retail banking first BMP pariba and digital transformation then HSBC and then finally I moved into gain AKA stonex now I've been acquired in 2020 and then rebranded into stonex and I actually transitioned from a more like retail banking type of service to trading instead trading Services as you might know or not know the stone exit owns the two brands in UK and worldwide forest.com and City index that provides trading services to people that's really great and I want to set the stage for today's conversation you let the data science team at stonex you let the data science function as well at Sonex it's always interesting to talk to someone who played a key role in launching a data team or a practice within an organization because I think there's many growing pain stories that are often missed by Practitioner Data scientists who join a relatively mature data team within an organization so what are the key ingredients of launching a data science team or function within an organization so when I started in stone X at the time it was still gain it was 2019. I started as principal analyst and the hope for my boss was for me to start the conversation about using data science and machine learning within the company that wasn't really using anything of this type of application and I started with two analysts under me and now I have nine so it was quite a journey in terms of I have to say I I was very happy to see that the organization was ready to use the data in a certain way and this is one of the key points so data needs to be ready to consume if you don't have that you can definitely not start data science very quickly because first of all you need good data and I was lucky enough that all the other people and then in the other teams and in the Enterprise data system teams and all of these people spend a lot of time and effort to set up a good data set a good back end from a data perspective that helped very much this is definitely key and I understand that sometimes in huge organization where you have the so-called data swamp issue where a lot of people just dump their data in the cloud and then they say okay now do something with that that is really one of the biggest pain point of a data science practice so that said when you start a data science practice the first thing that you need to understand is what can be your scope of action and your scope of action is directly linked to the quality of data that you have within that scope so I in in my case I think the good the secret ingredient of the recipe was to understand okay where can I bring value based on what is ready to be used so not start from you cannot do top down because if you do top down and you say oh you know what we should have a machine learning algorithm that do you know XYZ and then you just you fold this requirement into the tax pack and you understand when you go to the taxpack that wow to do this we would need to basically to work on all these data sets and if all these data sets are in a huge mess you will just spend months and months if not years to fix things so in this case you need to be smart and understand okay where I can drive the most value with what I have it's like you open your fridge and you have I don't know eggs and you have maybe avocado and you have something okay what I can do with this instead of taking the recipe book and say you know what it would be great to do a carrot cake and then you know you don't have anything to do okay okay that's basically the same thing you just start with what you have and try to and I would say I know it doesn't sound maybe very fancy from a data perspective but a lot of things usually from a business perspective that brings value right away is good data or linked data or integrated data View that's really awesome and what I really would like to ascertain from some of your answers here is the challenges related to launching a new data science function so you definitely mentioned the technical challenges of data quality what are other categories or challenges associated with building a new data science function from scratch definitely the talent Recruitment and also I was to understand the text stack that you want to work on the way we did that was to incrementally find our use cases that we know would or we we were fairly sure that would provide value to the business try to deliver a pilot of those and then just get more money from the company more investment it wasn't up that's all the money go ahead we had to earn all our Tiny Steps and we were fine with that because a big bang approach might not be the best because the point on machine learning I believe is that it's very much experiment driven you need to understand everything that you can work with you need to run all your experiments you need to understand and the more you learn and the more you have an understanding of how many people you need what type of tech you need maybe someone knows but personally if you just join a company and you don't know anything about or you don't know anything yet about the status of the data the status of the organization the stages of the business per se I mean in my case even though I was in financial institution I was coming from retail banking SO trading was a new thing for me so I had to learn a new type of service so if it's a new type of service a new industry maybe not an industry but a new area of the industry you need to get an understanding of that as well so data organization and business before you have this understanding really deep inside your head it's very hard to say I need these people I need this stack I need this capacity so the way you need to do that in my opinion is that you need to learn and readjust it's Lean Startup type of thinking you just start with a pilot with your MVP and then you work on it and you just evolve and add on top and understand if you're still in the right track if you're doing something that is useful for the business or not and you constantly readjust and you add on top or you just like fine tune so this is definitely the way I'm doing it and the way I would suggest someone else to do it and the challenges in doing this is definitely find the right people and not just in your team but also it's really key for a data science team to have a very good Dev team an architecture like team that can support you with suggesting the right tools for your need suggesting the right architecture so just you know everything that you need for example to process stream of data there are so many aspects in delivering a data science product that is really hard for one person to know everything of everything so you need to make sure you have good people advising you in all the steps that you are not an expert on that's really great and that's harp on that organizational challenge whether it's building out your own team can you walk me through in more detail how do you earn trust as a new data leader within an organization when working with different stakeholders such as a Dev team such as the business stakeholders right and that's the first set of questions but the second set of questions here as well would be how do you build out the team knowing that it's still early out in a juncture and that you want to be relatively disciplined and the type of resources and the amount of resources you add to a new team while maintaining the fact that you're adding value but you also want to make sure that you have the best hire so what is the type of profile you look for in an early data team as I said those success stories that you can drive in your first 6 to 12 months those are keys for you to build your trust if you can deliver a success stories let's say within your first year in the business and it could be something that people can say oh you know what who's that person believer that and they can associate you to a certain type of deliverable so start to build this type of trust by actually have direct content and being how they say lead by competence so make sure that everyone had things associate your name to something that works and that is definitely step one and then you start from that and I would say if you can secure that it will be all good everything will fall a lot smoother instead compared to something that just barge in and say oh we should do this we should do that and so for and so on and on the second Point instead about what type of people hiring and in your early data team so because the tech stack was very simple at the beginning like very very simple because we were building the practice let's say and that is also related to the iterative approach if you start with a very complex Tech stack you know oh very complicated I saw a full tax tag from your Cloud your machine learning and Ops platform your data engineering ETL and all the works okay all the works that you have gcp or AWS or Azure cloud and you have on top your email of course you need people that are skilled on all this Tech to deliver something so automatically you will need more people because you will need you cannot have someone that knows everything about all this Tech if you start with an easier Tech stack right we started with python having a server that was running our python script to test them and then we let's say partnered with other Dev team to deliver some models in production so we didn't do the delivery in production but we handed that over to other Dev team that had other Tech stack so with that in mind the type of people I hired in the first place where I would say a data scientist that had a little bit of coding if not coding experience just coding appetite so they didn't mind setting up Python scripts that were just getting data from apis scripting website or whatever to get get the data that they need to have to develop their machine learning models or to just test and experiment the machine learning models that we have in mind and once we developed this couple to I would say a couple of success stories we finally started to have our own development platform we have been completely included in the devops process because when I started the analytics team wasn't considered part of the devops it was an old-school Excel bi type of team and that was all it was reporting most most times it was just reporting but of course there was an appetite to evolve that so we started with that the in 2020 they said you know what guys you are developing software it's good that you are included in our devops so we started to be included in the devops so we had some training I already knew a little bit of git and bit pocket or GitHub or whatever we switch repositories in between but the other guys were so type of people eager to learn definitely they need to have solid foundation from a statistical mathematical perspective but they need to have that I would say that that appetite to develop things to not just analyze things but to really develop something that is a product so it's more it's more an attitude type of thing that you need to associate to a strong quantitative background that was the type of people that that I was hiring at the beginning that's really great and how has your hiring practices or what you look for evolved as the team grew and it became more established and provided Roi so now that the team is a little bit more established the way I set up my team is that I have guys that are more focused on the data engineering and machine learning actually engineering things as we are setting up finally our mlops Tech stack so I don't know if this is like a very mean differentiation but the way I see this is that you have there are like people that are driven to write what someone might call production code they're like other people they're more driven to analyze experiments and see things like what how I see the data scientist at the moment in my team is very much an r d function so it's a person that needs to have a business Acumen so needs to know about the business or needs to be able to understand the business so has a strong commercial organizational and business understanding and of course has that statistical and machine learning knowledge so that can you know just join the dots and say oh you know what I can use this data to solve this problem but once I would say the data scientist molded the Infinite Space of solution and caged it in a little bit more manageable space that thing is passed on the machine learning engineering and the engineering function that will industrialize and set up the pipelines and everything that needs to be done in order to operationalize and make of that mold a product that is reliable sustainable and and reusable within the business on top of these two group of people that I have in my team I also have a ba that supports me and the way I be and which I think is really useful because ba is that type of person that first of all has a constant relationship with different stakeholders and that are customers of your products and can gather requirements and have a conversation with the data scientist or the machine learning engineer to say you know what maybe we should do something to either change the product in this way an existing product or maybe design something new that would include that would solve this type of issue and also is the person that really helps you embedding the product within the business you know training business stakeholders talk with them maybe guide them at the beginning on how to use and how to interpret data and how to interpret the model workings because one of the things that when you develop a machine learning model is that it's very hard to explain these two known data people so you need to have that person that has that constant relationship with them so he can or she can like wrap that up in a way that is understandable and so that you can have sponsors outside your team that's key you always need to have sponsors outside your team I love that answer and I love how you create the delineation within the data science team from a more research and development type small mini data team that transitions its outputs to more applied engineering team that industrializes the work a lot of data scientists do but harping on that last note here when it comes to the business analyst role and one creating a relationship with other stakeholders and Gathering requirements and feedback oftentimes when talking to data leaders a big obstacle they face when it comes to providing value with data science analytics is data culture or analytics mindset or lack thereof within the organization I'd love to understand from you how did you approach conversations with the remainder of the stakeholders within the organization that may or may not have analytics mindset or a data culture or understand the value of data science and how you were able to maneuver these obstacles whether through the use of a ba or within your own team and how you approach these conversations so first of all this has nothing to do with your data skills than just putting this as a like a disclaimer on top this is all about your I would say political skills or relationship skills so as I said it's very key for you to start understanding where you can find your sponsor so first of all you need to have conversation for example in our case our company is organized in commercial leaders and we have Global teams as well and Commercial leaders of course you have commercial leaders of the biggest regions commercial leaders of maybe smaller regions and you need to gather an understanding on who who has the most driving role within the community of Executives and I'm sure if there is a data science team in the company you will be able to find your sponsors from day one the ones that are really Keen to get involved into that it might be easier or harder in some cases so first thing try to understand what are your easier sponsors the ones that maybe they're the keenest in sponsoring you but there might be still on the lookout because you haven't delivered anything yet I'm interested in data I would like data science so try to understand what are their key requirements and as I said I remember when I started I was like this is a little bit of your Jedi trick so you don't want that you want this so when you have a conversation with them and you know what you can deliver you you need to in a clever way sell something that is useful for them but you can deliver in in a reasonable amount of time so you try to drive them to that type of solution and this is your personal negotiating skills let us say like that once you have secured your good sponsor your big sponsor with that you just word them one by one if that makes sense and this I know this might seem okay but what happens when I deliver the model I have to explain that to them this is not related right it's actually very related because if you know that they in their heart they're already sponsoring you the day you are going to them explaining they will have a different attitude listening to you so you will have your chance to explain it to them and I would say don't never be condescending never been the lecturer there always try to you know what I delivered this because the main goal for this is to provide this additional benefit for you I'm using this do you want me to go through the details of the model I can most of the times I have to say that we're interested in knowing the performances so whatever type of performance metrics you want to use try to save for the business stakeholders the ones that are mostly understandable all the performance kpis that you use to understand if the model is sustainable if the model is robust maybe just save it for the annex but at the end of the day a commercial stakeholders want to know how often this works and if it doesn't work what is the risk so for example we had a churn prediction model that we started to share it was our first xgboost random Forest actual real machine learning type of model and we try just to I we went through that just explaining the features and we explained the confusion metrics to the commercial leader and I was already too much because it's a new thing for them and the way we were talking to them about that was the model on average ninety percent of the time predicts correct but what the mistakes we we worked in a way that we are over predicting churners because at the end of the day we slightly over predict churners this is why we don't have higher like performances because at the end of the day it doesn't cost to us a lot sending another email or calling another person that is at risk of churn it might cost more losing someone that we're not calling and wrapping in this way it was very understandable for them and they were really happy with that it required multiple explanation like multiple times to go through but after that you just build trust and it's easier and easier because they just start trusting you and they say okay you know I don't have a full understanding but if you say it's working it's fine we'll see we'll review it after a couple of months that we have this running so this is the type of I would say massage that you have to do at the beginning and you need to be patient and not too rush or aggressive definitely not aggressive that's really awesome and I think at the Crux of a lot of the different answers that you've done so far and I think a key Central tenant when it comes to succeeding and launching a data team is managing both the short-term priorities and the short-term wins that you can get but as well as making sure that you're working towards a long-term Vision so there's always a North star where we want to be in the long term and quarterly okrs and objectives that guide the short-term objectives for a data team can you walk me through the process of prioritization between these two objectives I have to say it's not something that you do alone especially if you're joining a new business the first thing that you want to do is also have a talk to the people that have been long time in the business so they can share with you what I've been 20 years in the business or 15 years in the business and I've been one of the things that really would disrupt us will be a way to predict that to understand that and then it's okay wow and never underestimate the fact that if the guy has been there for 15 years and they didn't manage to do that it doesn't mean that because you're a data scientist in one year you're gonna do that just because you have machine learning or whatever it's probably harder than that so you just put that and you gather all of this thoughts and you understand okay you know what so let's define a road map to go there so for example one of the things that we gathered from our I would say key internal speakers or applications that we can apply to online stream of Trades and transaction and of course being able to apply machine learning model on an online stream of data it's something that requires a tech stack that we're building towards that but if we started to do that from day one we wouldn't have delivered anything valuable it will just be cost for the business and will probably still be working on it after three years because it requires time to do that so you have that and so that's your top-down checklist if you wish and this allows you to understand what is the Roadmaster what we have now okay now I have my desktop a SQL data warehouse and Excel because that's how we started and and I need to go where machine learning online streaming setup is what do I need to do that and you can do it yourself I would always advise to talk with other people as well on the architecture side and gather like their view because I'm sure other people would have thought about it as well and you start defining your roadmap and milestones we would need to have at least an orchestrator like airflow to run our scripts and Python and all of these things we will need to have a devops process and that is step one and then you go okay you know what we will need to have probably a cloud based approach to run our machine learning not on our desktop on a cloud as compute that is scalable and we don't need to leave up our laptop to run overnight to train models we will have something on the cloud to do that and have some platform to connect to different data sources like for example I don't know data bricks or or this type of azure cloud and all of this platform and then to actually get stream of data you will need something like Kafka and then you start using pi spark and all of this thing so you have this plan and this is your your vision planning that is always easy from a certain perspective you just you plan and you say okay what do I need I need all of these things it's your grocery list on the other side you have short term and short-term as I said before you need to start with what you have so what do I have this what can I do with this and what is priority for the business priority you get from your sponsors or commercial stakeholders so you get the priorities from that from the business when I joined I got two priorities we need to understand how much we're spending on acquisition marketing and how much we're getting from that spend because at the moment we have no idea so that was one priority and on the other side like we don't know how we're targeting our customers we need a way to to segment our customer and Define the journeys based on our segments so very much acquisition focus and I have to say having an MBA or whatever like Business course or marketing course that you can have really had me there because I knew I know how a marketer would think about these things defining personas defining the user Journeys defining all of these things this is a knowledge that I got from both my MBA and also my previous job in in retail banking because I used to work in the ux team as a quantitative ba I was analyzing data and defining Journey with the user experience designers so I knew how much that acquisition artifacts were important for designers and for marketing in general so thanks to that I was able to capture that but I have to say they were very vocal I mean that they had these issues I said okay well can we do that and I had a look at our data warehouse and as I said at the beginning it was really key I was very lucky to have a neat data warehouse even if it was just our backhand back-end on-premises data warehouse which just our onboarding system and customer activity I was very lucky to have a very neat data set to start work with of course there were some glitches in the process but nothing too messy so that was key of the first successes that we had so that's how I started to prioritize like more short-term goals that's really great and if you want to abstract this out and propose a framework that can enable other data leaders to extract small wins as well as low hanging fruit that demonstrate early value for a data team how would you go about that so the way we go about that I will start with the data don't do that alone start with your business stakeholders and ask them what are the data sets that you use in your day-by-day job and how do you use them because if how they use them you can understand oh you know what you could automate that I could do something that will help you in using that data in a more efficient way and by going this way you're able to first of all understand right away what are the data sources involved in the process and have a look if the data sources are usable and second you have your use case and even if it's not the fanciest use case you can start delivering something very quickly because you have a workable data source to to start with and by doing that you start building your sponsors and once you start building your sponsors even if it's like with tiny deliverables you can start building up things on the other side I would say based on how messy is the data situation in the company you can start involve other teams and raise awareness if it's not there ready maybe it's already their awareness but raised awareness and investment of time and resources on fixing the data so that the data will enable you producing something that is more of higher value this is the way I would do that as I said it's a very entrepreneurial Lean Startup type of approach MVP first and then you just build up your way to the top I I couldn't agree more that biased action and having that lean approach is super useful for a lot of data teams now as we end up our episode Elektra I'd be remiss not to talk a bit about your work at stonex especially on the data science use cases that provide value in financial services with a recent war in Ukraine Covenant do supply chain issues economic uncertainty I think it's never been more important from a data scientist perspective to understand the role data science plays and commodities training foreign exchange trading and more so I'd love to understand some of the ways data science has been providing value in the industry I have to say we haven't been requested and in our case the overall International situation they don't affect too much directly of course we know they're like some people that have been sanctioned so accounts have been blocked stonex didn't have a huge impact on this point so we've been lucky but as a trading company we of course experience a lot of volatility in the market and that made our business very active from a certain perspective in terms of how a data team in our case we haven't been involved too much apart from making sure that what we were seeing in our system wasn't affecting other processes in the business but in terms of doing anything we haven't done anything also because when you have this I would say delicate situation it's left to Human handling this because you never know if you automate things you are prone to I would say embarrassing mistakes and this is something that of course no company wants because a lot I mean because we have a manageable volume of customers a manageable volume of accounts the data team wasn't really involved in doing anything specifically what are some of the main use cases you've been working on as a dead leader at stonex so we have definitely a lot of things related to marketing cell segmentation attribution modeling churn prediction lifetime value prediction last year we did our first NLP application to classify a customer Communications at the moment we're also working on client sentiment in trading and definitely one of the things that we would like to work on as I said before is online streaming of data but I don't have yet workable use cases to to share it's we need to build the grounds to do that that's awesome so hello as we close up our episode I'd love to look at any future Trends and innovations that you're particularly excited about in the future at the moment I feel that we are achieving a sort of data science has been a very wild and type of Aria there was a lot of buys not many companies achieved to Tangled the data science practice so at the moment the focus that I have is to try and industrialize the approach and make the data science practice like solid so the type of for example Tech that we're looking around is definitely mlops and pipeline Tech in terms of like pure Innovation and machine learning think honestly there's nothing purely Innovative that we're looking for we have so much ground to recover and to work on it before we we do something like more Innovative but especially for marketing there's a lot of innovation in terms of combining multiple models so ensembling for example but also combining multiple models to dynamically select advertisements this is something that is our mind is in our mind and we will definitely do that so using internal and external data to understand what are the trends what are the things that are actually grasping people's mind at the moment and dynamically select the content of your advertisement serving them at the right time to the right person that that is definitely something that is becoming machine learning heavy especially with all the cookie policies that are becoming more and more strict so this is definitely something that is in my mind I don't know when I will be able to implement that but this is definitely one of the things in my life that's awesome finally letra as we close up do you have any call to action before we wrap up today I would say just it takes patience and hard work so if you're not ready to have patience and do you know your your hours to to get your success stories do something else but it gives you a lot of satisfaction but it definitely gives you a lot of satisfaction in the end it's what is worth your while but it's A Hard Way to the Top If you want to rock and roll as they say yeah 100 thank you so much Electra for coming on the podcast thank you for having me thank you you've been listening to data framed a podcast by datacamp keep connected with us by subscribing to the show in your favorite podcast player please give us a rating leave a comment and share episodes you love that helps us keep delivering insights into all things data thanks for listening until next timeyou're listening to data framed a podcast by datacamp in this show you'll hear all the latest trends and insights in data science whether you're just getting started in your data career or you're a data leader looking to scale data-driven decisions in your organization join us for in-depth discussions with data and analytics leaders at the Forefront of the data Revolution Let's Dive Right In foreign this is Adele data science evangelist and educator at datacamp as data science becomes more and more integral to the success of organizations now more than ever organizations of all sorts and sizes are building data science functions to make the most of the data that they generate however I think given all the data framed episodes we've covered thus far this year it is definitely no easy feat to launch a data science function from scratch so I am excited to have Electra DiMaggio on today's podcast letra is the director of data science at stonex she has been deeply embedded in the data and digital transformation space and financial services and played a crucial role in launching the data science function at stonex throughout the episode we talked about the main challenges associated with launching a data science function how data leaders can prioritize the roadmap between low hanging fruit and long-term Vision how to earn trust with stakeholders within the organization as a data leader use cases she's worked on advice she has for aspiring practitioner owners and much more if you enjoyed this episode make sure to rate subscribe and comment but only if you liked it now let's Dive Right In alera great to have you on the show thanks guys for having me I'm excited to talk to you about your work leading data Sciences stonex best practices for launching a data science function from scratch how to manage short-term objectives and long-term priorities and more but before can you give us a bit of a background about yourself yeah sure so I started like studying computer science a long time ago so I graduated I had my Bachelor of Science and master of science in computer science and then during my master I majored in Ai and databases and I graduated in 2009 so yeah a long time ago and at that time I had to say that the science wasn't yet a thing although I went through all the neural network Vision NLP type of projects that you might imagine so I started to work in consultancy and then after a while I get bored of that so I I want a fellowship in Paris and got my MBA it was really interesting to get a business I was a business educational background as well it actually was very useful to me to learn a lot about how company works and what's behind the product or the service that a company actually offers and after that I went back to Italy to work in Gardner Consulting so again in consultancy was just it was a little bit of a curse on me at the time but then I moved finally in BMP Party by and in so as in in the client side as in consultancy they used to say in financial institution retail mostly so retail banking first BMP pariba and digital transformation then HSBC and then finally I moved into gain AKA stonex now I've been acquired in 2020 and then rebranded into stonex and I actually transitioned from a more like retail banking type of service to trading instead trading Services as you might know or not know the stone exit owns the two brands in UK and worldwide forest.com and City index that provides trading services to people that's really great and I want to set the stage for today's conversation you let the data science team at stonex you let the data science function as well at Sonex it's always interesting to talk to someone who played a key role in launching a data team or a practice within an organization because I think there's many growing pain stories that are often missed by Practitioner Data scientists who join a relatively mature data team within an organization so what are the key ingredients of launching a data science team or function within an organization so when I started in stone X at the time it was still gain it was 2019. I started as principal analyst and the hope for my boss was for me to start the conversation about using data science and machine learning within the company that wasn't really using anything of this type of application and I started with two analysts under me and now I have nine so it was quite a journey in terms of I have to say I I was very happy to see that the organization was ready to use the data in a certain way and this is one of the key points so data needs to be ready to consume if you don't have that you can definitely not start data science very quickly because first of all you need good data and I was lucky enough that all the other people and then in the other teams and in the Enterprise data system teams and all of these people spend a lot of time and effort to set up a good data set a good back end from a data perspective that helped very much this is definitely key and I understand that sometimes in huge organization where you have the so-called data swamp issue where a lot of people just dump their data in the cloud and then they say okay now do something with that that is really one of the biggest pain point of a data science practice so that said when you start a data science practice the first thing that you need to understand is what can be your scope of action and your scope of action is directly linked to the quality of data that you have within that scope so I in in my case I think the good the secret ingredient of the recipe was to understand okay where can I bring value based on what is ready to be used so not start from you cannot do top down because if you do top down and you say oh you know what we should have a machine learning algorithm that do you know XYZ and then you just you fold this requirement into the tax pack and you understand when you go to the taxpack that wow to do this we would need to basically to work on all these data sets and if all these data sets are in a huge mess you will just spend months and months if not years to fix things so in this case you need to be smart and understand okay where I can drive the most value with what I have it's like you open your fridge and you have I don't know eggs and you have maybe avocado and you have something okay what I can do with this instead of taking the recipe book and say you know what it would be great to do a carrot cake and then you know you don't have anything to do okay okay that's basically the same thing you just start with what you have and try to and I would say I know it doesn't sound maybe very fancy from a data perspective but a lot of things usually from a business perspective that brings value right away is good data or linked data or integrated data View that's really awesome and what I really would like to ascertain from some of your answers here is the challenges related to launching a new data science function so you definitely mentioned the technical challenges of data quality what are other categories or challenges associated with building a new data science function from scratch definitely the talent Recruitment and also I was to understand the text stack that you want to work on the way we did that was to incrementally find our use cases that we know would or we we were fairly sure that would provide value to the business try to deliver a pilot of those and then just get more money from the company more investment it wasn't up that's all the money go ahead we had to earn all our Tiny Steps and we were fine with that because a big bang approach might not be the best because the point on machine learning I believe is that it's very much experiment driven you need to understand everything that you can work with you need to run all your experiments you need to understand and the more you learn and the more you have an understanding of how many people you need what type of tech you need maybe someone knows but personally if you just join a company and you don't know anything about or you don't know anything yet about the status of the data the status of the organization the stages of the business per se I mean in my case even though I was in financial institution I was coming from retail banking SO trading was a new thing for me so I had to learn a new type of service so if it's a new type of service a new industry maybe not an industry but a new area of the industry you need to get an understanding of that as well so data organization and business before you have this understanding really deep inside your head it's very hard to say I need these people I need this stack I need this capacity so the way you need to do that in my opinion is that you need to learn and readjust it's Lean Startup type of thinking you just start with a pilot with your MVP and then you work on it and you just evolve and add on top and understand if you're still in the right track if you're doing something that is useful for the business or not and you constantly readjust and you add on top or you just like fine tune so this is definitely the way I'm doing it and the way I would suggest someone else to do it and the challenges in doing this is definitely find the right people and not just in your team but also it's really key for a data science team to have a very good Dev team an architecture like team that can support you with suggesting the right tools for your need suggesting the right architecture so just you know everything that you need for example to process stream of data there are so many aspects in delivering a data science product that is really hard for one person to know everything of everything so you need to make sure you have good people advising you in all the steps that you are not an expert on that's really great and that's harp on that organizational challenge whether it's building out your own team can you walk me through in more detail how do you earn trust as a new data leader within an organization when working with different stakeholders such as a Dev team such as the business stakeholders right and that's the first set of questions but the second set of questions here as well would be how do you build out the team knowing that it's still early out in a juncture and that you want to be relatively disciplined and the type of resources and the amount of resources you add to a new team while maintaining the fact that you're adding value but you also want to make sure that you have the best hire so what is the type of profile you look for in an early data team as I said those success stories that you can drive in your first 6 to 12 months those are keys for you to build your trust if you can deliver a success stories let's say within your first year in the business and it could be something that people can say oh you know what who's that person believer that and they can associate you to a certain type of deliverable so start to build this type of trust by actually have direct content and being how they say lead by competence so make sure that everyone had things associate your name to something that works and that is definitely step one and then you start from that and I would say if you can secure that it will be all good everything will fall a lot smoother instead compared to something that just barge in and say oh we should do this we should do that and so for and so on and on the second Point instead about what type of people hiring and in your early data team so because the tech stack was very simple at the beginning like very very simple because we were building the practice let's say and that is also related to the iterative approach if you start with a very complex Tech stack you know oh very complicated I saw a full tax tag from your Cloud your machine learning and Ops platform your data engineering ETL and all the works okay all the works that you have gcp or AWS or Azure cloud and you have on top your email of course you need people that are skilled on all this Tech to deliver something so automatically you will need more people because you will need you cannot have someone that knows everything about all this Tech if you start with an easier Tech stack right we started with python having a server that was running our python script to test them and then we let's say partnered with other Dev team to deliver some models in production so we didn't do the delivery in production but we handed that over to other Dev team that had other Tech stack so with that in mind the type of people I hired in the first place where I would say a data scientist that had a little bit of coding if not coding experience just coding appetite so they didn't mind setting up Python scripts that were just getting data from apis scripting website or whatever to get get the data that they need to have to develop their machine learning models or to just test and experiment the machine learning models that we have in mind and once we developed this couple to I would say a couple of success stories we finally started to have our own development platform we have been completely included in the devops process because when I started the analytics team wasn't considered part of the devops it was an old-school Excel bi type of team and that was all it was reporting most most times it was just reporting but of course there was an appetite to evolve that so we started with that the in 2020 they said you know what guys you are developing software it's good that you are included in our devops so we started to be included in the devops so we had some training I already knew a little bit of git and bit pocket or GitHub or whatever we switch repositories in between but the other guys were so type of people eager to learn definitely they need to have solid foundation from a statistical mathematical perspective but they need to have that I would say that that appetite to develop things to not just analyze things but to really develop something that is a product so it's more it's more an attitude type of thing that you need to associate to a strong quantitative background that was the type of people that that I was hiring at the beginning that's really great and how has your hiring practices or what you look for evolved as the team grew and it became more established and provided Roi so now that the team is a little bit more established the way I set up my team is that I have guys that are more focused on the data engineering and machine learning actually engineering things as we are setting up finally our mlops Tech stack so I don't know if this is like a very mean differentiation but the way I see this is that you have there are like people that are driven to write what someone might call production code they're like other people they're more driven to analyze experiments and see things like what how I see the data scientist at the moment in my team is very much an r d function so it's a person that needs to have a business Acumen so needs to know about the business or needs to be able to understand the business so has a strong commercial organizational and business understanding and of course has that statistical and machine learning knowledge so that can you know just join the dots and say oh you know what I can use this data to solve this problem but once I would say the data scientist molded the Infinite Space of solution and caged it in a little bit more manageable space that thing is passed on the machine learning engineering and the engineering function that will industrialize and set up the pipelines and everything that needs to be done in order to operationalize and make of that mold a product that is reliable sustainable and and reusable within the business on top of these two group of people that I have in my team I also have a ba that supports me and the way I be and which I think is really useful because ba is that type of person that first of all has a constant relationship with different stakeholders and that are customers of your products and can gather requirements and have a conversation with the data scientist or the machine learning engineer to say you know what maybe we should do something to either change the product in this way an existing product or maybe design something new that would include that would solve this type of issue and also is the person that really helps you embedding the product within the business you know training business stakeholders talk with them maybe guide them at the beginning on how to use and how to interpret data and how to interpret the model workings because one of the things that when you develop a machine learning model is that it's very hard to explain these two known data people so you need to have that person that has that constant relationship with them so he can or she can like wrap that up in a way that is understandable and so that you can have sponsors outside your team that's key you always need to have sponsors outside your team I love that answer and I love how you create the delineation within the data science team from a more research and development type small mini data team that transitions its outputs to more applied engineering team that industrializes the work a lot of data scientists do but harping on that last note here when it comes to the business analyst role and one creating a relationship with other stakeholders and Gathering requirements and feedback oftentimes when talking to data leaders a big obstacle they face when it comes to providing value with data science analytics is data culture or analytics mindset or lack thereof within the organization I'd love to understand from you how did you approach conversations with the remainder of the stakeholders within the organization that may or may not have analytics mindset or a data culture or understand the value of data science and how you were able to maneuver these obstacles whether through the use of a ba or within your own team and how you approach these conversations so first of all this has nothing to do with your data skills than just putting this as a like a disclaimer on top this is all about your I would say political skills or relationship skills so as I said it's very key for you to start understanding where you can find your sponsor so first of all you need to have conversation for example in our case our company is organized in commercial leaders and we have Global teams as well and Commercial leaders of course you have commercial leaders of the biggest regions commercial leaders of maybe smaller regions and you need to gather an understanding on who who has the most driving role within the community of Executives and I'm sure if there is a data science team in the company you will be able to find your sponsors from day one the ones that are really Keen to get involved into that it might be easier or harder in some cases so first thing try to understand what are your easier sponsors the ones that maybe they're the keenest in sponsoring you but there might be still on the lookout because you haven't delivered anything yet I'm interested in data I would like data science so try to understand what are their key requirements and as I said I remember when I started I was like this is a little bit of your Jedi trick so you don't want that you want this so when you have a conversation with them and you know what you can deliver you you need to in a clever way sell something that is useful for them but you can deliver in in a reasonable amount of time so you try to drive them to that type of solution and this is your personal negotiating skills let us say like that once you have secured your good sponsor your big sponsor with that you just word them one by one if that makes sense and this I know this might seem okay but what happens when I deliver the model I have to explain that to them this is not related right it's actually very related because if you know that they in their heart they're already sponsoring you the day you are going to them explaining they will have a different attitude listening to you so you will have your chance to explain it to them and I would say don't never be condescending never been the lecturer there always try to you know what I delivered this because the main goal for this is to provide this additional benefit for you I'm using this do you want me to go through the details of the model I can most of the times I have to say that we're interested in knowing the performances so whatever type of performance metrics you want to use try to save for the business stakeholders the ones that are mostly understandable all the performance kpis that you use to understand if the model is sustainable if the model is robust maybe just save it for the annex but at the end of the day a commercial stakeholders want to know how often this works and if it doesn't work what is the risk so for example we had a churn prediction model that we started to share it was our first xgboost random Forest actual real machine learning type of model and we try just to I we went through that just explaining the features and we explained the confusion metrics to the commercial leader and I was already too much because it's a new thing for them and the way we were talking to them about that was the model on average ninety percent of the time predicts correct but what the mistakes we we worked in a way that we are over predicting churners because at the end of the day we slightly over predict churners this is why we don't have higher like performances because at the end of the day it doesn't cost to us a lot sending another email or calling another person that is at risk of churn it might cost more losing someone that we're not calling and wrapping in this way it was very understandable for them and they were really happy with that it required multiple explanation like multiple times to go through but after that you just build trust and it's easier and easier because they just start trusting you and they say okay you know I don't have a full understanding but if you say it's working it's fine we'll see we'll review it after a couple of months that we have this running so this is the type of I would say massage that you have to do at the beginning and you need to be patient and not too rush or aggressive definitely not aggressive that's really awesome and I think at the Crux of a lot of the different answers that you've done so far and I think a key Central tenant when it comes to succeeding and launching a data team is managing both the short-term priorities and the short-term wins that you can get but as well as making sure that you're working towards a long-term Vision so there's always a North star where we want to be in the long term and quarterly okrs and objectives that guide the short-term objectives for a data team can you walk me through the process of prioritization between these two objectives I have to say it's not something that you do alone especially if you're joining a new business the first thing that you want to do is also have a talk to the people that have been long time in the business so they can share with you what I've been 20 years in the business or 15 years in the business and I've been one of the things that really would disrupt us will be a way to predict that to understand that and then it's okay wow and never underestimate the fact that if the guy has been there for 15 years and they didn't manage to do that it doesn't mean that because you're a data scientist in one year you're gonna do that just because you have machine learning or whatever it's probably harder than that so you just put that and you gather all of this thoughts and you understand okay you know what so let's define a road map to go there so for example one of the things that we gathered from our I would say key internal speakers or applications that we can apply to online stream of Trades and transaction and of course being able to apply machine learning model on an online stream of data it's something that requires a tech stack that we're building towards that but if we started to do that from day one we wouldn't have delivered anything valuable it will just be cost for the business and will probably still be working on it after three years because it requires time to do that so you have that and so that's your top-down checklist if you wish and this allows you to understand what is the Roadmaster what we have now okay now I have my desktop a SQL data warehouse and Excel because that's how we started and and I need to go where machine learning online streaming setup is what do I need to do that and you can do it yourself I would always advise to talk with other people as well on the architecture side and gather like their view because I'm sure other people would have thought about it as well and you start defining your roadmap and milestones we would need to have at least an orchestrator like airflow to run our scripts and Python and all of these things we will need to have a devops process and that is step one and then you go okay you know what we will need to have probably a cloud based approach to run our machine learning not on our desktop on a cloud as compute that is scalable and we don't need to leave up our laptop to run overnight to train models we will have something on the cloud to do that and have some platform to connect to different data sources like for example I don't know data bricks or or this type of azure cloud and all of this platform and then to actually get stream of data you will need something like Kafka and then you start using pi spark and all of this thing so you have this plan and this is your your vision planning that is always easy from a certain perspective you just you plan and you say okay what do I need I need all of these things it's your grocery list on the other side you have short term and short-term as I said before you need to start with what you have so what do I have this what can I do with this and what is priority for the business priority you get from your sponsors or commercial stakeholders so you get the priorities from that from the business when I joined I got two priorities we need to understand how much we're spending on acquisition marketing and how much we're getting from that spend because at the moment we have no idea so that was one priority and on the other side like we don't know how we're targeting our customers we need a way to to segment our customer and Define the journeys based on our segments so very much acquisition focus and I have to say having an MBA or whatever like Business course or marketing course that you can have really had me there because I knew I know how a marketer would think about these things defining personas defining the user Journeys defining all of these things this is a knowledge that I got from both my MBA and also my previous job in in retail banking because I used to work in the ux team as a quantitative ba I was analyzing data and defining Journey with the user experience designers so I knew how much that acquisition artifacts were important for designers and for marketing in general so thanks to that I was able to capture that but I have to say they were very vocal I mean that they had these issues I said okay well can we do that and I had a look at our data warehouse and as I said at the beginning it was really key I was very lucky to have a neat data warehouse even if it was just our backhand back-end on-premises data warehouse which just our onboarding system and customer activity I was very lucky to have a very neat data set to start work with of course there were some glitches in the process but nothing too messy so that was key of the first successes that we had so that's how I started to prioritize like more short-term goals that's really great and if you want to abstract this out and propose a framework that can enable other data leaders to extract small wins as well as low hanging fruit that demonstrate early value for a data team how would you go about that so the way we go about that I will start with the data don't do that alone start with your business stakeholders and ask them what are the data sets that you use in your day-by-day job and how do you use them because if how they use them you can understand oh you know what you could automate that I could do something that will help you in using that data in a more efficient way and by going this way you're able to first of all understand right away what are the data sources involved in the process and have a look if the data sources are usable and second you have your use case and even if it's not the fanciest use case you can start delivering something very quickly because you have a workable data source to to start with and by doing that you start building your sponsors and once you start building your sponsors even if it's like with tiny deliverables you can start building up things on the other side I would say based on how messy is the data situation in the company you can start involve other teams and raise awareness if it's not there ready maybe it's already their awareness but raised awareness and investment of time and resources on fixing the data so that the data will enable you producing something that is more of higher value this is the way I would do that as I said it's a very entrepreneurial Lean Startup type of approach MVP first and then you just build up your way to the top I I couldn't agree more that biased action and having that lean approach is super useful for a lot of data teams now as we end up our episode Elektra I'd be remiss not to talk a bit about your work at stonex especially on the data science use cases that provide value in financial services with a recent war in Ukraine Covenant do supply chain issues economic uncertainty I think it's never been more important from a data scientist perspective to understand the role data science plays and commodities training foreign exchange trading and more so I'd love to understand some of the ways data science has been providing value in the industry I have to say we haven't been requested and in our case the overall International situation they don't affect too much directly of course we know they're like some people that have been sanctioned so accounts have been blocked stonex didn't have a huge impact on this point so we've been lucky but as a trading company we of course experience a lot of volatility in the market and that made our business very active from a certain perspective in terms of how a data team in our case we haven't been involved too much apart from making sure that what we were seeing in our system wasn't affecting other processes in the business but in terms of doing anything we haven't done anything also because when you have this I would say delicate situation it's left to Human handling this because you never know if you automate things you are prone to I would say embarrassing mistakes and this is something that of course no company wants because a lot I mean because we have a manageable volume of customers a manageable volume of accounts the data team wasn't really involved in doing anything specifically what are some of the main use cases you've been working on as a dead leader at stonex so we have definitely a lot of things related to marketing cell segmentation attribution modeling churn prediction lifetime value prediction last year we did our first NLP application to classify a customer Communications at the moment we're also working on client sentiment in trading and definitely one of the things that we would like to work on as I said before is online streaming of data but I don't have yet workable use cases to to share it's we need to build the grounds to do that that's awesome so hello as we close up our episode I'd love to look at any future Trends and innovations that you're particularly excited about in the future at the moment I feel that we are achieving a sort of data science has been a very wild and type of Aria there was a lot of buys not many companies achieved to Tangled the data science practice so at the moment the focus that I have is to try and industrialize the approach and make the data science practice like solid so the type of for example Tech that we're looking around is definitely mlops and pipeline Tech in terms of like pure Innovation and machine learning think honestly there's nothing purely Innovative that we're looking for we have so much ground to recover and to work on it before we we do something like more Innovative but especially for marketing there's a lot of innovation in terms of combining multiple models so ensembling for example but also combining multiple models to dynamically select advertisements this is something that is our mind is in our mind and we will definitely do that so using internal and external data to understand what are the trends what are the things that are actually grasping people's mind at the moment and dynamically select the content of your advertisement serving them at the right time to the right person that that is definitely something that is becoming machine learning heavy especially with all the cookie policies that are becoming more and more strict so this is definitely something that is in my mind I don't know when I will be able to implement that but this is definitely one of the things in my life that's awesome finally letra as we close up do you have any call to action before we wrap up today I would say just it takes patience and hard work so if you're not ready to have patience and do you know your your hours to to get your success stories do something else but it gives you a lot of satisfaction but it definitely gives you a lot of satisfaction in the end it's what is worth your while but it's A Hard Way to the Top If you want to rock and roll as they say yeah 100 thank you so much Electra for coming on the podcast thank you for having me thank you you've been listening to data framed a podcast by datacamp keep connected with us by subscribing to the show in your favorite podcast player please give us a rating leave a comment and share episodes you love that helps us keep delivering insights into all things data thanks for listening until next time\n"