#97 How Salesforce Created a High-Impact Data Science Organization (with Anjali Samani)

The Importance of Transparency and Explainability in Data-Driven Decision Making

As we navigate the complex world of data-driven decision making, it's essential to understand where our numbers come from. A lot of transparency is required when working with stakeholders, providing them with a clear understanding of the models and outputs. This includes explaining the drivers behind the numbers and how they can be changed. By doing so, we create actionability, allowing stakeholders to take informed decisions.

Effective communication between stakeholders and data teams is critical in this context. Data literacy plays a significant role in enabling business stakeholders to become more effective partners for the data team. Without data culture or literacy, stakeholders may not understand what they're being given, leading to confusion and disengagement. On the other hand, having a data literate stakeholder can lead to co-creation, where problems are identified, and solutions are developed together.

The role of executive sponsorship cannot be overstated in building a data culture within an organization. It's essential for leaders to invest time in educating their teams and engaging with stakeholders. Data scientists must also take an active role in building the data science brand and fostering engagement with their audience. The burden of changing organizational culture cannot be shouldered by data scientists alone; it requires a collective effort from leadership.

In conclusion, becoming data literate is crucial for individuals to understand the world around them. This includes asking important questions of the data and the media. By doing so, we can avoid being misled by sensationalized information. For data professionals, building fundamental skills in machine learning engineering is essential, particularly when working in roles that have business impact.

Data Literacy: A Key to Unlocking Business Value

The conversation about data culture and literacy has sparked interesting discussions about the importance of having stakeholders who are well-versed in data-driven decision making. Having a data literate stakeholder can make all the difference in how effectively data is utilized within an organization. This, in turn, can lead to better decision-making and increased trust between stakeholders.

On the other hand, if stakeholders lack data literacy, they may not fully understand what they're being given, leading to confusion and disengagement. Without this level of understanding, stakeholders are unlikely to ask the right questions or provide meaningful feedback on data-driven insights. This can lead to a situation where stakeholders passively receive information without truly engaging with it.

The Impact of Data Literacy on Business Outcomes

By investing in data culture and literacy, organizations can unlock significant business value. When stakeholders have a deep understanding of data-driven decision making, they're more likely to ask the right questions and provide meaningful feedback. This leads to better outcomes, as decisions are informed by accurate and relevant data.

Moreover, having data literate stakeholders can lead to increased engagement and trust within an organization. When stakeholders feel empowered and informed, they're more likely to take ownership of data-driven initiatives, leading to a more collaborative and successful working environment. Conversely, disengaged or uninformed stakeholders can lead to missed opportunities and decreased business outcomes.

The Role of Executive Sponsorship in Building Data Culture

Executive sponsorship is critical in building a data culture within an organization. Leaders must invest time and resources in educating their teams and engaging with stakeholders. By doing so, they can create an environment where data literacy is valued and encouraged.

Moreover, leaders should foster a culture of transparency and accountability within their organizations. This includes providing clear explanations for data-driven decisions and being open to feedback from stakeholders. By doing so, leaders can build trust with their teams and stakeholders, leading to increased engagement and success.

The Power of Data Literacy in Driving Business Results

Data literacy is essential for driving business results. When stakeholders have a deep understanding of data-driven decision making, they're more likely to make informed decisions that drive business growth. By asking the right questions and providing meaningful feedback, stakeholders can identify areas for improvement and develop solutions that address business challenges.

Moreover, having data literate stakeholders can lead to increased efficiency and productivity within an organization. When stakeholders are empowered with accurate and relevant information, they're more likely to take ownership of initiatives and drive results. Conversely, disengaged or uninformed stakeholders can lead to missed opportunities and decreased business outcomes.

Conclusion: Embracing Data Literacy for a Brighter Future

In conclusion, embracing data literacy is essential for unlocking business value. By investing in data culture and literacy, organizations can create an environment where stakeholders are empowered with accurate and relevant information. This leads to better decision-making, increased trust, and improved business outcomes.

As we move forward, it's essential to recognize the importance of data literacy in driving business results. By asking questions, providing feedback, and developing solutions, stakeholders can make a significant impact on organizational success. Moreover, leaders must prioritize executive sponsorship and transparency within their organizations, creating an environment where data literacy is valued and encouraged.

By embracing data literacy, we can create a brighter future for businesses and individuals alike. It's time to recognize the power of data-driven decision making and invest in building a culture that values transparency, accountability, and collaboration.

"WEBVTTKind: captionsLanguage: enyou're listening to data framed a podcast by data camp in this show you'll hear all the latest trends and insights in data science whether you're just getting started in your data career or you're a data leader looking to scale data-driven decisions in your organization join us for in-depth discussions with data and analytics leaders at the forefront of the data revolution let's dive right in hello everyone this is adele data science educator and evangelist at datacamp one of the things we always think about on the podcast is how to make a data team impactful where data teams go beyond hype and promises they cannot keep but become an actual strategic asset that accelerates the organization's ability to provide value to do this data teams need to balance rigor business impact relationships and more and this is why i'm so excited to have anjali samani on today's podcast anjali is the director of data science and data intelligence at salesforce she is a senior data science leader with over 15 years of experience in multinational corporations startups and public sector organizations in the us and the uk she's led her fair share of impactful data teams and she brought it in spades in today's episode throughout our chat we talk about how she defines an impactful data team how to align data science projects with business value balancing rigor with speed as a data scientist the importance of data culture how to manage stakeholders and much much more if you enjoyed this episode make sure to rate comment and subscribe but only if you liked it now on to today's episode anjali it's great to have you on the show thank you for having me adele it's great to be here i'm really excited to speak with you about your experience leading data teams how you define a mature data science organization and how to build an impactful diverse team but beforehand can you give us a bit of a background about yourself and what got you into the data space sure so i work at salesforce on a team called data science applications which is part of a broader organization called data intelligence data intelligence is a very diverse team comprising of four sub-teams one of which is data science applications team so there's a strategy and growth team which helps the general managers of our product lines think through product strategy figure out what kinds of metrics they should be using how to build out framework to drive strategy for their individual business units then there's data science engineering they take care of our data platforms and pipelines so that data scientists can actually do their work then we have a visualization and enablement team who build out a lot of our interfaces for our data science products and help our users and stakeholders interact with the applications team so that team builds out applications centered on ai and machine learning we develop apps for our internal stakeholders at salesforce to help them make better decisions so it comprises of product managers data scientists and data engineers so within that data science applications team i lead the us data science team which comprises of data scientists and senior data scientists that's really great and an awesome aspect of hosting this podcast is that i get to talk to data leaders such as yourself who've been really leading the way when it comes to making data science impactful within their organizations you know you've worked at organizations with really mature data teams and this is especially true at salesforce in your current role i'd love it if you can break down what you think is the hallmarks of a mature data science organization yeah sure so i think if i had to summarize it in one sentence i would say that a mature data organization is one that can both consistently and efficiently or sustainably derive value out of its data so when you break that down you know why we think about consistency is that a lot of organizations are very good at spinning up one-off initiatives to derive value from their data but that doesn't define maturity if you're able to do it very consistently and in a way that is both efficient and sustainable so what do we mean by that efficiency to me is about driving the costs down over time a lot of the times when teams first start out in data science you know they're not really fully set up they're still trying to find their feet and if you work out the cost of generating those insights it can sometimes be quite high because there's a lot of investment that's going into acquiring the data setting up the tech stack hiring the people but over time as the organization matures then the costs should be going down and the value that they derive out of data should be going up and then by sustainability i mean how is the team extracting that value right is it always a fire drill or do you have good processes and technology in place so that it's running like a well-oiled machine so when i break that down it typically comes down to people processes and technology and by technology i mean both data and the actual tech stack so a mature data science organization or mature data organization is one where data is a first-class citizen it's not an afterthought or a by-product of everything else the leadership is intentional about data data is a strategic asset and it is treated as such so there's there's a lot of thinking and intentionality that goes behind what data is collected how it's collected how it's processed why it's collected because that also impacts downstream places depending on the purpose that a data set is being collected for it may naturally introduce a certain biases or certain nuances within that data and if you don't understand why you're doing it and communicate that clearly to the data scientists then you may end up with misleading insights or outcomes so they have all these strategies in place there is the tech stack in place and the technology investments are also very intentional right the data needs to be connected with the business objectives and the strategy which is owned by the ceo's office but the investments in the data and technology also needs to trickle down from the top that's a sort of the technology side of things right so there's the data and how you collect it a lot of times when organizations first start out they'll start with i have all this data i have lots of big data what can i do with it and that's a great place to start but it's also a hallmark of a very sort of naive young organization when you think about it in data maturity terms then there's the tech stack so with that this you know all the technology that goes into collecting organizing and persisting that data there is also technology that enables the people who are using the data and that could be your expert users like your data scientists and engineers and they could also be some of the downstream consumers of those data products so what kinds of dashboarding to investing in how are your users able to interact with your data products how are you enabling your data scientists to access the data and deploy models into production so that's the technology and data side of things then there's the people side of things which is always the most complicated and the most difficult one this is about the whole culture of the organization as well and that sort of starts at the top with the leadership with the exec leadership but at a more localized level it's about good hiring processes it's about knowing what the organization needs and what good looks like within each of those roles and those roles will also vary depending on where in its growth stage of businesses where it's at in its majority levels as an example when the organization is still very early on in its journey they may require a lot of generalists so you know they can hire a few people who can do a whole bunch of different things and they can really get the systems up and running very fast they don't necessarily specialists but as the organization matures then you know you start to see the need for folks who specialize in engineering or in data science or even specific areas within data science so there's understanding of what roles are needed and how to look for the right people then once you have those people it's about supporting them in the right way providing them a clear career path providing them with the right mentorship so that they continue to grow and develop a lot of organizations today offer education stipends which is a really great idea because data science is a field that's evolving at such a pace that you you have to constantly be learning and keeping yourself updated within that people bucket it's all about the data literacy and you know having an organization where there are high levels of data literacy not just the data scientists and engineers or the data specialists who are highly data literate data informs decisions and there is this culture of challenging assumptions views and beliefs if they're not supported by data people are not so attached to views and assumptions that you know if data challenges it they struggle to change their mind and this is a really hard thing to accomplish because it's such a big cultural shift this is what keeps everything running like a well-oiled machine all three of these data processes and people are intricately connected with people at the center of all of these things if you don't have good processes then you may not be able to derive the value that you need out of your data science investments you may have a fantastic team of data scientists who are constantly innovating and coming up with new products or new insights but if you don't have the processes to take that innovation from the lab to production in a way that it impacts the bottom line and gives you that competitive advantage without burning out your team then you've really failed at deriving value out of those investments so the processes are around adopting a lot of the best practices within data science within engineering within product development it's about having that culture of experimentation even at the enterprise level setting up the incentives correctly so that it's okay to fail as long as it's not catastrophically you're not incentivized to always look successful to come up with the positive kinds of insights so the processes are about putting the right things in place starting with the leadership and then it trickling all the way down to your individual data scientists so this is how i think about maturity within a data science org for a data or that is such a great holistic answer and there's definitely a lot to unpack here so i want to really focus in on the people component throughout the remainder of the episode and you mentioned there's the data literacy aspect for the broader organization is also the specialists within the organization the data science team so let's start with the data science theme one thing i've seen you break down really well is the importance of always connecting the data science team's priorities to the solutions they developed with the business objectives and this has multiple dimensions something that you title is the three r's of data science do you mind sharing your thinking or framework on the considerations data teams should make when designing data systems that have business impact yeah i mean in an ideal world everything you build should have a business impact right because otherwise what's the point of doing it but of course that's not realistic especially in the data science world where so much of it is is very experimental it's very iterative it can be very time consuming and there's always more you can do you can spend forever working on a problem so then what it comes down to is how is this thing going to be used and what impact is it going to have on the business because that really determines where you land on what i call the risk versus the time to share curve so if you imagine a pair of axes where you have on the x-axis you have the time to ship and on the y-axis you have the risk that the product carries then it's an upward sloping line going from left to right and in the bottom left corner you have things that are very low risk and take very little time and then in the top right corner is high risk analyses or data science investments that also take a long time to develop and then you have everything in between if you're in the bottom left border then it's analyses or data science work that is low cost that is low risk but typically low impact as well so this could be things like engagement initiatives right where it's all about volume you're producing a new sound bite every other day so even if you get a few things wrong it's not very rigorous then the impact of that is pretty low it may upset a few people it may cause a bit of an uproar on the one day but tomorrow when you release the new thing people are going to forget about it then they'll be talking about something else there is very little trigger that you may want to inject i mean in an ideal world you want to be rigorous about all the work that you do right all your analyses you want to make them reproducible at the very least but if this is something that time and cost considerations don't allow then in that bottom left corner you don't have to worry too much about it then you might move a little bit further along to the right and that's where you may have your insights that you might might produce on a on some kind of a regular cadence these take a little bit longer to produce because you need to think about the business context a little bit more you need to do some deeper level analyses and you may need to reproduce them every so often because you may want to check whether those insights still hold so there you may want to definitely think about the rigor a little bit more and the the reproducibility side of things but if it's not something that you're expecting to run on run very frequently then maybe repeatability and replicability are not big considerations you move a little bit further along the curve and then you have activities that are more medium risk medium impact they're more mature products dashboards reports that are being generated on a regular basis and their reproducibility and repeatability become very important because it's all about efficiency if it's something that you're doing every month it's not requiring any new thinking and the people who are working on it are not necessarily learning and growing from those activities but these are important because they keep the lights on so with those things you want to automate it you want to make sure that it's very efficient it's a it takes very little time to run those so that's where you need to make a little bit more investment so you're moving to the right on that time to ship curve and you're moving a little bit higher in terms of the risk you move a little bit further along and you have a lot of mission critical activities where you know if if you get things wrong there could be huge implications in terms of maybe there's legal risk maybe there's regulatory risk maybe there's a huge reputation risk and brand risk that's associated with getting those things wrong there you want to inject a lot more rigor and leadership needs to understand that these are some pretty big risks and pushing the team to shift very quickly when it may not be possible is maybe not the right thing to do and then when you go at the very top you have very low volume activities and it could be those one-off amazing products you know once in a while kind of products that change the way things are done in a particular industry that changed the thinking in a particular industry and there the risks are super high right those could be catastrophic risks where it could lead to loss of life or it could lead to certain other major implications that's how i i think about it your level of sophistication increases as you go up the curve and so you need to make a lot more investments you need to think about you know how rigorous your analyses are you know how you're persisting these things if there's likely to be certain regulatory or audits that are going to be conducted how are you persisting your data your models your analyses all of these things i really think about it in those terms and then there's also the timing aspect of it right so there are times when something might take a long time and if you need to extend that deadline by a week or two weeks or whatever it is it may just mean that the launch of the product is delayed or shipping is delayed and it's costly but it's not an existential risk almost there are other situations where it's very binary where if you don't release by a certain date because there is a deadline then you've missed that vote and then you have to again kind of balance this need for rigor and the time to ship very finely so it's always a balancing act and it's about taking calculated risks it's about thinking what are the risks how likely is this to occur what is the impact are there any mitigations that i can i can put in place so that's really how i think about it i think about it in terms of the risk and the impact versus the time and the cost and all of those things that's really great and following up here on the rigor and how do you approach prioritization you know as a data leader how do you ensure that you're baking in rigor and what you're describing here with the team's process while also maintaining business priorities and making sure that you're able to move fast and they're able to ship features fast in my view there's a certain level of rigor that i think should be injected in all activities a lot of it comes down to process so to my mind if you don't have rigor in your work and analyses then at best you may not get enough value out of your data science initiatives at worst you could make an incorrect decision which can potentially be catastrophic it may put the organization out of business so my opinion is that rigor is very important to actually derive that value out of data science initiatives and it's important to think about it again in terms of risks right so all else being equal you want both rigor and speed because the business always needs speed they need everything yesterday because time is money and the clock is always ticking if you have the right processes in place then it becomes very easy so it's a little bit like when you're learning to code initially it's really hard because you have all these things to follow and you have to think differently but once it becomes a habit to always make sure that your your code is well tested making sure those are reviewed there's a tech review there's a code review you have these feedback loops in place there are checks and balances along the way then rigor is just a very natural part of how data science is done i think unless you're on the very left of that curve that i just described where rigor isn't that important then taking a very scientific approach is to my mind always the right answer and i think a lot of organizations forget that there is a really strong bias towards speed which is understandable but it can come at the cost of making the right decision and i have seen organizations spend millions upon millions pursuing things that were actually based on incorrect initial analyses and it was because nobody reviewed it because there weren't the right processes in place and nobody asks the right questions and so there's always this tension between balancing the business priorities and needs with the need for rigor and i think a lot of it comes down to asking the right questions you need to ask things like if i don't inject the level of rigor the right level of rigor into this analysis what are the risks right so that whole risk probability of the risk what is the impact and how can i mitigate it and i think helping the business leaders or stakeholders see that kind of analysis to say hey i understand the need for speed but here's what can go wrong if we don't do this right i think that's usually a really critical piece so building those relationships keeping those lines of communication open and also you know as a data scientist sort of knowing when it's time to go really deep and when it's time to come up for air and kind of say okay this is a the right place to stop given the risks and given the time sensitivities of these things so it's there's no short or easy answer to how you do these things it's always a little bit of an art it's a bit of a fine balancing act i love that answer and especially showcasing the risk here to business stakeholders we really value speed but the costs of cutting corners because of speed can be so massive here when not applying rigor one of the really important things in this aspect is also setting up the right incentives for data scientists because if there is always this bias towards speed and bias towards producing what i call positive insights rather than being able to say hey this big hypothesis that we have or this big data set that we've just invested a lot of money and actually there's very little value coming out of it but that takes a lot of gumption a lot of courage a lot of integrity to go and say that to senior leadership and i think often the incentives are set up in such a way that people don't do that those data scientists will sacrifice rigor for speed because there is this culture in some organizations where you know somebody who can get the insights very quickly they're always recognized and rewarded rather than somebody who's who's done a mother analysis and actually comes back and says hey i don't think that there's enough value in this something that's a positive insight is always far more exciting for the leadership there's there's a little bit of educating that needs to happen on the leadership side as well to set up the right culture and incentives you couldn't agree more so we'll expand on that in the literacy section maybe something mature data teams oftentimes do really well is the ability to be able to measure the roi of the data team's outputs how do you go about measuring the value and ri for a specific data science initiative and how does this go into your prioritization process yeah that's a really great question it's it's something that's very close to my heart so i think that data science exists within a business and like any other investment that a business might make you need to do that analysis it is this investment giving me enough return so it's it's one of the most important considerations in my mind when you're thinking about whether or not to invest both time and data science and technology resources into any kind of data science initiative and this isn't just a new initiative that you might be starting from scratch it could be something like oh model x has i don't know 80 accuracy stakeholders are asking for you to increase the accuracy from 80 to 85 percent or 90 because that would really help them to increase the sales or some other cost savings and in principle that sounds like a great idea because of course this is going to help us to improve the bottom line why should we not do it but going from that 80 to 85 percent accuracy might actually take you an entire team of data scientists and it may take six months to to get there because it might be a very complex problem space when you calculate the cost of even something as simple as model improvement on an existing initiative which people often don't do you may realize that actually the improvement in sales or revenue that that model improvement will give doesn't actually offset the investment that's required or it may offset it over a much longer period than what you were expecting measuring that roi is super important so how do you do that right so there's the cost of the initiative and then there's the value you get from it the cost pieces is relatively easy because compute costs are really easy to to get the cost of human resources is again very easy to get how many people are working on it how much time is being spent on it and what is the salary of those people right so that's a very simple calculation the value piece is much harder and that's why a lot of organizations don't really do it they really shy away from quantifying the value that they're getting out of a data science investment it's always possible to start with how the product is going to be used and tying it back to some kind of quantifiable metric or benefit the obvious ones are increasing sales and revenue and cost saved or how much time is being saved or some kind of other operational metric now these things aren't always easy to measure a lot of the processors day-to-day processes and even decision-making processes so much of it is very subtle and a lot of it is muscle memory and it's a lot of mental models getting people to stop and think about how long is it taking me versus how long it took me before i had this product or how long would it take me if suddenly somebody switched the lights off on this product these are really hard things to get people to do and that's one of the reasons why you know success of data science investments really depends on that culture of the organization and these things have to come top down data scientists who's developing these things may not be able to go to a senior decision maker and say hey can you keep a track of how long it takes you to do certain things you can build your your tools in a way that it may track the journey of the user through the product or how much time they're spending navigating through your product but again this requires a lot of thoughtfulness ahead of time and this is often done for external products but not for internal products so internally it's always very difficult to to measure that value but you know even if you can track things like the number of decisions that are powered by this product or this insight or what kind of decision it is then that to my mind is is very valuable and more so than just sending out a survey and getting that qualitative feedback that's very important don't get me wrong but it doesn't necessarily help you get to that roi a lot of it is about asking the right questions and really pushing the stakeholders on that so an example might be okay this tool helps me to make a better estimates of how much i'm going to sell this quarter okay why is that important oh because it may help me to understand where i need to focus okay why is that important or what happens if you don't have that information what is the cost of focusing on the wrong areas you have to really keep digging a lot and this isn't always easy or it's not always welcome either because people don't often have time so it's a tricky one to navigate but it's definitely possible to do it and should be done and then coming to the prioritization part of it oftentimes there is this kind of bias towards making things very sensational right so insights can be very interesting they can be mind-blowing and you can really sensationalize them but if there's no actionability coming out of it then to my mind it's just a fun fact and should not be prioritized over something that is maybe less sensational but more impactful but again this is a cultural thing and it requires a lot of push from the leadership to focus on the right things in reality a lot of other things come into play who's making the request is it coming from the ceo or is it coming from someone much further down the chain what is the likely adoption of this product you know who else in the company is it going to impact are those people the people you want to get in front of and you want to get your product in front of so there's all these other you know sort of the human side of things that go into the prioritization decision but if you take that out then to my mind it's always about the impact and how much it's helping the business to achieve its goals and objectives that's really great i love that last point on sensationalizing certain insights kind of mirrors how we think about data science and ai in the public as well like for example breathtaking research results that are really important you know such as you know an ai playing go right i would argue like a customer churn model is much more useful for an organization for example than something that as sophisticated right yeah and that's whole culture of sensationalization also incentivizes people in the wrong way like i was saying earlier people may want to not follow all the best practices when it comes to how you're training your model so how you're even selecting the data for your training or validation or testing and at one level it can lead the organization to make suboptimal decisions but at another level it also sets the wrong expectations of what data science or ai or machine learning can do not just within the organization but also more broadly among the general public and that can go two ways right that can cause a lot of excitement but it can also cause a lot of fear and i think as a community may often do ourselves disservice by sensationalizing the wrong things you see it a lot especially today in the agi talk with gt3 dali 2 et cetera like these are highly great models right but at the same time we need to have tempered expectations to a certain extent where the field is headed and how we think about ai in the future and of course having a high impact data science team also means scaling data teams and organizing them i'd love to segue in what you've seen is a great way to organize data teams for impact some organizations have a centralized data team other organizations have more of an embedded model which model did you find most effective for building high impact data teams it really depends on where your organization is at in terms of maturity and and its needs there's no kind of one size fits all kind of a solution so small organizations startups they often start and if the organization is small then stay with very centralized teams that do everything so they touch all products all systems they do end-to-end development they're very close to the front end the back end the customers and in terms of the skill set you're looking very much for the breadth rather than necessarily the depth because often they're not required to they don't really have the time to go into a lot of in-depth work as the organization grows and matures then the centralized team tends to start splitting into squads or sub teams that focus on a very particular product or area of the business so they begin to specialize and at that point they become embedded within a business function and that embedded model makes a lot more sense so most medium to large organizations that engage in meaningful data science work will tend more towards this model personally i believe that after a certain point a hybrid model works really well and so you want a combination of both a central data team and then a number of embedded teams so the central data science team would own all things related to the actual data so data cataloging stewardship some level of logging the metadata and analyzing some of that they would own a lot of the data quality and observability of initiatives documentation and data lineage tracking and all of those things they also do very different kind of data science work so they might look at things like entity resolution because they've got a lot of different data sets coming from all over the place internally and externally they may be engaged in building knowledge graphs which then these more functional teams can leverage to draw insights they may be working on things like anomaly detection to understand what kinds of data cleaning might need to be applied what is the right kind of transformation to apply and what's the right level of aggregation and things like that and they need to be able to do all of these things at scale because they look after the entire organization's data they maintain that inventory they also own things like when is it time to deprecate a certain data set or what's going to replace it or how do you persist some of these things then you have these more embedded teams and they do very different kinds of data science work they leverage or build on what these centralized teams are doing so you know they would specialize in specific areas of the business or support different business areas it could be something like sales or marketing or finance it could be in customer success so for these teams you know they don't really need to know how their particular subset of data connects maybe to everything else in the organization they have a narrower focus and the business context and the specific problem that they're trying to solve might be a lot more important so a lot of their work may be a little bit more tactical rather than strategic which is owned by this more centralized team which model works well for the organization again depends on the size depends on the resources and also if there is enough drive from the leadership to create this more centralized organization because that is a huge investment in terms of data in terms of people in terms of technology and if you're not already moving towards that model then there's a lot of political and people-related matters that also need to be considered but personally i think that beyond a certain size of an organization that is a model that that works best that's really great and i completely agree with the hybrid model approach we've seen it a lot as well and like really mature data teams across the spectrum so we talked a lot throughout the episode about data team itself from the people component i want to expand it to outside of the data team and i want to start off with the interaction between the data team and the rest of the organization collaboration with business teams is super important because ultimately data science teams serve the business teams themselves what are ways you've been able to develop trust from your business partners with the data team it's all about relationships right whenever there's people involved it's all about the relationships you may build the most amazing cool data science product but if you're if you don't have the right relationships if you're not speaking to the right people it may never get adopted and it may never see the light of day so it's really about the relationships you build it's about having that empathy for your users understanding what their pain points are and solving for those rather than doing it the other way around which i also see a lot of teams and organizations do where they'll go and build this amazing product and then try and sell it to the customers and say hey you know you should really do this and they're like we don't need it you know what we have works absolutely fine we don't need this very cool product that is very complex to understand will take us a long time to onboard and there's a lot of friction there really understanding what your users need what pain points are is super important it's also about how responsive you are to their needs now by that i don't mean that you give them everything you want you know because they may go away and read about this very sensational cool thing that somebody's done and they say oh i want one of those but that's not what they need it's going to be a huge burden on the data scientists and it's going to drive very little or no value potentially it's about understanding what it is that they're trying to accomplish by getting that really cool thing what is the actual job to be done the problem to be solved and addressing those needs and it's enabling your users to get the most value out of what you're giving them a lot of trust also comes from educating your stakeholders speaking their language and not really trying to drown them in all the data science jargon and all the very technical language and complexities of your models and analyses if they're interested in that thing and if they're technically very capable sure go down that road but if they're not then be mindful of that speak their language show them how it's going to improve their processes or how it's going to make their lives better it's about setting and managing the expectations as well you know don't promise them that you can solve all their problems or get a very high accuracy on your predictions if you're not going to be able to do that because you don't have the right data or if you don't have enough data so you know really being very transparent about these things and saying hey here's what is possible and here's what isn't this is what you will be able to do and here's what you won't be able to do but we can give you some of these other things that can make your life better it may not be a cool ai solution but a dashboard might be all you need it's about having those open and honest conversations it's about a lot of transparency as well you know really helping them to understand where some of these numbers are coming from so you know a lot of explainability within within your models and outputs say hey i think this is what the forecast is going to be this month or this is how much i forecast the sales are going to be this month and this these are the main drivers behind it so that they know what's driving those numbers but also what they can do to change them if you know what the drivers are if you know what levers you can pull then that drives a lot more actionability and when they see those results and when they see how the product is actually impacting the business then i think that can help you to earn a lot of that trust that's really great and speaking here on the communication between both stakeholders how important do you find data culture or data literacy when enabling business stakeholders to become more effective partners for the data team and should the data team invest in the data culture of the remainder of the organization and data culture is everything right it's you know if there isn't enough data culture or literacy then you're talking two different languages and no matter how well intentioned things may not land in the way that that you hope if you have a data literate stakeholder then you know they will be a far more engaged informed partner versus somebody who's very passive or overwhelmed by what you're giving them because they're not accustomed to working in that way they're not accustomed to making their decisions in that way and then at the sort of other end of the spectrum is a very kind of disengaged uninterested stakeholder they won't say no this is really terrible i don't like it i don't want it they will also not say yeah this is wonderful i'm going to use it and all that they'll just be like yeah sure whatever and then you know it goes into this black hole never to see the light of day that's what the difference is between having a very engaged and educated or a data literate stakeholder versus not when you have a data literate business partner you can actually co-create they will come to you with problems and it creates interesting work for the data scientists and it also addresses their pain points and there is this really great feedback loop there is a really great virtuous circle where business is getting value data scientists are engaged they're doing amazing work and those synergies are super important with with keeping everyone engaged and interested that burden shouldn't be on the data scientists alone they cannot change by themselves the culture of the organization there needs to be a lot of executive sponsorship a lot of these things need to start at the top the data scientists definitely need to invest that time in building that data science brand and engaging with their stakeholders and helping to educate them but it cannot be the responsibility of those data scientists alone it has to come from the leadership i couldn't agree more finally as we close out anjali do you have any final call to actions before we wrap up today yes so if you are a non-data professional definitely become data literate teach your kids how to be data literate and ask important questions of the data and the world around them back in the day you could say seeing is believing you can no longer say that this deep wakes there's all kinds of stuff and if you're not if you're not aware of these things if you're not asking the right questions of the data and of what you're being fed by the media it could be something that's over sensationalized then you're really doing yourself and the future generations a disservice and if you are a data professional if you are a data scientist then make sure that you are picking up all the right skills and by that i don't mean the latest and greatest deep learning models and everything you know sure do those things but if your fundamentals are not in place you're going to struggle to add value and get your basics right pick up mle skills because those are super important particularly if you want to go into roles that are not r d kind of roles that have business impact because if you're if you don't have the right engineering skills you're not going to take your ideas and models into production and drive that value so learn what the best practices are from software engineering from product development from data science and really build those skills as well in addition to your technical skills those would be my two calls to action that's awesome thank you so much anjali for coming on dataframe of course thank you so much for having me it's been a lot of fun you've been listening to data framed a podcast by data camp keep connected with us by subscribing to the show in your favorite podcast player please give us a rating leave a comment and share episodes you love that helps us keep delivering insights into all things data thanks for listening until next timeyou're listening to data framed a podcast by data camp in this show you'll hear all the latest trends and insights in data science whether you're just getting started in your data career or you're a data leader looking to scale data-driven decisions in your organization join us for in-depth discussions with data and analytics leaders at the forefront of the data revolution let's dive right in hello everyone this is adele data science educator and evangelist at datacamp one of the things we always think about on the podcast is how to make a data team impactful where data teams go beyond hype and promises they cannot keep but become an actual strategic asset that accelerates the organization's ability to provide value to do this data teams need to balance rigor business impact relationships and more and this is why i'm so excited to have anjali samani on today's podcast anjali is the director of data science and data intelligence at salesforce she is a senior data science leader with over 15 years of experience in multinational corporations startups and public sector organizations in the us and the uk she's led her fair share of impactful data teams and she brought it in spades in today's episode throughout our chat we talk about how she defines an impactful data team how to align data science projects with business value balancing rigor with speed as a data scientist the importance of data culture how to manage stakeholders and much much more if you enjoyed this episode make sure to rate comment and subscribe but only if you liked it now on to today's episode anjali it's great to have you on the show thank you for having me adele it's great to be here i'm really excited to speak with you about your experience leading data teams how you define a mature data science organization and how to build an impactful diverse team but beforehand can you give us a bit of a background about yourself and what got you into the data space sure so i work at salesforce on a team called data science applications which is part of a broader organization called data intelligence data intelligence is a very diverse team comprising of four sub-teams one of which is data science applications team so there's a strategy and growth team which helps the general managers of our product lines think through product strategy figure out what kinds of metrics they should be using how to build out framework to drive strategy for their individual business units then there's data science engineering they take care of our data platforms and pipelines so that data scientists can actually do their work then we have a visualization and enablement team who build out a lot of our interfaces for our data science products and help our users and stakeholders interact with the applications team so that team builds out applications centered on ai and machine learning we develop apps for our internal stakeholders at salesforce to help them make better decisions so it comprises of product managers data scientists and data engineers so within that data science applications team i lead the us data science team which comprises of data scientists and senior data scientists that's really great and an awesome aspect of hosting this podcast is that i get to talk to data leaders such as yourself who've been really leading the way when it comes to making data science impactful within their organizations you know you've worked at organizations with really mature data teams and this is especially true at salesforce in your current role i'd love it if you can break down what you think is the hallmarks of a mature data science organization yeah sure so i think if i had to summarize it in one sentence i would say that a mature data organization is one that can both consistently and efficiently or sustainably derive value out of its data so when you break that down you know why we think about consistency is that a lot of organizations are very good at spinning up one-off initiatives to derive value from their data but that doesn't define maturity if you're able to do it very consistently and in a way that is both efficient and sustainable so what do we mean by that efficiency to me is about driving the costs down over time a lot of the times when teams first start out in data science you know they're not really fully set up they're still trying to find their feet and if you work out the cost of generating those insights it can sometimes be quite high because there's a lot of investment that's going into acquiring the data setting up the tech stack hiring the people but over time as the organization matures then the costs should be going down and the value that they derive out of data should be going up and then by sustainability i mean how is the team extracting that value right is it always a fire drill or do you have good processes and technology in place so that it's running like a well-oiled machine so when i break that down it typically comes down to people processes and technology and by technology i mean both data and the actual tech stack so a mature data science organization or mature data organization is one where data is a first-class citizen it's not an afterthought or a by-product of everything else the leadership is intentional about data data is a strategic asset and it is treated as such so there's there's a lot of thinking and intentionality that goes behind what data is collected how it's collected how it's processed why it's collected because that also impacts downstream places depending on the purpose that a data set is being collected for it may naturally introduce a certain biases or certain nuances within that data and if you don't understand why you're doing it and communicate that clearly to the data scientists then you may end up with misleading insights or outcomes so they have all these strategies in place there is the tech stack in place and the technology investments are also very intentional right the data needs to be connected with the business objectives and the strategy which is owned by the ceo's office but the investments in the data and technology also needs to trickle down from the top that's a sort of the technology side of things right so there's the data and how you collect it a lot of times when organizations first start out they'll start with i have all this data i have lots of big data what can i do with it and that's a great place to start but it's also a hallmark of a very sort of naive young organization when you think about it in data maturity terms then there's the tech stack so with that this you know all the technology that goes into collecting organizing and persisting that data there is also technology that enables the people who are using the data and that could be your expert users like your data scientists and engineers and they could also be some of the downstream consumers of those data products so what kinds of dashboarding to investing in how are your users able to interact with your data products how are you enabling your data scientists to access the data and deploy models into production so that's the technology and data side of things then there's the people side of things which is always the most complicated and the most difficult one this is about the whole culture of the organization as well and that sort of starts at the top with the leadership with the exec leadership but at a more localized level it's about good hiring processes it's about knowing what the organization needs and what good looks like within each of those roles and those roles will also vary depending on where in its growth stage of businesses where it's at in its majority levels as an example when the organization is still very early on in its journey they may require a lot of generalists so you know they can hire a few people who can do a whole bunch of different things and they can really get the systems up and running very fast they don't necessarily specialists but as the organization matures then you know you start to see the need for folks who specialize in engineering or in data science or even specific areas within data science so there's understanding of what roles are needed and how to look for the right people then once you have those people it's about supporting them in the right way providing them a clear career path providing them with the right mentorship so that they continue to grow and develop a lot of organizations today offer education stipends which is a really great idea because data science is a field that's evolving at such a pace that you you have to constantly be learning and keeping yourself updated within that people bucket it's all about the data literacy and you know having an organization where there are high levels of data literacy not just the data scientists and engineers or the data specialists who are highly data literate data informs decisions and there is this culture of challenging assumptions views and beliefs if they're not supported by data people are not so attached to views and assumptions that you know if data challenges it they struggle to change their mind and this is a really hard thing to accomplish because it's such a big cultural shift this is what keeps everything running like a well-oiled machine all three of these data processes and people are intricately connected with people at the center of all of these things if you don't have good processes then you may not be able to derive the value that you need out of your data science investments you may have a fantastic team of data scientists who are constantly innovating and coming up with new products or new insights but if you don't have the processes to take that innovation from the lab to production in a way that it impacts the bottom line and gives you that competitive advantage without burning out your team then you've really failed at deriving value out of those investments so the processes are around adopting a lot of the best practices within data science within engineering within product development it's about having that culture of experimentation even at the enterprise level setting up the incentives correctly so that it's okay to fail as long as it's not catastrophically you're not incentivized to always look successful to come up with the positive kinds of insights so the processes are about putting the right things in place starting with the leadership and then it trickling all the way down to your individual data scientists so this is how i think about maturity within a data science org for a data or that is such a great holistic answer and there's definitely a lot to unpack here so i want to really focus in on the people component throughout the remainder of the episode and you mentioned there's the data literacy aspect for the broader organization is also the specialists within the organization the data science team so let's start with the data science theme one thing i've seen you break down really well is the importance of always connecting the data science team's priorities to the solutions they developed with the business objectives and this has multiple dimensions something that you title is the three r's of data science do you mind sharing your thinking or framework on the considerations data teams should make when designing data systems that have business impact yeah i mean in an ideal world everything you build should have a business impact right because otherwise what's the point of doing it but of course that's not realistic especially in the data science world where so much of it is is very experimental it's very iterative it can be very time consuming and there's always more you can do you can spend forever working on a problem so then what it comes down to is how is this thing going to be used and what impact is it going to have on the business because that really determines where you land on what i call the risk versus the time to share curve so if you imagine a pair of axes where you have on the x-axis you have the time to ship and on the y-axis you have the risk that the product carries then it's an upward sloping line going from left to right and in the bottom left corner you have things that are very low risk and take very little time and then in the top right corner is high risk analyses or data science investments that also take a long time to develop and then you have everything in between if you're in the bottom left border then it's analyses or data science work that is low cost that is low risk but typically low impact as well so this could be things like engagement initiatives right where it's all about volume you're producing a new sound bite every other day so even if you get a few things wrong it's not very rigorous then the impact of that is pretty low it may upset a few people it may cause a bit of an uproar on the one day but tomorrow when you release the new thing people are going to forget about it then they'll be talking about something else there is very little trigger that you may want to inject i mean in an ideal world you want to be rigorous about all the work that you do right all your analyses you want to make them reproducible at the very least but if this is something that time and cost considerations don't allow then in that bottom left corner you don't have to worry too much about it then you might move a little bit further along to the right and that's where you may have your insights that you might might produce on a on some kind of a regular cadence these take a little bit longer to produce because you need to think about the business context a little bit more you need to do some deeper level analyses and you may need to reproduce them every so often because you may want to check whether those insights still hold so there you may want to definitely think about the rigor a little bit more and the the reproducibility side of things but if it's not something that you're expecting to run on run very frequently then maybe repeatability and replicability are not big considerations you move a little bit further along the curve and then you have activities that are more medium risk medium impact they're more mature products dashboards reports that are being generated on a regular basis and their reproducibility and repeatability become very important because it's all about efficiency if it's something that you're doing every month it's not requiring any new thinking and the people who are working on it are not necessarily learning and growing from those activities but these are important because they keep the lights on so with those things you want to automate it you want to make sure that it's very efficient it's a it takes very little time to run those so that's where you need to make a little bit more investment so you're moving to the right on that time to ship curve and you're moving a little bit higher in terms of the risk you move a little bit further along and you have a lot of mission critical activities where you know if if you get things wrong there could be huge implications in terms of maybe there's legal risk maybe there's regulatory risk maybe there's a huge reputation risk and brand risk that's associated with getting those things wrong there you want to inject a lot more rigor and leadership needs to understand that these are some pretty big risks and pushing the team to shift very quickly when it may not be possible is maybe not the right thing to do and then when you go at the very top you have very low volume activities and it could be those one-off amazing products you know once in a while kind of products that change the way things are done in a particular industry that changed the thinking in a particular industry and there the risks are super high right those could be catastrophic risks where it could lead to loss of life or it could lead to certain other major implications that's how i i think about it your level of sophistication increases as you go up the curve and so you need to make a lot more investments you need to think about you know how rigorous your analyses are you know how you're persisting these things if there's likely to be certain regulatory or audits that are going to be conducted how are you persisting your data your models your analyses all of these things i really think about it in those terms and then there's also the timing aspect of it right so there are times when something might take a long time and if you need to extend that deadline by a week or two weeks or whatever it is it may just mean that the launch of the product is delayed or shipping is delayed and it's costly but it's not an existential risk almost there are other situations where it's very binary where if you don't release by a certain date because there is a deadline then you've missed that vote and then you have to again kind of balance this need for rigor and the time to ship very finely so it's always a balancing act and it's about taking calculated risks it's about thinking what are the risks how likely is this to occur what is the impact are there any mitigations that i can i can put in place so that's really how i think about it i think about it in terms of the risk and the impact versus the time and the cost and all of those things that's really great and following up here on the rigor and how do you approach prioritization you know as a data leader how do you ensure that you're baking in rigor and what you're describing here with the team's process while also maintaining business priorities and making sure that you're able to move fast and they're able to ship features fast in my view there's a certain level of rigor that i think should be injected in all activities a lot of it comes down to process so to my mind if you don't have rigor in your work and analyses then at best you may not get enough value out of your data science initiatives at worst you could make an incorrect decision which can potentially be catastrophic it may put the organization out of business so my opinion is that rigor is very important to actually derive that value out of data science initiatives and it's important to think about it again in terms of risks right so all else being equal you want both rigor and speed because the business always needs speed they need everything yesterday because time is money and the clock is always ticking if you have the right processes in place then it becomes very easy so it's a little bit like when you're learning to code initially it's really hard because you have all these things to follow and you have to think differently but once it becomes a habit to always make sure that your your code is well tested making sure those are reviewed there's a tech review there's a code review you have these feedback loops in place there are checks and balances along the way then rigor is just a very natural part of how data science is done i think unless you're on the very left of that curve that i just described where rigor isn't that important then taking a very scientific approach is to my mind always the right answer and i think a lot of organizations forget that there is a really strong bias towards speed which is understandable but it can come at the cost of making the right decision and i have seen organizations spend millions upon millions pursuing things that were actually based on incorrect initial analyses and it was because nobody reviewed it because there weren't the right processes in place and nobody asks the right questions and so there's always this tension between balancing the business priorities and needs with the need for rigor and i think a lot of it comes down to asking the right questions you need to ask things like if i don't inject the level of rigor the right level of rigor into this analysis what are the risks right so that whole risk probability of the risk what is the impact and how can i mitigate it and i think helping the business leaders or stakeholders see that kind of analysis to say hey i understand the need for speed but here's what can go wrong if we don't do this right i think that's usually a really critical piece so building those relationships keeping those lines of communication open and also you know as a data scientist sort of knowing when it's time to go really deep and when it's time to come up for air and kind of say okay this is a the right place to stop given the risks and given the time sensitivities of these things so it's there's no short or easy answer to how you do these things it's always a little bit of an art it's a bit of a fine balancing act i love that answer and especially showcasing the risk here to business stakeholders we really value speed but the costs of cutting corners because of speed can be so massive here when not applying rigor one of the really important things in this aspect is also setting up the right incentives for data scientists because if there is always this bias towards speed and bias towards producing what i call positive insights rather than being able to say hey this big hypothesis that we have or this big data set that we've just invested a lot of money and actually there's very little value coming out of it but that takes a lot of gumption a lot of courage a lot of integrity to go and say that to senior leadership and i think often the incentives are set up in such a way that people don't do that those data scientists will sacrifice rigor for speed because there is this culture in some organizations where you know somebody who can get the insights very quickly they're always recognized and rewarded rather than somebody who's who's done a mother analysis and actually comes back and says hey i don't think that there's enough value in this something that's a positive insight is always far more exciting for the leadership there's there's a little bit of educating that needs to happen on the leadership side as well to set up the right culture and incentives you couldn't agree more so we'll expand on that in the literacy section maybe something mature data teams oftentimes do really well is the ability to be able to measure the roi of the data team's outputs how do you go about measuring the value and ri for a specific data science initiative and how does this go into your prioritization process yeah that's a really great question it's it's something that's very close to my heart so i think that data science exists within a business and like any other investment that a business might make you need to do that analysis it is this investment giving me enough return so it's it's one of the most important considerations in my mind when you're thinking about whether or not to invest both time and data science and technology resources into any kind of data science initiative and this isn't just a new initiative that you might be starting from scratch it could be something like oh model x has i don't know 80 accuracy stakeholders are asking for you to increase the accuracy from 80 to 85 percent or 90 because that would really help them to increase the sales or some other cost savings and in principle that sounds like a great idea because of course this is going to help us to improve the bottom line why should we not do it but going from that 80 to 85 percent accuracy might actually take you an entire team of data scientists and it may take six months to to get there because it might be a very complex problem space when you calculate the cost of even something as simple as model improvement on an existing initiative which people often don't do you may realize that actually the improvement in sales or revenue that that model improvement will give doesn't actually offset the investment that's required or it may offset it over a much longer period than what you were expecting measuring that roi is super important so how do you do that right so there's the cost of the initiative and then there's the value you get from it the cost pieces is relatively easy because compute costs are really easy to to get the cost of human resources is again very easy to get how many people are working on it how much time is being spent on it and what is the salary of those people right so that's a very simple calculation the value piece is much harder and that's why a lot of organizations don't really do it they really shy away from quantifying the value that they're getting out of a data science investment it's always possible to start with how the product is going to be used and tying it back to some kind of quantifiable metric or benefit the obvious ones are increasing sales and revenue and cost saved or how much time is being saved or some kind of other operational metric now these things aren't always easy to measure a lot of the processors day-to-day processes and even decision-making processes so much of it is very subtle and a lot of it is muscle memory and it's a lot of mental models getting people to stop and think about how long is it taking me versus how long it took me before i had this product or how long would it take me if suddenly somebody switched the lights off on this product these are really hard things to get people to do and that's one of the reasons why you know success of data science investments really depends on that culture of the organization and these things have to come top down data scientists who's developing these things may not be able to go to a senior decision maker and say hey can you keep a track of how long it takes you to do certain things you can build your your tools in a way that it may track the journey of the user through the product or how much time they're spending navigating through your product but again this requires a lot of thoughtfulness ahead of time and this is often done for external products but not for internal products so internally it's always very difficult to to measure that value but you know even if you can track things like the number of decisions that are powered by this product or this insight or what kind of decision it is then that to my mind is is very valuable and more so than just sending out a survey and getting that qualitative feedback that's very important don't get me wrong but it doesn't necessarily help you get to that roi a lot of it is about asking the right questions and really pushing the stakeholders on that so an example might be okay this tool helps me to make a better estimates of how much i'm going to sell this quarter okay why is that important oh because it may help me to understand where i need to focus okay why is that important or what happens if you don't have that information what is the cost of focusing on the wrong areas you have to really keep digging a lot and this isn't always easy or it's not always welcome either because people don't often have time so it's a tricky one to navigate but it's definitely possible to do it and should be done and then coming to the prioritization part of it oftentimes there is this kind of bias towards making things very sensational right so insights can be very interesting they can be mind-blowing and you can really sensationalize them but if there's no actionability coming out of it then to my mind it's just a fun fact and should not be prioritized over something that is maybe less sensational but more impactful but again this is a cultural thing and it requires a lot of push from the leadership to focus on the right things in reality a lot of other things come into play who's making the request is it coming from the ceo or is it coming from someone much further down the chain what is the likely adoption of this product you know who else in the company is it going to impact are those people the people you want to get in front of and you want to get your product in front of so there's all these other you know sort of the human side of things that go into the prioritization decision but if you take that out then to my mind it's always about the impact and how much it's helping the business to achieve its goals and objectives that's really great i love that last point on sensationalizing certain insights kind of mirrors how we think about data science and ai in the public as well like for example breathtaking research results that are really important you know such as you know an ai playing go right i would argue like a customer churn model is much more useful for an organization for example than something that as sophisticated right yeah and that's whole culture of sensationalization also incentivizes people in the wrong way like i was saying earlier people may want to not follow all the best practices when it comes to how you're training your model so how you're even selecting the data for your training or validation or testing and at one level it can lead the organization to make suboptimal decisions but at another level it also sets the wrong expectations of what data science or ai or machine learning can do not just within the organization but also more broadly among the general public and that can go two ways right that can cause a lot of excitement but it can also cause a lot of fear and i think as a community may often do ourselves disservice by sensationalizing the wrong things you see it a lot especially today in the agi talk with gt3 dali 2 et cetera like these are highly great models right but at the same time we need to have tempered expectations to a certain extent where the field is headed and how we think about ai in the future and of course having a high impact data science team also means scaling data teams and organizing them i'd love to segue in what you've seen is a great way to organize data teams for impact some organizations have a centralized data team other organizations have more of an embedded model which model did you find most effective for building high impact data teams it really depends on where your organization is at in terms of maturity and and its needs there's no kind of one size fits all kind of a solution so small organizations startups they often start and if the organization is small then stay with very centralized teams that do everything so they touch all products all systems they do end-to-end development they're very close to the front end the back end the customers and in terms of the skill set you're looking very much for the breadth rather than necessarily the depth because often they're not required to they don't really have the time to go into a lot of in-depth work as the organization grows and matures then the centralized team tends to start splitting into squads or sub teams that focus on a very particular product or area of the business so they begin to specialize and at that point they become embedded within a business function and that embedded model makes a lot more sense so most medium to large organizations that engage in meaningful data science work will tend more towards this model personally i believe that after a certain point a hybrid model works really well and so you want a combination of both a central data team and then a number of embedded teams so the central data science team would own all things related to the actual data so data cataloging stewardship some level of logging the metadata and analyzing some of that they would own a lot of the data quality and observability of initiatives documentation and data lineage tracking and all of those things they also do very different kind of data science work so they might look at things like entity resolution because they've got a lot of different data sets coming from all over the place internally and externally they may be engaged in building knowledge graphs which then these more functional teams can leverage to draw insights they may be working on things like anomaly detection to understand what kinds of data cleaning might need to be applied what is the right kind of transformation to apply and what's the right level of aggregation and things like that and they need to be able to do all of these things at scale because they look after the entire organization's data they maintain that inventory they also own things like when is it time to deprecate a certain data set or what's going to replace it or how do you persist some of these things then you have these more embedded teams and they do very different kinds of data science work they leverage or build on what these centralized teams are doing so you know they would specialize in specific areas of the business or support different business areas it could be something like sales or marketing or finance it could be in customer success so for these teams you know they don't really need to know how their particular subset of data connects maybe to everything else in the organization they have a narrower focus and the business context and the specific problem that they're trying to solve might be a lot more important so a lot of their work may be a little bit more tactical rather than strategic which is owned by this more centralized team which model works well for the organization again depends on the size depends on the resources and also if there is enough drive from the leadership to create this more centralized organization because that is a huge investment in terms of data in terms of people in terms of technology and if you're not already moving towards that model then there's a lot of political and people-related matters that also need to be considered but personally i think that beyond a certain size of an organization that is a model that that works best that's really great and i completely agree with the hybrid model approach we've seen it a lot as well and like really mature data teams across the spectrum so we talked a lot throughout the episode about data team itself from the people component i want to expand it to outside of the data team and i want to start off with the interaction between the data team and the rest of the organization collaboration with business teams is super important because ultimately data science teams serve the business teams themselves what are ways you've been able to develop trust from your business partners with the data team it's all about relationships right whenever there's people involved it's all about the relationships you may build the most amazing cool data science product but if you're if you don't have the right relationships if you're not speaking to the right people it may never get adopted and it may never see the light of day so it's really about the relationships you build it's about having that empathy for your users understanding what their pain points are and solving for those rather than doing it the other way around which i also see a lot of teams and organizations do where they'll go and build this amazing product and then try and sell it to the customers and say hey you know you should really do this and they're like we don't need it you know what we have works absolutely fine we don't need this very cool product that is very complex to understand will take us a long time to onboard and there's a lot of friction there really understanding what your users need what pain points are is super important it's also about how responsive you are to their needs now by that i don't mean that you give them everything you want you know because they may go away and read about this very sensational cool thing that somebody's done and they say oh i want one of those but that's not what they need it's going to be a huge burden on the data scientists and it's going to drive very little or no value potentially it's about understanding what it is that they're trying to accomplish by getting that really cool thing what is the actual job to be done the problem to be solved and addressing those needs and it's enabling your users to get the most value out of what you're giving them a lot of trust also comes from educating your stakeholders speaking their language and not really trying to drown them in all the data science jargon and all the very technical language and complexities of your models and analyses if they're interested in that thing and if they're technically very capable sure go down that road but if they're not then be mindful of that speak their language show them how it's going to improve their processes or how it's going to make their lives better it's about setting and managing the expectations as well you know don't promise them that you can solve all their problems or get a very high accuracy on your predictions if you're not going to be able to do that because you don't have the right data or if you don't have enough data so you know really being very transparent about these things and saying hey here's what is possible and here's what isn't this is what you will be able to do and here's what you won't be able to do but we can give you some of these other things that can make your life better it may not be a cool ai solution but a dashboard might be all you need it's about having those open and honest conversations it's about a lot of transparency as well you know really helping them to understand where some of these numbers are coming from so you know a lot of explainability within within your models and outputs say hey i think this is what the forecast is going to be this month or this is how much i forecast the sales are going to be this month and this these are the main drivers behind it so that they know what's driving those numbers but also what they can do to change them if you know what the drivers are if you know what levers you can pull then that drives a lot more actionability and when they see those results and when they see how the product is actually impacting the business then i think that can help you to earn a lot of that trust that's really great and speaking here on the communication between both stakeholders how important do you find data culture or data literacy when enabling business stakeholders to become more effective partners for the data team and should the data team invest in the data culture of the remainder of the organization and data culture is everything right it's you know if there isn't enough data culture or literacy then you're talking two different languages and no matter how well intentioned things may not land in the way that that you hope if you have a data literate stakeholder then you know they will be a far more engaged informed partner versus somebody who's very passive or overwhelmed by what you're giving them because they're not accustomed to working in that way they're not accustomed to making their decisions in that way and then at the sort of other end of the spectrum is a very kind of disengaged uninterested stakeholder they won't say no this is really terrible i don't like it i don't want it they will also not say yeah this is wonderful i'm going to use it and all that they'll just be like yeah sure whatever and then you know it goes into this black hole never to see the light of day that's what the difference is between having a very engaged and educated or a data literate stakeholder versus not when you have a data literate business partner you can actually co-create they will come to you with problems and it creates interesting work for the data scientists and it also addresses their pain points and there is this really great feedback loop there is a really great virtuous circle where business is getting value data scientists are engaged they're doing amazing work and those synergies are super important with with keeping everyone engaged and interested that burden shouldn't be on the data scientists alone they cannot change by themselves the culture of the organization there needs to be a lot of executive sponsorship a lot of these things need to start at the top the data scientists definitely need to invest that time in building that data science brand and engaging with their stakeholders and helping to educate them but it cannot be the responsibility of those data scientists alone it has to come from the leadership i couldn't agree more finally as we close out anjali do you have any final call to actions before we wrap up today yes so if you are a non-data professional definitely become data literate teach your kids how to be data literate and ask important questions of the data and the world around them back in the day you could say seeing is believing you can no longer say that this deep wakes there's all kinds of stuff and if you're not if you're not aware of these things if you're not asking the right questions of the data and of what you're being fed by the media it could be something that's over sensationalized then you're really doing yourself and the future generations a disservice and if you are a data professional if you are a data scientist then make sure that you are picking up all the right skills and by that i don't mean the latest and greatest deep learning models and everything you know sure do those things but if your fundamentals are not in place you're going to struggle to add value and get your basics right pick up mle skills because those are super important particularly if you want to go into roles that are not r d kind of roles that have business impact because if you're if you don't have the right engineering skills you're not going to take your ideas and models into production and drive that value so learn what the best practices are from software engineering from product development from data science and really build those skills as well in addition to your technical skills those would be my two calls to action that's awesome thank you so much anjali for coming on dataframe of course thank you so much for having me it's been a lot of fun you've been listening to data framed a podcast by data camp keep connected with us by subscribing to the show in your favorite podcast player please give us a rating leave a comment and share episodes you love that helps us keep delivering insights into all things data thanks for listening until next time\n"

#97 How Salesforce Created a High-Impact Data Science Organization (with Anjali Samani)

Random Videos