The Power of Innovation and Talent Acquisition in Large Legacy Companies
In today's fast-paced business landscape, large legacy companies are under immense pressure to innovate and stay ahead of the curve. One key area where they can tap into this innovation is through their data science teams. According to Jack, Senior Manager at a large tech company, "I think that's the important thing people right about, because they need career paths I think that's the important thing people right about." He emphasizes the importance of creating a community of practice among product managers and data scientists, which allows them to work together on projects and share ideas.
This community of practice is crucial in driving innovation within the organization. Jack notes that "people often will say well it's a guild and I'm not sure it's a guild as much because there is an or of product managers there's an org of data scientists." However, he argues that the projects themselves are what truly matter, regardless of who leads them. He believes that "the best person in charge to execute that that project if you will" should be determined based on their skills and expertise, rather than their title or background.
When it comes to innovation, Jack emphasizes the importance of giving employees the freedom to try new things. "We don't care where they come from I was asking about that but also you know where Innovation comes from within the organization do you um is it you know kind of tucked into longstanding product road maps or how do you leverage opportunities that are created by ML?" He notes that innovation often arises from ideation during development cycles, and that employees should be encouraged to share their ideas and work on them.
Jack also highlights the importance of being open to change and willing to rework projects as needed. "We don't want to rework forever but we always give the opportunity for people to do it so we're really we spend a lot of time talking why we want to build something and what we want to build." This approach helps to keep employees engaged and motivated, which is essential for driving innovation.
One of the biggest challenges facing large legacy companies when it comes to talent acquisition is the perception that they are less attractive places to work. However, Jack believes that this can be turned around by emphasizing the company's strengths and values. "I think we're really benefiting right now and we're benefiting for for two reasons one we took some steps to get to the Cloud and you know at least for what I do for what you do people are attracted to the data they're attracted to the idea that they can actually work on something with meaningful data." He notes that once employees are exposed to the company's culture and values, they become excited about working there.
Jack also emphasizes the importance of finding out what motivates each individual employee. "You see these Sparkles even over over all the you know zooming and everything else you see a sparkle in people's eyes when they discover something and that's a spreading activation." He believes that this approach helps to keep employees energized and motivated, which is essential for driving innovation.
In conclusion, Jack highlights the importance of creating a community of practice among product managers and data scientists, giving employees the freedom to try new things, and emphasizing the company's strengths and values. By doing so, large legacy companies can tap into their employees' creativity and drive innovation from within.
"WEBVTTKind: captionsLanguage: enall right everyone I am here with Jack burwitz Jack is the chief data officer with ADP Jack welcome to the tml AI podcast thanks thanks for having me I'm looking forward to digging into our conversation we'll be talking about some of the ways that ADP is using machine learning but to get us started I'd love to have you share a little bit about your background and uh a little bit about your career sure I've got a long career been at this for 30 31 years now kind of three phases first phase was largely doing work with derpa so early 90s mid90s doing a lot of work and you know at that time was cutting edge things like um ontologies uh you know distributed reasoning technology all sorts of stuff that people at DARPA would would normally um equate then I spent about 12 years doing various startups some in the semantic web some in search some in analytics and then for the past 10 years split between Oracle and now ADP really running large scale Enterprise level analytics products uh and so we build products for other people uh and then my my job at ADP is a combination of building products for other for clients and then also doing some things internally that ADP uses awesome awesome and when you started at ADP you started out in a product Centric role and transitioned over to CD i' love to have you share a little bit about that transition how it came about and uh you know how you it sounds like you still retain some product ownership there yeah I sure do so the CDO role really has two faces to it um on the one hand we build product for clients we have you know tens of thousands of clients that use are analytics for for HR people analytics compensation analytics um recruiting things like that and then we also have a defensive nature to the CDO organization so besides having all the data Advantage what do we do to protect the information and one of the big messages uh you know that I spend a lot of time on is talking about data governance you know so it's pretty funny right because as product development people as machine learning people we hear the word data governance and we immediately think okay people are trying to get in our way right or trying to hold us back and actually what it means to us is it enables us to get the data into a nice clean way so that we can share and so that you know if somebody Builds an nml algorithm to do something I don't know what you know but um for job matching the entire community of developers can then use that algorithm for job matching right so that's all part of that governance that that that we're putting in place today so I'd love to come back to that data governance topic before we do I've jumped right in and maybe taken it for granted that folks know who ADP is and uh it may be the case that most do but why don't you give us a quick summary of ADP and uh the company's Focus yeah so ADP is the world's largest provider of HR payroll benefit Solutions so we have over 920,000 clients in 140 countries and you know most people know ADP because their W2 or their paycheck has been issued by a w ADP Maybe not today but maybe at some point in their career and if it hasn't it will be right uh one of the things that we talk a lot about at ADP is it's not your grandfather's ADP but it's actually my grandfather's ADP so they've been been around for about 70 years he had a small gas station he paid his his employees and the family 380p um but um you know a series of first at the company really the first company to ever use computers for pay first company to build a uh what we now call software as a service used to be called something else in the late 90s uh for doing HR first company to do um mobile apps for for for people uh for the employees of companies and so every day you know or every week you can check your payroll you can clock in and out you can you know check your 401k balance through the through your mobile app um so lots of people about 58,000 people uh today uh uh building products and then providing great expertise because the one thing about HR is is that it's not just about the software you know there's you know regulations that have to be maintained there's problems for for employees or clients that need to be solved you know what if I'm adopting cross border what's the impact on my pay or or any sort of things like that and so you know our CEO likes to say hey it's about building great products with great expertise and that's really what we try to bring to be I think one of the consequences of that long history is um you the the compan is associated with some of those early technologies that maybe weren't so early when uh it first got started you know to be very clear when I think of kind of poster Childs for the Mainframe ADP is one of those companies who comes to mind and yet we're connected because you were delivering a keynote at AWS reinvent talking about some of the things the company was doing with machine learning as part of the data Cloud product talk a little bit about what it means to be building Cutting Edge data Centric product in a company with that kind of history and Legacy yeah so so we still have a little bit of those mainframes running around and uh and and and guess what so does every bank and and every you know DMV and everything else and um and it's funny right as a as somebody who's always on the cutting Edge you know you heard like DARPA and cloud and everything you know what do you mean well the data's got to come from somewhere right and if you want reliability to us it doesn't actually matter where the data comes from what we what we're able to do is harness the semantics of it the meanings of it bring it together and make it available in a platform and in in something that that other people can use blow and our own teams can use and that's that's the essential part of it and so you know when we were looking how do we do that in a modern way and how do we keep up with the pace of business and you know the pace of business is just unbelievable particularly now since covid uh hit um you know for us moving it into the cloud made total sense right and so we're syncing information every single minute back and forth between you know those classic on-prem data centers but their data center is just the same way as any other data center it's not like it's running under my desk or something um and and we're syncing information and we're extracting value out of it by taking advantage of all the services that we can get in the AWS Cloud your comment about platforms ties back to the the comment about governance and what brings them together is this idea of making it easier for your teams to be productive with that data to pull meaning out of it make predictions with it that kind of thing can you talk a little bit about the challenges with harnessing the the data that the organization has at its scale yeah let me give you a great example um I'll give you two different examples let me let me start with one though that that really summarizes not just harnessing the meaning but also you know the amount of machine learning we're building right so in any given month we pay about 30 million people in the US and for those 31 million or 30 million people um we've got about uh 21 million job titles okay so you know for about eight or you know for about 9 million we don't know what they do for a living but for 21 million we do we need to Crunch that data down into some common job titles in fact we crunch it down to about 9,000 job titles um now we don't do that manually we don't do that with rules right um and so we we run all that in a data driven machine learning pipeline using some pretty advanced technology to get that done all the latest buzzwords you can imagine plus a few uh are involved in that Pipeline and you say well why do that well if I want to know what a software engineer makes uh you would think that would easy be easy but company a calls it software engineer three Company B calls it I don't know development specialist company C calls it you know creative machine learning Guru juggler right um and we've got to have a way to bring all that information yeah you can only imagine everybody gets to invent their own title right and we've got to have a way to bring that together and we do that for our compensation benchmarks we do that and we use that same exact thing in sort of our our recruiting and sourcing systems we do the same thing in our performance review Systems and that's a great example of of of the challenge and then we have the same exact challenge when we try to provide the capabilities for companies to operate in multiple countries so things that may be obvious in us like holiday or vacation are called different things in different countries and in fact the math and the calculation of all was different depending on the country you're in and so we have the ability by bringing all that information together having you know models on top to to combine that information so that our our clients benefit through all this processing and so when you talk about data governance are you thinking about it akin to to kind of a traditional Enterprise architecture big committees deciding who can do what and where data sits and that kind of thing you know I'd like to say that there there are no committees and we're we're we're a reflection of the distributed Internet it's it's not quite there uh but but so we we do have a central group but we really try to operate in more of a hub and spoke or a Federated way and and in that sense um you know a lot of people are talking about data mesh these days we think about it as a semantic mesh or or a metrics mesh so we want the teams that are responsible to be able to operate on and have the domain knowledge there's a central group right that makes sure that things can attach because otherwise it's just pandemonium right um so there's a central group that that ensures that that either it's a concept right a domain object or or a new metric that it can attach to some parent object but in that case you know the C Central teams very thin and the amount of work they do is very thin it's really distributed we have you know tens of not tens but hundreds and hundreds of people in development um and we want them to have the independence uh to be able to to to create in a way that they uh can can do best um now related to that is also then the data lineage so it does this no good for somebody to say hey I've got the data source of data sources for you know some problem benefits if they don't provide how that data has been brought together and then managed and then brought up so we also have artifacts that you know if you're going to participate in that in that world of this Federation or this this combination of of information well then you also have the responsibility to say hey this is where my data came from I'm going to take care of the data quality of it and I'm going to advertise to all my other people that collaborate in that sense um so you know I I said hey I when you were asking me about my career I started and I did a lot of work in the semantic web we actually bring a lot of those Concepts to Bear now whether or not it's the formal semantics and the language I don't know but that idea of distributed data self-describing data and then attachment of data is something that we try to bring to bear across our company and the ultimate objective is to what allow teams to move more quickly or yeah I mean I think I think it's three three key things to us the first one is about Pace right um our client the world's busy clients are demanding and the world's situations are changing all the time right um second thing is is is is about reliability because we're messing with people's paychecks at the end of the day that's what we do and and you know that's the most personal data there is other than maybe Healthcare data right if you want to if you want to see somebody get excited make a mistake on their paycheck right and so we we have to have a little bit of reliability in terms of what we're doing so the first thing is about pce second thing is about reliability and the third thing is really about Clarity and explainability whether it's to each other inside the development teams or whether it's to our and clients right this is why you know so there's like an explainability in ethics overlay to all of this as to why we're so into this the data governance approach you gave us a couple of specific examples of machine learning use cases there uh be curious to have you talk broadly about the various ways that ml is used and my guess is that it's used primarily in kind of new products that are built on top of the data assets that the company has as opposed to kind of that core payroll process processing or or are there use cases there as well you'd be surprised yeah no you'd be surprised and so we have machine learning throughout the entire Corporation at this point um and it's come on really fast it's come on fast because we've separated the data but we've also exposed all the machine learning processes through microservices or through apis and then that allows yes totally brand new products they're intelligent from the ground up they're everything everybody could imagine you know there's a chatbot and or the screen changes but we also use it to do things on the older products as almost lay um overlays into them right um so the ability to check a payroll before it's submitted and all the money goes the ability to check it for anomalies that's all based on on new machine learning capabilities because now we have the data in in one place we could do that and we can provide new capabilities to existing what we're thought of as Legacy applications and so you can make something fresh for our clients at the same time as building net new products all at the same time and so we're getting great reception on on sort of that Innovation uh uh aspect with our clients because we've taken some pretty hard steps in terms of building a data platform and building out the machine learning um but but it really is to a benefit to them let's dig into that platform that you've built a little bit is that the data Cloud yeah so the data cloud is really this collection of capabilities right some people identified as the products we build and some identified as this platform of data um it's built on top of uh a set of those capabilities whether it's Amazon EMR or um Amazon sag maker um and you know S3s at the bottom of it um we have some Partners in in as well some of the key Partners in in in the data space space um because you know there's certain things dealing with for example data security that that still is an emerging area of technology and so we'll bring in Partners as need be we don't want to constrain people as well so if they've built models using for example you know libraries like Bert or gpt3 right they have access to be able to do um and use those models as well so um we want to govern those uh use of third party systems because we don't want any you know hacker to infiltrate our our supply chain of ml if you will um but all of it's there and all of it's available um we have a sort of a standard way um either either data scientists or data data Engineers can enter and interact with a platform um security is all managed and so they have access to certain data assets there may be other data assets that they need to request and get get you know permission to use things that get very personal about people people for example um and then everything included in there is everything's obus skated and encrypted and so you know if there are data elements that you don't have a need to know as a data scientist well you don't have a need to know right um but we'll obus skate it but this the engine can still perform actions against it right and so so that's a big aspect of the platform is to make sure that that you know to reduce the the amount of D data duplication but also main control maintain control of some very personal data that people have did the platform start out as uh kind of abstracting things that you built for a specific product into a generalized capability or did it start as a larger effort that pulled in you know various internal and external things what's the the history yeah now in my experience boiling the ocean up front to build a platform just leaves you with a bunch of steam and some salt in the bottom of a container uh and then you're just like well where did it go it's like you got this well you gotta you don't have any wrinkles in your shirt right um and so we build very specific application elements and we we really really focus on those and then as if you got two three four of those together then you can start to line up the commonalities and then you can drop things out so you know I like to say well we moved to the cloud in 2019 but we actually started a few years earlier on building machine learning and that actually taught us enough to be able to know hey we needed to move this to the cloud and what did we need to do to move to the cloud right um and so we build and we continue to do that you know build an application for employment verification right so if people are getting mortgages they need to make sure their employment is verified and that their income is verified by the mortgage companies once you build that hey wow you know there's a new type of machine learning we can do to to take care of you know name disambiguation across multiple people named Jack burtz I have the strangest name in the world and it turns out there's like six of us on LinkedIn right now so uh you know that way you can you can deal with these types of things so when you say application elements uh in the context of the the platform are those applications or are they kind of platform capabilities those they kind of both kind of both right so so a Reasoner is a reason you know a Reasoner that that decides about um uh matching of a candidate to a job or a job to a candidate so you tell me your skills and I'll tell you the jobs that are available is that an application or is that an ml component it's kind of both right it's it's it's it's actually a number of ml components all put together but then I can abstract it as a single API so that Downstream applications like our recruiting application or our internal sourcing applic can hit that API so it's it's it's at both levels what are some of the other components like that so for example we we do an awful lot with um uh projections so we have a lot of different recommenders and optimizers right that'll do deal with recommendations so we have a capability um for operational managers non HR people to get pushed to them insights about their their their team status right how many people are leaving how many people um need raises what's the birthday every everybody you know I have a large team inside the company you know I'm an operational manager like anybody else but I still need that HR data um that comes to me actually on that mobile app where I can check my pay my my my payroll right or my 401k on the same exact app I get these HR insights right so this recommendation is there other areas that we we build machine learning for are um you know again not just for clients but for the interal company how can we how can we you know answer a client's concerns about a regulation in a much more efficient way and we can do that have that answer put it out on a chat bot put it in the in front of our um our our cons customer service Representatives um all sorts of different ways and so these are these are you know some Knowledge Management some recommend ations some projections you know about are you ready for the shopping season that's our our favorite one for retailers so we can actually look at oh what's the rate at which you're hiring people what's the rate at which people are leaving how many jobs do you have open What's the demand so we can build this small little workforce planning capability quickly by assembling our existing components and then have an advantage uh for our clients to use us as a company that is you know fully invested in Cloud using cloud services at all different levels including the the platform services like Sage maker um you you mentioned uh uh kind of these higher level Services I'm curious how do you think through what level services to consume from your Cloud partner and or or conversely what level you know do you target to build at versus uh use external services so I wish I would say I wish I could say that there was just like some cookbook out there but there's not right um you know the best thing for us is is that we're we're CL closely closely integrated with AWS in terms of engineer to engineer and so people like me get the heck out of the way and let the engineers talk which is the best way to have a a partnership with a with a with a with a tech group um and so we we'll look at capabilities and we'll brainstorm them together um and then we'll iterate so sometimes we've built things it turns out that that um AWS may have sometimes they'll say hey we're not building it and then or a third party may say yeah it doesn't exist it we'll have to build um sometimes we have unique capabilities that we need um so uh I try not we try to be more as a Content team and an application team rather than a core technology developer you know like low-level code but just recently we've built some very low-level code to build a reporting application because the constraints no matter what databases are out there today aren't quite what we need for our clients with you know the hundreds of thousands of clients we have in terms of cost and so we had to get down and start to build something now are we the first ones to say we should open source this or or yeah exactly so whether we open source it or we find something in the Community um that can replace it so so part of that a some engineering discipline about um making sure that you're you're extracting your blocks properly so that you so because the one thing you know about technology is somebody's going to come along with a better Mouse trrap and so you know you need to be ready to to re architect and refactor and so we think about it architecturally we budget time for that as we go as well but it's it's it's the reality of the cloud M you mentioned the the the kind of scope of your customers just now you also mentioned some scale uh figures during your keynote in terms of the amount of data that you were dealing the number of transactions talk a little bit about that and the the broad impact of scale on the way you deal with data so I I mentioned right 920,000 clients 140 countries um you know we do like 50 million people apply to jobs by our clients every year we do 38 million people in payroll every year um we do 11 million 1099s right that's an incredible amount of people doing Contracting right so all that information comes together the number that I put up on stage and it was It was kind of fun um we move $2.3 trillion dollar a year um and the number that you know it was hard for me and and so we did this kind of thing where we said what does $2.3 trillion means and so if it was a GDP it would be somewhere between France and Italy in terms of GDP now our payments are not GDP but it still gives you a sense of the scale Now France and Italy have better food than maybe we do but you know it gives you it gives you an idea of the scale and and then you know the elasticity to process that becomes you know insane right uh I think we do something I can't remember the number that we put on stage but it's like 300 trillion Decisions by our machine learning every month you know and so you got does that mean 32 trillion queries no but it it it means you know the individual decision processes that the algorithms are taking as they process um you know to somebody who's been around for a while in the business that's just mindboggling I remember when people were coining the word petabyte and now we toss around exabytes and we you know zettabytes as if they're nothing right um I remember buying my first hard drive and it had 10 megabytes of storage and I remember a guy telling me you'll never be able to fill that up and we fill it up with single PowerPoint files today right so um so the scale is kind of amazing um that $2.3 trillion dollar and everything that that that that entails in terms of payments is is the number that just sticks in our head every day Are there specific ways that you have have adapted the way that you operate or organize to kind of manage that scale well I mean I think I talked about a little bit we we have a a real distributed nature to the development teams um and that's an important aspect to what we're doing um the other thing is is is and I talked a little bit about this on stage we really wanted to get to a data Centric culture so rather than having pillars of well these are the data analysts and these are the data scientists and these are the domain experts and these are we got rid of all of that right we you know they're all together in one team so a team it it's modeled after this idea of a two Pizza team but we're not quite selling the products right um but it is this notion that we bring everybody together to go solve a specific problem so you know because we have all these people inside of our company that no HR or no benefits or no payroll we can bring people right into the development team to be part of the development team to go solve a problem and so that's helped us uh create very very tight development Circle uh time frames right and then it's just iterating out of it so those are our two keys first of all lots of federations second of all very small teams that are able to iterate on a very specific problem very quickly um and and those te those teams are cross domain right we we literally we want them virtually at least sitting alongside each other hey what does this mean yeah this does this right as opposed to going through you know paperwork exercises and presentations and stuff like that the the data focused folks on those teams the data scientists data Engineers they are they all embedded within product organizations or do product folks kind of join uh into engineering efforts or yeah it's a great question so we look at it so a lot of things in in our company uh a a lot of the teams will look at it as Triad engineering product and design I actually look at it as quads engineering product design and data science right and each one of my teams has has all four representatives and at one point there may not be a data science problem to be solved right but there's still somebody tracking that project from a data science perspective right just the same way as the ux on a certain project may just be the API not you know not some elaborate UI right uh or maybe documentation could be the ux so we have all four domains represented in this quad quad idea and the how long has the company had data scientists probably since they started dealing with it back in 1949 right to be honest right it's a company of accountants right uh so it's fun right oh well it's data science using machine learning well what's linear regression right uh you know it's fun I I I interview you know people just getting out of school oh I've I've learned this thing about linear regression you know it's been around for since the 1800s right so so I think it's a hard question to answer I think if you want to look specifically at machine learning as a practice you know over the P you know just like everybody else over the past you know six eight ten years has really been the ongoing of that um you know it's it's it's the end of 2021 anybody tells you got more than 10 years of machine learning experience unless they were doing neural networks in the 90s is lying to you right um and so I think uh I think you know in that in that 68 year uh time range is when we started to really add machine learning and then by 2016 2017 we had people really cranking on it so the company was very I think Nimble into the new technology more than I've seen on on other firms and when you you think about kind of this model that you have where you have the the um the teams kind of embedded together for lack of a better term um a as opposed to having a standalone data science organization or machine learning organization or something like that how do you kind of take full advantage of the so so yeah so we look at them as we look at them as as sort of cohorts or or however you want to call it grouping so we have somebody in charge of data science for sure sure right but then people are matrixed out just the same way as we have it with ux the same way as because they need career paths I think that's the important thing people right and there's a value for them to get together and and argue out different problems and everything I try to visit with each team every week it drives me nuts because you know they all want to meet exactly the same time um so we we create this community of practice um so and and people often will say well it's a guild and I'm not sure it's a guild as much because there is an or of product managers there's an org of data scientists but the projects themselves that are being executed are executed in this in this group and then we have somebody leading that and that person could be a data scientist could be an engineering leader could be a ux leader or could be a u product manager leading that specific project we just want to get the best person in charge to execute that that that project if you will yeah we don't care where they come from I was asking about that but also you know where Innovation comes from within the organization do you um is it you know kind of tucked into longstanding product road maps or how do you leverage opportunities that are created by ML and yeah we we we kind of I hate the top down bottomup thing but that's in a way it's happening so yeah we definitely have where we want to be from a business perspective but but our our job particularly as Senior Management is just to give the where or the trajectory let the engineers and the day of scientists say hey I have a way to solve that problem maybe a way you hadn't thought about and so we do spend a lot of time on ideation during our development cycles and we are also always open to rework as we need to now you don't want to rework forever but we always give the opportunity for people to do it so we're really we spend a lot of time talking why we want to build something and what we want to build we let those teams go and decide what and how they want to build it maybe to wrap us up I'm curious you're you know being in this being in an organization that's very much focused on kind of HR and talent I'm most curious about how the company is dealing with uh getting the talent that it needs to innovate on the data side very competitive Marketplace often kind of large Legacy companies quote unquote are considered to be less attractive places like how do you deal with all that so the same fears that I think about every day I wake up thinking about that I go to sleep thinking about that I nightmar thinking about that but actually I think we're really we're really benefiting right now and we're benefiting for for two reasons one we took some steps to get to the Cloud and you know at least for what I do for what you do people are attracted to the data they're attracted to the idea that they can actually work on something with meaningful data and they're attracted to the culture around data to build these things the quads and everything else and I'm not saying that we don't have any problems getting talent but what I would say is is that we're really successful in beinging and great people to the company because once they get exposed to it whether through the interview or you know through this type of you know discussion we're having now and they they dig into it they get excited and I think that's the best part is you see these Sparkles even over over all the you know zooming and everything else you see a sparkle in people's eyes when they discover something and that's a spreading activation and so whether you're an older company like ours that's you know going through this transition or whether you're a startup you know I've worked at startups where there's not actually a lot exciting going on right um and so you really just have to ask people the question is what do you want to spend your days doing right you know do you actually want to spend your days putting out on Twitter that you got a free water bottle this week or do you actually want to go solve a hard problem right you know I'm an engineer my favorite movie in the world is Apollo 13 not because of the astronauts but because when the guy comes in he dumps the stuff on the table and says you know folks we got to put this into this to get these people home and the engineers rise to the occasion and that's the way we feel about about Talent you know we'll give you something hard to do let's go do it and uh and that helps with our recruiting and helps with with keeping people energized and so we're we're actually doing great on that if anybody wants to come join us and give it a go we're more than happy to have that discussion awesome awesome well Jack thanks so much for joining us and sharing a bit about what you're up to there great thank you I've really enjoyed the conversationall right everyone I am here with Jack burwitz Jack is the chief data officer with ADP Jack welcome to the tml AI podcast thanks thanks for having me I'm looking forward to digging into our conversation we'll be talking about some of the ways that ADP is using machine learning but to get us started I'd love to have you share a little bit about your background and uh a little bit about your career sure I've got a long career been at this for 30 31 years now kind of three phases first phase was largely doing work with derpa so early 90s mid90s doing a lot of work and you know at that time was cutting edge things like um ontologies uh you know distributed reasoning technology all sorts of stuff that people at DARPA would would normally um equate then I spent about 12 years doing various startups some in the semantic web some in search some in analytics and then for the past 10 years split between Oracle and now ADP really running large scale Enterprise level analytics products uh and so we build products for other people uh and then my my job at ADP is a combination of building products for other for clients and then also doing some things internally that ADP uses awesome awesome and when you started at ADP you started out in a product Centric role and transitioned over to CD i' love to have you share a little bit about that transition how it came about and uh you know how you it sounds like you still retain some product ownership there yeah I sure do so the CDO role really has two faces to it um on the one hand we build product for clients we have you know tens of thousands of clients that use are analytics for for HR people analytics compensation analytics um recruiting things like that and then we also have a defensive nature to the CDO organization so besides having all the data Advantage what do we do to protect the information and one of the big messages uh you know that I spend a lot of time on is talking about data governance you know so it's pretty funny right because as product development people as machine learning people we hear the word data governance and we immediately think okay people are trying to get in our way right or trying to hold us back and actually what it means to us is it enables us to get the data into a nice clean way so that we can share and so that you know if somebody Builds an nml algorithm to do something I don't know what you know but um for job matching the entire community of developers can then use that algorithm for job matching right so that's all part of that governance that that that we're putting in place today so I'd love to come back to that data governance topic before we do I've jumped right in and maybe taken it for granted that folks know who ADP is and uh it may be the case that most do but why don't you give us a quick summary of ADP and uh the company's Focus yeah so ADP is the world's largest provider of HR payroll benefit Solutions so we have over 920,000 clients in 140 countries and you know most people know ADP because their W2 or their paycheck has been issued by a w ADP Maybe not today but maybe at some point in their career and if it hasn't it will be right uh one of the things that we talk a lot about at ADP is it's not your grandfather's ADP but it's actually my grandfather's ADP so they've been been around for about 70 years he had a small gas station he paid his his employees and the family 380p um but um you know a series of first at the company really the first company to ever use computers for pay first company to build a uh what we now call software as a service used to be called something else in the late 90s uh for doing HR first company to do um mobile apps for for for people uh for the employees of companies and so every day you know or every week you can check your payroll you can clock in and out you can you know check your 401k balance through the through your mobile app um so lots of people about 58,000 people uh today uh uh building products and then providing great expertise because the one thing about HR is is that it's not just about the software you know there's you know regulations that have to be maintained there's problems for for employees or clients that need to be solved you know what if I'm adopting cross border what's the impact on my pay or or any sort of things like that and so you know our CEO likes to say hey it's about building great products with great expertise and that's really what we try to bring to be I think one of the consequences of that long history is um you the the compan is associated with some of those early technologies that maybe weren't so early when uh it first got started you know to be very clear when I think of kind of poster Childs for the Mainframe ADP is one of those companies who comes to mind and yet we're connected because you were delivering a keynote at AWS reinvent talking about some of the things the company was doing with machine learning as part of the data Cloud product talk a little bit about what it means to be building Cutting Edge data Centric product in a company with that kind of history and Legacy yeah so so we still have a little bit of those mainframes running around and uh and and and guess what so does every bank and and every you know DMV and everything else and um and it's funny right as a as somebody who's always on the cutting Edge you know you heard like DARPA and cloud and everything you know what do you mean well the data's got to come from somewhere right and if you want reliability to us it doesn't actually matter where the data comes from what we what we're able to do is harness the semantics of it the meanings of it bring it together and make it available in a platform and in in something that that other people can use blow and our own teams can use and that's that's the essential part of it and so you know when we were looking how do we do that in a modern way and how do we keep up with the pace of business and you know the pace of business is just unbelievable particularly now since covid uh hit um you know for us moving it into the cloud made total sense right and so we're syncing information every single minute back and forth between you know those classic on-prem data centers but their data center is just the same way as any other data center it's not like it's running under my desk or something um and and we're syncing information and we're extracting value out of it by taking advantage of all the services that we can get in the AWS Cloud your comment about platforms ties back to the the comment about governance and what brings them together is this idea of making it easier for your teams to be productive with that data to pull meaning out of it make predictions with it that kind of thing can you talk a little bit about the challenges with harnessing the the data that the organization has at its scale yeah let me give you a great example um I'll give you two different examples let me let me start with one though that that really summarizes not just harnessing the meaning but also you know the amount of machine learning we're building right so in any given month we pay about 30 million people in the US and for those 31 million or 30 million people um we've got about uh 21 million job titles okay so you know for about eight or you know for about 9 million we don't know what they do for a living but for 21 million we do we need to Crunch that data down into some common job titles in fact we crunch it down to about 9,000 job titles um now we don't do that manually we don't do that with rules right um and so we we run all that in a data driven machine learning pipeline using some pretty advanced technology to get that done all the latest buzzwords you can imagine plus a few uh are involved in that Pipeline and you say well why do that well if I want to know what a software engineer makes uh you would think that would easy be easy but company a calls it software engineer three Company B calls it I don't know development specialist company C calls it you know creative machine learning Guru juggler right um and we've got to have a way to bring all that information yeah you can only imagine everybody gets to invent their own title right and we've got to have a way to bring that together and we do that for our compensation benchmarks we do that and we use that same exact thing in sort of our our recruiting and sourcing systems we do the same thing in our performance review Systems and that's a great example of of of the challenge and then we have the same exact challenge when we try to provide the capabilities for companies to operate in multiple countries so things that may be obvious in us like holiday or vacation are called different things in different countries and in fact the math and the calculation of all was different depending on the country you're in and so we have the ability by bringing all that information together having you know models on top to to combine that information so that our our clients benefit through all this processing and so when you talk about data governance are you thinking about it akin to to kind of a traditional Enterprise architecture big committees deciding who can do what and where data sits and that kind of thing you know I'd like to say that there there are no committees and we're we're we're a reflection of the distributed Internet it's it's not quite there uh but but so we we do have a central group but we really try to operate in more of a hub and spoke or a Federated way and and in that sense um you know a lot of people are talking about data mesh these days we think about it as a semantic mesh or or a metrics mesh so we want the teams that are responsible to be able to operate on and have the domain knowledge there's a central group right that makes sure that things can attach because otherwise it's just pandemonium right um so there's a central group that that ensures that that either it's a concept right a domain object or or a new metric that it can attach to some parent object but in that case you know the C Central teams very thin and the amount of work they do is very thin it's really distributed we have you know tens of not tens but hundreds and hundreds of people in development um and we want them to have the independence uh to be able to to to create in a way that they uh can can do best um now related to that is also then the data lineage so it does this no good for somebody to say hey I've got the data source of data sources for you know some problem benefits if they don't provide how that data has been brought together and then managed and then brought up so we also have artifacts that you know if you're going to participate in that in that world of this Federation or this this combination of of information well then you also have the responsibility to say hey this is where my data came from I'm going to take care of the data quality of it and I'm going to advertise to all my other people that collaborate in that sense um so you know I I said hey I when you were asking me about my career I started and I did a lot of work in the semantic web we actually bring a lot of those Concepts to Bear now whether or not it's the formal semantics and the language I don't know but that idea of distributed data self-describing data and then attachment of data is something that we try to bring to bear across our company and the ultimate objective is to what allow teams to move more quickly or yeah I mean I think I think it's three three key things to us the first one is about Pace right um our client the world's busy clients are demanding and the world's situations are changing all the time right um second thing is is is is about reliability because we're messing with people's paychecks at the end of the day that's what we do and and you know that's the most personal data there is other than maybe Healthcare data right if you want to if you want to see somebody get excited make a mistake on their paycheck right and so we we have to have a little bit of reliability in terms of what we're doing so the first thing is about pce second thing is about reliability and the third thing is really about Clarity and explainability whether it's to each other inside the development teams or whether it's to our and clients right this is why you know so there's like an explainability in ethics overlay to all of this as to why we're so into this the data governance approach you gave us a couple of specific examples of machine learning use cases there uh be curious to have you talk broadly about the various ways that ml is used and my guess is that it's used primarily in kind of new products that are built on top of the data assets that the company has as opposed to kind of that core payroll process processing or or are there use cases there as well you'd be surprised yeah no you'd be surprised and so we have machine learning throughout the entire Corporation at this point um and it's come on really fast it's come on fast because we've separated the data but we've also exposed all the machine learning processes through microservices or through apis and then that allows yes totally brand new products they're intelligent from the ground up they're everything everybody could imagine you know there's a chatbot and or the screen changes but we also use it to do things on the older products as almost lay um overlays into them right um so the ability to check a payroll before it's submitted and all the money goes the ability to check it for anomalies that's all based on on new machine learning capabilities because now we have the data in in one place we could do that and we can provide new capabilities to existing what we're thought of as Legacy applications and so you can make something fresh for our clients at the same time as building net new products all at the same time and so we're getting great reception on on sort of that Innovation uh uh aspect with our clients because we've taken some pretty hard steps in terms of building a data platform and building out the machine learning um but but it really is to a benefit to them let's dig into that platform that you've built a little bit is that the data Cloud yeah so the data cloud is really this collection of capabilities right some people identified as the products we build and some identified as this platform of data um it's built on top of uh a set of those capabilities whether it's Amazon EMR or um Amazon sag maker um and you know S3s at the bottom of it um we have some Partners in in as well some of the key Partners in in in the data space space um because you know there's certain things dealing with for example data security that that still is an emerging area of technology and so we'll bring in Partners as need be we don't want to constrain people as well so if they've built models using for example you know libraries like Bert or gpt3 right they have access to be able to do um and use those models as well so um we want to govern those uh use of third party systems because we don't want any you know hacker to infiltrate our our supply chain of ml if you will um but all of it's there and all of it's available um we have a sort of a standard way um either either data scientists or data data Engineers can enter and interact with a platform um security is all managed and so they have access to certain data assets there may be other data assets that they need to request and get get you know permission to use things that get very personal about people people for example um and then everything included in there is everything's obus skated and encrypted and so you know if there are data elements that you don't have a need to know as a data scientist well you don't have a need to know right um but we'll obus skate it but this the engine can still perform actions against it right and so so that's a big aspect of the platform is to make sure that that you know to reduce the the amount of D data duplication but also main control maintain control of some very personal data that people have did the platform start out as uh kind of abstracting things that you built for a specific product into a generalized capability or did it start as a larger effort that pulled in you know various internal and external things what's the the history yeah now in my experience boiling the ocean up front to build a platform just leaves you with a bunch of steam and some salt in the bottom of a container uh and then you're just like well where did it go it's like you got this well you gotta you don't have any wrinkles in your shirt right um and so we build very specific application elements and we we really really focus on those and then as if you got two three four of those together then you can start to line up the commonalities and then you can drop things out so you know I like to say well we moved to the cloud in 2019 but we actually started a few years earlier on building machine learning and that actually taught us enough to be able to know hey we needed to move this to the cloud and what did we need to do to move to the cloud right um and so we build and we continue to do that you know build an application for employment verification right so if people are getting mortgages they need to make sure their employment is verified and that their income is verified by the mortgage companies once you build that hey wow you know there's a new type of machine learning we can do to to take care of you know name disambiguation across multiple people named Jack burtz I have the strangest name in the world and it turns out there's like six of us on LinkedIn right now so uh you know that way you can you can deal with these types of things so when you say application elements uh in the context of the the platform are those applications or are they kind of platform capabilities those they kind of both kind of both right so so a Reasoner is a reason you know a Reasoner that that decides about um uh matching of a candidate to a job or a job to a candidate so you tell me your skills and I'll tell you the jobs that are available is that an application or is that an ml component it's kind of both right it's it's it's it's actually a number of ml components all put together but then I can abstract it as a single API so that Downstream applications like our recruiting application or our internal sourcing applic can hit that API so it's it's it's at both levels what are some of the other components like that so for example we we do an awful lot with um uh projections so we have a lot of different recommenders and optimizers right that'll do deal with recommendations so we have a capability um for operational managers non HR people to get pushed to them insights about their their their team status right how many people are leaving how many people um need raises what's the birthday every everybody you know I have a large team inside the company you know I'm an operational manager like anybody else but I still need that HR data um that comes to me actually on that mobile app where I can check my pay my my my payroll right or my 401k on the same exact app I get these HR insights right so this recommendation is there other areas that we we build machine learning for are um you know again not just for clients but for the interal company how can we how can we you know answer a client's concerns about a regulation in a much more efficient way and we can do that have that answer put it out on a chat bot put it in the in front of our um our our cons customer service Representatives um all sorts of different ways and so these are these are you know some Knowledge Management some recommend ations some projections you know about are you ready for the shopping season that's our our favorite one for retailers so we can actually look at oh what's the rate at which you're hiring people what's the rate at which people are leaving how many jobs do you have open What's the demand so we can build this small little workforce planning capability quickly by assembling our existing components and then have an advantage uh for our clients to use us as a company that is you know fully invested in Cloud using cloud services at all different levels including the the platform services like Sage maker um you you mentioned uh uh kind of these higher level Services I'm curious how do you think through what level services to consume from your Cloud partner and or or conversely what level you know do you target to build at versus uh use external services so I wish I would say I wish I could say that there was just like some cookbook out there but there's not right um you know the best thing for us is is that we're we're CL closely closely integrated with AWS in terms of engineer to engineer and so people like me get the heck out of the way and let the engineers talk which is the best way to have a a partnership with a with a with a with a tech group um and so we we'll look at capabilities and we'll brainstorm them together um and then we'll iterate so sometimes we've built things it turns out that that um AWS may have sometimes they'll say hey we're not building it and then or a third party may say yeah it doesn't exist it we'll have to build um sometimes we have unique capabilities that we need um so uh I try not we try to be more as a Content team and an application team rather than a core technology developer you know like low-level code but just recently we've built some very low-level code to build a reporting application because the constraints no matter what databases are out there today aren't quite what we need for our clients with you know the hundreds of thousands of clients we have in terms of cost and so we had to get down and start to build something now are we the first ones to say we should open source this or or yeah exactly so whether we open source it or we find something in the Community um that can replace it so so part of that a some engineering discipline about um making sure that you're you're extracting your blocks properly so that you so because the one thing you know about technology is somebody's going to come along with a better Mouse trrap and so you know you need to be ready to to re architect and refactor and so we think about it architecturally we budget time for that as we go as well but it's it's it's the reality of the cloud M you mentioned the the the kind of scope of your customers just now you also mentioned some scale uh figures during your keynote in terms of the amount of data that you were dealing the number of transactions talk a little bit about that and the the broad impact of scale on the way you deal with data so I I mentioned right 920,000 clients 140 countries um you know we do like 50 million people apply to jobs by our clients every year we do 38 million people in payroll every year um we do 11 million 1099s right that's an incredible amount of people doing Contracting right so all that information comes together the number that I put up on stage and it was It was kind of fun um we move $2.3 trillion dollar a year um and the number that you know it was hard for me and and so we did this kind of thing where we said what does $2.3 trillion means and so if it was a GDP it would be somewhere between France and Italy in terms of GDP now our payments are not GDP but it still gives you a sense of the scale Now France and Italy have better food than maybe we do but you know it gives you it gives you an idea of the scale and and then you know the elasticity to process that becomes you know insane right uh I think we do something I can't remember the number that we put on stage but it's like 300 trillion Decisions by our machine learning every month you know and so you got does that mean 32 trillion queries no but it it it means you know the individual decision processes that the algorithms are taking as they process um you know to somebody who's been around for a while in the business that's just mindboggling I remember when people were coining the word petabyte and now we toss around exabytes and we you know zettabytes as if they're nothing right um I remember buying my first hard drive and it had 10 megabytes of storage and I remember a guy telling me you'll never be able to fill that up and we fill it up with single PowerPoint files today right so um so the scale is kind of amazing um that $2.3 trillion dollar and everything that that that that entails in terms of payments is is the number that just sticks in our head every day Are there specific ways that you have have adapted the way that you operate or organize to kind of manage that scale well I mean I think I talked about a little bit we we have a a real distributed nature to the development teams um and that's an important aspect to what we're doing um the other thing is is is and I talked a little bit about this on stage we really wanted to get to a data Centric culture so rather than having pillars of well these are the data analysts and these are the data scientists and these are the domain experts and these are we got rid of all of that right we you know they're all together in one team so a team it it's modeled after this idea of a two Pizza team but we're not quite selling the products right um but it is this notion that we bring everybody together to go solve a specific problem so you know because we have all these people inside of our company that no HR or no benefits or no payroll we can bring people right into the development team to be part of the development team to go solve a problem and so that's helped us uh create very very tight development Circle uh time frames right and then it's just iterating out of it so those are our two keys first of all lots of federations second of all very small teams that are able to iterate on a very specific problem very quickly um and and those te those teams are cross domain right we we literally we want them virtually at least sitting alongside each other hey what does this mean yeah this does this right as opposed to going through you know paperwork exercises and presentations and stuff like that the the data focused folks on those teams the data scientists data Engineers they are they all embedded within product organizations or do product folks kind of join uh into engineering efforts or yeah it's a great question so we look at it so a lot of things in in our company uh a a lot of the teams will look at it as Triad engineering product and design I actually look at it as quads engineering product design and data science right and each one of my teams has has all four representatives and at one point there may not be a data science problem to be solved right but there's still somebody tracking that project from a data science perspective right just the same way as the ux on a certain project may just be the API not you know not some elaborate UI right uh or maybe documentation could be the ux so we have all four domains represented in this quad quad idea and the how long has the company had data scientists probably since they started dealing with it back in 1949 right to be honest right it's a company of accountants right uh so it's fun right oh well it's data science using machine learning well what's linear regression right uh you know it's fun I I I interview you know people just getting out of school oh I've I've learned this thing about linear regression you know it's been around for since the 1800s right so so I think it's a hard question to answer I think if you want to look specifically at machine learning as a practice you know over the P you know just like everybody else over the past you know six eight ten years has really been the ongoing of that um you know it's it's it's the end of 2021 anybody tells you got more than 10 years of machine learning experience unless they were doing neural networks in the 90s is lying to you right um and so I think uh I think you know in that in that 68 year uh time range is when we started to really add machine learning and then by 2016 2017 we had people really cranking on it so the company was very I think Nimble into the new technology more than I've seen on on other firms and when you you think about kind of this model that you have where you have the the um the teams kind of embedded together for lack of a better term um a as opposed to having a standalone data science organization or machine learning organization or something like that how do you kind of take full advantage of the so so yeah so we look at them as we look at them as as sort of cohorts or or however you want to call it grouping so we have somebody in charge of data science for sure sure right but then people are matrixed out just the same way as we have it with ux the same way as because they need career paths I think that's the important thing people right and there's a value for them to get together and and argue out different problems and everything I try to visit with each team every week it drives me nuts because you know they all want to meet exactly the same time um so we we create this community of practice um so and and people often will say well it's a guild and I'm not sure it's a guild as much because there is an or of product managers there's an org of data scientists but the projects themselves that are being executed are executed in this in this group and then we have somebody leading that and that person could be a data scientist could be an engineering leader could be a ux leader or could be a u product manager leading that specific project we just want to get the best person in charge to execute that that that project if you will yeah we don't care where they come from I was asking about that but also you know where Innovation comes from within the organization do you um is it you know kind of tucked into longstanding product road maps or how do you leverage opportunities that are created by ML and yeah we we we kind of I hate the top down bottomup thing but that's in a way it's happening so yeah we definitely have where we want to be from a business perspective but but our our job particularly as Senior Management is just to give the where or the trajectory let the engineers and the day of scientists say hey I have a way to solve that problem maybe a way you hadn't thought about and so we do spend a lot of time on ideation during our development cycles and we are also always open to rework as we need to now you don't want to rework forever but we always give the opportunity for people to do it so we're really we spend a lot of time talking why we want to build something and what we want to build we let those teams go and decide what and how they want to build it maybe to wrap us up I'm curious you're you know being in this being in an organization that's very much focused on kind of HR and talent I'm most curious about how the company is dealing with uh getting the talent that it needs to innovate on the data side very competitive Marketplace often kind of large Legacy companies quote unquote are considered to be less attractive places like how do you deal with all that so the same fears that I think about every day I wake up thinking about that I go to sleep thinking about that I nightmar thinking about that but actually I think we're really we're really benefiting right now and we're benefiting for for two reasons one we took some steps to get to the Cloud and you know at least for what I do for what you do people are attracted to the data they're attracted to the idea that they can actually work on something with meaningful data and they're attracted to the culture around data to build these things the quads and everything else and I'm not saying that we don't have any problems getting talent but what I would say is is that we're really successful in beinging and great people to the company because once they get exposed to it whether through the interview or you know through this type of you know discussion we're having now and they they dig into it they get excited and I think that's the best part is you see these Sparkles even over over all the you know zooming and everything else you see a sparkle in people's eyes when they discover something and that's a spreading activation and so whether you're an older company like ours that's you know going through this transition or whether you're a startup you know I've worked at startups where there's not actually a lot exciting going on right um and so you really just have to ask people the question is what do you want to spend your days doing right you know do you actually want to spend your days putting out on Twitter that you got a free water bottle this week or do you actually want to go solve a hard problem right you know I'm an engineer my favorite movie in the world is Apollo 13 not because of the astronauts but because when the guy comes in he dumps the stuff on the table and says you know folks we got to put this into this to get these people home and the engineers rise to the occasion and that's the way we feel about about Talent you know we'll give you something hard to do let's go do it and uh and that helps with our recruiting and helps with with keeping people energized and so we're we're actually doing great on that if anybody wants to come join us and give it a go we're more than happy to have that discussion awesome awesome well Jack thanks so much for joining us and sharing a bit about what you're up to there great thank you I've really enjoyed the conversation\n"