Pushing Back on AI Hype with Alex Hanna - 649

The Sociological Implications of Machine Translation: A Conversation with Samy Amir Efrati

Machine translation has long been a topic of interest and debate, particularly in the context of its impact on communities and societies. As a sociologist, Samy Amir Efrati has spent years studying the effects of machine translation on power dynamics, language, and culture. In this conversation, we explore the complexities of machine translation and its implications for our understanding of technology, society, and human relationships.

One of the key points that emerged from our conversation is the importance of considering the history and politics of machine translation. Efrati notes that machine translation has a long history of being used as a tool of colonization, spying, and intelligence gathering during the Cold War era. For example, one of Efrati's co-authors wrote a paper on this topic, highlighting how machine translation was used to translate from Russian into ways that would be more legible by intelligence officials. This raises important questions about the value neutrality of machine translation and its potential for being twisted to serve specific interests.

This phenomenon is not unique to machine translation, but rather a broader issue with many technologies. Efrati notes that "these things are not value-neutral; they're very value-laden." He emphasizes the need to recognize that these technologies have certain kinds of politics and histories that shape their use and impact. This requires us to approach technology with a critical eye, considering who is benefiting from its use, how it is being used, and what its potential drawbacks might be.

In the context of machine translation, Efrati points out that "it's easy to see that it has its benefits, as well as drawbacks." He suggests that one way to approach this is to examine how these technologies are being used in context. This involves considering the political economy of these technologies, who is making money, and who is gaining power and status through their use.

One potential issue with machine translation is that it tends to accumulate in people who already have a lot of power and influence. This can result in more harm than good. For example, Efrati notes that "the internet is just completely inaccessible unless you have some translation into English or German or Spanish or a Western language." He emphasizes the importance of making machine translation accessible to marginalized communities and languages.

This raises an interesting point about the nature of technology itself. Is machine translation the "cat's out of the bag," as Efrati suggests, where it has become an essential tool for accessing the digital world? Or is it more like a Western language being dominant on the internet, where English and other Western languages are privileged?

For Efrati, this metaphor gets more and more tangled the more he talks about it. But ultimately, he suggests that machine translation is not the only "bag" or technology at play here. Rather, it's part of a broader system that privileges certain groups over others.

If you're interested in learning more about Efrati's work and the projects being undertaken by the DARE Institute, check out their website at dareinstitute.org. The institute is dedicated to exploring the intersection of technology and society, with a focus on issues related to power, language, and culture.

You can also learn more about other interesting projects, such as Toku Media and Lanai J., who are working on innovative approaches to machine translation and its impact on society. Finally, if you're looking for some thought-provoking podcasts that explore the intersection of technology and society, check out Mystery AI Hype Theater 3000.

Overall, our conversation with Samy Amir Efrati has highlighted the complex and multifaceted nature of machine translation and its implications for our understanding of technology, society, and human relationships. By examining the politics, history, and social implications of machine translation, we can gain a deeper understanding of the ways in which technology shapes our world and ourselves.

"WEBVTTKind: captionsLanguage: enall right everyone welcome to another episode of the twiml AI podcast I am of course your host Sam shington and today I'm joined by Alex Hannah Alex is director of research at dare the distributed AI Research Institute before we get into today's conversation be sure to take a moment to head over to Apple podcast Spotify or your listening platform of choice and if you enjoy the show please leave us a festar rating and review Alex welcome to the podcast thanks for having me Sam I'm looking forward to digging into our conversation uh it's been maybe a year in change since I spoke to timet in kind of the I think she had just started dare or you all had just started thereare and so I'm really looking forward to kind of learning more about you know what you've been doing in that year and change uh but before we dig into that why don't you start us off with a little bit of introduction how'd you come to work in Ai and AI at ethics in particular yeah thanks for the introduction Sam and I'm looking forward to talking about dar's almost twoyear history at this point it'll be two years in December December 2nd uh I can tell you a little bit about myself so I come to AI in a very uh roundabout way my training is as a sociologist and there's not a lot of social scientists within AI although there should be more for many reasons um I got into I feel like we'll be discussing that a little bit we will be discussing that for sure I got into AI kind of through the back way I was using a lot of machine learning methods actually in my dissertation um and so I was using a supervis learning technique to uh identify news articles which mentioned protest and this is because in some of the work that I currently still do and work that I had been doing earlier uh an interest of sociologist that study social movements is identifying kind of the who what when way or why a protest for the instance of identifying you know uh you know what motivates protest or what what makes it happen um what are the demands of folks um and and how do they win and so that got me interested in using some automated methods to to look into that after that I was a professor briefly at the univiversity of t Tonto decided that the academic track wasn't for me and went to work at Google initially working as a curriculum designer within machine learning but was still very much in the conversation around machine learning fairness and algorithmic discrimination uh so I connected with some friends of mine working on that and got more into that space learned much more about it and one of the things that I was focusing on was how much of the time within the conversation machine learning there was very little attention paid to data coming from a sociology background much of the focus is how data is collected how data is constructed how that data may or may not have validity what kind of Errors happening in measurement and operationalization and so that led me very much into focusing on these things so I was getting involved with many of the academic communities around fairness fact for instance is a large conference uh had been going to the conference since 2018 um and and and started going every year since then then at Google I eventually transferred to work with doctors Tim Jau and Meg Mitchell on the ethical AI team that they had constructed at Google was very excited to do so at Google I was the first research scientist that was a social scientist ever hired on that ladder and that opened the door because many social scientists now work at Google research uh focusing on these issues which which is great and I'm happy that it open the door for many many other folks and so everything happened at Google we know that story we won't go into it to meet to meet was fire refer back we'll refer back to that uh that previous podcast for more on that yeah go back to the podcast on how everything happened there and um so so everything happened T went and started dare uh shortly after uh uh after dar's announcement December 2021 I I joined three months later in February 2022 as director of research employee number three so yeah so that's that was my background that's what led me to where I am now and where I'm at at Derek awesome so employee number three you know brand new Research Institute um you know how do you go about cra crafting a a research agenda from from nothing essentially I would say that in terms of in terms of doing it there are two things I mean I was again reiterate it wasn't quite a blank slate just because a lot of the work that we were already working on especially around data uh around data documentation is was a bit precent right I mean people are still if you saw the news today T and Meg and Emily B they all uh were uh represented in times AI 100 um and so you know and notably for the precence of the stochastic parrots paper because of that paper much of the you know much of that was preent but I will say in the in another register a lot of the ways we had set the research agenda is by bringing on researchers and especially research fellows that had a research agenda already and so I say employee number three because someone who is already at there is our fellow rasa safala rucha is a grad student now at Mila in Montreal and her work at dare was on the the spatial apari project so that project and I imagine t uh may have mentioned it uh when she was on on this show is that uh that project was using computer vision technology to detect the Persistence of segregation and the the Persistence of spatial aparti in South Africa the history of South African apide is that black people and non-white people uh uh uh so so-called colored people that were not classified as black but typically Indian in Asian uh were were separated out into uh uh Township areas uh and in those Township areas those are separate from the neighborhoods uh in which the wealthy white population lived and so even though South Africa formerly abolished apartheid in the mid 90s um there's been Persistence of that even though the census in South Africa does not maintain um divisions between townships and neighborhoods anymore reta's work revealed many of the persistence even though I mean this is this is well known that this is persisting but now we actually have a view on this of where this is um and so those data can we then be used to identify things like um how long it takes Social Services to get to certain people the amount of schools the amount of hospitals the kind of time it takes for ambulances to get to a certain place etc etc and so and so that itself is a research agenda that we're still pursuing multiple people are pursuing uing um uh our one of our full-time uh people and also an author on that paper nalii morosi um who's based in loto has been working with with with with r on that and so fellows have come in and I think into areas that we want to focus on and have set research agendas by being able to say you were bring you on because this is an important Dimension that we're focusing on and we want to empower you to focus on that and do work on that so that includes people like aska has been doing work with Lanai and developing language technology that works for the Horn of Africa Adrien Williams who is a former charter school teacher and Amazon delivery driver who's focused on wage theft via surveillance of Amazon workers especially drivers and flex drivers Crystal Kaufman who is an organizer with tropon has also written and done organizing around the rights of data workers the people who are fueling all the data that goes into Ai and so we bring in folks because we know they have those expertise and we let them do what they need to do um and that's so so in terms of coming in with kind of a green field uh uh sort of research agenda that people already have these knowledges whether they're academics or they're people with lived experience and we bring them in and and help them build those skills and publish original research and and work on that and I certainly spoke with uh timet about this in talking about dare but you know how do you articulate kind of the Common Thread that runs through these various research efforts that you described what are what does DARE care most about I think the Common Thread is that we are focused on the notion that AI is not inevitable that it could be a tool that would be useful in some contexts and those contexts tend to be rather narrow in some guises so for instance things like machine translation or automated speech recognition those are actually pretty useful Technologies they could be useful assistive Technologies they could be useful in expanding the scope of people that could use Computing they could provide different interfaces for people who maybe have a very hard time typing Asal lash and and to me and I have this kind of paper work in progress where we talk about an internet for our grandmothers where you know usma and tonita are talking about their um their grandmothers who who couldn't read or write I'm thinking about my grandmother who didn't speak any English spoke Egyptian Arabic and you even Coptic in the home um these are languages that aren't really quite well supported in in different kinds of Technologies even if Google or meta says that they have these automated speech technologies that work well um for these languages they actually don't um they work quite poorly and um if we were able to provide interfaces for different modalities then that could be a great use for AI but right now the use for AI is is is kind of going all into these different kinds of very extractive uses for large language models we trying to automate it out work trying to put people out of jobs um trying to do it in such a way to threaten the labor of many people that things that many workers and many people aren't asking for so the Common Thread really is finding technology that works for People based in in our communities and also I would say the second thing is acknowledging that knowledge that comes from communities is a form of knowledge it's a way of knowing we use this big term in the philosophy of science sometimes epistemology that means how we get to know certain things and I think one of the things that we really Thrive with in Dare is knowing that there are multiple different ways of knowing and that could be lived experience that could be a PhD that could be both um but acknowledging that is is where we start from and how does that that particular Point kind of play out in the research I think it plays out in the research of seeing how people set agendas here um again we're we're you know we came into this project you know with people into into the organization bringing people bringing fellows in saying determine here what is the most important thing okay we want you to write this out how is this the most important thing write this out okay let's talk about what it means to to develop research on this there's a quote from um General Gordon Baker from the Revolutionary black workers in which he says our focus is to turn thinkers into Fighters and Fighters into thinkers and I absolutely love that because the kind of thing that I'm thinking about he's talking a lot about turning organizers into kind of polit and and having them go through political education but also people who are learned we need to bring them into advocacy and I think about that a lot at dare because I almost think about I think about a little twist on that how do we turn um researchers into Fighters and Fighters into researchers right we have these people are that are being brought in that have these huge weals of Knowledge from how they are from everything that they've experienced through their labor through their activism how do we turn that into how do we turn them into researchers and really bring in that evidence that has um the kind of legitimacy within um academic domains but also is going to be stuff that is useful for this kind of goal of ours this norst of ours to build technology um that works for people and when you think about um Building Technology that works for people can you give us some examples of projects that um that are kind of squarely focused on that particular goal and some of the the outcomes of those yeah so I want to revisit the work I mentioned earlier as as Mal's work at Lan in which he's been focusing on building machine translation and automated speech recognition tools for um two languages that are on uh the Horn of Africa toia and Amharic these are languages spoken by um I think Amar has spoken I want to say by 20 or 30 million people in Ethiopia toia is spoken by by by about three million people in the theer uh the t region of Ethiopia and these Technologies work very poorly in uh uh when you when you look at meta work or Google's work on this and you use those tools they actually work very poorly in doing those things even if they advertise that they've done so I think Meta Even had a video in which they advertised their ability to translate um or or do automated speech recognition of aark and Asma lash did an analysis of that and found it it worked very poorly and so developing those tools with the cooperation of people in in those communities has been critical so he's been you know working on the development of those tools um with the cooperation of those speakers um sourcing those data in ethical ways um checking it with people from the community for Community use and in some ways we're also very inspired by other efforts in these directions to heku Media for instance is a uh organization that's based in arria or New Zealand um in which people uh the people involved are all from the Tori minority uh indigenous community in New Zealand and they've done a bit of work they're not an academic group per se but they're a both kind of an an indigenous and traditional cultural knowledge preservation group and an Engineering Group and so what they've been doing is focusing on collecting data from indigenous Elders from speakers ofori and being able to develop machine translation and automated speech recognition tools that work for that Community they've compared this with other tools that have been released by for instance open Ai and found how poorly it does in that language something else that they've done is they've safeguarded the data that they use to train those tools because those themselves are considered under a certain kind of uh uh data Sovereign Tre that they want to keep and maintain and so we take a lot of inspiration from that project and it's something that we kind of bring in and thinking about as something that should be exemplified as a way of building tech for people that works when you engage around kind of this conversation of uh data collection is it primarily an awareness raising thing do you is there uh is there research that goes into that is there are there you know is there a way to study that as a a phenomenon to to drive change around around it or is it primarily kind of this building of awareness well there's definitely ways of building research around it I mean much of my research is focused on where these data come from this paper that we wrote with that I wrote with Morgan CLA shman and Remy Denton uh focused the name of the papers do data sets have politics that paper focused a lot about the sourcing of data and what computer scientists and other related researchers focused on important value lad in P in data sets there's been other studies including the work from um um from Pang and and and um forgot the author's first name and arvind uh nanian that focus on the after lives of data that have been considered unethical U they focused on three data sets that have been retracted and found many different versions of the this data still out in the wild um and a lot of it is Research into what exists the effort is to change practices and changing practices is very difficult but the first part of it is understanding how pervasive the problem is and so some of the changes have been proferred so for instance nurs now has a data set and Benchmark track that data set and Benchmark track is an effort to bring more data and have data be a valued contribution in its own right especially data that meets a certain bar of ethical commitments every paper in that uh in that track needs to have a data sheet needs to be explained it needs to be open if there's any restrictions that needs to be explained but this is still not the case I mean that is any organization that wants to go to market with something like chat P PT or any of these other tools released by large entities don't have this reasoning to release those data they don't have compunction or anything compelling them to be transparent or release those data uh they hide behind kind of promises or excuses of trade secrecy or uh if there's someone that would build a nefarious kind of version of of the of language model or whatnot um and I find those arguments to be disingenuous I mean there needs to be transparency into what those data sets are and how you know per existing kinds of problems uh bias quote unquote hallucinations although I hate that word hallucinations more like misinformation and falsehoods so they're perpetuated uh from those training data and we just have no way to do any auditing of that so awareness is one aspect of it but it's also changing scientific practice and developing regulation and legislation that is going to protect different subjects in the data set development and the model development process can you talk a little bit uh in more detail about the do data sets have politics paper yeah absolutely so this is a paper that we wrote two two years ago started 3 years ago and the focus on this paper was looking at a particular sector of machine learning most specific more specifically computer vision so what we attempted to do is construct almost a population level uh overview of all computer vision data sets that we had come up with so we looked into many different methods of this we looked at citation patterns we um at Pap I don't think we looked at papers with code for this one um um but we did search every we searched i e archives we tried to find basically every type of image data set that we could within the past um I want to say a decade or two yes the past 20 years we found around 500 to 700 uh data sets I think we started we had 500 initially in the population we took a 100 data set sample of that in addition to the 14 mosts highly cited data sets and we coded them for 100 different variables and then also uh did a qualitative analysis of things that computer scientists valued in those constructions of data sets so let me explain each of those things in terms of the data sets and the different kinds of data that we focused on and coded for we coded for where did the data come from is there any licensing around uh the data instances are there any people in the data set can you identify people FR their faces um do they have any consent around it was there any licensing around it as for the data itself was it held on a repository that had restricted use um is there any privacy considerations mentioned in this is there any ethical considerations is the data even still available can we access it can we audit it are people being good data stewards and then as for the qualitative variables in terms of the values of the data set the kinds of things that we focused on is is there any valuated language around this so for instance if the author of a data set writes something like we use this data set because we wanted a larger data set because larger data means that our you know it it serves as a better Benchmark or we wanted to have people in multiple different poses so we could have better out of sample fit or you know yada yada yada these are the types of things in which people are making a value judgment on that and we did an analysis that we used we used a method that's commonly used in uh social science called grounded theory in which you look at lots of different texts you see what themes emerge and then you bend them into different categories and we found four common themes we found that data set developers typically focused at on univers universality um compared to partic particularity they want to try to cover every single instance uh that however has the problem where you may uh have people at the margins fall out of the data set you might have people who are quote unquote edge cases um not actually be included in the data set and now is at the cost of having particularity of having things of having a nearly scoped problem in which people are um problems are well defined uh we found a intent toward speed rather than care uh so we need to get this thing we need to collect this as much as possible we need to quickly label these we relied on Amazon Mechanical Turk workers rather than having people or experts judge these things and take the care needed to treat these data with a certain amount of care um that was a common theme the last two things are a focus on impartiality rather than positionality so focusing on trying to have a data set that would say unbias or this kind of mythical unbiased of the data biasness of a data set when we know that all data sets as the title suggests have politics they have a particular view of the world and actually acknowledging that view is what comes at the cost of claiming um claiming impartiality and then lastly a focus on doing the work of building the model versus is the work of building the data set so so much of the work focused on the building of the model this reflects a lot of the qualitative work that my former colleague at Google nithan sasian has also shown in doing interviews people want to do the model work they want to build these models you get these Topline metrics you beat state-of-the-art rather than the slow plotting work of doing data um ensuring that this stuff is um kind of meets a criteria of quality that people have been paid sufficiently that you have consent uh where consent is needed and obtainable uh nobody wants to do that data work which is much slower and you can even see that by the volume of papers given to describing data sets a new data set may be released and it gets two paragraphs in an eight-page paper um most of the paper is spent describing the math and the methods and how this paper beats uh you know your your state of the art so maybe shift gears a little bit one of the topics that you've been um kind of outspoken about recently is all the hype surrounding uh AI I think uh listeners of this podcast will be familiar with that hype um you know a lot of it uh has come about since the release of Chad GPT uh so you know we're nine months into it into this latest iteration of ccle this latest iteration of the hype cycle you know AI hype has been a uh an issue for a while um but it's I think we're at new levels here um you know why do you why is why is the hype cycle kind of an interesting and important thing to or the level of hype an interesting and important thing to to talk about and highlight for you well it's really interesting how we came about this because I think the papers that we were writing around data a lot of it came out of the in 2021 where Emily and I were writing and thinking together with a larger group of people and so this hype cycle started when this was really launched when Blake leemon who I did work with at Google was fired by Google for claiming that this model was sentient Lambda and but shortly after a um a VP at Google Blaze Argy Arcus he wrote this very long literally 10,000 words maybe 15,000 words on this kind of idea of AI sentience and didn't refute any of of Blake's claims but was effectively giving some Credence the idea that these large language models were sentient uh the same thing with Ilia Suk who said something of the nature that large language models are slightly sentient in a decontextualized sort of tweet same Altman given some Credence to this and saying that I am a stochastic parrot and so are you and so we like Wow Let's sck into this and I want to pick up on something you said because I've been reading a lot of history of AI lately and AI hype is not only uh is not only new it's also it's actually quite very very old it's probably as old as AI it's self right and I want to give a two shout outs here one to Abbe beran and a second to Ben tarof baby barhan had a piece in um Real Life magazine called fair warning and a lot of it was about it was a reading of Joseph visen bomb's uh computer power and human reason very much dealing with this kind of idea of AI hype and the kind of risk that we have at this Ben tarof has has has written a longer piece for the guardian going into Visa bom's life going into the way that he had been person that was digging into you know you know this is the person who wrote the Eliza chatbot right and he was struck at how many people were fooled by this thing that people were really taken by a few Simple Rules given to this chatbot written in the 1950s um and how it did a few different things one of the things that it did is make people panic or hype depending on which side of the coin you are what this would do to your jobs Eliza was purported to be a rogerian psychologist many psychologists fisen bomb rights and computer power and human reason were even saying well this thing is going to take jobs like we're actually going to be able to have a psychologist in every hospital and this can take on any number of patients needed um and he was very struck by that and it did two things that he argues one of the things it did is it produced this amount of hype and the second thing it did and and kind of unreasonably so the second thing he felt that it did is it devalued what it means to be human and devalued what it means to be a particular species at this point in time and so visen bom was very critical of AI boosters he was at MIT he got you know arguments with Marvin Minsky the head of the AI Lab at in MIT and Minsky was just taking oodles and oodles of Defense funding to develop these different tools uh without being very critical reflexive of these kinds of operations and so why is it important to tackle AI hype now one it's just at a fever pitch it seems like you can't turn turn anywhere the same way that two years ago or three years ago you everywhere you looked was blockchain or crypto or nfts uh AI is being deployed in every which way uh every kinds of things since chat gbt has become something available to mass Market users you effectively are seeing new and horrible ways in which someone thinks let's slap a chatbot on it and use it in some business or Social Service use case and so there needs to be someone out here countering those breathless claims and that's where we Emily and I see our role um is really taking these uh with a really sober mind in addressing these you you mentioned new and horrible use cases are there some that come to mind for you so many Sim so many the things that horri me the most are really the medical use cases um those cases I mean take this as a page right out of visen bomb again but it's those cases in which um things are being used for talk therapy or being used for people who are in Mental Health crisis uh recently there was a there was there's actually an article published today in the American Prospect about the national Eating Disorders Association and how in the face of their unionization efforts um the whole staff was cut for a chatbot named Tessa Tessa uh was doing things like providing was quickly taken out of the commission after they found out that it was giving advice to people like weight law strategies things that people with eating disorders don't need to hear in in in marked uh uh contradiction of of the kinds of things people in crisis need to hear uh the same thing has happened with uh doctor's services and Diagnostics Martin Shelli the guy who got arrested uh for uh jacking the price of insulin he posted on Twitter some AI to called Dr Gupta doai That was supposed to be helpful as a diagnostic and this has been done for other more reputable firms as well Google said that their Med Palm to was being tested at the Mayo Clinic uh glass. uh was that the Dr Drug D Prim yeah it's been U these these these things have been put in medicinal and clinical settings and that's I think the thing that's the most one of the most horrifying cases for me uh one of the like very uh Curious cases that's been pretty alarming is uh mushroom identification people have been using llms to generate uh mushroom identification books for amateur Mushroom Hunters um and if you I know it's it's wild 404 media had an article on this I think Samantha Cole wrote it and it was about how these things are flooding Amazon um and uh you know if you if you have some made up mushroom and it says it's safe to eat and then someone eats it and dies from it yeah that's that's that's that's literally a death on the hands of this this ch uh and these things are just flooding Amazon so there's a lot of horrible use cases and these are the ones when I'm thinking about kind of direct bodily harm are the ones that I I have top of mind but there's a lot of other stuff out there too like how do you kind of pars through the I don't know it's kind of the guns don't kill people people kill people argument type uh of thing like it's not the technology it's the misuse of the technology that's where it's helpful to be a sociologist right yeah because because you don't focus and this is why Emily and I were so well together she's a linguist and I'm a sociologist as a sociologist what I pay attention to are the organizations and and Collective incentives and which drive people to certain kinds of behavior and how certain organizations are incentivized to do so right so okay guns might not kill people but but you're putting this tool out there yes creating in you there's an existing incentive structure for them to kill people with so therefore it's a systemic issue and not uh not an individual Choice per se yeah and I mean that's the that's the situation in which we're in a funding environment in which funders are VCS are fighting Handover Fist and giving out money like it's water to try to get some Roi on some AI tool then yeah then it's going to be people are incentivized to use these things and to use them quickly uh the last time I checked pitchbook data this industry had 44 million billion dollars in investment with trillion dollars in valuation I'm sure if I go back to pitchbook that's probably gone up 10 billion since the last time I looked in the last quarter and so if you see just the sheer volume of money that's going out then it doesn't matter if an individual llm isn't going to kill people or not if an llm is sitting in a closet and it's being used for a scientific purpose only but it that's that's that's not what's happening there's a whole uh infrastructure around trying to turn investment off these things do you decry all medical uses of llms or AI broadly or is it is it um more Nuance than that is it just the irresponsible uses some of which you you just mentioned I just mentioned the most egregious versions of these things um I don't decry all of these usages I mean I think there can be usages in which there are certain situations in which Health Care Providers or people in Social Services could use these to some degree however there's been very little evaluation of these things in clinical settings there's been very little public evaluation of these things through peer review if they've been done through peer review those benchmarks have their own problems um this is kind of the issue and and we recently did a show uh on our podcast with uh um with Dr roxan denu Who was incoming professor in at Stanford um on the uses of llms in medical evaluation and and di agnostic and you know much of the cases for instance Google did an evaluation of their Med Palm models and found something like initially a 68% accuracy on the US medical licensing exam and then an increased accuracy I think up in the 80s on that exam but the problem is that that's not even a good evaluation for clinicians that's the first step that allows entry into medical program there's much more that has to do with Diagnostic and and and and and in treatment plans yeah it's like the llm can pass the bar so therefore it should be allowed to be a lawyer right exactly and we and we have an episode on on that too with KRA Albert uh with Kendra Albert who's uh works in the Harvard Cyber Law clinic and so you know we you know we we've talked to experts about these things and they're very critical as well and it's um you know that this is um so if there's a place in which evaluation is robustly defined where it is outlined in a way that has both construct and face validity where the use case if it goes wrong has some type of recourse uh where there is a close human supervision uh where you have a robust process then yeah uh I wouldn't be opposed to it but that's not what's happening these things are being put out the kind of uh scientific papers that are written about it don't look very different from press releases um it's not kind of slow uh thoughtful agreed upon evaluation work uh then that's just not what's happening are there Frameworks that you can point point to uh or or would suggest for folks that are you know hey you know I've got this you know shiny llm tool I want to use it for thing X how do I know if that's a good idea like is that a you know you know it when you see a thing or there's a you know there are 10 Frameworks that are already published just pick any one of them uh like um you know what tools do folks have for seriously evaluating the applicability uh of and not just llms any AI driven tool to a given problem I know one of you mentioned Baba barana we spoke uh too long ago years ago um but you know one of the one of the things she really focused on at the time was kind of being um you know human Centric or you know having a a view that is C centered on the people that are impacted by whatever the tool is as opposed to you know the a tool Centric View and I know this a theme that uh that kind of is is carried through a lot of dar's work um but are there are there Frameworks that you would Point people to to you know for thinking this through or is this an area that we need to you know continue to develop yeah I think there's some Frameworks that are emerging um so one of them the nist has a risk management framework that they've been working through and trying to assess on if you're thinking about a tool what would it mean to assess risk in this particular view so I think that could be a helpful thing in terms of different evaluation Frameworks I think that's a bit harder I think it needs to be pretty particular to a use case uh I don't really believe in this kind of idea of kind of like a general purpose technology I mean I think that's a that's a that's a bit of a um that is a thing that open AI likes to say that these things are um but that itself is problematic in many guyses um and so I think identifying things that are more commonly accepted by a particular scoped academic Community would be helpful to look at so I would say you know are there things within the health or health evaluation for particular types of goals that would be that be would be well scoped are there ways that communicating with people who are providers or professionals would that be a process um does that exist in a particular view or could that be a thing that uh you could engage certain kinds of professional associations with I mean I think those are all places to start looking for these things but I just think these things are so new none of those have been developed in cooperation with particular uh professional communities and societies and regarding llms as a general purpose tool is the objection there that it leads people to believe that you can take them off the shelf and tell them to do anything and their output is you know a valid uh is valid for doing that thing yeah absolutely I mean this is this is being very this is being very critical of one particular work that open a I put out which was dpts or gpts is the paper I just got a lot of traction which suggested that certain kinds of Technologies would replace you know something like 10% of jobs and affect 20% of them um and first off that paper has many issues one of them being that the people actually rating those were open AI employees um so that also presents a face validity issue just from their own internal metrics as a ranking system but the fact that many of these things uh also foreclose the possibility of other Technologies I go back to visen bomb here because surprisingly precent he's actually very critical of that notion of even a computer as a general purpose technology um we use computers for everything now though uh but that also forecloses a certain kind of notion of kind of how people want to be recognized and computed in certain kinds of systems you also have to think about where bom is writing he writes this in 1976 he fle Germany in in light of the the Nazi occupation rise of the na Nazism and you know he effectively says you know yeah Nazis had computers they would have used them and it would have exterminated people faster and we I mean in IBM for instance has still has an apologize for the use of their um their counting machines for the kind of tally of of people in in camps and so you know like the kind of notion of computing as a kind of device I mean can be seen as a certain kind of project which forecloses other possibilities and I think any kind of technology that claims to be generaliz generalizable um can have that view especially if it tries to take over kind of traditional knowledges and traditional ways of doing things so that's I think it's a longer conversation and I didn't mean to open that box but it's also like I also already mentioned already mentioned visen Bal so I think he did have some precence in determining and and talking about the way that certain Technologies become generalizable and what they do to our imagination of what technology can be in that last response you you know just at the very end kind of grounded on traditional ways of doing things as like the Touchstone and the implication that I thought I heard was that um you know having technology as a tool that replaces traditional ways of doing things is you didn't necessarily say that it it was bad but the implication was that um you start from a perspective of you know it's bad and it needs to to prove itself in you know some way um trying to necessarily formulate a question around this but I'm mostly trying to get your take on on that because that seems overly pessimistic or something I'm not saying that we should start from the perspective that all technology is bad I love indoor plumbing I love I love pens uh I love just not computers I love I I love computers I I can't lie I I love I've Loved computers since I was four I can't pretend like I don't like computers right computers don't fascinate me I have a degree in computer science you know yeah and that was a dream of mine since I was five and so you know and and I'm glad I had that degree at the same time what I'm saying is that what are the ways in which these technologies will serve us that don't have externalities that are going to harm us right um what are the ways in which these things could be viewed in certain kinds of ways what are the ways in which you know um you know we're going to develop machine translation that would be a helpful way of helping you know our grandmothers access the internet while also acknowledging that machine translation has a history of being a bit of a having a colonizing force or as a force of of War making and Cold War spying and intelligence Amanda pada Amanda ly pada one of my co-authors on other work has a as a paper has a blog post she wrote for the gradient which talks about U machine translation and the way machine translation shifts power and she talks about the kind of developing the machine translation in the Cold War era basically used to translate from Russian um into into ways that would be more legible by intelligence officials so these things are not you know they're not value neutral they're very value Laden if there's a way we can twist those to our own ends that work for communities and that's great um you know but we also need to know Rec ize that these things have certain kind of politics and histories that that lead them to act in the way that they are now you're just continuing with translation as an example you know it's I think easy to see that it has its benefits uh as well as the drawbacks and so how do you balance the benefits how do you approach balancing benefits and drawbacks as a sociologist yeah I mean that's that's a curious question I mean the as a sociologist part I think is is the thing I think I think one of the aspects about this is seeing how these things are being used in context seeing what the political economy of these things are who is who's making money who's gaining power status and capital through these things and if it seems to be the case that these Technologies tend to acre to people who already have a lot of power and that is resulting in more harm then that seems like an issue if it is instead something that is maybe a technology that is helpful in some limited sort of context and would benefit people disproportionately that don't that are not already occurring many benefits then that would be a benefit um but it's a trade-off in every case and I mean it's hard to kind of talk about this in a in a kind of General case I mean in Translation case I think that's uh you know that's you know we're at the point in which machine translation I think has gone to a certain place where it is you know to have a certain kind of access to the digital world you need to have some there are elements of the internet that are just completely inaccessible unless you have some translation into English or German or Spanish or a western language I mean I guess Chinese translating to Chinese to and from Chinese is also in Mandarin more specifically um and so given that so much of the internet is and the web and and and therefore Commerce and Industry is so in accessible than it seems like that one's a cat that's out of the bag and in that way it's making it accessible to people so they're able to access that world and exist and and live within that world is translation the cat that's out of the bag or English and Western languages being dominant on the Internet is the cat the tot of the bag I don't know is the what's what's the cat and what's the bag here right yeah yeah no I mean I guess the I guess the English as being the dominant you know is is a bit of the is a bit of the cat that's out of the bag machine translation is maybe the bag uh or maybe I have that reversed I me this metaphor is going to get more and more mingled the more and more I talk about it so Alex we've talked about uh a pretty broad range of uh of things uh and uh just a small bit of the work that's going on in and around dare before we wrap up are there you know any other things that you'd like to point us to or projects that you'd like to um to suggest that our audience takes a look at as you know perhaps as representative of some of the things that we've talked about yeah definitely you can learn more about us at Dar dyen institute.org um that's where we've got uh a bit of work on all our projects all our fellows uh I also mention toku media check out their work uh really kind of a friend of d as well as uh Lanai l. and um check out the podcast mystery AI hype Theater 3000 we've talked about a lot of kinds of things there so yeah just a a shout out to that stuff um and just the folks kind of in the orbit awesome awesome well thanks so much for taking the time to chat it was great to catch up on there and to uh learn a bit about some of the work you're working on thanks Sam it was a pleasureall right everyone welcome to another episode of the twiml AI podcast I am of course your host Sam shington and today I'm joined by Alex Hannah Alex is director of research at dare the distributed AI Research Institute before we get into today's conversation be sure to take a moment to head over to Apple podcast Spotify or your listening platform of choice and if you enjoy the show please leave us a festar rating and review Alex welcome to the podcast thanks for having me Sam I'm looking forward to digging into our conversation uh it's been maybe a year in change since I spoke to timet in kind of the I think she had just started dare or you all had just started thereare and so I'm really looking forward to kind of learning more about you know what you've been doing in that year and change uh but before we dig into that why don't you start us off with a little bit of introduction how'd you come to work in Ai and AI at ethics in particular yeah thanks for the introduction Sam and I'm looking forward to talking about dar's almost twoyear history at this point it'll be two years in December December 2nd uh I can tell you a little bit about myself so I come to AI in a very uh roundabout way my training is as a sociologist and there's not a lot of social scientists within AI although there should be more for many reasons um I got into I feel like we'll be discussing that a little bit we will be discussing that for sure I got into AI kind of through the back way I was using a lot of machine learning methods actually in my dissertation um and so I was using a supervis learning technique to uh identify news articles which mentioned protest and this is because in some of the work that I currently still do and work that I had been doing earlier uh an interest of sociologist that study social movements is identifying kind of the who what when way or why a protest for the instance of identifying you know uh you know what motivates protest or what what makes it happen um what are the demands of folks um and and how do they win and so that got me interested in using some automated methods to to look into that after that I was a professor briefly at the univiversity of t Tonto decided that the academic track wasn't for me and went to work at Google initially working as a curriculum designer within machine learning but was still very much in the conversation around machine learning fairness and algorithmic discrimination uh so I connected with some friends of mine working on that and got more into that space learned much more about it and one of the things that I was focusing on was how much of the time within the conversation machine learning there was very little attention paid to data coming from a sociology background much of the focus is how data is collected how data is constructed how that data may or may not have validity what kind of Errors happening in measurement and operationalization and so that led me very much into focusing on these things so I was getting involved with many of the academic communities around fairness fact for instance is a large conference uh had been going to the conference since 2018 um and and and started going every year since then then at Google I eventually transferred to work with doctors Tim Jau and Meg Mitchell on the ethical AI team that they had constructed at Google was very excited to do so at Google I was the first research scientist that was a social scientist ever hired on that ladder and that opened the door because many social scientists now work at Google research uh focusing on these issues which which is great and I'm happy that it open the door for many many other folks and so everything happened at Google we know that story we won't go into it to meet to meet was fire refer back we'll refer back to that uh that previous podcast for more on that yeah go back to the podcast on how everything happened there and um so so everything happened T went and started dare uh shortly after uh uh after dar's announcement December 2021 I I joined three months later in February 2022 as director of research employee number three so yeah so that's that was my background that's what led me to where I am now and where I'm at at Derek awesome so employee number three you know brand new Research Institute um you know how do you go about cra crafting a a research agenda from from nothing essentially I would say that in terms of in terms of doing it there are two things I mean I was again reiterate it wasn't quite a blank slate just because a lot of the work that we were already working on especially around data uh around data documentation is was a bit precent right I mean people are still if you saw the news today T and Meg and Emily B they all uh were uh represented in times AI 100 um and so you know and notably for the precence of the stochastic parrots paper because of that paper much of the you know much of that was preent but I will say in the in another register a lot of the ways we had set the research agenda is by bringing on researchers and especially research fellows that had a research agenda already and so I say employee number three because someone who is already at there is our fellow rasa safala rucha is a grad student now at Mila in Montreal and her work at dare was on the the spatial apari project so that project and I imagine t uh may have mentioned it uh when she was on on this show is that uh that project was using computer vision technology to detect the Persistence of segregation and the the Persistence of spatial aparti in South Africa the history of South African apide is that black people and non-white people uh uh uh so so-called colored people that were not classified as black but typically Indian in Asian uh were were separated out into uh uh Township areas uh and in those Township areas those are separate from the neighborhoods uh in which the wealthy white population lived and so even though South Africa formerly abolished apartheid in the mid 90s um there's been Persistence of that even though the census in South Africa does not maintain um divisions between townships and neighborhoods anymore reta's work revealed many of the persistence even though I mean this is this is well known that this is persisting but now we actually have a view on this of where this is um and so those data can we then be used to identify things like um how long it takes Social Services to get to certain people the amount of schools the amount of hospitals the kind of time it takes for ambulances to get to a certain place etc etc and so and so that itself is a research agenda that we're still pursuing multiple people are pursuing uing um uh our one of our full-time uh people and also an author on that paper nalii morosi um who's based in loto has been working with with with with r on that and so fellows have come in and I think into areas that we want to focus on and have set research agendas by being able to say you were bring you on because this is an important Dimension that we're focusing on and we want to empower you to focus on that and do work on that so that includes people like aska has been doing work with Lanai and developing language technology that works for the Horn of Africa Adrien Williams who is a former charter school teacher and Amazon delivery driver who's focused on wage theft via surveillance of Amazon workers especially drivers and flex drivers Crystal Kaufman who is an organizer with tropon has also written and done organizing around the rights of data workers the people who are fueling all the data that goes into Ai and so we bring in folks because we know they have those expertise and we let them do what they need to do um and that's so so in terms of coming in with kind of a green field uh uh sort of research agenda that people already have these knowledges whether they're academics or they're people with lived experience and we bring them in and and help them build those skills and publish original research and and work on that and I certainly spoke with uh timet about this in talking about dare but you know how do you articulate kind of the Common Thread that runs through these various research efforts that you described what are what does DARE care most about I think the Common Thread is that we are focused on the notion that AI is not inevitable that it could be a tool that would be useful in some contexts and those contexts tend to be rather narrow in some guises so for instance things like machine translation or automated speech recognition those are actually pretty useful Technologies they could be useful assistive Technologies they could be useful in expanding the scope of people that could use Computing they could provide different interfaces for people who maybe have a very hard time typing Asal lash and and to me and I have this kind of paper work in progress where we talk about an internet for our grandmothers where you know usma and tonita are talking about their um their grandmothers who who couldn't read or write I'm thinking about my grandmother who didn't speak any English spoke Egyptian Arabic and you even Coptic in the home um these are languages that aren't really quite well supported in in different kinds of Technologies even if Google or meta says that they have these automated speech technologies that work well um for these languages they actually don't um they work quite poorly and um if we were able to provide interfaces for different modalities then that could be a great use for AI but right now the use for AI is is is kind of going all into these different kinds of very extractive uses for large language models we trying to automate it out work trying to put people out of jobs um trying to do it in such a way to threaten the labor of many people that things that many workers and many people aren't asking for so the Common Thread really is finding technology that works for People based in in our communities and also I would say the second thing is acknowledging that knowledge that comes from communities is a form of knowledge it's a way of knowing we use this big term in the philosophy of science sometimes epistemology that means how we get to know certain things and I think one of the things that we really Thrive with in Dare is knowing that there are multiple different ways of knowing and that could be lived experience that could be a PhD that could be both um but acknowledging that is is where we start from and how does that that particular Point kind of play out in the research I think it plays out in the research of seeing how people set agendas here um again we're we're you know we came into this project you know with people into into the organization bringing people bringing fellows in saying determine here what is the most important thing okay we want you to write this out how is this the most important thing write this out okay let's talk about what it means to to develop research on this there's a quote from um General Gordon Baker from the Revolutionary black workers in which he says our focus is to turn thinkers into Fighters and Fighters into thinkers and I absolutely love that because the kind of thing that I'm thinking about he's talking a lot about turning organizers into kind of polit and and having them go through political education but also people who are learned we need to bring them into advocacy and I think about that a lot at dare because I almost think about I think about a little twist on that how do we turn um researchers into Fighters and Fighters into researchers right we have these people are that are being brought in that have these huge weals of Knowledge from how they are from everything that they've experienced through their labor through their activism how do we turn that into how do we turn them into researchers and really bring in that evidence that has um the kind of legitimacy within um academic domains but also is going to be stuff that is useful for this kind of goal of ours this norst of ours to build technology um that works for people and when you think about um Building Technology that works for people can you give us some examples of projects that um that are kind of squarely focused on that particular goal and some of the the outcomes of those yeah so I want to revisit the work I mentioned earlier as as Mal's work at Lan in which he's been focusing on building machine translation and automated speech recognition tools for um two languages that are on uh the Horn of Africa toia and Amharic these are languages spoken by um I think Amar has spoken I want to say by 20 or 30 million people in Ethiopia toia is spoken by by by about three million people in the theer uh the t region of Ethiopia and these Technologies work very poorly in uh uh when you when you look at meta work or Google's work on this and you use those tools they actually work very poorly in doing those things even if they advertise that they've done so I think Meta Even had a video in which they advertised their ability to translate um or or do automated speech recognition of aark and Asma lash did an analysis of that and found it it worked very poorly and so developing those tools with the cooperation of people in in those communities has been critical so he's been you know working on the development of those tools um with the cooperation of those speakers um sourcing those data in ethical ways um checking it with people from the community for Community use and in some ways we're also very inspired by other efforts in these directions to heku Media for instance is a uh organization that's based in arria or New Zealand um in which people uh the people involved are all from the Tori minority uh indigenous community in New Zealand and they've done a bit of work they're not an academic group per se but they're a both kind of an an indigenous and traditional cultural knowledge preservation group and an Engineering Group and so what they've been doing is focusing on collecting data from indigenous Elders from speakers ofori and being able to develop machine translation and automated speech recognition tools that work for that Community they've compared this with other tools that have been released by for instance open Ai and found how poorly it does in that language something else that they've done is they've safeguarded the data that they use to train those tools because those themselves are considered under a certain kind of uh uh data Sovereign Tre that they want to keep and maintain and so we take a lot of inspiration from that project and it's something that we kind of bring in and thinking about as something that should be exemplified as a way of building tech for people that works when you engage around kind of this conversation of uh data collection is it primarily an awareness raising thing do you is there uh is there research that goes into that is there are there you know is there a way to study that as a a phenomenon to to drive change around around it or is it primarily kind of this building of awareness well there's definitely ways of building research around it I mean much of my research is focused on where these data come from this paper that we wrote with that I wrote with Morgan CLA shman and Remy Denton uh focused the name of the papers do data sets have politics that paper focused a lot about the sourcing of data and what computer scientists and other related researchers focused on important value lad in P in data sets there's been other studies including the work from um um from Pang and and and um forgot the author's first name and arvind uh nanian that focus on the after lives of data that have been considered unethical U they focused on three data sets that have been retracted and found many different versions of the this data still out in the wild um and a lot of it is Research into what exists the effort is to change practices and changing practices is very difficult but the first part of it is understanding how pervasive the problem is and so some of the changes have been proferred so for instance nurs now has a data set and Benchmark track that data set and Benchmark track is an effort to bring more data and have data be a valued contribution in its own right especially data that meets a certain bar of ethical commitments every paper in that uh in that track needs to have a data sheet needs to be explained it needs to be open if there's any restrictions that needs to be explained but this is still not the case I mean that is any organization that wants to go to market with something like chat P PT or any of these other tools released by large entities don't have this reasoning to release those data they don't have compunction or anything compelling them to be transparent or release those data uh they hide behind kind of promises or excuses of trade secrecy or uh if there's someone that would build a nefarious kind of version of of the of language model or whatnot um and I find those arguments to be disingenuous I mean there needs to be transparency into what those data sets are and how you know per existing kinds of problems uh bias quote unquote hallucinations although I hate that word hallucinations more like misinformation and falsehoods so they're perpetuated uh from those training data and we just have no way to do any auditing of that so awareness is one aspect of it but it's also changing scientific practice and developing regulation and legislation that is going to protect different subjects in the data set development and the model development process can you talk a little bit uh in more detail about the do data sets have politics paper yeah absolutely so this is a paper that we wrote two two years ago started 3 years ago and the focus on this paper was looking at a particular sector of machine learning most specific more specifically computer vision so what we attempted to do is construct almost a population level uh overview of all computer vision data sets that we had come up with so we looked into many different methods of this we looked at citation patterns we um at Pap I don't think we looked at papers with code for this one um um but we did search every we searched i e archives we tried to find basically every type of image data set that we could within the past um I want to say a decade or two yes the past 20 years we found around 500 to 700 uh data sets I think we started we had 500 initially in the population we took a 100 data set sample of that in addition to the 14 mosts highly cited data sets and we coded them for 100 different variables and then also uh did a qualitative analysis of things that computer scientists valued in those constructions of data sets so let me explain each of those things in terms of the data sets and the different kinds of data that we focused on and coded for we coded for where did the data come from is there any licensing around uh the data instances are there any people in the data set can you identify people FR their faces um do they have any consent around it was there any licensing around it as for the data itself was it held on a repository that had restricted use um is there any privacy considerations mentioned in this is there any ethical considerations is the data even still available can we access it can we audit it are people being good data stewards and then as for the qualitative variables in terms of the values of the data set the kinds of things that we focused on is is there any valuated language around this so for instance if the author of a data set writes something like we use this data set because we wanted a larger data set because larger data means that our you know it it serves as a better Benchmark or we wanted to have people in multiple different poses so we could have better out of sample fit or you know yada yada yada these are the types of things in which people are making a value judgment on that and we did an analysis that we used we used a method that's commonly used in uh social science called grounded theory in which you look at lots of different texts you see what themes emerge and then you bend them into different categories and we found four common themes we found that data set developers typically focused at on univers universality um compared to partic particularity they want to try to cover every single instance uh that however has the problem where you may uh have people at the margins fall out of the data set you might have people who are quote unquote edge cases um not actually be included in the data set and now is at the cost of having particularity of having things of having a nearly scoped problem in which people are um problems are well defined uh we found a intent toward speed rather than care uh so we need to get this thing we need to collect this as much as possible we need to quickly label these we relied on Amazon Mechanical Turk workers rather than having people or experts judge these things and take the care needed to treat these data with a certain amount of care um that was a common theme the last two things are a focus on impartiality rather than positionality so focusing on trying to have a data set that would say unbias or this kind of mythical unbiased of the data biasness of a data set when we know that all data sets as the title suggests have politics they have a particular view of the world and actually acknowledging that view is what comes at the cost of claiming um claiming impartiality and then lastly a focus on doing the work of building the model versus is the work of building the data set so so much of the work focused on the building of the model this reflects a lot of the qualitative work that my former colleague at Google nithan sasian has also shown in doing interviews people want to do the model work they want to build these models you get these Topline metrics you beat state-of-the-art rather than the slow plotting work of doing data um ensuring that this stuff is um kind of meets a criteria of quality that people have been paid sufficiently that you have consent uh where consent is needed and obtainable uh nobody wants to do that data work which is much slower and you can even see that by the volume of papers given to describing data sets a new data set may be released and it gets two paragraphs in an eight-page paper um most of the paper is spent describing the math and the methods and how this paper beats uh you know your your state of the art so maybe shift gears a little bit one of the topics that you've been um kind of outspoken about recently is all the hype surrounding uh AI I think uh listeners of this podcast will be familiar with that hype um you know a lot of it uh has come about since the release of Chad GPT uh so you know we're nine months into it into this latest iteration of ccle this latest iteration of the hype cycle you know AI hype has been a uh an issue for a while um but it's I think we're at new levels here um you know why do you why is why is the hype cycle kind of an interesting and important thing to or the level of hype an interesting and important thing to to talk about and highlight for you well it's really interesting how we came about this because I think the papers that we were writing around data a lot of it came out of the in 2021 where Emily and I were writing and thinking together with a larger group of people and so this hype cycle started when this was really launched when Blake leemon who I did work with at Google was fired by Google for claiming that this model was sentient Lambda and but shortly after a um a VP at Google Blaze Argy Arcus he wrote this very long literally 10,000 words maybe 15,000 words on this kind of idea of AI sentience and didn't refute any of of Blake's claims but was effectively giving some Credence the idea that these large language models were sentient uh the same thing with Ilia Suk who said something of the nature that large language models are slightly sentient in a decontextualized sort of tweet same Altman given some Credence to this and saying that I am a stochastic parrot and so are you and so we like Wow Let's sck into this and I want to pick up on something you said because I've been reading a lot of history of AI lately and AI hype is not only uh is not only new it's also it's actually quite very very old it's probably as old as AI it's self right and I want to give a two shout outs here one to Abbe beran and a second to Ben tarof baby barhan had a piece in um Real Life magazine called fair warning and a lot of it was about it was a reading of Joseph visen bomb's uh computer power and human reason very much dealing with this kind of idea of AI hype and the kind of risk that we have at this Ben tarof has has has written a longer piece for the guardian going into Visa bom's life going into the way that he had been person that was digging into you know you know this is the person who wrote the Eliza chatbot right and he was struck at how many people were fooled by this thing that people were really taken by a few Simple Rules given to this chatbot written in the 1950s um and how it did a few different things one of the things that it did is make people panic or hype depending on which side of the coin you are what this would do to your jobs Eliza was purported to be a rogerian psychologist many psychologists fisen bomb rights and computer power and human reason were even saying well this thing is going to take jobs like we're actually going to be able to have a psychologist in every hospital and this can take on any number of patients needed um and he was very struck by that and it did two things that he argues one of the things it did is it produced this amount of hype and the second thing it did and and kind of unreasonably so the second thing he felt that it did is it devalued what it means to be human and devalued what it means to be a particular species at this point in time and so visen bom was very critical of AI boosters he was at MIT he got you know arguments with Marvin Minsky the head of the AI Lab at in MIT and Minsky was just taking oodles and oodles of Defense funding to develop these different tools uh without being very critical reflexive of these kinds of operations and so why is it important to tackle AI hype now one it's just at a fever pitch it seems like you can't turn turn anywhere the same way that two years ago or three years ago you everywhere you looked was blockchain or crypto or nfts uh AI is being deployed in every which way uh every kinds of things since chat gbt has become something available to mass Market users you effectively are seeing new and horrible ways in which someone thinks let's slap a chatbot on it and use it in some business or Social Service use case and so there needs to be someone out here countering those breathless claims and that's where we Emily and I see our role um is really taking these uh with a really sober mind in addressing these you you mentioned new and horrible use cases are there some that come to mind for you so many Sim so many the things that horri me the most are really the medical use cases um those cases I mean take this as a page right out of visen bomb again but it's those cases in which um things are being used for talk therapy or being used for people who are in Mental Health crisis uh recently there was a there was there's actually an article published today in the American Prospect about the national Eating Disorders Association and how in the face of their unionization efforts um the whole staff was cut for a chatbot named Tessa Tessa uh was doing things like providing was quickly taken out of the commission after they found out that it was giving advice to people like weight law strategies things that people with eating disorders don't need to hear in in in marked uh uh contradiction of of the kinds of things people in crisis need to hear uh the same thing has happened with uh doctor's services and Diagnostics Martin Shelli the guy who got arrested uh for uh jacking the price of insulin he posted on Twitter some AI to called Dr Gupta doai That was supposed to be helpful as a diagnostic and this has been done for other more reputable firms as well Google said that their Med Palm to was being tested at the Mayo Clinic uh glass. uh was that the Dr Drug D Prim yeah it's been U these these these things have been put in medicinal and clinical settings and that's I think the thing that's the most one of the most horrifying cases for me uh one of the like very uh Curious cases that's been pretty alarming is uh mushroom identification people have been using llms to generate uh mushroom identification books for amateur Mushroom Hunters um and if you I know it's it's wild 404 media had an article on this I think Samantha Cole wrote it and it was about how these things are flooding Amazon um and uh you know if you if you have some made up mushroom and it says it's safe to eat and then someone eats it and dies from it yeah that's that's that's that's literally a death on the hands of this this ch uh and these things are just flooding Amazon so there's a lot of horrible use cases and these are the ones when I'm thinking about kind of direct bodily harm are the ones that I I have top of mind but there's a lot of other stuff out there too like how do you kind of pars through the I don't know it's kind of the guns don't kill people people kill people argument type uh of thing like it's not the technology it's the misuse of the technology that's where it's helpful to be a sociologist right yeah because because you don't focus and this is why Emily and I were so well together she's a linguist and I'm a sociologist as a sociologist what I pay attention to are the organizations and and Collective incentives and which drive people to certain kinds of behavior and how certain organizations are incentivized to do so right so okay guns might not kill people but but you're putting this tool out there yes creating in you there's an existing incentive structure for them to kill people with so therefore it's a systemic issue and not uh not an individual Choice per se yeah and I mean that's the that's the situation in which we're in a funding environment in which funders are VCS are fighting Handover Fist and giving out money like it's water to try to get some Roi on some AI tool then yeah then it's going to be people are incentivized to use these things and to use them quickly uh the last time I checked pitchbook data this industry had 44 million billion dollars in investment with trillion dollars in valuation I'm sure if I go back to pitchbook that's probably gone up 10 billion since the last time I looked in the last quarter and so if you see just the sheer volume of money that's going out then it doesn't matter if an individual llm isn't going to kill people or not if an llm is sitting in a closet and it's being used for a scientific purpose only but it that's that's that's not what's happening there's a whole uh infrastructure around trying to turn investment off these things do you decry all medical uses of llms or AI broadly or is it is it um more Nuance than that is it just the irresponsible uses some of which you you just mentioned I just mentioned the most egregious versions of these things um I don't decry all of these usages I mean I think there can be usages in which there are certain situations in which Health Care Providers or people in Social Services could use these to some degree however there's been very little evaluation of these things in clinical settings there's been very little public evaluation of these things through peer review if they've been done through peer review those benchmarks have their own problems um this is kind of the issue and and we recently did a show uh on our podcast with uh um with Dr roxan denu Who was incoming professor in at Stanford um on the uses of llms in medical evaluation and and di agnostic and you know much of the cases for instance Google did an evaluation of their Med Palm models and found something like initially a 68% accuracy on the US medical licensing exam and then an increased accuracy I think up in the 80s on that exam but the problem is that that's not even a good evaluation for clinicians that's the first step that allows entry into medical program there's much more that has to do with Diagnostic and and and and and in treatment plans yeah it's like the llm can pass the bar so therefore it should be allowed to be a lawyer right exactly and we and we have an episode on on that too with KRA Albert uh with Kendra Albert who's uh works in the Harvard Cyber Law clinic and so you know we you know we we've talked to experts about these things and they're very critical as well and it's um you know that this is um so if there's a place in which evaluation is robustly defined where it is outlined in a way that has both construct and face validity where the use case if it goes wrong has some type of recourse uh where there is a close human supervision uh where you have a robust process then yeah uh I wouldn't be opposed to it but that's not what's happening these things are being put out the kind of uh scientific papers that are written about it don't look very different from press releases um it's not kind of slow uh thoughtful agreed upon evaluation work uh then that's just not what's happening are there Frameworks that you can point point to uh or or would suggest for folks that are you know hey you know I've got this you know shiny llm tool I want to use it for thing X how do I know if that's a good idea like is that a you know you know it when you see a thing or there's a you know there are 10 Frameworks that are already published just pick any one of them uh like um you know what tools do folks have for seriously evaluating the applicability uh of and not just llms any AI driven tool to a given problem I know one of you mentioned Baba barana we spoke uh too long ago years ago um but you know one of the one of the things she really focused on at the time was kind of being um you know human Centric or you know having a a view that is C centered on the people that are impacted by whatever the tool is as opposed to you know the a tool Centric View and I know this a theme that uh that kind of is is carried through a lot of dar's work um but are there are there Frameworks that you would Point people to to you know for thinking this through or is this an area that we need to you know continue to develop yeah I think there's some Frameworks that are emerging um so one of them the nist has a risk management framework that they've been working through and trying to assess on if you're thinking about a tool what would it mean to assess risk in this particular view so I think that could be a helpful thing in terms of different evaluation Frameworks I think that's a bit harder I think it needs to be pretty particular to a use case uh I don't really believe in this kind of idea of kind of like a general purpose technology I mean I think that's a that's a that's a bit of a um that is a thing that open AI likes to say that these things are um but that itself is problematic in many guyses um and so I think identifying things that are more commonly accepted by a particular scoped academic Community would be helpful to look at so I would say you know are there things within the health or health evaluation for particular types of goals that would be that be would be well scoped are there ways that communicating with people who are providers or professionals would that be a process um does that exist in a particular view or could that be a thing that uh you could engage certain kinds of professional associations with I mean I think those are all places to start looking for these things but I just think these things are so new none of those have been developed in cooperation with particular uh professional communities and societies and regarding llms as a general purpose tool is the objection there that it leads people to believe that you can take them off the shelf and tell them to do anything and their output is you know a valid uh is valid for doing that thing yeah absolutely I mean this is this is being very this is being very critical of one particular work that open a I put out which was dpts or gpts is the paper I just got a lot of traction which suggested that certain kinds of Technologies would replace you know something like 10% of jobs and affect 20% of them um and first off that paper has many issues one of them being that the people actually rating those were open AI employees um so that also presents a face validity issue just from their own internal metrics as a ranking system but the fact that many of these things uh also foreclose the possibility of other Technologies I go back to visen bomb here because surprisingly precent he's actually very critical of that notion of even a computer as a general purpose technology um we use computers for everything now though uh but that also forecloses a certain kind of notion of kind of how people want to be recognized and computed in certain kinds of systems you also have to think about where bom is writing he writes this in 1976 he fle Germany in in light of the the Nazi occupation rise of the na Nazism and you know he effectively says you know yeah Nazis had computers they would have used them and it would have exterminated people faster and we I mean in IBM for instance has still has an apologize for the use of their um their counting machines for the kind of tally of of people in in camps and so you know like the kind of notion of computing as a kind of device I mean can be seen as a certain kind of project which forecloses other possibilities and I think any kind of technology that claims to be generaliz generalizable um can have that view especially if it tries to take over kind of traditional knowledges and traditional ways of doing things so that's I think it's a longer conversation and I didn't mean to open that box but it's also like I also already mentioned already mentioned visen Bal so I think he did have some precence in determining and and talking about the way that certain Technologies become generalizable and what they do to our imagination of what technology can be in that last response you you know just at the very end kind of grounded on traditional ways of doing things as like the Touchstone and the implication that I thought I heard was that um you know having technology as a tool that replaces traditional ways of doing things is you didn't necessarily say that it it was bad but the implication was that um you start from a perspective of you know it's bad and it needs to to prove itself in you know some way um trying to necessarily formulate a question around this but I'm mostly trying to get your take on on that because that seems overly pessimistic or something I'm not saying that we should start from the perspective that all technology is bad I love indoor plumbing I love I love pens uh I love just not computers I love I I love computers I I can't lie I I love I've Loved computers since I was four I can't pretend like I don't like computers right computers don't fascinate me I have a degree in computer science you know yeah and that was a dream of mine since I was five and so you know and and I'm glad I had that degree at the same time what I'm saying is that what are the ways in which these technologies will serve us that don't have externalities that are going to harm us right um what are the ways in which these things could be viewed in certain kinds of ways what are the ways in which you know um you know we're going to develop machine translation that would be a helpful way of helping you know our grandmothers access the internet while also acknowledging that machine translation has a history of being a bit of a having a colonizing force or as a force of of War making and Cold War spying and intelligence Amanda pada Amanda ly pada one of my co-authors on other work has a as a paper has a blog post she wrote for the gradient which talks about U machine translation and the way machine translation shifts power and she talks about the kind of developing the machine translation in the Cold War era basically used to translate from Russian um into into ways that would be more legible by intelligence officials so these things are not you know they're not value neutral they're very value Laden if there's a way we can twist those to our own ends that work for communities and that's great um you know but we also need to know Rec ize that these things have certain kind of politics and histories that that lead them to act in the way that they are now you're just continuing with translation as an example you know it's I think easy to see that it has its benefits uh as well as the drawbacks and so how do you balance the benefits how do you approach balancing benefits and drawbacks as a sociologist yeah I mean that's that's a curious question I mean the as a sociologist part I think is is the thing I think I think one of the aspects about this is seeing how these things are being used in context seeing what the political economy of these things are who is who's making money who's gaining power status and capital through these things and if it seems to be the case that these Technologies tend to acre to people who already have a lot of power and that is resulting in more harm then that seems like an issue if it is instead something that is maybe a technology that is helpful in some limited sort of context and would benefit people disproportionately that don't that are not already occurring many benefits then that would be a benefit um but it's a trade-off in every case and I mean it's hard to kind of talk about this in a in a kind of General case I mean in Translation case I think that's uh you know that's you know we're at the point in which machine translation I think has gone to a certain place where it is you know to have a certain kind of access to the digital world you need to have some there are elements of the internet that are just completely inaccessible unless you have some translation into English or German or Spanish or a western language I mean I guess Chinese translating to Chinese to and from Chinese is also in Mandarin more specifically um and so given that so much of the internet is and the web and and and therefore Commerce and Industry is so in accessible than it seems like that one's a cat that's out of the bag and in that way it's making it accessible to people so they're able to access that world and exist and and live within that world is translation the cat that's out of the bag or English and Western languages being dominant on the Internet is the cat the tot of the bag I don't know is the what's what's the cat and what's the bag here right yeah yeah no I mean I guess the I guess the English as being the dominant you know is is a bit of the is a bit of the cat that's out of the bag machine translation is maybe the bag uh or maybe I have that reversed I me this metaphor is going to get more and more mingled the more and more I talk about it so Alex we've talked about uh a pretty broad range of uh of things uh and uh just a small bit of the work that's going on in and around dare before we wrap up are there you know any other things that you'd like to point us to or projects that you'd like to um to suggest that our audience takes a look at as you know perhaps as representative of some of the things that we've talked about yeah definitely you can learn more about us at Dar dyen institute.org um that's where we've got uh a bit of work on all our projects all our fellows uh I also mention toku media check out their work uh really kind of a friend of d as well as uh Lanai l. and um check out the podcast mystery AI hype Theater 3000 we've talked about a lot of kinds of things there so yeah just a a shout out to that stuff um and just the folks kind of in the orbit awesome awesome well thanks so much for taking the time to chat it was great to catch up on there and to uh learn a bit about some of the work you're working on thanks Sam it was a pleasure\n"