Dmitry Korkin - Evolution of Proteins, Viruses, Life, and AI _ Lex Fridman Podcast #153
I'm excited to talk to you about my year in review and what I'm looking forward to in 2021. As an academic, I've had the privilege of diving deep into various topics, including viruses and scientific research. One thing that's been on my mind is the impact of the pandemic on our lives and how it's affected our ability to travel and connect with others.
One thing I'm definitely looking forward to in 2021 is traveling again. Every summer, I attend an international summer school called the School for Molecular and Theoretical Biology, which is held in Europe. It's a fantastic experience that brings together gifted kids from all over the world, and it's always a highlight of my year. Unfortunately, we couldn't make it this August, but we did manage to participate remotely. While it wasn't the same as being there in person, I'm excited to go back next summer and reconnect with old friends and make new ones.
In addition to traveling, one of my personal resolutions for 2021 is to prioritize spending time with my family. As someone who's often been focused on their work and research, I've come to realize that I've missed out on quality time with my loved ones. I want to make a conscious effort to be more present and engaged when I'm not in the lab or teaching. It's not always easy to switch off from work mode, but I know it's essential for maintaining a healthy work-life balance.
I'd like to share a poem that holds a special meaning for me, written by my namesake, Dmitry Dmitriev. The poem is called "Sorceress Vedima" (also known as "Dunya" in Russian), and it's one of the few poems I can recall from memory. There's something about the way the words capture the essence of longing and love that resonates deeply with me, especially during this time of year around Christmas and New Year's. The poem is like a whispered secret, a message from afar that speaks to my soul.
Russian poetry has always been a source of inspiration for me, and I find that it offers a unique perspective on the world. There's something magical about the way words can evoke emotions and create connections between people. Whether it's through the use of metaphor or wordplay, Russian poetry seems to tap into a deeper truth that transcends language barriers.
As we look ahead to 2021, I'm excited to see what the new year will bring. Will there be breakthroughs in science? Will we find innovative solutions to the challenges we face as humans? Only time will tell, but one thing is certain – it's going to be an interesting ride. So let's raise a glass to the possibilities of 2021 and to the magic that awaits us around the corner.
The conversation with Dmitry was indeed contagious, and I'm grateful for his insights and perspectives on life and science. It's clear that he's passionate about what he does, and that enthusiasm is infectious. As we wrap up this episode, I want to leave you with a quote from Jeffery Eugenides: "Biology gives you a brain, life turns it into a mind." These words resonate deeply with me, and I believe they offer a profound truth about the human experience.
Finally, I'd like to take a moment to acknowledge our sponsors, who make this podcast possible. Brave Browser is a powerful tool that prioritizes your online safety and security. Netsuite provides business management software that streamlines operations and boosts productivity. Magic Spoon offers delicious low-carb cereal that's perfect for fueling up on-the-go. And Sleep Self Cooling Mattress provides the ultimate sleeping experience, ensuring you wake up feeling refreshed and rejuvenated.
So, thank you to our sponsors for their support. If you're interested in trying out any of these products, be sure to click the links below to get a discount and to support this podcast. Until next time, I bid you farewell, and I look forward to sharing more conversations with you in the future.
"WEBVTTKind: captionsLanguage: enthe following is a conversation with dmitry korkin his second time in the podcast he's a professor of bioinformatics and computational biology at wpi where he specializes in bioinformatics of complex disease computational genomics systems biology and biomedical data analytics he loves biology he loves computing plus he is russian and recites a poem in russian at the end of the podcast what else could you possibly ask for in this world quick mention of our sponsors brave browser netsuite business management software magic spoon low carb cereal and eight sleep self cooling mattress so the choice is browsing privacy business success healthy diet or comfortable sleep choose wisely my friends and if you wish click the sponsor links below to get a discount and to support this podcast as a side note let me say that to me the scientists that did the best apolitical impactful brilliant work of 2020 are the biologists who study viruses without an agenda without much sleep to be honest just a pure passion for scientific discovery and exploration of the mysteries within viruses viruses are both terrifying and beautiful terrifying because they can threaten the fabric of human civilization both biological and psychological beautiful because they give us insights into the nature of life on earth and perhaps even extraterrestrial life of the not so intelligent variety that might meet us one day as we explore the habitable planets and moons in our universe if you enjoy this thing subscribe on youtube review it on apple podcast follow on spotify support on patreon or connect with me on twitter at lex friedman and now here's my conversation with dmitry korkin it's often said that proteins and the amino acid residues that make them up are the building blocks of life do you think of proteins in this way as the uh basic building blocks of life yes and no so the proteins indeed is the the basic unit biological unit that carries out uh important functioning of the cell however through studying the proteins and comparing the proteins across different species across these different kingdoms you realize that uh proteins are actually a more much more complicated so they have so-called modular complexity and so what i mean by that is an average protein consists of of several structural units so we call them protein domains and so you can imagine a protein as a string of beads where each bead is a protein domain and uh you know in the past 20 years scientists have been studying uh the nature of the protein domains because we realize that it's it's it's the unit because if you look at the functions right so so uh many proteins have more than one function and those protein functions uh are often carried out by those protein domains so we also see that in the evolution those proteins domains get shuffled so so they act actually as as the unit also from the structural perspective right so you know some people think of a protein as a sort of a globular molecule but as a matter of fact is is the globular part of this protein is the protein domain so we we often have this uh you know again the the the collection of this protein domains align on a string as beads and uh the protein domains are made up of amino acid residues so it's it's so this is the basic build so you're saying the protein domain is the basic building block of the function that we think about proteins doing so of course you can always talk about different building blocks turtles all the way down but it's there's a point where there is at the point of the hierarchy where it's the most the cleanest element block based on which you can put them together in different kinds of ways to form complex function and you're saying protein domains why is that not talked about as often in popular culture well you know there are several perspectives on this um and one of course is the historical perspective right so historically scientists have been able to structurally resolved to obtain the 3d coordinates of a protein for uh you know for smaller proteins and smaller proteins tend to be a single domain protein so we have a protein equal to a protein domain and so so because of that the initial suspicion was that the the the proteins are they have globular shapes and the more of smaller proteins you obtain structurally the more you were you became convinced that that's that's the the case and only later when uh we had we started having um you know uh alternative approaches so you know the the traditional uh the traditional ones are x-ray crystallography and nmr spectroscopy so this is sort of the the the two main techniques uh that uh give us the 3d coordinates but nowadays uh there is huge breakthrough in uh cry electron microscopy so the the more advanced methods that allow us to uh you know to get into the uh you know 3d shapes of much larger molecules molecular complexes to give you uh one of the common examples uh for this year right so so the the first experimental structure of a sars cove to protein was the cribium structure of the s protein so the spike protein and so it was solved very quickly and the reason for that is the advancement of the uh of this technology is is pretty spectacular how many domains does the uh is it more than one domain oh yes oh yes i mean so so it's it's a very complex structure and we you know on top of the complexity of a single protein right so this this structure is actually is a complex it's a trimer so it needs to form a trimer in order to function properly what's a complex so a complex is agglomeration of multiple proteins and so we can have the same protein copied in multiple uh you know made up in multiple copies and forming something that we called uh a homo oligomer homo means the same right so so in this case so uh uh sp the spike protein is the is an example of a homo tetramer uh homotrimer sorry so these three copies of the three copies in order to exactly we have the these three chains the the three molecular chains uh coupled uh together and performing the the function that's what when when you look at this protein from from the top you see a perfect triangle yeah so uh but other uh you know so other complexes are made up of um you know different proteins uh some of them are completely different some of them are similar the the hemoglobin molecule right so it's actually it's a protein complex it's made of four basic subunits two of them uh are identical to each other and two other identical to each other but they are also similar to each other which sort of uh gives us some ideas about the evolution of this uh you know uh of this uh molecule and uh perhaps so one of the hypothesis is that you know in the past it was just a homo tetramer right so four identical comp uh copies and then it became you know uh sort of uh modify it became mutated over the time and and became more specialized can we linger on the spike protein for a little bit is is there something interesting or like beautiful you find about it i mean first of all it's an incredibly challenging protein and so we as a part of uh our sort of research to understand the structural basis of this virus to sort of decode structured decode every single protein in its proteome uh which you know we've been working on the spike uh protein and uh one of the main challenges was that um the cryovm data allows us to reconstruct or to obtain the 3d coordinates of roughly two thirds of the protein the rest of the one-third of this protein it's a part that uh is buried into the into the membrane of the virus and uh of the of the viral envelope and uh it also has a lot of unstable structures around it so it's chemically interacting somehow with whatever the heck is connecting yeah so so it people are still trying to understand so so the the nature of and the the role of this uh you know uh of this uh one third because the the top part uh you know the the primary function is to get attached to the you know h2 receptor human receptor there is also beautiful you know mechanics of how this thing happens right so because there are three different copies of this uh chains or you know there are three different domains right so we're talking about domains so this is the receptor binding domains rpgs that gets untangled and get ready to to to atta to get attached to to the receptor and now they are not necessarily going in a sync mode as a matter of fact it's asynchronous yeah so yes so and this is this is where you know the another level of complexity comes into play because you know right now what we see is we typically see just one of the arms going out and getting ready to to at a time to be attached to the uh to the ac2 receptors however there was a recent mutation that uh people studied in that spike protein and a very recently a group from umass medical school uh we happened to collaborate with groups so this is a group of jeremy lubin and a number of uh other faculty um they uh actually uh solved the uh the mutated structure of the spike and they showed that actually because of these mutations you have more than one arms opening up and so now so you so the frequency of two arms going up increa increase quite you know drastically how interesting is that does that change the dynamics somehow it potentially can change the dynamics of because now you have two possible opportunities to get attached to the ac2 receptor it's a very complex molecular process mechanistic process but the first step of this process is the attachment of this spike protein of the spike trimer to the human h2 receptor so this is a molecule that sits on the surface of the human cell and that's essentially what initiates the what triggers the whole process of in you know encapsulation if this was dating this would be the first date so this is the uh the way yes so is it is it possible to have the spike protein just like floating about on its own or does it need that interactive ability with the uh with the membrane yeah so it needs to be attached at least as far as i know but uh you know when you get this thing attached on the surface right there is also a lot of dynamics on where how it sits on the surface right so for example uh there was a recent work in uh again uh where people use the cry electron microscopy to get the first glimpse of the overall structure it's a very low res but you still get some interesting details about this surface about what is happening inside because we have literally no clue until recent work about how the the capsid is organized so capsid is essentially it's the inner core of the viral particle where the uh there is a the rna of the virus and it's protected by another protein and protein that essentially acts as a shield but you know now we are learning more and more so it's actually it's not just this shield it's you is potentially is used for the stability of the outer shell of the of the virus so it's it's pretty complicated and uh i mean understanding all of this is really useful for trying to figure out like developing a vaccine or some kind of drug to attack any aspects of this right so i mean there are many different implications to that i mean first of all you know it's it's important to understand the virus itself right so you know in order to uh to understand how it acts what is the overall mechanism mechanistic process of this virus replication of this virus proliferation to the cell right so so that's one uh aspect the the other aspect is you know designing new treatments right so one of the uh possible treatments is uh you know designing nanoparticles and so some nanoparticles that will resemble the viral shape that would have the spike integrated and essentially would act uh as a competitor to the real virus by blocking the ace2 receptors and thus preventing the real virus entering the cell now there are also you know there is a very interesting direction in looking at the the membrane at the envelope portion of the protein and attacking its uh m protein so so there are uh you know to give you a you know sort of a brief overview there are four structural proteins these are the proteins that made up a structure of the virus so spike has protein that acts as a trimer so it needs three copies e envelope protein that acts as a pantomer so it needs five copies to act properly m uh is a is a membrane protein at it forms dimers and actually it forms beautiful lattice and this is something that we've been studying and we are seeing it in simulations it it actually forms a very nice grit or you know threads uh you know uh of of different dimers attached next to each of copies of each other and they naturally when you have a bunch of copies of each other they form an interesting lattice exactly and and you know you if you think about this right so so so the this complex you know the divi the viral shape needs to be organized somehow self-organized somehow right so it you know if it was a completely random process you know you probably wouldn't have the the the envelope shell of the so ellipsoid shape you know you would have something you know pretty random right shape so there is some you know regularity and how this uh you know uh how this uh m dimers get to attach to each other in a very specific directed way is that understood at all uh it's not understood we are now we we've been working in the past six months since you know we met actually this is where we started working on on trying to understand the overall structure of the envelope and the the key components that made up this uh you know uh structure does the envelope also have the lattice structure no so so the envelope is essentially is the outer shell of the viral particle the n the nucleocapsid protein is something that is inside but get that the n is likely to interact with m does it go m and e like where's the e and so so e those different proteins they occur in different copies on the viral particle so so e this pantomimer complex we only have two or three maybe per each particle okay we have thousand or so of m dimers that essentially made up uh that makes up uh the entire you know outer shell sure so most of the outer shell is the m m dimer and protein when you say particle that's the viron the virus the individual single single element of the virus it's a single virus single virus right and we have about you know roughly 50 to 90 spike timers right so so so when you you know when you show a per per virus particle per virus particle sorry what did you say 50 to 90 50 to 90 right so so this is how this thing is organized and so now typically right so you see this uh uh the the antibodies that target you know spike proteins certain parts of the spike protein but there could be some or also some treatments right so so these are you know these are small molecules that bind strategic parts of these proteins disrupting its functioning so one of the promising directions uh it's one of the newest directions is actually targeting the m dimer of the protein targeting the proteins that make up this outer shell because if you are able to destroy the outer shell you are essentially destroying the the the viral particle itself so preventing it from from you know functioning at all so that's you think is uh from a sort of cyber security perspective virus security perspective that's the best attack vector is uh or like that's a promising attack vector i would say yeah so i mean this is still tons of research needs to be you know to be done but uh yes i think you know so there's more attack surface i guess more attack surface but you know from from our analysis from other evolution analysis this protein is evolutionary more stable compared to the say to the spike protein unstable means a more uh static target well yeah so so it it it doesn't change it doesn't evolve from the evolutionary perspective so drastically as for example the spike protein there's a bunch of stuff in the news about mutations of the virus in the united kingdom i also saw in south africa something maybe that was yesterday you just kind of mentioned about stability and so on which aspects of this are mutatable and which aspects if mutated become more dangerous and maybe even zooming out what are your thoughts and knowledge and ideas about the way it's mutated all the news that we've been hearing are you worried about it from a biological perspective are you worried about it from a human perspective so i mean you know mutations are sort of a general way for these viruses to evolve right so so it's you know it's uh essentially this is the way they evolve this is the way they were able to jump from you know one species to another we also see uh you know some recent jumps there were some incidents of this virus jumping from human to dogs so you know there is some danger in in in in those jobs because you know every time it jumps it also mutates right so so it when it jumps to to the uh to the species and jumps back right so it acquires some mutations that are sort of um driven by the environment of a new host yeah right and it's different from the human environment and so we don't know whether the mutations that are required uh in the new species are neutral with respect to the human host or maybe you know maybe um damaging yeah change is always scary but so you worried about i mean it seems like because the spread is during winter niles seems to be exceptionally high uh and especially with a vaccine just around the corner already being actually deployed there's some worry that there's this puts evolutionary pressure selective pressure on the virus afford to uh to mute for you to mutate is that us yeah well i mean there is always this thought you know in in in the scientists my mind you know what happened what will happen right so uh i know they've been uh they've been discussions about sort of the arms race between the you know the ability of of the uh of the you know humanity to uh you know to get vaccinated faster then the virus you know uh essentially you know becomes uh you know resistant to to the vaccine um i i mean i don't worry that much uh simply because uh you know there is not that much evidence to that to aggressive mutation around the vaccine exactly you know obviously there are mutations around the works there are vaccines so the reason we get vaccinated every year against the season of mutations right um but uh you know i think it's important to study it no doubts right so i think one of the you know to me and again i might be biased uh because you know we we've been uh trying to to do that as well uh so but one of the critical directions in understanding the virus is to uh to understand its evolution in order to uh sort of understand the mechanisms the key mechanisms that lead the virus to jump you know the nordic viruses to jump from species from species to another that the mechanisms that lead the virus to become resistant to accidents also to treatments right and hopefully that knowledge was uh will enable us to sort of forecast the evolutionary uh traces the future evolutionary traces of those virus i mean what uh from a biological perspective this might be a dumb question but is there parts of the virus that if uh souped up like through mutation could make it more effective at doing its job we're talking about the specific coronavirus like yeah because we were talking about the different like the membrane the m protein the e protein the n and the s the spike is there some there are 20 or so more in addition to that but is that is that a dumb way to look at it like uh which of these if mutated could have the greatest impact potentially damaging impact on the effectiveness of the virus so it's actually it's it's a very good question because and and the short answer is we don't know yet but uh of course there is capacity of this virus to to become more efficient the reason for that is um you know so if you look at the virus i mean it's it's a machine right so it's a machine that does a lot of different functions and many of these functions are sort of nearly perfect but they are not perfect and those mutations can make those functions more perfect for example the attachment to ace2 receptor right of the spike right so uh you know is it has this virus reached the efficiency in which the attachment is carried out or there are some mutations that uh that still to be discovered right that will make this attachment uh sort of stronger or you know something uh more in a way more efficient from the point of view of this virus functioning that's that's sort of the obvious example but if you look at each of these proteins i mean it's there for a reason it performs certain function and it could be that certain mutations will you know enhance this function it could be that some mutations will make this function much less efficient right so that's that's also the case let's uh since we're talking about the evolutionary history of a virus uh let's zoom back out and uh look at the evolution of proteins i i glanced at this 2010 nature paper on the quote ongoing expansion of the protein universe and then you know it kind of implies and uh talks about that uh protein started with a common ancestor which is you know kind of interesting it's interesting thing about like even just like the first organic thing that started life on earth and from that there's now uh you know what is it 3.5 billion years later there's now millions of proteins and they're still evolving and that's you know in part one of the things that you're researching is there something interesting to you about the evolution of proteins from this initial ancestor to today is there something beautiful insightful about this long story so i think you know uh if if i were to pick a single keyword about uh protein evolution i would pick modularity something that we talked about uh in the in the beginning and that's the fact that the proteins are no longer considered as you know as a sequence of letters there are hierarchical uh complexities in the way these proteins are organized and uh this complexities are actually going beyond the protein sequence it's actually going all the way back to the uh to the gene to the nucleotide sequence and so you know again these protein domains they are not only functional building blocks they are also evolutionary building blocks and so what we see in the sort of in the later stages of evolution i mean once this stable structurally and functionally building blocks were discovered they essentially they stay those domains stay as such so that's why if you start comparing different proteins you will see that many of them will have similar fragments and those fragments will correspond to something that we call protein domain families and so so they are still different because you you still have mutations and and and and you know the you know different mutations are attributed to to you know diversification of the function of this uh you know uh protein domains however you don't you very rarely see um you know the the evolutionary events that would split this domain into fragments because and it's you know once you have the the the the domain split you actually you uh you know you can completely cancel out its function or at the very least you can reduce it and that's not you know efficient from the point of view of the you know of the cell functioning so so the the protein domain level is a very important one now on top of that right so if you look at the proteins right so you have this structural units and they carry out the function but then much less is known about things that connect this protein domains something that we call linkers and those linkers are completely flexible you know parts of the protein that nevertheless carry out a lot of function it's like little tails little heads so so we we do have tails so they called termini c and and terminus so these are things right on the on on on one and another ends of the protein sequence so they are also very important so they they attribute it to very specific uh interactions between the proteins so but you're referring to the links between domains that connect the domains and you know apart from the just the the uh simple perspective if you have you know a very short domain you have sorry a very short linker you have two domains next to each other they are forced to be next to each other if you have a very long one you have the domains that are extremely flexible and they carry out a lot of sort of pa spatial reorganization right so but on top of that right just this linker itself because it's so flexible it actually can adapt to a lot of different shapes and therefore it's a it's a very good interactor when it comes to interaction between this protein and other protein all right so these things also evolve you know uh and they in a way have different law uh sort of uh uh laws of uh or the driving laws that underlie the the evolution because they no longer need to uh to preserve certain structure right uh unlike protein domains and so now on top of that you have uh something that is even less studied and this is something that uh uh attribute to to the concept of alternative splicing so alternative splicing so it's a it's a very cool concept it's something that uh uh we've been fascinated about for you know over a decade uh in my lab and trying to do research with that but so you know so so typically you know a simplistic perspective is that one gene is equal one protein product right so you have a gene you know you transcribe it and and translate it and you it becomes a protein in reality when we talk about eukaryotes especially sort of more recent eukaryotes that are very complex the gene is not it's no longer equal to one protein it actually can uh produce multiple functionally uh you know active protein products and each of them is you know is called an alternatively spliced product the reason it happens is that if you look at the gene it actually has it has also blocks and the blocks some of which and it it's essentially it goes like this so we have a block that will later be translated we call it exon then we'll have a block that is not translated cut out we call it intron so we have exon intron exon intro et cetera et cetera et cetera right so sometimes you can have uh you know dozens of these exons and introns so what happens is during the the process when the gene is converted to rna we have things that are cut out the introns that cut out and exons that now get assembled together and sometimes we will throw out some of the exons and the remaining protein products will become still be the same different or different right so so now you have uh fragments of the protein that no longer there they were cut out with the introns sometimes you will essentially take one exam and replace it with another one right so there's some flexibility in in this process so so that creates a whole new level of complexity because it's random though is it random it's it's not random we and and this is where i think uh now the the appearance of this modern uh single cell uh and and before that tissue level sequencing next generation sequencing techniques such as rna-seq allows us to see that this these are the events that often happen in response in it's a it's a dynamic event that happens in response to to disease or in response to certain developmental stage of a cell and and this is an incredibly complex layer that also undergoes i mean because it's at the gene level right so it undergoes certain evolution right and uh now we have this interplay between what's happening and what is happening in the in the protein world and what is happening in the in the gene and you know rna world and for example you know it's it's often that we see that the boundaries of this exons coincide with the boundaries of the protein domains right so there is this you know close interplay to that uh it's not always i mean you know otherwise it would be too simple right but we do see the connection between those sort of machineries and obviously the evolution will pick up this complexity and uh you know select for whatever is successful we see that complexity in play and and makes this question you know more complex but more exciting as a small detour i don't know if you think about this in into the world of computer science there's uh douglas house that or i think came up with a name of quine which are i don't know if you're familiar with these things but it's computer programs that have uh i guess exxon and intron and they copy the whole purpose of the program is to copy itself so it prints copies of itself but can also carry information inside of it so it's a very kind of crude fun exercise of um can we sort of replicate these ideas from cells can we have a computer program that when you run it just prints itself the entirety of itself and does it in different programming languages and so on i've been playing around and writing them it's a kind of fun little exercise you know when i was a kid so so you know it it was essentially one of the of the sort of main stages in in informatics olympiads that you have to reach in order to be any so good is you should be able to write a program that replicates itself and so the tax then becomes even you know sort of more complicated so what is the shortest what is the sure program yeah and of course it's it's you know it's a function of a programming language but yeah i remember you know long long long time ago when we tried to you know to to make it short and short and find the the the shortcut there's actually on a stack exchange there's a entire site called code golf i think where the entirety is just a competition people just come up with whatever task i don't know like uh write code that reports the weather today and the competition is about whatever programming language what is the shortest program and it makes you actually people should check it out because it makes you realize there's some some weird programming languages out there but you know just to dig on that a little uh deeper uh do you think you know in computer science we don't often think about programs just like the machine learning world now uh that's still kind of basic programs and then there's humans that replicate themselves right and there's these mutations and so on do you think we'll ever have a world where there's programs that kind of have an evolutionary process so i'm not talking about evolutionary algorithms but i'm talking about programs that kind of mate with each other and evolve and like on their own replicate themselves so this is kind of the idea here is you know that's how you can have a runaway thing so we think about machine learning as a system that gets smarter and smarter and smarter and smarter at least the machine learning systems of today are like it's it's a program that you can like turn off as opposed to throwing a bunch of little programs out there and letting them like multiply and mate and evolve and replicate do you ever think about that kind of world you know when we jump from the biological systems that you're looking at to to artificial ones i mean it's almost like you you take the the sort of the area of intelligent agents right which are essentially the the independent sort of uh codes that run and interact and exchange the information right so i i don't see why not i mean i you know it could be sort of a natural evolution in in in this you know uh area of computer science i think it's kind of an interesting possibility it's terrifying too but i think it's a really powerful tool like to have like agents that inter you know we have social networks with millions of people and they interact i think it's interesting to inject into that was already injected into that bots right but those bots are pretty dumb uh uh you know they're they're probably pretty dumb algorithms uh you know it's interesting to think that there might be bots that evolve together with humans and there's the sea of humans and robots that are operating first in the digital space and then you can also think i love the idea some people worked i think at harvard at penn there's uh robotics labs that you know build take as a fundamental task to build a robot that given extra resources can build another copy of itself like in the physical space which is uh super difficult to do but super interesting i remember there's like research on robots that can build a bridge so they make a copy of themselves and they connect themselves and sort of like self-building bridge based on building blocks you can imagine like a building that self-assembles so it's basically self-assembling structures from uh from uh robotic parts but it's interesting to within that robot add the ability to mutate and uh and and do all the interesting like little things that you're referring to in evolution to go from a single origin protein building block to like well weird complexity and if you think about this i mean you know the bits and pieces are there you know so so you mentioned revolutionary algorithm right you know so this is sort of yeah and the the maybe sort of the the goal is in a way different right so the goal is to you know to essentially uh to to optimize your search right so uh but uh sort of the the ideas are there so you people recognize that you know that the the you know recombination events lead to global changes in the in in search trajectories the mutations event is a more refined uh you know uh step in in the search then you have you know uh other sort of uh nature inspired algorithm right so one of the reason that that you know i think it's it's one of the funnest one is the slime uh based algorithm right so that it's a i think the first was introduced by the japanese group but where it was able to to solve uh some some pre you know complex problems uh so so that's the yeah and and then i think uh there are still a lot of things we've yet to to you know borrow from the nature right so there are a lot of sort of ideas that nature uh you know gets to offer us that you know it's up to us to grab it and to to to you know get the best use of it including neural networks you know we have a very crude inspire inspiration from nature on neural networks maybe there's other inspirations to be discovered in the brain or other aspects of uh the various systems even like the immune system the way it uh interplays i recently started to understand that like the immune system has something to do with the way the brain operates like there's multiple things going on in there which uh all of which are not modeled in artificial neural networks and maybe if you throw a little bit of that biological spice in there you'll come up with something uh something cool i i i'm not sure if you're familiar with the drake equation that uh estimate i just did a video on it yesterday because i wanted to give my own estimate of it it's uh it's an equation that combines a bunch of factors to estimate how many alien civilizations oh yeah i've heard about it yes so one one of the interesting parameters you know it's like how many uh stars are born every year how many planets are on average per star uh for this how many habitable planets are there and then the the one that starts being really interesting is uh the probability that life emerges on a habitable planet so like i don't know if you think about you certainly think a lot about evolution but do you think about the thing which evolution doesn't describe which is like the beginning of evolution the origin of life i think i put the probability of life developing a habitable planet one percent this is very scientifically uh rigorous okay uh well first at a high level for the drake equation what would you put that percent that on earth and in general do you have something do you have thoughts about how life might have started you know like the proteins being the first kind of one of the early jumping points yes so so um i think back in 2018 there was a very exciting paper published in nature where they uh found uh one of the simplest amino acids glycine in this in a comet dust so so this is uh and i i i apologize if i uh don't pronounce it's a russian named comets it's i think to grim of gerasimenko this is the comment where and there was this uh um mission to to get and uh get close to this comment and get the the stardust from from its tail and uh when scientists analyzed it they actually found traces of uh you know uh of glycine which you know makes up you know the one it's one of the basic uh one of the 20 basic uh amino acids that makes up proteins right so uh so that was exciting very exciting right but you know it's the question is very interesting right so what uh you know what if there is some alien life is it gonna be made of proteins right or maybe rnas right so we see that you know the the rna viruses are certainly you know very well established sort of uh you know group of molecular machines right so um so yeah it's it's it's a very interesting question you know what what probability would you put like how hard is this job like how unlikely just on earth do you think this whole thing is that we got going like is that are we really lucky or is it inevitable like what's your sense when you sit back and think about life on earth is it higher or lower than one percent well because one percent is pretty low but it still is like damn that's pretty good chance yes it's it's a pretty good chance i mean i i would personally but again you know i'm um you know probably not the best person to to to do such estimations but uh i would you know intuitively i would probably put it lower yeah but still i mean you know we're really lucky here on earth uh i mean or the conditions are really good it means you know i think that there was everything was right in a way right so it's still it's not the the conditions were not like ideal if you try to to look at you know what was you know several billions years ago when the life emerged so there is something called uh the rare earth hypothesis that you know encounter to the drake equation says that the you know the conditions of earth if you actually were to describe earth it's quite a special place so special might be unique in our galaxy and potentially you know close to unique in the entire universe like it's very difficult to reconstruct those same conditions and what the rare earth hypothesis argues is all those different conditions are essential for life and so that's the sort of the counter you know like all the things we thinking that earth is pretty average um i mean i can't really i'm trying to remember to to go through all of them but just the fact that it um is shielded from a lot of asteroids the obviously the distance to the sun but also the fact that it's um it's like a perfect balance between the amount of water and land and all those kinds of things and i don't know there's a bunch of different factors that i remember there's a long list but it's fascinating to think about if if uh in order for something like proteins and then dna and rna to emerge you need um and basic living organisms you need to be a very close and earth-like planet which would be sad or exciting i don't know which uh if you ask me i you know in a way i put a parallel between um you know between our own research uh and i mean from the from the intuitive perspective you know you have those two extremes and the reality is never very rarely falls into the extremes it's always the optimus always reached somewhere in between so so i would so and that's what i tend to think i think that uh you know we're probably somewhere in between so they were not unique unique but again the chances are you know reasonably small the problem is we don't know the the other extreme is like i tend to think that we don't actually understand the basic mechanisms of like what this is all originated from like it seems like we think of life as this distinct thing maybe intelligence is a distinct thing maybe the physics that from which planets and suns are born is a distinct thing but that could be a very it's like the stephen wolfram thing it's like the from simple rules emerges greater and greater complexity so i you know i tend to believe that just life finds a way it like we don't know the extreme of how common life is because it could be life is like everywhere like like so everywhere that it's almost like laughable like that we're such idiots to think where you like it's it's like ridiculous to even like think it's like ants thinking that their little colony is the unique thing and everything else doesn't exist i mean it it's also very possible that that's uh that's the extreme and we're just not able to maybe comprehend the nature of that uh life just to stick on alien life for just a brief moment more is there is some signs of signs of life on venus in gaseous form there's uh hope for life on mars probably extinct we're not talking about intelligent life although that has been in the news recently we're talking about basic like you know uh bacteria bacteria yeah and then also i guess uh there's a couple moons there yeah your europa which is jupiter's moon i think there's another one are you um is that exciting or is it terrifying to you that we might find life do you hope we find life i certainly do hope that we'll find life um i mean it was very exciting to to hear about uh you know uh this uh news about the the possible life on the venus it'd be nice to have hard evidence of something with uh which is what the hope is for for mars and and uh europa but do you think those organisms would be similar biologically or would they even be sort of carbon based if we do find them i would say they they would be carbon based uh how similar it's a big question right so it's it's the moment we discover things outside earth right even if it's a tiny little single cell i mean there's so much just imagine that that would be so i i think that that would be another turning point for for the science you know and if especially if it's different in some very new way that's exciting because that says that's a definitive state not a definitive but a pretty strong statement that life is everywhere in the in the in the universe to me at least that's that's really exciting you brought up joshua letterberg in an offline conversation i think i'd love to talk to you about affifold and this might be an interesting way to enter that conversation because uh so he won the 1958 nobel prize in physiology medicine for discovering that bacteria can mate and exchange genes but uh he also did a ton of other stuff like uh like we mentioned helping nasa find life on mars and uh the uh the dendro the the chemical expert system expert systems remember those uh do you uh what do you find interesting about this guy and his his ideas about artificial intelligence in general so i have a kind of personal story to um to share so i started my phd in canada back in 2000 and so essentially my pg was uh so we were developing sort of a new language for symbolic uh machine learning so it's different from the feature based machine learning and and the uh one of the sort of cleanest applications of this uh you know of this approach of this formalism was uh two uh chem informatics and computer aided drug design right so so so essentially we were uh you know as a part of my research uh i developed a system that essentially looked at chemical compounds of say the same therapeutic category you know male hormones right and tried to figure out the structural fragments that are the structural building blocks that are important that define this class versus structural building blocks that are there just because you know the to complete the structure but they are not essentially the ones that make up the the chemical the the key chemical properties of this uh therapeutic category and and uh you know uh for me it was something new i was i was trained as an applied mathematician you know as with some a machine learning background but you know computer drug design was completely a completely new territory so because of that i often uh find myself asking lots of questions uh on one of these sort of central uh forums back then there were no no facebooks or stuff like that there was a forum you know it's a forum it's essentially it's like a bulletin board yeah right yeah so you essentially you have a bunch of people and you post a question and you get you know an answer from you know different people and and and back then this one of the most popular uh forums was ccl i think um computational chemistry libra not library but something like that but ccl that was the the forum and there i i you know i asked a lot of dumb questions yes i ask questions also share some some you know some uh information about our former is and how we do and whether whatever we do makes sense and so you know and uh i remember that well one of this posts i mean i still remember you know uh i uh i would call it desperately looking for uh for uh a chemist advice something like that right and so so i post my question i explained you know how how my uh our formalism is what is what it does and what kind of applications i'm planning to to do and you know and it was you know in the middle of the night and you know i went back uh you know to bed and and next morning have a phone call from my advisor who also looked at this forum it's like you won't believe who replied to you and and it's like who he said well you know there is a message to you from joshua lederberg and my reaction was like who is joshua later back your eyes are hung up so and essentially you know joshua wrote me that we we had conceptually similar ideas in in the dandruff project you may want to look it up and you know we should also sorry and it's a side comment say that even though he he won the nobel prize at a really young age in 58 but so he he was i think he was what 33 yeah it's just crazy yeah so anyway so that's so hence hence in the 90s responding to young whippersnappers on the on the ccl forum okay and and so so back then he was already very senior i mean he unfortunately passed away back in 2008 uh but you know uh back in 2001 he was i mean he was a professor emeritus at rockefeller university and you know that was actually believe it or not one of the one of the uh of uh of the reasons i decided to join uh you know as a postdoc the group of andrei saleh who was at rockefeller university with the hope that you know that i could actually you know uh have a chance uh to meet joshua in person and i met him very briefly right the you know just because he was walking you know there's a little breach that connects the sort of the research campus with the um with the uh sort of sky scrapper that rockefeller owns the where you know uh post docs and faculty and graduate students live and so so i met him you know and i had a very short conversation you know but uh so i i started you know reading about dandrull and i was amazed you know it's we're talking about 1960 yeah right the ideas were so profound well what's the fundamental ideas of it the the reason to make this is even crazier so so so leatherberg wanted to make a system that would help him study the extraterrestrial molecules right so so the idea was that you know the way you study the extraterrestrial molecules is you do the mass spec analysis right and so the mass spec gives you sort of bits numbers about essentially uh gives you the ideas about the possible fragments or you know atoms and you know and and and maybe little fragments pieces of this molecule that make up the molecule right so now you need to sort of to decompose this information and to figure out what was the whole before you know it beca became uh fragments bits and pieces right so so in order to make this uh you know to have this tool the idea of leather work was to connect chemistry computer science and to design this so-called expert system that looks that takes into account this it takes as an input the mass pack data the possible the database of possible molecules and essentially try to uh sort of induce the molecule that would correspond to this spectra or you know essentially the what this project ended up being was that you know it would provide a list of candidates that then a chemist would look at and and and make final decision so but the original idea is supposed to solve the entirety of this problem automatically yes so so so he uh you know so uh so he uh back then uh he succeeded yes believe that yeah it's it's amazing i mean it still blows my mind you know that it's that's is and this was essentially the the the origin of the modern bioinformatics game informatics you know back in the 60s yeah right so that's that's you know you know so every time you you you deal with with projects like this with the you know research like this you just you know uh so the the power of of the of the you know intelligence of this people uh is is just you know overwhelming do you think about expert systems is there um and why they kind of didn't become successful especially in the space of bioinformatics where it does seem like there's a lot of expertise in humans and uh you know it's it's possible to see that a system like this could be made very useful right so it's it's actually it's a it's a great question and and this is something so you know so uh you know at my university i teach artificial intelligence and you know we start the my first two lectures are on the history of ai and and there we you know we tried to you know go through the main stages of ei and so you know the question of why expert systems failed or became obsolete it's actually a very interesting one and there are you know if you uh try to read the you know the historical perspectives there are actually two lines of thoughts one is that the they were uh essentially not up to the expectations and so therefore they were replaced you know uh by by other things right the other one was that uh completely opposite one that they were too good and and as a result they essentially became sort of a household name and then essentially they they got transformed i mean the in both cases sort of the outcome was the same they evolved into something yeah right and that's what i you know if if i look at this right so the modern machine learning right so those echoes in in the modern machine learning i think so i think so because you know if if you think about this you know and how we design uh you know uh the most successful algorithms including alpha fault right you built in the knowledge about the domain uh that you study all right so so you built in your expertise so speaking of alpha fold the deep minds alpha fold two recently uh was announced to have quote unquote solved protein folding how exciting is this to you it seems to be one of the one of the exciting things that have happened in 2020 it's an incredible accomplishment from the looks of it what part of it is amazing to you what part would you say is over hype or maybe misunderstood it's definitely a very exciting achievement to give you a little bit of perspective right so uh so in bioinformatics we have several competitions and so the way you know you often hear uh how those competitions have been explained to uh sort of to known bioinformaticians is that you know they call it bioinformatics olympic games and there are several disciplines right so so the was so the the historical one of the first one was the discipline in predicting the protein structure predicting the 3d coordinates of the proteins but there are some other so uh the predicting protein functions uh predicting effects of uh mutations on protein functions then uh predicting uh protein protein interactions so so the original one was uh casp or a critical assessment of uh of protein structure um and um the you know typically what uh happens during these competitions is uh you know scientists experimental scientists solve the these structures but don't put them into the protein data bank which is the centralized database that contains all the 3d coordinates instead they hold it and release protein sequences and now the challenge of the community is to predict the 3d structures of these proteins and then use the experimental resolve structures to assess which one is the closest one right and this competition by the way just a bunch of different tangents and maybe you can also say what is protein folding uh and this competition casp competition is has become the gold standard and that's what was used to say that protein folding was solved so i used to add a little um yeah just a bunch so if you could whenever you say stuff maybe throw in some of the basics for the folks that might be outside of the field anyway sorry so so yeah so you know so the reason it's it's um you know it's relevant to our understanding of protein folding is because you know we we we've yet to learn how the folding mechanistically works right so there are different hypotheses what happens to this fault for example uh there is a hypothesis that the folding happens by you know in also in the modular fashion right so that you know we have protein domains that get folded independently because their structure is stable and then the whole protein structure gets formed but you know within those domains we also have uh so-called secondary structure the small alpha helices beta sheets so these are you know uh uh elements that are structurally stable and so and the the question is you know when they when do they get formed because some of the secondary structure elements you have to have uh you know a fragment in the beginning and say they're fragment in the middle right so so you cannot potentially start having a the the full fault from the get-go right uh so so it's still you know it's still a big enigma what what happens we know that it it's an extremely efficient and stable process right so there's this long sequence and the fold happens really quickly exactly well that's really weird right and it happens like the same way almost every time exactly exactly right really weird so that's freaking weird it's it's yeah that's that's why it's it is it's such a mega it's amazing but most importantly right so it's you know so when when you see the the the you know the translation process right so when you don't have the the the whole uh protein translated right it's still being translated you know uh getting out from the ribosome you you already see some structural you know fragmentation so so folding starts happening before the whole protein gets produced right and so this is this is obviously you know one of the biggest questions in you know in modern molecular biology not not like maybe what happens like that's not that's bigger than the question of folding that's the question of like like deeper fundamental idea of folding yes behind folks exactly exactly so you know so obviously if we are able to uh predict the end product of protein folding we are one step closer to understanding sort of the mechanistics of the protein folding because we can then potentially look and and start probing what are the critical parts of this process and what are not so critical parts of this process so we can start decomposing this you know so so so in the way this protein structure prediction algorithm can be can be used as a tool right so so you change the the the you know you modify the the protein you get back to to this tool it predicts okay it's completely it's completely unstable yeah which uh which aspects of the input will have a big impact on the output exactly exactly so so what happens is you know we typically have some sort of incremental uh advancement you know each stage of this cusp competition you have groups with incremental advancement and you know historically uh the top performing groups were uh you know they were not using machine learning they were using uh very advanced biophysics combined with bioinformatics combined with you know the the data mining uh and that was uh you know that would enable them to obtain uh protein structures of those proteins that don't have any structurally soft relatives because you know if we have another protein say the same protein but coming from a different species we could potentially derive some ideas and that's so-called homology or comparative modeling where we'll derive some ideas from the previously known structures and that would help us tremendously in uh you know in uh reconstructing the 3d structure uh overall but what happens when we don't have these relatives this is when it becomes really really hard right so that's so-called de novo uh uh you know uh the nova protein structure prediction and in this case those methods were uh traditionally very good but what happened in the in the last year the original alpha fault came into and over sudden it's much better than everyone else this is 2018. yeah oh the competition is only every two years um i think and and then so uh you know it was sort of kind of of a shock wave to to to to the bioinformatics community that you know we have like a state-of-the-art machine learning system that does uh you know structure prediction and and essentially what it does you know so you know if you look at this it actually predicts the context so you know so so the the process of reconstructing the the 3d structure starts by predicting the the context between the different parts of the protein and the context essentially the parts of the proteins that are in a close proximity to each other right so actually the machine learning part seems to be estimating you can correct me uh if i'm i'm wrong here but it seems to be estimating the distance matrix which is like the distance between the different parts yeah so we call the contact map contact map right so once you have the contact map the reconstruction is becoming more straightforward yeah right but so the contact map is the key and so so uh you know so that what happened and uh now we started seeing in this current stage right where in the in the most recent one we started seeing the emergence of these ideas in others people works right but yet is you know alpha fault two yeah that again outperforms everyone else and also by introducing yet another wave of of the of the you know machine learning ideas yeah uh they don't seem to be also an incorporation first of all this the paper is not out yet but there's a bunch of ideas already out there does seem to be an incorporation of this other thing i don't know if it's something that you could speak to which is like the incorporation of like other structures like evolutionary similar yes structures that are used to kind of give you hints yes so so so the evolutionary similarity uh is something that we can detect at different levels right so we know for example that this structure of proteins is more conserved than the sequence the sequence could be very different but the structural shape is actually still very conserved so that's that's sort of the intrinsic property that you know in a way related to protein folds you know to the evolution of the you know of the protein of proteins and protein domains etc but we know that i mean we they've been multiple studies and uh you know ideally if you have structures you know you should use that information however sometimes we don't have this information instead we have a bunch of sequences sequences we have a lot right so so we we have you know hundreds thousands of uh you know different organisms sequenced right and by taking this same protein but in different organisms and aligning it so making it you know making the corresponding positions aligned we can actually uh say a lot about sort of what is conserved in this protein and therefore you know structurally more stable what is diverse in these proteins so on top of that we we could provide sort of the information about the sort of the secondary structure of this protein et cetera so this information is extremely useful and it's already there so so while it's tempting to you know to do a complete ab initio so you just have a protein sequence and nothing else the reality is such that we we are overwhelmed with this data so why not use it and so yeah so i i'm looking forward to to reading the the this paper it does seem to like they've in the previous version of alpha fold they didn't uh for this the evolutionary similarity thing they didn't use machine learning for that or they rather they used it as like the input to the entirety of the the neural net like the features uh derived from the similarity it seems like there's some kind of quote-unquote iterative thing where it seems to be part of the part of the learning process is the incorporation of this evolutionary similarity yeah i i don't think there is a bioarchive paper right there's no no there's nothing yeah it's a blog post that's written by a marketing team essentially yeah which you know it has some scientific uh uh similarity probably to the the actual methodology used but it could be it's like interpreting scripture it could it could be just poetic uh interpretations of the actual work as opposed to direct connection to the work so now speaking about protein folding right so so so you know in order to answer the question whether or not we we have solved this right yeah so we need to go back to to the beginning of our conversation you know with the realization that you know an average protein is that typically what uh the the cusp uh has been focusing on is uh the you know this competition has been focusing on the single maybe two domain proteins that are still very compact and even those ones are extremely challenging to to solve right but now we talk about you know an average protein that has two three protein domains if you look at the um proteins that uh that are in charge of the you know of the process in you know with the neural system right perhaps one of the uh of the most recently evolved sort of uh systems in in in a in an organism right all of them well the majority of them are highly multi-domain proteins so they are you know some of them have five six seven you know and more domains right and you know we are very far away from understanding how these proteins are folded so the complexity of the protein matters here the complexity the complexity of the protein modules or the the protein domains so you're saying solve so the definition of solved here is particularly the cast competition achieving human level not human level achieving uh mental experimental level performance on these particular sets of proteins that have been used in these competitions well i mean you know i i i do think that uh you know especially with with regards to the alpha fault you know it is able to uh you know to solve you know at the near experimental level a pretty big majority of the of the uh more compact proteins like or protein domains because again in order to understand how the overall protein uh you know multi-domain protein fold we do need to understand the structure of its individual domains i mean unlike if you look at alpha zero or like even mu0 if you look at that work you know there it's nice reinforcement learning self-playing mechanisms are nice because it's all in simulation so you can learn from just huge amounts like you don't need data like the problem with proteins like the size uh i forget how how many 3d structures have been mapped but the training data is very small no matter what it's like millions maybe a one or two million or something like that but some very small number but like it doesn't seem like that's scalable there has to be i don't know it feels like you want to somehow 10x the data or 100x the data somehow yes but we also can take advantage of um of uh homology models right so the models that are of very good quality because they are essentially uh obtained based on the evolutionary information right so so you can there is a potential to enhance this information and uh you know use it again uh to to empower the the uh the training set um and it's i think i i am actually very optimistic i think it's been one of these uh sort of uh you know uh churning events where you have a system that is you know a machine learning system that is truly better than the sort of the more conventional biophysics based methods that's a huge leap this is one of those fun questions but uh where would you put it in in the uh ranking of the greatest breakthroughs in artificial intelligence history so like okay so let's let's see who's in the running maybe you can correct me so you got like alpha zero and alpha go beating you know beating the world champion at the game of go thought to be impossible like 20 years ago or at least the ad community was highly skeptical then you got like also deep blue original kasparov you have deep learning itself like the maybe what would you say the alexnet image in that moment so the first you'll network at achieving human level performance super not that's not true achieving like a big leap in performance on the computer vision problem uh there is open ai the whole like gpt-3 that whole space of transformers and language models just achieving this incredible performance uh of application of neural networks to language models boston dynamics pretty cool like robotics even though people are like there's no ai no no there's no machine learning currently but uh ai is much bigger than machine learning yes so so that just the engineering aspect i would say is one of the greatest accomplishments in engineering side engineering meaning like mechanical engineering of uh robotics ever then of course autonomous vehicles you can argue for waymo which is like the google self-driving car or you can argue for tesla which is like actually being used by hundreds of thousands of people on the road today machine learning system um and uh i don't know if you can what else what else is there but i think that's it so and then alpha four many people are saying as up there potentially number one would you put them at number one well in terms of the impact on on the science and on the society beyond it's definitely you know to me would be one of the you know uh top three three i mean i'm probably not the best person to to to answer that you know but you know i uh you know i i do have i i remember my you know uh back in i think 1997 when deep blue that kasparov it was i mean it was a shock i mean it was and i think for the for the you know uh for the you know uh pre-substantial part of the world that especially people who have some uh you know some experience with chess right and realizing how incredibly human this game how you know how much of a brain power you need you know to to reach those you know those levels of uh grand masters right level and it's probably one of the first time and how good caspar was and again yeah so kasparov is actually one of the best ever right and you get a machine that beats him right so it's it's first time a machine probably beat a human at that scale of a thing of anything yes yes so that was to me that was like you know one of the groundbreaking events in the history of ayat that's probably number one as probably like we don't it's hard to remember it's like muhammad ali versus uh i don't know any other mike tyson or something like that it's like nah you got to put muhammad ali at number one uh same with same with d blue even though it's not machine learning based uh i still it uses advanced search and search is the integral part of the api yeah right so it's not you said this people don't think of it that way not at this moment in vogue currently search is not seen as a as a fundamental aspect of intelligence but it very well i mean very likely is in fact i mean that's what neural networks are they're just performing search on the space of parameters and it's all search all of intelligence is some form of search and you just have to become clever and clever at that search problem and i also have uh another one that you didn't mention that's that's that's uh one of my favorite ones is uh so you probably heard of this it's uh i think it's called deep rembrandt it's the project where they they trained i think there was a collaboration between the uh sort of the uh experts in in rembrandt uh painting in netherlands and a group an artificial intelligence group where they train an algorithm to replicate the style of the rembrandt and they actually printed a a portrait that never existed before uh in the style of rembrandt they they uh i think they printed it only on a sort of uh on the canvas that you know using pretty much same types of paints and stuff and to me it was mind-blowing yeah it's in the space of art that's interesting there hasn't been um maybe that's that's it but i i think there hasn't been an image in that moment yet in a space of art you haven't been able to achieve super human level performance in the space of art even though there was you know there's a big famous thing where there was a piece of art was purchased i guess for a lot of money yes yeah but it's still you know people are like in the space of music at least um that's you know it's clear that human created pieces are much more popular so there hasn't been a moment where it's like oh this is we're now i would say in the space of music what makes a lot of money we're talking about serious money it's music and movies or like shows and so on and entertainment there hasn't been a moment where ai created uh ai was able to create a piece of music or a piece of uh cinema like netflix show that is uh you know that's sufficiently popular to make a ton of money yeah and that moment would be very very powerful because that's like a that's in the ai system being used to make a lot of money and like direct of course ai tools like even premiere audio editing all the editing everything i do to edit this podcast there's a lot of ai involved i won't actually this is a program i want to talk to those folks just because i want to nerd out it's called izotope i don't know if you're familiar with it they have a bunch of tools of audio processing and they have i think they're boston based just it's so exciting to me to use it like on the audio here because it's all machine learning it's not because most most audio production stuff is like any kind of processing you do is very basic signal processing and you're tuning knobs and so on they have all of that of course but they also have all of this machine learning stuff like where you actually give it training data you select parts of the audio you train on you you train on it and it it figures stuff out it's great it's able to detect uh like the ability of it to be able to separate voice and music for example or voice in anything is incredible like it it just it's clearly exceptionally good at uh you know applying these different neural networks models to to just separate the different kinds of signals from the audio that that uh okay so that's really exciting photoshop adobe people also use it but to generate a piece of music yeah that will sell millions a piece of art yeah no i agree and you know it's uh that's that's you know uh you know i as i mentioned i offer my my ai class and you know an integral part of this is a project right so it's it's my favorite ultimate favorite part because it typically we have these you know project presentations the last two weeks of the classes right before you know the the christmas break and it's it's sort of it adds those cool excitement and every time i'm i'm amazed you know with with some uh some projects that people uh you know come up with and so uh and quite a few of them are actually you know they some have some link to uh to to arts i mean you know i think last year uh we had a group who designed an ai uh producing uh hocus japanese poems oh wow uh so and some of them so you know it got trained on the on the english space hikers hikers right there so um and and some of them you know they get to present like the the top selection they were pretty good i mean you know i mean of course i'm not i'm not a specialist but yeah you you read them and you see it seems profound yes yeah it seems so it's kind of cool we also had a couple of projects where people tried to to teach ai how to play like rock music classical music uh i think and and and popular music interestingly enough uh you know classical music was among the most difficult ones and and and you know of course if you if you know uh you know if if you look at the you know the uh like grand masters of music like bach right so there is a lot of uh there is a lot of almost math yeah well he's very mathematical exactly so so this is i would imagine that at least some style of this music could be picked up but then you have those completely different spectrum of of you know classical composers and so you know it and you know it's almost like you know you don't have to sort of look at the data you just listen to it and say nah that's that's that's not it not yet that's not right yeah that's that's how i feel too this open ai has i think open muse or something like that the system it's cool but it's like eh it's not compelling for some for some reason it could be a psychological reason too maybe we need to have a human being a tortured soul behind the music i don't know yeah no that absolutely i completely agree but yeah whether or not we'll have at one day we'll have you know a song written by an ai engine to to be in like in top charts yeah musical charts i wouldn't be surprised i wouldn't be surprised i wonder if we already have one and it just hasn't been announced we wouldn't know how hard is the multi-protein folding problem is that kind of something you've already mentioned which is baked into this idea of greater and greater complexity of proteins like multi-domain proteins does that basically become multi-protein like complexes it's yes you you got it right so so it's sort of it has the components of both of protein folding and protein protein interactions because in order for these domains i mean many of these proteins actually they never form a stable structure uh you know one of my favorite proteins you know and uh pretty much everyone who who works in the i know who whom i know who works in the you know with proteins they always have their favorite proteins right so so one of my favorite proteins are probably my favorite protein the one that i worked when i was a postdoc is so-called post-synaptic density 95 psd 95 protein so it's uh it's one of the key actors in uh in the majority of neurological processes at the molecular level so it's a and it essentially it's a it's a key player in the postsynaptic density so this is the crucial part of this uh synapse where you know a lot of these chemoliquor processes are happening so it's it has five domains right so five protein dominances so pretty you know large proteins uh i think uh 600 something amino acids uh but you know the way it's organized itself it's flexible right so it acts as a scaffold so it is used to bring in other proteins so they start acting in the orchestrated manner right so and the the type of the shape of this protein it's in a way there are some stable parts of this protein but there are some flexible and this flexibility is built in into the protein in order to become sort of this multifunctional machine so do you think that kind of thing is also learnable through the alpha fold two kind of approach i mean the time will tell is it another level of complexity is it is it uh like how big of a jumping complexity is that whole thing to me it's it's yet another level of complexity because when we talk about uh protein protein interactions and there is actually a a different challenge for this called capri and so this that is focus uh specifically on macromolecular interactions protein protein for dna etc so uh but it's you know there are different mechanisms uh that govern molecular interactions and that need to be picked up say by a machine learning algorithm uh interestingly enough we actually we participated uh for a few years in this competition we typically don't participate in competitions i don't know uh don't have enough time you know because it's very intensive yeah yeah it's a very intensive process but we participated back in um you know about 10 years ago or so and the way we enter this competition so we design a scoring function right so the function that evaluates whether or not your protein protein interaction is supposed to look like experimentally solved right so the scoring function is very critical part of the of the uh model prediction so we designed it to be a machine learning one and so it was one of the first machine learning based scoring function used in capri and uh you know we essentially you know learned what should contribute what are the critical components contributing into the protein protein interaction so this this could be converted into a learning problem and thereby could be it could be learned i believe so yes do you think alpha fold two or something similar to it from deep mind or somebody else will be will result in a nobel prize or multiple nobel prizes so like the you know obviously maybe not so obviously you can't give a nobel prize to the computer program uh you at least for now give it to the designers of that program but is do you see one or multiple nobel prizes where alpha fold two is like a large percentage of what that prize is given for would it lead to discoveries at the level of nobel prizes i mean i think we are definitely destined to see the nobel prize becoming sort of to be evolving with the evolution of science and the evolution of science as such that it now becomes like really multifaceted right so where you have you don't really have like a unique discipline you have sort of the the a lot of cross disciplinary talks in order to achieve uh sort of you know really big advancements uh you know so i think you know the computational methods will be acknowledged in one way or another and in as a matter of fact uh you know they were first acknowledged back in 2013 right where you know the first three uh people were uh you know awarded the the nobel prize for the pro for studying the protein folding right the principle and you know i think all three of them are computational by physicists right so um you know that i think is is inavoidable you know it will come with the time um the fact that you know alpha fold and you know similar approaches because again it's a matter of time that people will embrace the this uh you know principle and we'll see more and more such uh you know such tools coming into play but uh you know these methods will be critical in uh in a scientific discovery no no doubts about it on the engineering side may be a dark question but do you think it's possible to use these machine learning methods to start to engineer proteins and the next question is something quite a few biologists are against some are for for study purposes is to engineer viruses do you think machine learning like something like alpha fold could be used to engineer viruses so to answering the first question you know it has been you know a part of the research in the protein science the protein design is you know is a very prominent areas of research of course you know one of the pioneers is david baker and rosetta algorithm that you know essentially was doing the the nova design and was used to design new proteins you know and designer proteins means design of functions like when you design a protein you can control i mean the whole point of the protein with the protein structure comes a function like correct it's doing something correct so you can design different things so you can yeah so you can do well you can look at the proteins from the function perspective you can also look at the proteins from the structure perspective right so the structural building blocks so if you want to have a building block of a certain shape you can try to achieve it yes by you know introducing a new pro protein sequence and predicting you know uh how it will fold so uh so with that i mean it's it's a natural uh one of the you know natural uh applications of these algorithms now talking about engineering a virus with machine learning with machine learning right so so well um you know so luckily for us i mean uh we don't have that much data right uh we actually uh right now one of the projects that we are uh carrying on in the lab is we we're trying to develop a machine learning algorithm that uh determines the uh whether or not the the current strain is pathogenic and the current strain of the cardiovirus of the one of the virus i mean so there are applications to coronaviruses because we have strains of sarskov2 also sarskovi mers that are pathogenic but we also have strains of other coronaviruses that are you know not pathogenic i mean the the common cold viruses and you know and some other ones right so so pathogenic meaning spreading oh but pathogenic means actually inflicting damage correct uh there are also some you know seasonal versus pandemic strains of influenza right and to determining the what are the molecular determinant right so that are built in into the protein sequence into the gene sequence right so and uh whether or not the machine learning can determine those deter those components right oh interesting so like using machine learning to do that's really interesting to to to given give the input is like uh what the entire sequence the protein sequence and then determine if this thing is going to be able to do damage to to uh to a biological system yeah so so i mean good machine learning you're saying we don't have enough data for that we i mean for for this specific one we do uh we might actually uh you know have to back up on this because we're still in the process there was uh one uh work uh that appeared in by archive by eugene kunin who is one of these you know pioneers in in in evolutionary genomics uh and they tried to look at this uh but uh you know the the methods were sort of standard uh you know supervised learning uh methods uh and now the question is you know can you you know advance it further by by using you know not so standard methods you know so there's obviously a lot of hope in in transfer learning where you can actually try to transfer the information that the machine learns about the proper protein sequences right and you know so so there is some promise in going this direction but if we have this it would be extremely useful because then we could essentially forecast the potential mutations that would make the current strain more or less pathogenic and anticipate anticipate them exactly from a vaccine development for the treatment the anti any viral drug development that would be a very crucial task but you could also use that system to then say how would we potentially modify this virus to make it more pathogenic this that's true that's true i mean uh you know the again the hope is well several things right so one is that you know it's even if you design a you know a sequence right so to carry out the actual experimental biology to ensure that all the common components working you know is is a completely different matter difficult process yes uh then they did you know uh we've seen in the past there could be some regulation of the the moment the scientific community recognizes that it's now becoming no longer a sort of a a fun puzzle to you know for for machine learning it could be yeah so so then there might be some regulation so i i think back in what 2015 there was you know the there was the issue on regulating the uh the research on um on influenza strains right that there were you know several groups uh you know use surfaces mutation analysis to to determine uh whether or not this strain will jump from one species to another and i think there was like a half a year more moratorium on on the research on on the paper published until you know uh scientists you know analyzed it and decided that it's actually safe um i forgot what that's called something a function testing function function gain a function yeah and gain a functional loss of function that's right sorry uh it's it's like let's watch this thing mutate for a while to see like to see what kind of things we can observe i guess i'm not so much worried about that kind of research if there's a lot of regulation and if it's done very well and with with competence and seriously i am more worried about kind of this uh you know the the the underlying aspect of this question is more like 50 years from now speaking to the drake equation one one of the parameters in the drake equation is how long civilizations last and that's that seems to be the most important value actually for calculating if there's other alien intelligence civilizations out there that's where there's most variability uh assuming like if life if that percentage that life can emerge is like not zero like if we're super unique then it's the how long we last is basically the most important thing so from from a selfish perspective but also from a drake uh equation perspective i'm worried about the la our civilization lasting and you kind of think about all the ways in which machine learning can be used to design greater weapons of destruction right and i mean one way to ask that if you look sort of 50 years from now 100 years from now would you be more worried about natural pandemics or engineered pandemics like who's who is the better designer of viruses nature or humans if we look down the line i i think uh in my view i would still be worried about the natural pandemics simply because i mean the the capacity uh of the nature producing yeah this it does pretty good job right yes and the motivation for using virus engineering viruses for uh as a weapon is a weird one because uh maybe you can correct me on this but it's very it seems very difficult to target a virus right the whole point of a weapon the way a rocket works if a starting point you have an end point and you're trying to hit a target to hit a target with a virus is very difficult it's basically just right it it's the target would be the human species oh man yeah i have a i have a hope in us i'm forever optimistic that we will not there's no there's insufficient evil in the world to do to lead that to that kind of destruction well you know i also hope that i mean that's what we see i mean uh with the way we are getting connected the world is getting connected i think it it helps for the world to become more transparent yeah so so the information spread is you know i think it's one of the key things for the for the society to become more balanced yeah one way or another this is something that people disagree with me on but i do think that the kind of secrecy that governments have so you're kind of speaking more to the other aspects like research community being more open companies are being more open uh government is still like we're talking about like military secrets yeah i think i think military secrets of the kind that could destroy the world will become also a thing of the 20th century it'll become more and more open yeah like i i think nations will lose power in the 21st century like lose sufficient power towards secrecies transparency is more beneficial than secrecy but of course it's not obvious let's hope so let's hope so that that you know the the the uh the governments will become more transparent what so we last talked i think in march or april what have you learned how has your philosophical psychological biological worldview changed since then or you've been studying it non-stop from a computational biology perspective how is your understanding and thoughts about this fire has changed over those months from the beginning to today one thing that uh i was really amazed at how efficient the scientific community was i mean and uh you know even just judging uh on on this very narrow domain of you know uh protein structure and understanding the structural uh characterization of of this virus from the components point of view of the you know whole wireless point of view you know if you look at at sars right the the something that happened you know oh less than 20 but you know close enough 20 years ago and you see what you know when it happened you know what was sort of the response to by the scientific community you see that the the structure characterizations did a cure but it took several years right now the things that took several years it's a matter of months right so so we we see that you know the the the research pop up we are at the unprecedented level in terms of the sequencing right never uh before we had a single virus sequence so many times you know so which allows us to actually to trace very precisely the sort of the evolutionary nature of this virus what happens and it's not just the you know this virus independently of everything is you know it's the you know the the sequence of this virus linked anchored to the specific geographic place to specific people because you know the the our genotype influences also you know the the evolution of this you know it's it's always a horse pathogen co evolution that that you know occurs it'd be cool if we also had a lot more data about sort of the spread of this virus not maybe well it'd be nice if we had it for like contact tracing purposes for this virus but it would be also nice if we had it for the study for future viruses to be able to respond and so on but it's already nice that we have geographical data and the basic data from individual humans yeah exactly i know i think context racing is is is obviously a key uh component in understanding the spread of this virus uh we there is also there is a number of challenges right so xprice is one of them we we you know uh just recently the you know took a part of this competition it's the prediction of the uh of the uh number of infections in different regions so and you know obviously the the ai is the main topic in those predictions yeah but it's still the data i mean that's that's a competition but the the data is weak on the training like it's it's great it's much more than probably before but like it would be nice if it was like really rich i i i talked to michael mina from uh from harvard i mean he dreams that the community comes together with like a weather map to wear of viruses right like really high resolution sensors on like how from person to person the viruses that travel all the different kinds of viruses right because there's there's there's a ton of them and then you'll be able to tell the story that uh you've spoken about of the evolution of these viruses like day-to-day mutations that are occurring i mean that would be fascinating just uh from perspective study and from the perspective of being able to respond to future pandemics that's ultimately what i'm worried about um people love books uh is there is there some three or whatever number of books technical fiction philosophical that uh that brought you joy in life had an impact on your life and maybe some that you would recommend others so i'll give you three very different books and i also have a special runner-up uh and a honorable match yeah it's a yeah i wouldn't i mean it's it's an audiobook and that's the yeah there's some specific reason behind it okay so uh you know so the first book is you know something that uh sort of impacted my earlier stage of life and i probably are not gonna be very original here uh it's bulgakov's master and margarita so that's probably you know well not for russian maybe it's not super original but it's you know it's it's a really powerful book for uh even in english so i write it in english so it is incredibly powerful and i mean it's the the way it ends right so it's it's i i i still have goosebumps when i read the the the very last sort of the it's called prologue where uh it's just so powerful what impact did you have on you what ideas what insights did you get from it i was just taken by you know by the the fact that you have those parallel lives apart from many centuries right and somehow they they got sort of interwined into one story and and that's to me was fascinating and uh you know of course the the romantic part of this book it's like you know it's not just you know romance it's like the romance empowered by sort of magic right and and that and and and and maybe on top of that you have some irony which inavoidable right so because it it was that you know this is the soviet time but it's very it's very it's deeply russian so that's um the the wit the humor and the pain the the love all of that is um one of the books that kind of captures something about russian culture that people outside of russia should probably read i agree what's uh what's the section so so the second one is again another one that uh it happened uh i read it uh later in my life i think uh i read it first time when i was a a a graduate student and that's uh the solzhenitsyn's uh cancer ward that is an amazingly powerful book it's what is it about it's about i mean essentially based on on uh you know seljunitsen uh uh was diagnosed with cancer when he was reasonably young and he he made a full recovery but uh you know so so this is about a a a person who was sentenced for life in one of these you know camps and he had some cancer so he was uh you know transported back to one of these uh uh soviet republics i think it was you know south asian uh republics and uh the the book is about you know his experience being a a prisoner being a you know a patient in the cancer clinic in a cancer ward surrounded by people many of which die right but in a way you know the way i you know it reads i mean first of all later on i i uh read the the accounts of the of the doctors who described these you know the the experiences uh you know uh in the book by the patient as as incredibly accurate right so so you know i i read that there was you know some doctors saying that you know every single doctor should read this book to understand what the patient feels but you know again as many of the social media sergey nissan's books it has multiple levels of complexity and obviously the the you know if you look above the cancer and the patient i mean the the the humor that was growing and then disappeared in in in his uh you know uh in his body with some consequences i mean this is you know allegorically the soviet yeah and and you know and he actually he he agree you know when he was asked he said that this is what make him think about this you know how to combine these experiences him being a part of the you know of the soviet regime also being a part of the of the you know of someone sent to the to gulag camp right and also someone who who experienced cancer in his life you know the uh the good luck archipelago and this book these are the works that actually made him uh you know receive a nobel prize but you know to me i i've uh you know i've read different other uh you know uh books by solzhenitsyn this one is to me is the most powerful one that and by the way both this one and the previous one you write in russian yes yes so now there is the third book is is an english book and it's completely different so so you know we're switching the gears completely so this is the book which it's not even a book it's a it's an essay by uh jonathan neumann called the computer and the brain and that was the thing the book he was writing knowing that he he was dying of cancer so so the book was released back it's a very thin book right but uh the the power the intellectual power in in this book in this essay is incredible i mean you probably know that von neumann is considered to be one of these biggest thinkers right so the his intellectual power was incredible right and you can actually feel this power in this book where you know the person is writing knowing that he will be you know he will die the book actually got published only after his death back in 1958 he died in 1957 and but so so he tried to put as many ideas that you know he still did you know uh hadn't realized and you know so so this book is very difficult to read because you know every single paragraph is just compact you know is is filled with this ideas and you know the ideas are incredible um even nowadays you know so so he tried to put the parallels between the brain computing power the neural system and the computers you know as they were here he was working on this like approximately 57 57. so so that was right during his you know when he was diagnosed with cancer and he was essentially yeah he's one of those um there's a few folks people mention i think ed whitton is another that like everybo everyone that meets them they say he's just an intellectual powerhouse yes okay so who's the honorable mention so and this is i mean the reason i put it sort of in this separate section because this is a book that i reasonably recent uh uh recently listened to so so it's an audio book and this is a book called lab girl by hope jaron so hope jaren is she is a a scientist she is a geo chemist that essentially studies um the uh the fossil uh plants and so she uses the this fossil and the the chemical analysis to understand what was the climate back in you know in thousand years hundreds of thousands of years ago and so something that incredibly touched me by this book it was narrated by the author nice and it's excellent incredibly personal story incredibly so certain parts of the book you could actually hear the author crying and that to me i mean i i never experienced anything like this you know reading the book but it was like you know the the connection between you and the author and i think this is you know this is really a must read but even better must listen to uh audiobook for anyone who wants to learn about sort of you know academia science research in general because it's a very personal account about her uh becoming a scientist so uh we're just before new year's you know we talked a lot about some difficult topics of viruses and so on do you have some exciting things uh you're looking forward to in 2021 some new year's resolutions may be silly or fun or something very important and fundamental to the world of science or something completely unimportant well well i'm i'm definitely looking forward to towards you know things becoming normal right so uh yes so i i really miss traveling uh every summer i go to a international summer school is called the school for molecular and theoretical biology it's held in europe and it's organized by very good friends of mine and this is a cool the school for gifted kids from all over the world and they're incredibly bright it's like every time i go there it's like you know it's it's it's it's a highlight of the year um and you know we couldn't make it this august so we we did this school remotely but it's it's different it's so some i am uh definitely looking for it next august coming there i also i mean you know one of the one of my you know personal resolutions i i realized that you know being in you know in-house and working from from home um you know i realized that that actually i apparently missed a lot you know spending time with my family believe it or not so so you you typically you know with all the research and uh you know and and teaching and everything really related to the academic life uh i mean you get distracted and so so uh you know you don't feel that you know the fact that you are away from your family doesn't affect you because you are you know naturally distracted by other things and uh you know this time i i realized that you know that that's so important right spending your time with the with the family with your kids and so i'm that that would be my new year resolution and actually trying to spend as much time as possible even when the world opens up yeah that's a that's a beautiful message that's a beautiful reminder i asked you if uh if there's a russian poem you could read uh that i could force you to read and he said okay fine sure uh do you mind do you mind reading and you're like you said that no paper needed so nope so yeah so the this poem was written by my namesake another dmitry dimitri kemirfeldt um and uh is a you know it's a recent poem and it's uh it's called uh sorceress vedima uh in in russian uh or actually call dunya so that's sort of another sort of connotation of sorceress or which and i really like it and it's one of just a handful poems i actually can recall by heart i also have a very strong association when i read this poem with uh master margarita the the the main female character margarita and also it's you know it's about you know it's happening about the same time we're talking uh now so around new year around christmas you mind uh reading that's beautiful i love how it captures a a moment of longing and uh maybe love even yes to me it has a lot of meaning about you know this something that is happening something that is far away but still very close to you and yes it's the the winter the way there's something magical about winter isn't it what is the well i don't know i don't know how to translate it but uh but a kiss in winter uh is interesting uh lips and winter and all that kind of stuff it's beautiful i mean russian has a way as a reason russian poetry is just i'm a fan of poetry in both languages but english doesn't capture some of the magic that russian seems to so thank you for doing that that was awesome dmitry is great to talk to you again you're uh it's contagious how much you love what you do how much you love life so i really appreciate you taking the time to talk today and thank you for having me thanks for listening to this conversation with demetri corkin and thank you to our sponsors brave browser netsuite business management software magic spoon low carb cereal and a sleep self cooling mattress so the choice is browsing privacy business success healthy diet a comfortable sleep choose wisely my friends and if you wish click the sponsor links below to get a discount and to support this podcast now let me leave you with some words from jeffrey eugenides biology gives you a brain life turns it into a mind thank you for listening and hope to see you next time youthe following is a conversation with dmitry korkin his second time in the podcast he's a professor of bioinformatics and computational biology at wpi where he specializes in bioinformatics of complex disease computational genomics systems biology and biomedical data analytics he loves biology he loves computing plus he is russian and recites a poem in russian at the end of the podcast what else could you possibly ask for in this world quick mention of our sponsors brave browser netsuite business management software magic spoon low carb cereal and eight sleep self cooling mattress so the choice is browsing privacy business success healthy diet or comfortable sleep choose wisely my friends and if you wish click the sponsor links below to get a discount and to support this podcast as a side note let me say that to me the scientists that did the best apolitical impactful brilliant work of 2020 are the biologists who study viruses without an agenda without much sleep to be honest just a pure passion for scientific discovery and exploration of the mysteries within viruses viruses are both terrifying and beautiful terrifying because they can threaten the fabric of human civilization both biological and psychological beautiful because they give us insights into the nature of life on earth and perhaps even extraterrestrial life of the not so intelligent variety that might meet us one day as we explore the habitable planets and moons in our universe if you enjoy this thing subscribe on youtube review it on apple podcast follow on spotify support on patreon or connect with me on twitter at lex friedman and now here's my conversation with dmitry korkin it's often said that proteins and the amino acid residues that make them up are the building blocks of life do you think of proteins in this way as the uh basic building blocks of life yes and no so the proteins indeed is the the basic unit biological unit that carries out uh important functioning of the cell however through studying the proteins and comparing the proteins across different species across these different kingdoms you realize that uh proteins are actually a more much more complicated so they have so-called modular complexity and so what i mean by that is an average protein consists of of several structural units so we call them protein domains and so you can imagine a protein as a string of beads where each bead is a protein domain and uh you know in the past 20 years scientists have been studying uh the nature of the protein domains because we realize that it's it's it's the unit because if you look at the functions right so so uh many proteins have more than one function and those protein functions uh are often carried out by those protein domains so we also see that in the evolution those proteins domains get shuffled so so they act actually as as the unit also from the structural perspective right so you know some people think of a protein as a sort of a globular molecule but as a matter of fact is is the globular part of this protein is the protein domain so we we often have this uh you know again the the the collection of this protein domains align on a string as beads and uh the protein domains are made up of amino acid residues so it's it's so this is the basic build so you're saying the protein domain is the basic building block of the function that we think about proteins doing so of course you can always talk about different building blocks turtles all the way down but it's there's a point where there is at the point of the hierarchy where it's the most the cleanest element block based on which you can put them together in different kinds of ways to form complex function and you're saying protein domains why is that not talked about as often in popular culture well you know there are several perspectives on this um and one of course is the historical perspective right so historically scientists have been able to structurally resolved to obtain the 3d coordinates of a protein for uh you know for smaller proteins and smaller proteins tend to be a single domain protein so we have a protein equal to a protein domain and so so because of that the initial suspicion was that the the the proteins are they have globular shapes and the more of smaller proteins you obtain structurally the more you were you became convinced that that's that's the the case and only later when uh we had we started having um you know uh alternative approaches so you know the the traditional uh the traditional ones are x-ray crystallography and nmr spectroscopy so this is sort of the the the two main techniques uh that uh give us the 3d coordinates but nowadays uh there is huge breakthrough in uh cry electron microscopy so the the more advanced methods that allow us to uh you know to get into the uh you know 3d shapes of much larger molecules molecular complexes to give you uh one of the common examples uh for this year right so so the the first experimental structure of a sars cove to protein was the cribium structure of the s protein so the spike protein and so it was solved very quickly and the reason for that is the advancement of the uh of this technology is is pretty spectacular how many domains does the uh is it more than one domain oh yes oh yes i mean so so it's it's a very complex structure and we you know on top of the complexity of a single protein right so this this structure is actually is a complex it's a trimer so it needs to form a trimer in order to function properly what's a complex so a complex is agglomeration of multiple proteins and so we can have the same protein copied in multiple uh you know made up in multiple copies and forming something that we called uh a homo oligomer homo means the same right so so in this case so uh uh sp the spike protein is the is an example of a homo tetramer uh homotrimer sorry so these three copies of the three copies in order to exactly we have the these three chains the the three molecular chains uh coupled uh together and performing the the function that's what when when you look at this protein from from the top you see a perfect triangle yeah so uh but other uh you know so other complexes are made up of um you know different proteins uh some of them are completely different some of them are similar the the hemoglobin molecule right so it's actually it's a protein complex it's made of four basic subunits two of them uh are identical to each other and two other identical to each other but they are also similar to each other which sort of uh gives us some ideas about the evolution of this uh you know uh of this uh molecule and uh perhaps so one of the hypothesis is that you know in the past it was just a homo tetramer right so four identical comp uh copies and then it became you know uh sort of uh modify it became mutated over the time and and became more specialized can we linger on the spike protein for a little bit is is there something interesting or like beautiful you find about it i mean first of all it's an incredibly challenging protein and so we as a part of uh our sort of research to understand the structural basis of this virus to sort of decode structured decode every single protein in its proteome uh which you know we've been working on the spike uh protein and uh one of the main challenges was that um the cryovm data allows us to reconstruct or to obtain the 3d coordinates of roughly two thirds of the protein the rest of the one-third of this protein it's a part that uh is buried into the into the membrane of the virus and uh of the of the viral envelope and uh it also has a lot of unstable structures around it so it's chemically interacting somehow with whatever the heck is connecting yeah so so it people are still trying to understand so so the the nature of and the the role of this uh you know uh of this uh one third because the the top part uh you know the the primary function is to get attached to the you know h2 receptor human receptor there is also beautiful you know mechanics of how this thing happens right so because there are three different copies of this uh chains or you know there are three different domains right so we're talking about domains so this is the receptor binding domains rpgs that gets untangled and get ready to to to atta to get attached to to the receptor and now they are not necessarily going in a sync mode as a matter of fact it's asynchronous yeah so yes so and this is this is where you know the another level of complexity comes into play because you know right now what we see is we typically see just one of the arms going out and getting ready to to at a time to be attached to the uh to the ac2 receptors however there was a recent mutation that uh people studied in that spike protein and a very recently a group from umass medical school uh we happened to collaborate with groups so this is a group of jeremy lubin and a number of uh other faculty um they uh actually uh solved the uh the mutated structure of the spike and they showed that actually because of these mutations you have more than one arms opening up and so now so you so the frequency of two arms going up increa increase quite you know drastically how interesting is that does that change the dynamics somehow it potentially can change the dynamics of because now you have two possible opportunities to get attached to the ac2 receptor it's a very complex molecular process mechanistic process but the first step of this process is the attachment of this spike protein of the spike trimer to the human h2 receptor so this is a molecule that sits on the surface of the human cell and that's essentially what initiates the what triggers the whole process of in you know encapsulation if this was dating this would be the first date so this is the uh the way yes so is it is it possible to have the spike protein just like floating about on its own or does it need that interactive ability with the uh with the membrane yeah so it needs to be attached at least as far as i know but uh you know when you get this thing attached on the surface right there is also a lot of dynamics on where how it sits on the surface right so for example uh there was a recent work in uh again uh where people use the cry electron microscopy to get the first glimpse of the overall structure it's a very low res but you still get some interesting details about this surface about what is happening inside because we have literally no clue until recent work about how the the capsid is organized so capsid is essentially it's the inner core of the viral particle where the uh there is a the rna of the virus and it's protected by another protein and protein that essentially acts as a shield but you know now we are learning more and more so it's actually it's not just this shield it's you is potentially is used for the stability of the outer shell of the of the virus so it's it's pretty complicated and uh i mean understanding all of this is really useful for trying to figure out like developing a vaccine or some kind of drug to attack any aspects of this right so i mean there are many different implications to that i mean first of all you know it's it's important to understand the virus itself right so you know in order to uh to understand how it acts what is the overall mechanism mechanistic process of this virus replication of this virus proliferation to the cell right so so that's one uh aspect the the other aspect is you know designing new treatments right so one of the uh possible treatments is uh you know designing nanoparticles and so some nanoparticles that will resemble the viral shape that would have the spike integrated and essentially would act uh as a competitor to the real virus by blocking the ace2 receptors and thus preventing the real virus entering the cell now there are also you know there is a very interesting direction in looking at the the membrane at the envelope portion of the protein and attacking its uh m protein so so there are uh you know to give you a you know sort of a brief overview there are four structural proteins these are the proteins that made up a structure of the virus so spike has protein that acts as a trimer so it needs three copies e envelope protein that acts as a pantomer so it needs five copies to act properly m uh is a is a membrane protein at it forms dimers and actually it forms beautiful lattice and this is something that we've been studying and we are seeing it in simulations it it actually forms a very nice grit or you know threads uh you know uh of of different dimers attached next to each of copies of each other and they naturally when you have a bunch of copies of each other they form an interesting lattice exactly and and you know you if you think about this right so so so the this complex you know the divi the viral shape needs to be organized somehow self-organized somehow right so it you know if it was a completely random process you know you probably wouldn't have the the the envelope shell of the so ellipsoid shape you know you would have something you know pretty random right shape so there is some you know regularity and how this uh you know uh how this uh m dimers get to attach to each other in a very specific directed way is that understood at all uh it's not understood we are now we we've been working in the past six months since you know we met actually this is where we started working on on trying to understand the overall structure of the envelope and the the key components that made up this uh you know uh structure does the envelope also have the lattice structure no so so the envelope is essentially is the outer shell of the viral particle the n the nucleocapsid protein is something that is inside but get that the n is likely to interact with m does it go m and e like where's the e and so so e those different proteins they occur in different copies on the viral particle so so e this pantomimer complex we only have two or three maybe per each particle okay we have thousand or so of m dimers that essentially made up uh that makes up uh the entire you know outer shell sure so most of the outer shell is the m m dimer and protein when you say particle that's the viron the virus the individual single single element of the virus it's a single virus single virus right and we have about you know roughly 50 to 90 spike timers right so so so when you you know when you show a per per virus particle per virus particle sorry what did you say 50 to 90 50 to 90 right so so this is how this thing is organized and so now typically right so you see this uh uh the the antibodies that target you know spike proteins certain parts of the spike protein but there could be some or also some treatments right so so these are you know these are small molecules that bind strategic parts of these proteins disrupting its functioning so one of the promising directions uh it's one of the newest directions is actually targeting the m dimer of the protein targeting the proteins that make up this outer shell because if you are able to destroy the outer shell you are essentially destroying the the the viral particle itself so preventing it from from you know functioning at all so that's you think is uh from a sort of cyber security perspective virus security perspective that's the best attack vector is uh or like that's a promising attack vector i would say yeah so i mean this is still tons of research needs to be you know to be done but uh yes i think you know so there's more attack surface i guess more attack surface but you know from from our analysis from other evolution analysis this protein is evolutionary more stable compared to the say to the spike protein unstable means a more uh static target well yeah so so it it it doesn't change it doesn't evolve from the evolutionary perspective so drastically as for example the spike protein there's a bunch of stuff in the news about mutations of the virus in the united kingdom i also saw in south africa something maybe that was yesterday you just kind of mentioned about stability and so on which aspects of this are mutatable and which aspects if mutated become more dangerous and maybe even zooming out what are your thoughts and knowledge and ideas about the way it's mutated all the news that we've been hearing are you worried about it from a biological perspective are you worried about it from a human perspective so i mean you know mutations are sort of a general way for these viruses to evolve right so so it's you know it's uh essentially this is the way they evolve this is the way they were able to jump from you know one species to another we also see uh you know some recent jumps there were some incidents of this virus jumping from human to dogs so you know there is some danger in in in in those jobs because you know every time it jumps it also mutates right so so it when it jumps to to the uh to the species and jumps back right so it acquires some mutations that are sort of um driven by the environment of a new host yeah right and it's different from the human environment and so we don't know whether the mutations that are required uh in the new species are neutral with respect to the human host or maybe you know maybe um damaging yeah change is always scary but so you worried about i mean it seems like because the spread is during winter niles seems to be exceptionally high uh and especially with a vaccine just around the corner already being actually deployed there's some worry that there's this puts evolutionary pressure selective pressure on the virus afford to uh to mute for you to mutate is that us yeah well i mean there is always this thought you know in in in the scientists my mind you know what happened what will happen right so uh i know they've been uh they've been discussions about sort of the arms race between the you know the ability of of the uh of the you know humanity to uh you know to get vaccinated faster then the virus you know uh essentially you know becomes uh you know resistant to to the vaccine um i i mean i don't worry that much uh simply because uh you know there is not that much evidence to that to aggressive mutation around the vaccine exactly you know obviously there are mutations around the works there are vaccines so the reason we get vaccinated every year against the season of mutations right um but uh you know i think it's important to study it no doubts right so i think one of the you know to me and again i might be biased uh because you know we we've been uh trying to to do that as well uh so but one of the critical directions in understanding the virus is to uh to understand its evolution in order to uh sort of understand the mechanisms the key mechanisms that lead the virus to jump you know the nordic viruses to jump from species from species to another that the mechanisms that lead the virus to become resistant to accidents also to treatments right and hopefully that knowledge was uh will enable us to sort of forecast the evolutionary uh traces the future evolutionary traces of those virus i mean what uh from a biological perspective this might be a dumb question but is there parts of the virus that if uh souped up like through mutation could make it more effective at doing its job we're talking about the specific coronavirus like yeah because we were talking about the different like the membrane the m protein the e protein the n and the s the spike is there some there are 20 or so more in addition to that but is that is that a dumb way to look at it like uh which of these if mutated could have the greatest impact potentially damaging impact on the effectiveness of the virus so it's actually it's it's a very good question because and and the short answer is we don't know yet but uh of course there is capacity of this virus to to become more efficient the reason for that is um you know so if you look at the virus i mean it's it's a machine right so it's a machine that does a lot of different functions and many of these functions are sort of nearly perfect but they are not perfect and those mutations can make those functions more perfect for example the attachment to ace2 receptor right of the spike right so uh you know is it has this virus reached the efficiency in which the attachment is carried out or there are some mutations that uh that still to be discovered right that will make this attachment uh sort of stronger or you know something uh more in a way more efficient from the point of view of this virus functioning that's that's sort of the obvious example but if you look at each of these proteins i mean it's there for a reason it performs certain function and it could be that certain mutations will you know enhance this function it could be that some mutations will make this function much less efficient right so that's that's also the case let's uh since we're talking about the evolutionary history of a virus uh let's zoom back out and uh look at the evolution of proteins i i glanced at this 2010 nature paper on the quote ongoing expansion of the protein universe and then you know it kind of implies and uh talks about that uh protein started with a common ancestor which is you know kind of interesting it's interesting thing about like even just like the first organic thing that started life on earth and from that there's now uh you know what is it 3.5 billion years later there's now millions of proteins and they're still evolving and that's you know in part one of the things that you're researching is there something interesting to you about the evolution of proteins from this initial ancestor to today is there something beautiful insightful about this long story so i think you know uh if if i were to pick a single keyword about uh protein evolution i would pick modularity something that we talked about uh in the in the beginning and that's the fact that the proteins are no longer considered as you know as a sequence of letters there are hierarchical uh complexities in the way these proteins are organized and uh this complexities are actually going beyond the protein sequence it's actually going all the way back to the uh to the gene to the nucleotide sequence and so you know again these protein domains they are not only functional building blocks they are also evolutionary building blocks and so what we see in the sort of in the later stages of evolution i mean once this stable structurally and functionally building blocks were discovered they essentially they stay those domains stay as such so that's why if you start comparing different proteins you will see that many of them will have similar fragments and those fragments will correspond to something that we call protein domain families and so so they are still different because you you still have mutations and and and and you know the you know different mutations are attributed to to you know diversification of the function of this uh you know uh protein domains however you don't you very rarely see um you know the the evolutionary events that would split this domain into fragments because and it's you know once you have the the the the domain split you actually you uh you know you can completely cancel out its function or at the very least you can reduce it and that's not you know efficient from the point of view of the you know of the cell functioning so so the the protein domain level is a very important one now on top of that right so if you look at the proteins right so you have this structural units and they carry out the function but then much less is known about things that connect this protein domains something that we call linkers and those linkers are completely flexible you know parts of the protein that nevertheless carry out a lot of function it's like little tails little heads so so we we do have tails so they called termini c and and terminus so these are things right on the on on on one and another ends of the protein sequence so they are also very important so they they attribute it to very specific uh interactions between the proteins so but you're referring to the links between domains that connect the domains and you know apart from the just the the uh simple perspective if you have you know a very short domain you have sorry a very short linker you have two domains next to each other they are forced to be next to each other if you have a very long one you have the domains that are extremely flexible and they carry out a lot of sort of pa spatial reorganization right so but on top of that right just this linker itself because it's so flexible it actually can adapt to a lot of different shapes and therefore it's a it's a very good interactor when it comes to interaction between this protein and other protein all right so these things also evolve you know uh and they in a way have different law uh sort of uh uh laws of uh or the driving laws that underlie the the evolution because they no longer need to uh to preserve certain structure right uh unlike protein domains and so now on top of that you have uh something that is even less studied and this is something that uh uh attribute to to the concept of alternative splicing so alternative splicing so it's a it's a very cool concept it's something that uh uh we've been fascinated about for you know over a decade uh in my lab and trying to do research with that but so you know so so typically you know a simplistic perspective is that one gene is equal one protein product right so you have a gene you know you transcribe it and and translate it and you it becomes a protein in reality when we talk about eukaryotes especially sort of more recent eukaryotes that are very complex the gene is not it's no longer equal to one protein it actually can uh produce multiple functionally uh you know active protein products and each of them is you know is called an alternatively spliced product the reason it happens is that if you look at the gene it actually has it has also blocks and the blocks some of which and it it's essentially it goes like this so we have a block that will later be translated we call it exon then we'll have a block that is not translated cut out we call it intron so we have exon intron exon intro et cetera et cetera et cetera right so sometimes you can have uh you know dozens of these exons and introns so what happens is during the the process when the gene is converted to rna we have things that are cut out the introns that cut out and exons that now get assembled together and sometimes we will throw out some of the exons and the remaining protein products will become still be the same different or different right so so now you have uh fragments of the protein that no longer there they were cut out with the introns sometimes you will essentially take one exam and replace it with another one right so there's some flexibility in in this process so so that creates a whole new level of complexity because it's random though is it random it's it's not random we and and this is where i think uh now the the appearance of this modern uh single cell uh and and before that tissue level sequencing next generation sequencing techniques such as rna-seq allows us to see that this these are the events that often happen in response in it's a it's a dynamic event that happens in response to to disease or in response to certain developmental stage of a cell and and this is an incredibly complex layer that also undergoes i mean because it's at the gene level right so it undergoes certain evolution right and uh now we have this interplay between what's happening and what is happening in the in the protein world and what is happening in the in the gene and you know rna world and for example you know it's it's often that we see that the boundaries of this exons coincide with the boundaries of the protein domains right so there is this you know close interplay to that uh it's not always i mean you know otherwise it would be too simple right but we do see the connection between those sort of machineries and obviously the evolution will pick up this complexity and uh you know select for whatever is successful we see that complexity in play and and makes this question you know more complex but more exciting as a small detour i don't know if you think about this in into the world of computer science there's uh douglas house that or i think came up with a name of quine which are i don't know if you're familiar with these things but it's computer programs that have uh i guess exxon and intron and they copy the whole purpose of the program is to copy itself so it prints copies of itself but can also carry information inside of it so it's a very kind of crude fun exercise of um can we sort of replicate these ideas from cells can we have a computer program that when you run it just prints itself the entirety of itself and does it in different programming languages and so on i've been playing around and writing them it's a kind of fun little exercise you know when i was a kid so so you know it it was essentially one of the of the sort of main stages in in informatics olympiads that you have to reach in order to be any so good is you should be able to write a program that replicates itself and so the tax then becomes even you know sort of more complicated so what is the shortest what is the sure program yeah and of course it's it's you know it's a function of a programming language but yeah i remember you know long long long time ago when we tried to you know to to make it short and short and find the the the shortcut there's actually on a stack exchange there's a entire site called code golf i think where the entirety is just a competition people just come up with whatever task i don't know like uh write code that reports the weather today and the competition is about whatever programming language what is the shortest program and it makes you actually people should check it out because it makes you realize there's some some weird programming languages out there but you know just to dig on that a little uh deeper uh do you think you know in computer science we don't often think about programs just like the machine learning world now uh that's still kind of basic programs and then there's humans that replicate themselves right and there's these mutations and so on do you think we'll ever have a world where there's programs that kind of have an evolutionary process so i'm not talking about evolutionary algorithms but i'm talking about programs that kind of mate with each other and evolve and like on their own replicate themselves so this is kind of the idea here is you know that's how you can have a runaway thing so we think about machine learning as a system that gets smarter and smarter and smarter and smarter at least the machine learning systems of today are like it's it's a program that you can like turn off as opposed to throwing a bunch of little programs out there and letting them like multiply and mate and evolve and replicate do you ever think about that kind of world you know when we jump from the biological systems that you're looking at to to artificial ones i mean it's almost like you you take the the sort of the area of intelligent agents right which are essentially the the independent sort of uh codes that run and interact and exchange the information right so i i don't see why not i mean i you know it could be sort of a natural evolution in in in this you know uh area of computer science i think it's kind of an interesting possibility it's terrifying too but i think it's a really powerful tool like to have like agents that inter you know we have social networks with millions of people and they interact i think it's interesting to inject into that was already injected into that bots right but those bots are pretty dumb uh uh you know they're they're probably pretty dumb algorithms uh you know it's interesting to think that there might be bots that evolve together with humans and there's the sea of humans and robots that are operating first in the digital space and then you can also think i love the idea some people worked i think at harvard at penn there's uh robotics labs that you know build take as a fundamental task to build a robot that given extra resources can build another copy of itself like in the physical space which is uh super difficult to do but super interesting i remember there's like research on robots that can build a bridge so they make a copy of themselves and they connect themselves and sort of like self-building bridge based on building blocks you can imagine like a building that self-assembles so it's basically self-assembling structures from uh from uh robotic parts but it's interesting to within that robot add the ability to mutate and uh and and do all the interesting like little things that you're referring to in evolution to go from a single origin protein building block to like well weird complexity and if you think about this i mean you know the bits and pieces are there you know so so you mentioned revolutionary algorithm right you know so this is sort of yeah and the the maybe sort of the the goal is in a way different right so the goal is to you know to essentially uh to to optimize your search right so uh but uh sort of the the ideas are there so you people recognize that you know that the the you know recombination events lead to global changes in the in in search trajectories the mutations event is a more refined uh you know uh step in in the search then you have you know uh other sort of uh nature inspired algorithm right so one of the reason that that you know i think it's it's one of the funnest one is the slime uh based algorithm right so that it's a i think the first was introduced by the japanese group but where it was able to to solve uh some some pre you know complex problems uh so so that's the yeah and and then i think uh there are still a lot of things we've yet to to you know borrow from the nature right so there are a lot of sort of ideas that nature uh you know gets to offer us that you know it's up to us to grab it and to to to you know get the best use of it including neural networks you know we have a very crude inspire inspiration from nature on neural networks maybe there's other inspirations to be discovered in the brain or other aspects of uh the various systems even like the immune system the way it uh interplays i recently started to understand that like the immune system has something to do with the way the brain operates like there's multiple things going on in there which uh all of which are not modeled in artificial neural networks and maybe if you throw a little bit of that biological spice in there you'll come up with something uh something cool i i i'm not sure if you're familiar with the drake equation that uh estimate i just did a video on it yesterday because i wanted to give my own estimate of it it's uh it's an equation that combines a bunch of factors to estimate how many alien civilizations oh yeah i've heard about it yes so one one of the interesting parameters you know it's like how many uh stars are born every year how many planets are on average per star uh for this how many habitable planets are there and then the the one that starts being really interesting is uh the probability that life emerges on a habitable planet so like i don't know if you think about you certainly think a lot about evolution but do you think about the thing which evolution doesn't describe which is like the beginning of evolution the origin of life i think i put the probability of life developing a habitable planet one percent this is very scientifically uh rigorous okay uh well first at a high level for the drake equation what would you put that percent that on earth and in general do you have something do you have thoughts about how life might have started you know like the proteins being the first kind of one of the early jumping points yes so so um i think back in 2018 there was a very exciting paper published in nature where they uh found uh one of the simplest amino acids glycine in this in a comet dust so so this is uh and i i i apologize if i uh don't pronounce it's a russian named comets it's i think to grim of gerasimenko this is the comment where and there was this uh um mission to to get and uh get close to this comment and get the the stardust from from its tail and uh when scientists analyzed it they actually found traces of uh you know uh of glycine which you know makes up you know the one it's one of the basic uh one of the 20 basic uh amino acids that makes up proteins right so uh so that was exciting very exciting right but you know it's the question is very interesting right so what uh you know what if there is some alien life is it gonna be made of proteins right or maybe rnas right so we see that you know the the rna viruses are certainly you know very well established sort of uh you know group of molecular machines right so um so yeah it's it's it's a very interesting question you know what what probability would you put like how hard is this job like how unlikely just on earth do you think this whole thing is that we got going like is that are we really lucky or is it inevitable like what's your sense when you sit back and think about life on earth is it higher or lower than one percent well because one percent is pretty low but it still is like damn that's pretty good chance yes it's it's a pretty good chance i mean i i would personally but again you know i'm um you know probably not the best person to to to do such estimations but uh i would you know intuitively i would probably put it lower yeah but still i mean you know we're really lucky here on earth uh i mean or the conditions are really good it means you know i think that there was everything was right in a way right so it's still it's not the the conditions were not like ideal if you try to to look at you know what was you know several billions years ago when the life emerged so there is something called uh the rare earth hypothesis that you know encounter to the drake equation says that the you know the conditions of earth if you actually were to describe earth it's quite a special place so special might be unique in our galaxy and potentially you know close to unique in the entire universe like it's very difficult to reconstruct those same conditions and what the rare earth hypothesis argues is all those different conditions are essential for life and so that's the sort of the counter you know like all the things we thinking that earth is pretty average um i mean i can't really i'm trying to remember to to go through all of them but just the fact that it um is shielded from a lot of asteroids the obviously the distance to the sun but also the fact that it's um it's like a perfect balance between the amount of water and land and all those kinds of things and i don't know there's a bunch of different factors that i remember there's a long list but it's fascinating to think about if if uh in order for something like proteins and then dna and rna to emerge you need um and basic living organisms you need to be a very close and earth-like planet which would be sad or exciting i don't know which uh if you ask me i you know in a way i put a parallel between um you know between our own research uh and i mean from the from the intuitive perspective you know you have those two extremes and the reality is never very rarely falls into the extremes it's always the optimus always reached somewhere in between so so i would so and that's what i tend to think i think that uh you know we're probably somewhere in between so they were not unique unique but again the chances are you know reasonably small the problem is we don't know the the other extreme is like i tend to think that we don't actually understand the basic mechanisms of like what this is all originated from like it seems like we think of life as this distinct thing maybe intelligence is a distinct thing maybe the physics that from which planets and suns are born is a distinct thing but that could be a very it's like the stephen wolfram thing it's like the from simple rules emerges greater and greater complexity so i you know i tend to believe that just life finds a way it like we don't know the extreme of how common life is because it could be life is like everywhere like like so everywhere that it's almost like laughable like that we're such idiots to think where you like it's it's like ridiculous to even like think it's like ants thinking that their little colony is the unique thing and everything else doesn't exist i mean it it's also very possible that that's uh that's the extreme and we're just not able to maybe comprehend the nature of that uh life just to stick on alien life for just a brief moment more is there is some signs of signs of life on venus in gaseous form there's uh hope for life on mars probably extinct we're not talking about intelligent life although that has been in the news recently we're talking about basic like you know uh bacteria bacteria yeah and then also i guess uh there's a couple moons there yeah your europa which is jupiter's moon i think there's another one are you um is that exciting or is it terrifying to you that we might find life do you hope we find life i certainly do hope that we'll find life um i mean it was very exciting to to hear about uh you know uh this uh news about the the possible life on the venus it'd be nice to have hard evidence of something with uh which is what the hope is for for mars and and uh europa but do you think those organisms would be similar biologically or would they even be sort of carbon based if we do find them i would say they they would be carbon based uh how similar it's a big question right so it's it's the moment we discover things outside earth right even if it's a tiny little single cell i mean there's so much just imagine that that would be so i i think that that would be another turning point for for the science you know and if especially if it's different in some very new way that's exciting because that says that's a definitive state not a definitive but a pretty strong statement that life is everywhere in the in the in the universe to me at least that's that's really exciting you brought up joshua letterberg in an offline conversation i think i'd love to talk to you about affifold and this might be an interesting way to enter that conversation because uh so he won the 1958 nobel prize in physiology medicine for discovering that bacteria can mate and exchange genes but uh he also did a ton of other stuff like uh like we mentioned helping nasa find life on mars and uh the uh the dendro the the chemical expert system expert systems remember those uh do you uh what do you find interesting about this guy and his his ideas about artificial intelligence in general so i have a kind of personal story to um to share so i started my phd in canada back in 2000 and so essentially my pg was uh so we were developing sort of a new language for symbolic uh machine learning so it's different from the feature based machine learning and and the uh one of the sort of cleanest applications of this uh you know of this approach of this formalism was uh two uh chem informatics and computer aided drug design right so so so essentially we were uh you know as a part of my research uh i developed a system that essentially looked at chemical compounds of say the same therapeutic category you know male hormones right and tried to figure out the structural fragments that are the structural building blocks that are important that define this class versus structural building blocks that are there just because you know the to complete the structure but they are not essentially the ones that make up the the chemical the the key chemical properties of this uh therapeutic category and and uh you know uh for me it was something new i was i was trained as an applied mathematician you know as with some a machine learning background but you know computer drug design was completely a completely new territory so because of that i often uh find myself asking lots of questions uh on one of these sort of central uh forums back then there were no no facebooks or stuff like that there was a forum you know it's a forum it's essentially it's like a bulletin board yeah right yeah so you essentially you have a bunch of people and you post a question and you get you know an answer from you know different people and and and back then this one of the most popular uh forums was ccl i think um computational chemistry libra not library but something like that but ccl that was the the forum and there i i you know i asked a lot of dumb questions yes i ask questions also share some some you know some uh information about our former is and how we do and whether whatever we do makes sense and so you know and uh i remember that well one of this posts i mean i still remember you know uh i uh i would call it desperately looking for uh for uh a chemist advice something like that right and so so i post my question i explained you know how how my uh our formalism is what is what it does and what kind of applications i'm planning to to do and you know and it was you know in the middle of the night and you know i went back uh you know to bed and and next morning have a phone call from my advisor who also looked at this forum it's like you won't believe who replied to you and and it's like who he said well you know there is a message to you from joshua lederberg and my reaction was like who is joshua later back your eyes are hung up so and essentially you know joshua wrote me that we we had conceptually similar ideas in in the dandruff project you may want to look it up and you know we should also sorry and it's a side comment say that even though he he won the nobel prize at a really young age in 58 but so he he was i think he was what 33 yeah it's just crazy yeah so anyway so that's so hence hence in the 90s responding to young whippersnappers on the on the ccl forum okay and and so so back then he was already very senior i mean he unfortunately passed away back in 2008 uh but you know uh back in 2001 he was i mean he was a professor emeritus at rockefeller university and you know that was actually believe it or not one of the one of the uh of uh of the reasons i decided to join uh you know as a postdoc the group of andrei saleh who was at rockefeller university with the hope that you know that i could actually you know uh have a chance uh to meet joshua in person and i met him very briefly right the you know just because he was walking you know there's a little breach that connects the sort of the research campus with the um with the uh sort of sky scrapper that rockefeller owns the where you know uh post docs and faculty and graduate students live and so so i met him you know and i had a very short conversation you know but uh so i i started you know reading about dandrull and i was amazed you know it's we're talking about 1960 yeah right the ideas were so profound well what's the fundamental ideas of it the the reason to make this is even crazier so so so leatherberg wanted to make a system that would help him study the extraterrestrial molecules right so so the idea was that you know the way you study the extraterrestrial molecules is you do the mass spec analysis right and so the mass spec gives you sort of bits numbers about essentially uh gives you the ideas about the possible fragments or you know atoms and you know and and and maybe little fragments pieces of this molecule that make up the molecule right so now you need to sort of to decompose this information and to figure out what was the whole before you know it beca became uh fragments bits and pieces right so so in order to make this uh you know to have this tool the idea of leather work was to connect chemistry computer science and to design this so-called expert system that looks that takes into account this it takes as an input the mass pack data the possible the database of possible molecules and essentially try to uh sort of induce the molecule that would correspond to this spectra or you know essentially the what this project ended up being was that you know it would provide a list of candidates that then a chemist would look at and and and make final decision so but the original idea is supposed to solve the entirety of this problem automatically yes so so so he uh you know so uh so he uh back then uh he succeeded yes believe that yeah it's it's amazing i mean it still blows my mind you know that it's that's is and this was essentially the the the origin of the modern bioinformatics game informatics you know back in the 60s yeah right so that's that's you know you know so every time you you you deal with with projects like this with the you know research like this you just you know uh so the the power of of the of the you know intelligence of this people uh is is just you know overwhelming do you think about expert systems is there um and why they kind of didn't become successful especially in the space of bioinformatics where it does seem like there's a lot of expertise in humans and uh you know it's it's possible to see that a system like this could be made very useful right so it's it's actually it's a it's a great question and and this is something so you know so uh you know at my university i teach artificial intelligence and you know we start the my first two lectures are on the history of ai and and there we you know we tried to you know go through the main stages of ei and so you know the question of why expert systems failed or became obsolete it's actually a very interesting one and there are you know if you uh try to read the you know the historical perspectives there are actually two lines of thoughts one is that the they were uh essentially not up to the expectations and so therefore they were replaced you know uh by by other things right the other one was that uh completely opposite one that they were too good and and as a result they essentially became sort of a household name and then essentially they they got transformed i mean the in both cases sort of the outcome was the same they evolved into something yeah right and that's what i you know if if i look at this right so the modern machine learning right so those echoes in in the modern machine learning i think so i think so because you know if if you think about this you know and how we design uh you know uh the most successful algorithms including alpha fault right you built in the knowledge about the domain uh that you study all right so so you built in your expertise so speaking of alpha fold the deep minds alpha fold two recently uh was announced to have quote unquote solved protein folding how exciting is this to you it seems to be one of the one of the exciting things that have happened in 2020 it's an incredible accomplishment from the looks of it what part of it is amazing to you what part would you say is over hype or maybe misunderstood it's definitely a very exciting achievement to give you a little bit of perspective right so uh so in bioinformatics we have several competitions and so the way you know you often hear uh how those competitions have been explained to uh sort of to known bioinformaticians is that you know they call it bioinformatics olympic games and there are several disciplines right so so the was so the the historical one of the first one was the discipline in predicting the protein structure predicting the 3d coordinates of the proteins but there are some other so uh the predicting protein functions uh predicting effects of uh mutations on protein functions then uh predicting uh protein protein interactions so so the original one was uh casp or a critical assessment of uh of protein structure um and um the you know typically what uh happens during these competitions is uh you know scientists experimental scientists solve the these structures but don't put them into the protein data bank which is the centralized database that contains all the 3d coordinates instead they hold it and release protein sequences and now the challenge of the community is to predict the 3d structures of these proteins and then use the experimental resolve structures to assess which one is the closest one right and this competition by the way just a bunch of different tangents and maybe you can also say what is protein folding uh and this competition casp competition is has become the gold standard and that's what was used to say that protein folding was solved so i used to add a little um yeah just a bunch so if you could whenever you say stuff maybe throw in some of the basics for the folks that might be outside of the field anyway sorry so so yeah so you know so the reason it's it's um you know it's relevant to our understanding of protein folding is because you know we we we've yet to learn how the folding mechanistically works right so there are different hypotheses what happens to this fault for example uh there is a hypothesis that the folding happens by you know in also in the modular fashion right so that you know we have protein domains that get folded independently because their structure is stable and then the whole protein structure gets formed but you know within those domains we also have uh so-called secondary structure the small alpha helices beta sheets so these are you know uh uh elements that are structurally stable and so and the the question is you know when they when do they get formed because some of the secondary structure elements you have to have uh you know a fragment in the beginning and say they're fragment in the middle right so so you cannot potentially start having a the the full fault from the get-go right uh so so it's still you know it's still a big enigma what what happens we know that it it's an extremely efficient and stable process right so there's this long sequence and the fold happens really quickly exactly well that's really weird right and it happens like the same way almost every time exactly exactly right really weird so that's freaking weird it's it's yeah that's that's why it's it is it's such a mega it's amazing but most importantly right so it's you know so when when you see the the the you know the translation process right so when you don't have the the the whole uh protein translated right it's still being translated you know uh getting out from the ribosome you you already see some structural you know fragmentation so so folding starts happening before the whole protein gets produced right and so this is this is obviously you know one of the biggest questions in you know in modern molecular biology not not like maybe what happens like that's not that's bigger than the question of folding that's the question of like like deeper fundamental idea of folding yes behind folks exactly exactly so you know so obviously if we are able to uh predict the end product of protein folding we are one step closer to understanding sort of the mechanistics of the protein folding because we can then potentially look and and start probing what are the critical parts of this process and what are not so critical parts of this process so we can start decomposing this you know so so so in the way this protein structure prediction algorithm can be can be used as a tool right so so you change the the the you know you modify the the protein you get back to to this tool it predicts okay it's completely it's completely unstable yeah which uh which aspects of the input will have a big impact on the output exactly exactly so so what happens is you know we typically have some sort of incremental uh advancement you know each stage of this cusp competition you have groups with incremental advancement and you know historically uh the top performing groups were uh you know they were not using machine learning they were using uh very advanced biophysics combined with bioinformatics combined with you know the the data mining uh and that was uh you know that would enable them to obtain uh protein structures of those proteins that don't have any structurally soft relatives because you know if we have another protein say the same protein but coming from a different species we could potentially derive some ideas and that's so-called homology or comparative modeling where we'll derive some ideas from the previously known structures and that would help us tremendously in uh you know in uh reconstructing the 3d structure uh overall but what happens when we don't have these relatives this is when it becomes really really hard right so that's so-called de novo uh uh you know uh the nova protein structure prediction and in this case those methods were uh traditionally very good but what happened in the in the last year the original alpha fault came into and over sudden it's much better than everyone else this is 2018. yeah oh the competition is only every two years um i think and and then so uh you know it was sort of kind of of a shock wave to to to to the bioinformatics community that you know we have like a state-of-the-art machine learning system that does uh you know structure prediction and and essentially what it does you know so you know if you look at this it actually predicts the context so you know so so the the process of reconstructing the the 3d structure starts by predicting the the context between the different parts of the protein and the context essentially the parts of the proteins that are in a close proximity to each other right so actually the machine learning part seems to be estimating you can correct me uh if i'm i'm wrong here but it seems to be estimating the distance matrix which is like the distance between the different parts yeah so we call the contact map contact map right so once you have the contact map the reconstruction is becoming more straightforward yeah right but so the contact map is the key and so so uh you know so that what happened and uh now we started seeing in this current stage right where in the in the most recent one we started seeing the emergence of these ideas in others people works right but yet is you know alpha fault two yeah that again outperforms everyone else and also by introducing yet another wave of of the of the you know machine learning ideas yeah uh they don't seem to be also an incorporation first of all this the paper is not out yet but there's a bunch of ideas already out there does seem to be an incorporation of this other thing i don't know if it's something that you could speak to which is like the incorporation of like other structures like evolutionary similar yes structures that are used to kind of give you hints yes so so so the evolutionary similarity uh is something that we can detect at different levels right so we know for example that this structure of proteins is more conserved than the sequence the sequence could be very different but the structural shape is actually still very conserved so that's that's sort of the intrinsic property that you know in a way related to protein folds you know to the evolution of the you know of the protein of proteins and protein domains etc but we know that i mean we they've been multiple studies and uh you know ideally if you have structures you know you should use that information however sometimes we don't have this information instead we have a bunch of sequences sequences we have a lot right so so we we have you know hundreds thousands of uh you know different organisms sequenced right and by taking this same protein but in different organisms and aligning it so making it you know making the corresponding positions aligned we can actually uh say a lot about sort of what is conserved in this protein and therefore you know structurally more stable what is diverse in these proteins so on top of that we we could provide sort of the information about the sort of the secondary structure of this protein et cetera so this information is extremely useful and it's already there so so while it's tempting to you know to do a complete ab initio so you just have a protein sequence and nothing else the reality is such that we we are overwhelmed with this data so why not use it and so yeah so i i'm looking forward to to reading the the this paper it does seem to like they've in the previous version of alpha fold they didn't uh for this the evolutionary similarity thing they didn't use machine learning for that or they rather they used it as like the input to the entirety of the the neural net like the features uh derived from the similarity it seems like there's some kind of quote-unquote iterative thing where it seems to be part of the part of the learning process is the incorporation of this evolutionary similarity yeah i i don't think there is a bioarchive paper right there's no no there's nothing yeah it's a blog post that's written by a marketing team essentially yeah which you know it has some scientific uh uh similarity probably to the the actual methodology used but it could be it's like interpreting scripture it could it could be just poetic uh interpretations of the actual work as opposed to direct connection to the work so now speaking about protein folding right so so so you know in order to answer the question whether or not we we have solved this right yeah so we need to go back to to the beginning of our conversation you know with the realization that you know an average protein is that typically what uh the the cusp uh has been focusing on is uh the you know this competition has been focusing on the single maybe two domain proteins that are still very compact and even those ones are extremely challenging to to solve right but now we talk about you know an average protein that has two three protein domains if you look at the um proteins that uh that are in charge of the you know of the process in you know with the neural system right perhaps one of the uh of the most recently evolved sort of uh systems in in in a in an organism right all of them well the majority of them are highly multi-domain proteins so they are you know some of them have five six seven you know and more domains right and you know we are very far away from understanding how these proteins are folded so the complexity of the protein matters here the complexity the complexity of the protein modules or the the protein domains so you're saying solve so the definition of solved here is particularly the cast competition achieving human level not human level achieving uh mental experimental level performance on these particular sets of proteins that have been used in these competitions well i mean you know i i i do think that uh you know especially with with regards to the alpha fault you know it is able to uh you know to solve you know at the near experimental level a pretty big majority of the of the uh more compact proteins like or protein domains because again in order to understand how the overall protein uh you know multi-domain protein fold we do need to understand the structure of its individual domains i mean unlike if you look at alpha zero or like even mu0 if you look at that work you know there it's nice reinforcement learning self-playing mechanisms are nice because it's all in simulation so you can learn from just huge amounts like you don't need data like the problem with proteins like the size uh i forget how how many 3d structures have been mapped but the training data is very small no matter what it's like millions maybe a one or two million or something like that but some very small number but like it doesn't seem like that's scalable there has to be i don't know it feels like you want to somehow 10x the data or 100x the data somehow yes but we also can take advantage of um of uh homology models right so the models that are of very good quality because they are essentially uh obtained based on the evolutionary information right so so you can there is a potential to enhance this information and uh you know use it again uh to to empower the the uh the training set um and it's i think i i am actually very optimistic i think it's been one of these uh sort of uh you know uh churning events where you have a system that is you know a machine learning system that is truly better than the sort of the more conventional biophysics based methods that's a huge leap this is one of those fun questions but uh where would you put it in in the uh ranking of the greatest breakthroughs in artificial intelligence history so like okay so let's let's see who's in the running maybe you can correct me so you got like alpha zero and alpha go beating you know beating the world champion at the game of go thought to be impossible like 20 years ago or at least the ad community was highly skeptical then you got like also deep blue original kasparov you have deep learning itself like the maybe what would you say the alexnet image in that moment so the first you'll network at achieving human level performance super not that's not true achieving like a big leap in performance on the computer vision problem uh there is open ai the whole like gpt-3 that whole space of transformers and language models just achieving this incredible performance uh of application of neural networks to language models boston dynamics pretty cool like robotics even though people are like there's no ai no no there's no machine learning currently but uh ai is much bigger than machine learning yes so so that just the engineering aspect i would say is one of the greatest accomplishments in engineering side engineering meaning like mechanical engineering of uh robotics ever then of course autonomous vehicles you can argue for waymo which is like the google self-driving car or you can argue for tesla which is like actually being used by hundreds of thousands of people on the road today machine learning system um and uh i don't know if you can what else what else is there but i think that's it so and then alpha four many people are saying as up there potentially number one would you put them at number one well in terms of the impact on on the science and on the society beyond it's definitely you know to me would be one of the you know uh top three three i mean i'm probably not the best person to to to answer that you know but you know i uh you know i i do have i i remember my you know uh back in i think 1997 when deep blue that kasparov it was i mean it was a shock i mean it was and i think for the for the you know uh for the you know uh pre-substantial part of the world that especially people who have some uh you know some experience with chess right and realizing how incredibly human this game how you know how much of a brain power you need you know to to reach those you know those levels of uh grand masters right level and it's probably one of the first time and how good caspar was and again yeah so kasparov is actually one of the best ever right and you get a machine that beats him right so it's it's first time a machine probably beat a human at that scale of a thing of anything yes yes so that was to me that was like you know one of the groundbreaking events in the history of ayat that's probably number one as probably like we don't it's hard to remember it's like muhammad ali versus uh i don't know any other mike tyson or something like that it's like nah you got to put muhammad ali at number one uh same with same with d blue even though it's not machine learning based uh i still it uses advanced search and search is the integral part of the api yeah right so it's not you said this people don't think of it that way not at this moment in vogue currently search is not seen as a as a fundamental aspect of intelligence but it very well i mean very likely is in fact i mean that's what neural networks are they're just performing search on the space of parameters and it's all search all of intelligence is some form of search and you just have to become clever and clever at that search problem and i also have uh another one that you didn't mention that's that's that's uh one of my favorite ones is uh so you probably heard of this it's uh i think it's called deep rembrandt it's the project where they they trained i think there was a collaboration between the uh sort of the uh experts in in rembrandt uh painting in netherlands and a group an artificial intelligence group where they train an algorithm to replicate the style of the rembrandt and they actually printed a a portrait that never existed before uh in the style of rembrandt they they uh i think they printed it only on a sort of uh on the canvas that you know using pretty much same types of paints and stuff and to me it was mind-blowing yeah it's in the space of art that's interesting there hasn't been um maybe that's that's it but i i think there hasn't been an image in that moment yet in a space of art you haven't been able to achieve super human level performance in the space of art even though there was you know there's a big famous thing where there was a piece of art was purchased i guess for a lot of money yes yeah but it's still you know people are like in the space of music at least um that's you know it's clear that human created pieces are much more popular so there hasn't been a moment where it's like oh this is we're now i would say in the space of music what makes a lot of money we're talking about serious money it's music and movies or like shows and so on and entertainment there hasn't been a moment where ai created uh ai was able to create a piece of music or a piece of uh cinema like netflix show that is uh you know that's sufficiently popular to make a ton of money yeah and that moment would be very very powerful because that's like a that's in the ai system being used to make a lot of money and like direct of course ai tools like even premiere audio editing all the editing everything i do to edit this podcast there's a lot of ai involved i won't actually this is a program i want to talk to those folks just because i want to nerd out it's called izotope i don't know if you're familiar with it they have a bunch of tools of audio processing and they have i think they're boston based just it's so exciting to me to use it like on the audio here because it's all machine learning it's not because most most audio production stuff is like any kind of processing you do is very basic signal processing and you're tuning knobs and so on they have all of that of course but they also have all of this machine learning stuff like where you actually give it training data you select parts of the audio you train on you you train on it and it it figures stuff out it's great it's able to detect uh like the ability of it to be able to separate voice and music for example or voice in anything is incredible like it it just it's clearly exceptionally good at uh you know applying these different neural networks models to to just separate the different kinds of signals from the audio that that uh okay so that's really exciting photoshop adobe people also use it but to generate a piece of music yeah that will sell millions a piece of art yeah no i agree and you know it's uh that's that's you know uh you know i as i mentioned i offer my my ai class and you know an integral part of this is a project right so it's it's my favorite ultimate favorite part because it typically we have these you know project presentations the last two weeks of the classes right before you know the the christmas break and it's it's sort of it adds those cool excitement and every time i'm i'm amazed you know with with some uh some projects that people uh you know come up with and so uh and quite a few of them are actually you know they some have some link to uh to to arts i mean you know i think last year uh we had a group who designed an ai uh producing uh hocus japanese poems oh wow uh so and some of them so you know it got trained on the on the english space hikers hikers right there so um and and some of them you know they get to present like the the top selection they were pretty good i mean you know i mean of course i'm not i'm not a specialist but yeah you you read them and you see it seems profound yes yeah it seems so it's kind of cool we also had a couple of projects where people tried to to teach ai how to play like rock music classical music uh i think and and and popular music interestingly enough uh you know classical music was among the most difficult ones and and and you know of course if you if you know uh you know if if you look at the you know the uh like grand masters of music like bach right so there is a lot of uh there is a lot of almost math yeah well he's very mathematical exactly so so this is i would imagine that at least some style of this music could be picked up but then you have those completely different spectrum of of you know classical composers and so you know it and you know it's almost like you know you don't have to sort of look at the data you just listen to it and say nah that's that's that's not it not yet that's not right yeah that's that's how i feel too this open ai has i think open muse or something like that the system it's cool but it's like eh it's not compelling for some for some reason it could be a psychological reason too maybe we need to have a human being a tortured soul behind the music i don't know yeah no that absolutely i completely agree but yeah whether or not we'll have at one day we'll have you know a song written by an ai engine to to be in like in top charts yeah musical charts i wouldn't be surprised i wouldn't be surprised i wonder if we already have one and it just hasn't been announced we wouldn't know how hard is the multi-protein folding problem is that kind of something you've already mentioned which is baked into this idea of greater and greater complexity of proteins like multi-domain proteins does that basically become multi-protein like complexes it's yes you you got it right so so it's sort of it has the components of both of protein folding and protein protein interactions because in order for these domains i mean many of these proteins actually they never form a stable structure uh you know one of my favorite proteins you know and uh pretty much everyone who who works in the i know who whom i know who works in the you know with proteins they always have their favorite proteins right so so one of my favorite proteins are probably my favorite protein the one that i worked when i was a postdoc is so-called post-synaptic density 95 psd 95 protein so it's uh it's one of the key actors in uh in the majority of neurological processes at the molecular level so it's a and it essentially it's a it's a key player in the postsynaptic density so this is the crucial part of this uh synapse where you know a lot of these chemoliquor processes are happening so it's it has five domains right so five protein dominances so pretty you know large proteins uh i think uh 600 something amino acids uh but you know the way it's organized itself it's flexible right so it acts as a scaffold so it is used to bring in other proteins so they start acting in the orchestrated manner right so and the the type of the shape of this protein it's in a way there are some stable parts of this protein but there are some flexible and this flexibility is built in into the protein in order to become sort of this multifunctional machine so do you think that kind of thing is also learnable through the alpha fold two kind of approach i mean the time will tell is it another level of complexity is it is it uh like how big of a jumping complexity is that whole thing to me it's it's yet another level of complexity because when we talk about uh protein protein interactions and there is actually a a different challenge for this called capri and so this that is focus uh specifically on macromolecular interactions protein protein for dna etc so uh but it's you know there are different mechanisms uh that govern molecular interactions and that need to be picked up say by a machine learning algorithm uh interestingly enough we actually we participated uh for a few years in this competition we typically don't participate in competitions i don't know uh don't have enough time you know because it's very intensive yeah yeah it's a very intensive process but we participated back in um you know about 10 years ago or so and the way we enter this competition so we design a scoring function right so the function that evaluates whether or not your protein protein interaction is supposed to look like experimentally solved right so the scoring function is very critical part of the of the uh model prediction so we designed it to be a machine learning one and so it was one of the first machine learning based scoring function used in capri and uh you know we essentially you know learned what should contribute what are the critical components contributing into the protein protein interaction so this this could be converted into a learning problem and thereby could be it could be learned i believe so yes do you think alpha fold two or something similar to it from deep mind or somebody else will be will result in a nobel prize or multiple nobel prizes so like the you know obviously maybe not so obviously you can't give a nobel prize to the computer program uh you at least for now give it to the designers of that program but is do you see one or multiple nobel prizes where alpha fold two is like a large percentage of what that prize is given for would it lead to discoveries at the level of nobel prizes i mean i think we are definitely destined to see the nobel prize becoming sort of to be evolving with the evolution of science and the evolution of science as such that it now becomes like really multifaceted right so where you have you don't really have like a unique discipline you have sort of the the a lot of cross disciplinary talks in order to achieve uh sort of you know really big advancements uh you know so i think you know the computational methods will be acknowledged in one way or another and in as a matter of fact uh you know they were first acknowledged back in 2013 right where you know the first three uh people were uh you know awarded the the nobel prize for the pro for studying the protein folding right the principle and you know i think all three of them are computational by physicists right so um you know that i think is is inavoidable you know it will come with the time um the fact that you know alpha fold and you know similar approaches because again it's a matter of time that people will embrace the this uh you know principle and we'll see more and more such uh you know such tools coming into play but uh you know these methods will be critical in uh in a scientific discovery no no doubts about it on the engineering side may be a dark question but do you think it's possible to use these machine learning methods to start to engineer proteins and the next question is something quite a few biologists are against some are for for study purposes is to engineer viruses do you think machine learning like something like alpha fold could be used to engineer viruses so to answering the first question you know it has been you know a part of the research in the protein science the protein design is you know is a very prominent areas of research of course you know one of the pioneers is david baker and rosetta algorithm that you know essentially was doing the the nova design and was used to design new proteins you know and designer proteins means design of functions like when you design a protein you can control i mean the whole point of the protein with the protein structure comes a function like correct it's doing something correct so you can design different things so you can yeah so you can do well you can look at the proteins from the function perspective you can also look at the proteins from the structure perspective right so the structural building blocks so if you want to have a building block of a certain shape you can try to achieve it yes by you know introducing a new pro protein sequence and predicting you know uh how it will fold so uh so with that i mean it's it's a natural uh one of the you know natural uh applications of these algorithms now talking about engineering a virus with machine learning with machine learning right so so well um you know so luckily for us i mean uh we don't have that much data right uh we actually uh right now one of the projects that we are uh carrying on in the lab is we we're trying to develop a machine learning algorithm that uh determines the uh whether or not the the current strain is pathogenic and the current strain of the cardiovirus of the one of the virus i mean so there are applications to coronaviruses because we have strains of sarskov2 also sarskovi mers that are pathogenic but we also have strains of other coronaviruses that are you know not pathogenic i mean the the common cold viruses and you know and some other ones right so so pathogenic meaning spreading oh but pathogenic means actually inflicting damage correct uh there are also some you know seasonal versus pandemic strains of influenza right and to determining the what are the molecular determinant right so that are built in into the protein sequence into the gene sequence right so and uh whether or not the machine learning can determine those deter those components right oh interesting so like using machine learning to do that's really interesting to to to given give the input is like uh what the entire sequence the protein sequence and then determine if this thing is going to be able to do damage to to uh to a biological system yeah so so i mean good machine learning you're saying we don't have enough data for that we i mean for for this specific one we do uh we might actually uh you know have to back up on this because we're still in the process there was uh one uh work uh that appeared in by archive by eugene kunin who is one of these you know pioneers in in in evolutionary genomics uh and they tried to look at this uh but uh you know the the methods were sort of standard uh you know supervised learning uh methods uh and now the question is you know can you you know advance it further by by using you know not so standard methods you know so there's obviously a lot of hope in in transfer learning where you can actually try to transfer the information that the machine learns about the proper protein sequences right and you know so so there is some promise in going this direction but if we have this it would be extremely useful because then we could essentially forecast the potential mutations that would make the current strain more or less pathogenic and anticipate anticipate them exactly from a vaccine development for the treatment the anti any viral drug development that would be a very crucial task but you could also use that system to then say how would we potentially modify this virus to make it more pathogenic this that's true that's true i mean uh you know the again the hope is well several things right so one is that you know it's even if you design a you know a sequence right so to carry out the actual experimental biology to ensure that all the common components working you know is is a completely different matter difficult process yes uh then they did you know uh we've seen in the past there could be some regulation of the the moment the scientific community recognizes that it's now becoming no longer a sort of a a fun puzzle to you know for for machine learning it could be yeah so so then there might be some regulation so i i think back in what 2015 there was you know the there was the issue on regulating the uh the research on um on influenza strains right that there were you know several groups uh you know use surfaces mutation analysis to to determine uh whether or not this strain will jump from one species to another and i think there was like a half a year more moratorium on on the research on on the paper published until you know uh scientists you know analyzed it and decided that it's actually safe um i forgot what that's called something a function testing function function gain a function yeah and gain a functional loss of function that's right sorry uh it's it's like let's watch this thing mutate for a while to see like to see what kind of things we can observe i guess i'm not so much worried about that kind of research if there's a lot of regulation and if it's done very well and with with competence and seriously i am more worried about kind of this uh you know the the the underlying aspect of this question is more like 50 years from now speaking to the drake equation one one of the parameters in the drake equation is how long civilizations last and that's that seems to be the most important value actually for calculating if there's other alien intelligence civilizations out there that's where there's most variability uh assuming like if life if that percentage that life can emerge is like not zero like if we're super unique then it's the how long we last is basically the most important thing so from from a selfish perspective but also from a drake uh equation perspective i'm worried about the la our civilization lasting and you kind of think about all the ways in which machine learning can be used to design greater weapons of destruction right and i mean one way to ask that if you look sort of 50 years from now 100 years from now would you be more worried about natural pandemics or engineered pandemics like who's who is the better designer of viruses nature or humans if we look down the line i i think uh in my view i would still be worried about the natural pandemics simply because i mean the the capacity uh of the nature producing yeah this it does pretty good job right yes and the motivation for using virus engineering viruses for uh as a weapon is a weird one because uh maybe you can correct me on this but it's very it seems very difficult to target a virus right the whole point of a weapon the way a rocket works if a starting point you have an end point and you're trying to hit a target to hit a target with a virus is very difficult it's basically just right it it's the target would be the human species oh man yeah i have a i have a hope in us i'm forever optimistic that we will not there's no there's insufficient evil in the world to do to lead that to that kind of destruction well you know i also hope that i mean that's what we see i mean uh with the way we are getting connected the world is getting connected i think it it helps for the world to become more transparent yeah so so the information spread is you know i think it's one of the key things for the for the society to become more balanced yeah one way or another this is something that people disagree with me on but i do think that the kind of secrecy that governments have so you're kind of speaking more to the other aspects like research community being more open companies are being more open uh government is still like we're talking about like military secrets yeah i think i think military secrets of the kind that could destroy the world will become also a thing of the 20th century it'll become more and more open yeah like i i think nations will lose power in the 21st century like lose sufficient power towards secrecies transparency is more beneficial than secrecy but of course it's not obvious let's hope so let's hope so that that you know the the the uh the governments will become more transparent what so we last talked i think in march or april what have you learned how has your philosophical psychological biological worldview changed since then or you've been studying it non-stop from a computational biology perspective how is your understanding and thoughts about this fire has changed over those months from the beginning to today one thing that uh i was really amazed at how efficient the scientific community was i mean and uh you know even just judging uh on on this very narrow domain of you know uh protein structure and understanding the structural uh characterization of of this virus from the components point of view of the you know whole wireless point of view you know if you look at at sars right the the something that happened you know oh less than 20 but you know close enough 20 years ago and you see what you know when it happened you know what was sort of the response to by the scientific community you see that the the structure characterizations did a cure but it took several years right now the things that took several years it's a matter of months right so so we we see that you know the the the research pop up we are at the unprecedented level in terms of the sequencing right never uh before we had a single virus sequence so many times you know so which allows us to actually to trace very precisely the sort of the evolutionary nature of this virus what happens and it's not just the you know this virus independently of everything is you know it's the you know the the sequence of this virus linked anchored to the specific geographic place to specific people because you know the the our genotype influences also you know the the evolution of this you know it's it's always a horse pathogen co evolution that that you know occurs it'd be cool if we also had a lot more data about sort of the spread of this virus not maybe well it'd be nice if we had it for like contact tracing purposes for this virus but it would be also nice if we had it for the study for future viruses to be able to respond and so on but it's already nice that we have geographical data and the basic data from individual humans yeah exactly i know i think context racing is is is obviously a key uh component in understanding the spread of this virus uh we there is also there is a number of challenges right so xprice is one of them we we you know uh just recently the you know took a part of this competition it's the prediction of the uh of the uh number of infections in different regions so and you know obviously the the ai is the main topic in those predictions yeah but it's still the data i mean that's that's a competition but the the data is weak on the training like it's it's great it's much more than probably before but like it would be nice if it was like really rich i i i talked to michael mina from uh from harvard i mean he dreams that the community comes together with like a weather map to wear of viruses right like really high resolution sensors on like how from person to person the viruses that travel all the different kinds of viruses right because there's there's there's a ton of them and then you'll be able to tell the story that uh you've spoken about of the evolution of these viruses like day-to-day mutations that are occurring i mean that would be fascinating just uh from perspective study and from the perspective of being able to respond to future pandemics that's ultimately what i'm worried about um people love books uh is there is there some three or whatever number of books technical fiction philosophical that uh that brought you joy in life had an impact on your life and maybe some that you would recommend others so i'll give you three very different books and i also have a special runner-up uh and a honorable match yeah it's a yeah i wouldn't i mean it's it's an audiobook and that's the yeah there's some specific reason behind it okay so uh you know so the first book is you know something that uh sort of impacted my earlier stage of life and i probably are not gonna be very original here uh it's bulgakov's master and margarita so that's probably you know well not for russian maybe it's not super original but it's you know it's it's a really powerful book for uh even in english so i write it in english so it is incredibly powerful and i mean it's the the way it ends right so it's it's i i i still have goosebumps when i read the the the very last sort of the it's called prologue where uh it's just so powerful what impact did you have on you what ideas what insights did you get from it i was just taken by you know by the the fact that you have those parallel lives apart from many centuries right and somehow they they got sort of interwined into one story and and that's to me was fascinating and uh you know of course the the romantic part of this book it's like you know it's not just you know romance it's like the romance empowered by sort of magic right and and that and and and and maybe on top of that you have some irony which inavoidable right so because it it was that you know this is the soviet time but it's very it's very it's deeply russian so that's um the the wit the humor and the pain the the love all of that is um one of the books that kind of captures something about russian culture that people outside of russia should probably read i agree what's uh what's the section so so the second one is again another one that uh it happened uh i read it uh later in my life i think uh i read it first time when i was a a a graduate student and that's uh the solzhenitsyn's uh cancer ward that is an amazingly powerful book it's what is it about it's about i mean essentially based on on uh you know seljunitsen uh uh was diagnosed with cancer when he was reasonably young and he he made a full recovery but uh you know so so this is about a a a person who was sentenced for life in one of these you know camps and he had some cancer so he was uh you know transported back to one of these uh uh soviet republics i think it was you know south asian uh republics and uh the the book is about you know his experience being a a prisoner being a you know a patient in the cancer clinic in a cancer ward surrounded by people many of which die right but in a way you know the way i you know it reads i mean first of all later on i i uh read the the accounts of the of the doctors who described these you know the the experiences uh you know uh in the book by the patient as as incredibly accurate right so so you know i i read that there was you know some doctors saying that you know every single doctor should read this book to understand what the patient feels but you know again as many of the social media sergey nissan's books it has multiple levels of complexity and obviously the the you know if you look above the cancer and the patient i mean the the the humor that was growing and then disappeared in in in his uh you know uh in his body with some consequences i mean this is you know allegorically the soviet yeah and and you know and he actually he he agree you know when he was asked he said that this is what make him think about this you know how to combine these experiences him being a part of the you know of the soviet regime also being a part of the of the you know of someone sent to the to gulag camp right and also someone who who experienced cancer in his life you know the uh the good luck archipelago and this book these are the works that actually made him uh you know receive a nobel prize but you know to me i i've uh you know i've read different other uh you know uh books by solzhenitsyn this one is to me is the most powerful one that and by the way both this one and the previous one you write in russian yes yes so now there is the third book is is an english book and it's completely different so so you know we're switching the gears completely so this is the book which it's not even a book it's a it's an essay by uh jonathan neumann called the computer and the brain and that was the thing the book he was writing knowing that he he was dying of cancer so so the book was released back it's a very thin book right but uh the the power the intellectual power in in this book in this essay is incredible i mean you probably know that von neumann is considered to be one of these biggest thinkers right so the his intellectual power was incredible right and you can actually feel this power in this book where you know the person is writing knowing that he will be you know he will die the book actually got published only after his death back in 1958 he died in 1957 and but so so he tried to put as many ideas that you know he still did you know uh hadn't realized and you know so so this book is very difficult to read because you know every single paragraph is just compact you know is is filled with this ideas and you know the ideas are incredible um even nowadays you know so so he tried to put the parallels between the brain computing power the neural system and the computers you know as they were here he was working on this like approximately 57 57. so so that was right during his you know when he was diagnosed with cancer and he was essentially yeah he's one of those um there's a few folks people mention i think ed whitton is another that like everybo everyone that meets them they say he's just an intellectual powerhouse yes okay so who's the honorable mention so and this is i mean the reason i put it sort of in this separate section because this is a book that i reasonably recent uh uh recently listened to so so it's an audio book and this is a book called lab girl by hope jaron so hope jaren is she is a a scientist she is a geo chemist that essentially studies um the uh the fossil uh plants and so she uses the this fossil and the the chemical analysis to understand what was the climate back in you know in thousand years hundreds of thousands of years ago and so something that incredibly touched me by this book it was narrated by the author nice and it's excellent incredibly personal story incredibly so certain parts of the book you could actually hear the author crying and that to me i mean i i never experienced anything like this you know reading the book but it was like you know the the connection between you and the author and i think this is you know this is really a must read but even better must listen to uh audiobook for anyone who wants to learn about sort of you know academia science research in general because it's a very personal account about her uh becoming a scientist so uh we're just before new year's you know we talked a lot about some difficult topics of viruses and so on do you have some exciting things uh you're looking forward to in 2021 some new year's resolutions may be silly or fun or something very important and fundamental to the world of science or something completely unimportant well well i'm i'm definitely looking forward to towards you know things becoming normal right so uh yes so i i really miss traveling uh every summer i go to a international summer school is called the school for molecular and theoretical biology it's held in europe and it's organized by very good friends of mine and this is a cool the school for gifted kids from all over the world and they're incredibly bright it's like every time i go there it's like you know it's it's it's it's a highlight of the year um and you know we couldn't make it this august so we we did this school remotely but it's it's different it's so some i am uh definitely looking for it next august coming there i also i mean you know one of the one of my you know personal resolutions i i realized that you know being in you know in-house and working from from home um you know i realized that that actually i apparently missed a lot you know spending time with my family believe it or not so so you you typically you know with all the research and uh you know and and teaching and everything really related to the academic life uh i mean you get distracted and so so uh you know you don't feel that you know the fact that you are away from your family doesn't affect you because you are you know naturally distracted by other things and uh you know this time i i realized that you know that that's so important right spending your time with the with the family with your kids and so i'm that that would be my new year resolution and actually trying to spend as much time as possible even when the world opens up yeah that's a that's a beautiful message that's a beautiful reminder i asked you if uh if there's a russian poem you could read uh that i could force you to read and he said okay fine sure uh do you mind do you mind reading and you're like you said that no paper needed so nope so yeah so the this poem was written by my namesake another dmitry dimitri kemirfeldt um and uh is a you know it's a recent poem and it's uh it's called uh sorceress vedima uh in in russian uh or actually call dunya so that's sort of another sort of connotation of sorceress or which and i really like it and it's one of just a handful poems i actually can recall by heart i also have a very strong association when i read this poem with uh master margarita the the the main female character margarita and also it's you know it's about you know it's happening about the same time we're talking uh now so around new year around christmas you mind uh reading that's beautiful i love how it captures a a moment of longing and uh maybe love even yes to me it has a lot of meaning about you know this something that is happening something that is far away but still very close to you and yes it's the the winter the way there's something magical about winter isn't it what is the well i don't know i don't know how to translate it but uh but a kiss in winter uh is interesting uh lips and winter and all that kind of stuff it's beautiful i mean russian has a way as a reason russian poetry is just i'm a fan of poetry in both languages but english doesn't capture some of the magic that russian seems to so thank you for doing that that was awesome dmitry is great to talk to you again you're uh it's contagious how much you love what you do how much you love life so i really appreciate you taking the time to talk today and thank you for having me thanks for listening to this conversation with demetri corkin and thank you to our sponsors brave browser netsuite business management software magic spoon low carb cereal and a sleep self cooling mattress so the choice is browsing privacy business success healthy diet a comfortable sleep choose wisely my friends and if you wish click the sponsor links below to get a discount and to support this podcast now let me leave you with some words from jeffrey eugenides biology gives you a brain life turns it into a mind thank you for listening and hope to see you next time you\n"