AI against Censorship - Genetic Algorithms, The Geneva Project, ML in Security, and more!

The Power of Collaboration: Unlocking the Potential of AI Censorship Research

In the field of AI research, particularly in the area of censorship, there is a growing recognition of the importance of collaboration and interdisciplinary approaches. This is evident in the work of Dr. Kevin Hamilton, who has led a team of researchers, including undergraduates, in exploring the intersection of machine learning and security. The project's success is a testament to the power of collaboration and the potential for AI research to have a significant impact on real-world problems.

The team's work has not been without its challenges, however. As Dr. Hamilton noted, "it's huge tasks" that require careful navigation of complex issues, including national defense implications and legal considerations. To address these concerns, the team has worked closely with the institution's review board (IRB) and network security experts to ensure compliance with regulations and minimize risks. This collaborative approach has allowed them to make significant progress in their research.

One notable example of the team's success is a project that involved scanning people for potential AI-related vulnerabilities. Unfortunately, this effort was temporarily disrupted when they accidentally attacked the city of Jacksonville, Florida. The incident served as a reminder of the importance of careful planning and testing, but it also highlighted the potential benefits of their research in identifying and mitigating security risks.

The team's experience has taught them to be proactive and adaptable in the face of unexpected challenges. As Dr. Hamilton observed, "it definitely happens" that researchers will encounter unexpected issues or accidentally trigger security measures. However, with the right support and resources, these incidents can often be quickly rectified, and the research can continue.

The team's work has also been supported by various organizations and initiatives dedicated to monitoring censorship and promoting AI research. For example, the Open Observatory of Network Interference (OONI) is a global project that uses volunteers to monitor censorship across the world. OONI's efforts have helped to identify instances of censorship in many countries, providing valuable insights for researchers and policymakers.

For those interested in exploring the field of AI censorship research, there are several resources available. Sensor Planet is one such organization that publishes data and reports on AI-related censorship incidents. The team at Sensor Planet has developed a range of tools and methodologies for analyzing and identifying censorship patterns, which can be useful for researchers and practitioners alike.

Ultimately, the potential impact of AI research in the area of censorship cannot be overstated. By applying machine learning and evolutionary methods to complex security problems, researchers like Dr. Hamilton's team are well-positioned to make a real-world difference. As Dr. Hamilton noted, "this space has the potential to make an immediate impact on the world" – making it an exciting and worthwhile area of research for those interested in AI and machine learning.

The Future of AI Censorship Research

In conclusion, the work of researchers like Dr. Kevin Hamilton highlights the importance of collaboration and interdisciplinary approaches in exploring the intersection of machine learning and security. By working together with experts from a range of fields, including network security, law, and policy, researchers can develop innovative solutions to complex problems.

As AI research continues to evolve, it is likely that we will see new challenges and opportunities emerge. For those interested in getting started in this field, there are several resources available, including the work of organizations like OONI and Sensor Planet. These initiatives provide valuable insights and tools for analyzing censorship patterns, as well as a platform for researchers to share their findings and collaborate with others.

Ultimately, the potential impact of AI research in the area of censorship is significant, and it is an exciting space to explore. As Dr. Hamilton noted, "it's a great space to get involved in" – making it an ideal area for researchers, practitioners, and enthusiasts alike.

"WEBVTTKind: captionsLanguage: enhello there today i'm talking to kevin bock who is a cyber security expert and one of the main people involved in the geneva project geneva is a genetic algorithm that evades censorship by nation states so in real time geneva can evolve to the ever more present danger of censorship by really big entities such as governments all of this is done through an evolutionary search over a program grammar and in this interview we're going to touch on a whole range of topics including geneva how it works what it does why people research it and what it has done so far in the world but also the broader topics of security and its connections to ai how people can get started in this field and what the main questions and problems are in this space further geneva comes out of a project at the university of maryland called breaker space which is a sort of lab that includes undergraduates in security research which is a really cool project and i think highlighting this would be helpful to some people maybe you're at the university you don't know this exists go there take part all right without further ado i want to give over to the interview and have fun all right everyone i have with me today here kevin bach who is a phd student at the university of maryland a cyber security researcher and a member of breaker space which is a pretty cool project at the university of maryland he also has been in the news a little bit with a project that's called geneva which uses genetic algorithms to evade censorship by nation states and i think that's pretty cool so kevin welcome to the show and uh thanks for being here thank you thank you for having me i'm excited to be here so the goal of today it's a little bit different because i'm a total noob at security most of the audience of this channel is not is into machine learning maybe some know about security some know about um the the censorship apparatus that's in place around the world and what people do about it i think most won't so today i'll be asking mostly noob-ish questions and um we'll have you here to guide us through everything to guide us through like what's happening in this world so maybe you first can start off a little bit how did you get into like how did you get to the place where you are what's kind of the the main things in security right now that draw you to it so i think security and the censorship space also is in this this really cool um this really cool time where ai and ml techniques have been exploding in all these other fields and they're just over the last four years really breaking into security and we're still figuring out all the different applications where you can apply these techniques in security there's new techniques and new applications of this that people are discovering all the time from better ways to detect spam and better ways to identify hey this domain is malicious or ai-based scanners for that that binary you downloaded that's probably malware things like that so security field is still discovering all sorts of new ways you can apply these techniques and that was one of my motivations initially actually of bringing this censorship because this project was really the entire field of censorship's first foray into using ai and ml like techniques and if you if you talk about censorship what do you mean exactly by that yes there's a so many forms of censorship in in in effect around the world today i mean everything from political pressure to self-censorship to taking down po like there's so many different types so i'm going to scope this discussion down a little bit just the type of censorship that we study in this lab and that's this type of automated censorship that happens in the network performed by nation states so what do i mean by this um if you're a user in certain regimes around the world let's say in iran or something and you try and make a request as that request as that web traffic crosses through the the border to the country it is scanned parsed and inspected by some machines that physically reside in the network called middle boxes called because they're in the middle of the network and these middle boxes examine your requests and they say is this something we should allow or not and if the answer's no they either inject traffic to take down your connection or they drop your connection or they do something to disrupt what's going on and you'll notice everything i just said there there's no human in the loop there's there's no like human content review or anything like this it's it's a purely automated run by these middle boxes or firewalls deployed by these nations that just like automatically inspect internet traffic as they go by so that's really the scope of what we've been studying here naive question why why can't i just encrypt my traffic and then like every traffic looks the same towards the outside yeah that's a great question so why can't we just encrypt everything people have been trying so there's like a couple different approaches to this you're like well let's just use https right encrypted we're good unfortunately https has a small privacy leakage when you first set up an https connection and that very first initials called a handshake got first back and forth you as the client as a part of the protocol you have to announce the domain you're talking to and that announcement happens unencrypted so if you're making a https handshake to wikipedia in the very first packet you send it's going to include the word wikipedia and that's called the server name indication field you indicate to the server what the name of the server you're trying to talk to and unfortunately sensors just read that fields and then they take down your connection if you're talking to a forbidden domain so https unfortunately not close but not quite finishing the job now i will say there there have been just a quick sidebar there have been some advancements in hbs to try and fix this um there's a recent proposal to encrypt that field uh it's called encrypted sni um and china just started centering censoring that last year so the you can try and encrypt things but these sensors are often just hostile to um the idea of just letting their letting their citizens just encrypt all their traffic i guess it's it's a little bit like if everyone encrypts like with https nowadays everyone does it so you can't conceivably block https you know just just because you don't like some traffic but if there's a new type of encryption uh you you can probably it's probably only the people that have something to hide that use that type of encryption so is is like is a strategy that the rest of the world as fast as possible would use these techniques to kind of make that approach unusable that's exactly that's exactly right the the broader topic you're actually discovering and saying out loud here is this idea of collateral damage of can we make a protocol or something so popular and used so diversely that if a sensor were to try and block it it would cause irreparable harm to like good services there's some meaningful cost to performing that censorship so just like you've identified https that's everywhere they can't just shut down all https but rolling out a new encryption method for https that's not very widely deployed they can nip that in the bud and prevent its rollout so there's this kind of this interesting a race in a game between developers and these sensors that's still being played out now let's let's talk about more let's say naive approaches what is the the development of the field like what has been tried before and what has been let's say thwarted or what's the cat and mouse game looked like in the past i imagine different things like there's tor there is you know all kinds of things there is probably things that everyone installs on their end like vpns and tunnels and so on like what's what's been the general development over the years yeah so the researchers and sensors have been playing this cat mass game for two decades now and it's kind of evolved and it's been playing out in multiple fronts um so you're exactly right tor has been a huge uh front on that war if you will like we've developed tour and continued to advance it unfortunately though there are some limitations just the tor protocol and sensors can enumerate the tour entry points basically and just block you so once you get into tour you're generally great but they try and lock you out um there's been all sorts of techniques people from people have proposed like maybe i can disguise my traffic to look like skype and then the sensor is like well you didn't disguise it quite well enough blocked um there's a whole interesting field of defeating censorship or some field i should say um called packet packet manipulation based censorship and this is this idea where um all our communication is happening via packets and if you just tweak those packets in just the right way uh you could cause the sensor to miss you um and historically that's also been something that's played out in this cat and mouse game where researchers were will study these sensor systems and then they'll find a loophole and they'll deploy and use it and then the sensor is like oh i'll fix that and then we're back to square zero so this this game has really been continuing to play um i'll call one thing out real quickly about vpns because a lot of people particularly those who have been to china are like i've been able to use a vpn but it's been okay um vpns in in many places work in many places they don't there's a country in the news recently um they were in the news because they rolled out a new law that forced their citizens to swear on the quran that they would not use a vpn in order to get internet access installed in their homes it's just like a crazy sentence to say out loud yeah but in china for example these vpns they many of them work most of the time but what researchers have noticed is that around the time politically sensitive events are happening or political uh such as elections things like this a lot of vpns will just mysteriously stop working yeah and then after the event they'll mysteriously start working again and it kind of points to this broader idea that some of these countries may be sitting on more censorship capability than they deploy on a daily basis um and they they have more power than they use um so this cat and mouse game may even be like the cat may even be stronger than we think it is yeah uh this this this can you give us an idea of what this packet manipulation uh evasions look like because i imagine something you mentioned before you know if there's wikipedia in the hater i don't want my population to see wikipedia like that's it right what can i possibly manipulate there in order to get it to get through such censorship yeah so we can think about sensors um as our computers are sending packets around you can imagine a lot of that communication like you're writing mail and your packets are envelopes that are going that are going through the network and in order to have a communication with a server like wikipedia that's going to take a couple a couple envelopes back and forth right and the sensor is just like the postman in the middle reading all your letters unfortunately that postband's got to process a lot of letters a lot of letters and you can imagine the something that scale of like china you're you're dealing with a huge huge volume of traffic just at a constant basis what that means is the sensor can't just remember everything it sees okay so for example if it's trying to if it's trying to track that hey that person over there is trying to talk to that server over there and that person over there is talking that server over there that state it has to maintain right the this amount of state it has to maintain it'll grow and it's in the size of some word like china it could grow pretty fast so they have to be really careful about what they remember in the state they maintain so you can imagine doing something like let's let's say we're exchanging packets there's exists a type of packet called the reset packet and these are normal packets our computers send these all the time but they basically just exist to tell the other side stop talking to me immediately i'm hanging up the connection so you can imagine doing something like you and i are communicating we're sending these packets back and forth and i just slip one additional packet into the connection towards the beginning and it's reset packet and i'll send that packet along and when the postman sees that packet he's like well these guys have stopped communicating after this message he's going to ignore him forever and then he throws away the state he's maintaining about our connection he forgets that we're talking because why would he need to remember anymore he thinks we're done and if i craft that pack in such a way that it won't make it to you or you'll see it and ignore it or something like this then we'll be able to still communicate fine right like our our communication is unimpacted but any other packets that go by the sensor is like i don't know who this is yeah and you can get through so this is like the broad strokes this idea of packet manipulation based censorship where you're you're tweaking the packets that go by to try and basically trick the sensor that's in the middle into letting you continue to talk now do i do i see this correctly that there have been like a giant amount of these schemes proposed and as you say there's a cat and mouse game one is being proposed then they fix it then another one then they fix it so that points to the possibility of what if we could have something dynamic right what if we could have something that by itself tries to invent new things and that's where you went with geneva do i understand that correctly that's exactly correct yeah you're spot on yeah so over the years there's been i want to say dozens of these that have been proposed and researchers have it's exactly what is exactly this cat masking they studied the censorship system i mean the sensory system's not public so they're probing it they're they're trying to take measurements that's a lot of work and then they get an understanding they apply their good human intuition they develop something cool and publish it and the sensor fixes it and they don't tell you they fixed it yeah they don't they don't publish a paper that's like hey we just fixed your bug so it just resets the score zero and so the idea with with geneva which stands for genetic invasion the idea of this was it's an algorithm that could kind of flip this process on its head so instead of a human having to to take the approach of let's understand how the censorship works and then defeat it let's just have some ai or fuzzer or automated system just attack the sensor figure out ways through and then give it to the human and now after the fact my slow human brain can go figure out why that thing worked and now my brain is no longer the bottleneck to helping people get through the sensor how does how does this you want to go a bit more into detail i mean it sounds great at the surface but there's a reason right we need security researchers probing making sense and there's a reason that's the bottleneck if i were just to be like well you know fuzz a bit um it's probably not gonna work so what what does what does geneva do uh that allows it to even be successful where maybe humans take a long time or wouldn't be successful yes there were a couple pretty significant challenges when we first started in applying something like a genetic algorithm or really any ai to the space of censorship and if you think about the way censorship works it's not hard to imagine like why that's the case because if you think about think about a censorship problem right like a query is either censored or it's not it's just a binary decision so it's not like your traditional ml or ai where you have this nice like gradient descent there's no error you get back from the sensor the sensor doesn't tell you like hey if you tweet your query just a little bit you're getting closer yeah you know there's no gradient which with which you could work so that that property alone rules out the majority of the ml field as far as approaches you can take is there even a loss like you said it's hard to detect if you even get through how do you do that in the first place how do you notice success or failure yeah so in our case you're exactly right dude capturing that could be difficult um what we do to make it easier in ourselves is we obtain machines inside these censored countries and directly try to request for written context so geneva trains directly against the sensor and we know we got it because when the sensor takes action is kind of obvious so geneva will try and obtain some forbidden content while manipulating a packet stream and then if it succeeds great if it fails we'll know yeah right so this idea of how do we apply ml ai some fuzzing to this space like what how do we how do we build this um there's a couple main challenges towards doing that the first is this total lack of gradient that i mentioned um and really that only leaves you with kind of a small number of approaches um and we chose to go down the route of let's use a genetic algorithm for this there's some nice properties it's it's easily explainable you can understand how it works while it runs it's a little less black boxy than something more like a neural net or something or markov or something like this so but if you want to build a genetic algorithm you need a couple of things um you're seeing what some of these strategies look like right here um so if you want to build a genetic art there's a couple things you need you need some some building blocks something that something that the algorithm can compose and put together um and you need some way for it to put those things together i mean us humans as examples as far as like genetics goes we've got our dna bases right actg and our we can put those together in dna for the genetic algorithm for geneva we needed to decide what what makes sense for building blocks for the algorithm to use um and that alone is like an initial really huge challenge because you could be creative and then you can think about a million different ways an algorithm could manipulate a packet right flip a bit you could flip this bit like there's just so many different things you could give it to do so one of the first challenges we had to figure out was how do we balance what this algorithm can and cannot do to the data it has and on one hand we could let it flip any bit um the downside of that is it could take like forever to learn to check some but it's super powerful like on the other other extreme there we could just encode what previous researchers found and let it like play with those together it would be super fast but it'd be hard to learn anything new right we'd just be building in biases directly so the approach we ended up taking was giving geneva basically the same ability to change traffic as what the network itself could do so the network itself has just a few set primitives they can do the packets you can take a packet make multiple packets you can duplicate them it can change a header to something it's tampering a packet you can take a packet break in multiple pieces fragmenting you can take it back and drop it which is just basically deleting the package um so we build out these building blocks and then allow it to compose these things together in trees yeah so like syntax like you give it a syntax and it can perform it can assemble a little program out of this syntax one like one we see right here that's exactly correct can you walk us through what this particular thing does sure sure this is uh this is a this is this is kind of a fun this is kind of a fun strategy um so there's a few different components to a geneva strategy i'll break down the syntax for you real fast what these programs look like so the first component is the idea of a trigger the trigger is what's between the the square brackets so there's two triggers in this tcp flags s and tcp flags are and when geneva is monitoring traffic the trigger tells it which packet should i act upon so this first trigger you see here it says tcp flags s okay so that means that whatever actions are attached to that trigger will run on any syn packet it sees s stands for sin and sin means the start of my connection so what this is going to do to that packet is the very first action we see is duplicate so that means it's going to take that packet and make two of them now duplicate the syntax of this is it's one set of actions comma another set of actions so you'll see the two actions you see here are tamper and then send so the second duplicate we do nothing to so the second syn pad the second duplicate we're just going to send on the wire but to the first duplicate what we're going to do is we're going to replace the flags fields in that packet with synac sa and then we're going to send that packet so basically what this little program does is it sees outgoing synap packets outgoing syn packets to your computer and it duplicates them to make two packets and then replaces the flags in the first one with cynic now any networking person listening is like this is clearly ridiculous this this never should work like why why would why would we even do this why are we talking about this and what's going on here is that for certain sensors around the world uh synack is the packet that's typically sent by a server it's never sent by a client so what's going on in this in this strategy is when the client sends a synack the sensor says whoa i must admit something this client is clearly a server which means the server must be the client yeah it reverses the roles of client server in the mind of the sensor and as a consequence when the client makes the real request since the sensor is processing packets differently between client surfer you're through i see so that's this idea of the strategy so that connection in the in the mind of the sensor is already established as here's a server here as a client and it kind of keeps that state for subsequent packages more or less yep that's exactly it yeah so let's this is an example of just one strategy and one of these programs that so geneva built this program itself and it built this through the process of evolution yeah and you've discovered just to to to jump ahead a little bit because we're not through yet with explaining exactly how it works but you've discovered that geneva will actually reproduce a lot of the uh a lot of the common or known or or already discovered uh discovered things that researchers have proposed right yeah we had this with this really cool result initially where we set out to try and uh we wanted to we first developed this tool kind of bench market against the the rest of the fields um and that that's kind of challenging because sensors have continued to evolve yeah so we did was we sat down in the lab and we implemented in the lab our best guess as to what our our best implementation i should say as to what these sensors look like based on what previous researchers found and then trained geneva against these mock sensors and also trained it against the great firewall and and real sensors where we could um and we found was very quickly it was able to reproduce basically the entire field yeah um every strategy human had come up with this this also found and it found pretty quickly um so it's really showing the power of automated approaches and ai and ml yeah so you have you have uh let's let's get back a little bit you have this syntax right that you can build uh trees from which are valid programs in geneva this will modify the traffic somehow now to say that most of this traffic will just not even be traffic probably like it will like the connection will be somehow bad uh some of it will go through and some of it will actually maybe evade the sensor what do we need to get there what do we need to um you know to to get to a place where i guess if you just do it naively and you randomize a little bit it will just be bad like 99.9 of all the programs you generate you'll initiate them and then after a while you'll see like my traffic doesn't even isn't isn't even getting anywhere right so what are the like of the genetic algorithm components what do we still need yeah so we're we're building our way up to the united we've got just like you said we got our building blocks we got a way to put them together we got a syntax that we can build these programs out of and we can run these programs on network traffic and you're exactly correct that if we initialize completely randomly it's going to do terribly and that's exactly what happens we've tested this um so what what where do we need to go from here now that we have this so this this kind of brings us to this idea of let's let's let's get evolution in the mix so you can imagine you can imagine the way the way this works is we have a big pool of strategies okay we'll call this a population and each of these populations just take for granted for now that we have some diverse set of strategies in here and we have a way to test them right we can try and make a request for something forbidden and we can run these programs on those requests as we make them so for example from inside of china we can try and access wikipedia that's a sense of resource and we'll have these programs running on that connection we'll just try and make that connection over and over again what we'll see is some of these strategies will destroy our connection some of them will just not work at all and do terribly some of them might let her some of them might keep our connection alive and maybe if we get crazy lucky we'll defeat censorship but for now let's just say a whole bunch of them will just destroy our connection and maybe somewhat what we have is a fitness function and this fitness function uh this is it just borrow some a much broader space in ml and ai but it's basically this idea of if you take in some individual from the population some individual strategy how good is this thing survival the fittest like should this thing survive basically continue to propagate its genetic material so this was actually the second big challenge in applying ai and ml to the space of sensitive division of what on earth should a fitness function look like in this space because just like we talked about earlier there's no gradient right and even even coming with like a loss function can be a little tricky and i mean even if if like sorry to interrupt but if the fitness even like if if the i guess the fitness is it anything else than zero like okay maybe some connections don't even work to like the server next to you you can discard those but other than that the fitness is either doesn't reach the target or does reach the target and if it does you've kind of won right like how can you even get a meaningful signal is there a fitness in between zero and one yeah so and and part of what makes geneva work is we've kind of shoehorned our way into getting fitness between zero and one um and specifically what we do is is rule out those strategies that break your own connection um so that that's kind of how we've gotten between zero and one cause it's not it's not technically zero or one it's almost negative one zero one and negative one is geneva's shooting itself in the foot right it's just like dropping all your traffic like that's never gonna work and we shouldn't even bother exploring that space more right like we're never going to go anywhere but if you can make it so that your packets are at least interacting with the sensor and at least have the potential into the server well now we might be getting somewhere so basically what we do is we set up the fitness function in such a way that if strategies destroy the underlying connection they'll be punished severely and basically killed off and strategies that interact with the sensor even though they get sensors they'll get a slightly higher fitness function than those other ones so what's going to happen is because those those individuals aren't they're not successful but they're still the most successful in the population pool which means some subset of them will continue to reproduce basically that subset's just chosen randomly but because we're just choosing randomly mutation is still going to happen so we're basically taking a set of individuals they all interact with the sensor and then we just mutate them and try again and then mutate them and try again and effectively what this is turned into is a fuzzer like geneva is the the fitness function is basically makes this a targeted fuzzer where we can fuzz just the space of strategies just the space of programs that allow us to interact with the sensor and then where it gets interesting is as this fuzzer is running generation after generation just trying different crazy things against the sensor if it finds something that gets through suddenly that fitness is way higher than everything else and that individual will start sharing its genetic material and propagating within the population pool at that point we could stop we could stop the fitness function right there but we optionally add some additional punishments and rewards for the algorithm at this point and specifically we add basically a punishment for strategy complexity so if if this if an individual is successful we optionally punish it for basically the number of actions and the amount of overhead it adds to connection and the reason we do that is this is not strictly required but i have a very small smooth human brain and it's so much easier to understand a strategy that's only two actions long compared to something that's 50 actions for example so if we could encourage the algorithm like great you've got a solution now simplify it down for me and it will over the course of generations whittle it down to its smallest form and then at the end presents you its population pool and its its best individuals um and and we we see here a few ways you can mutate i think this this just essentially comes down to changing the syntax syntax tree in some form um yep and these are basically you can yeah you can imagine all the different ways you could you could take these programs and mix them around if you can think about it geneva could probably do it yeah and so just um maybe for for my understanding but you're trying all of this you you say you have some machines inside of these countries aren't and i read some like obviously this is not going to work against ip blocking like how do you how do you not get ip blocked by them if like i imagine there's like some weird traffic that's you know hits my censorship wall all the time um why don't i just be like well gone yeah that's a good question and we get this question a lot actually and you're kind of pointing to this this broader question of like what's the sensor's response yeah you're doing all these wacky crazy ridiculous things i mean there's a strategy in there that just lights up every tcp flag like that package shouldn't exist flatly it did it has no meaning on the network but geneva tried it found it and found that it works um so where do you sensor where do sensors go from here um it sounds like we're talking about things like it's sending crazy packets it sounds like that should be something that's easy to detect on the network um but it sounds easy until you try and write it um because if you think about it writing something to detect abnormality when you have no idea what that abnormality looks like um especially in the space of just like just how random and crazy the internet is all the time um identifying that is actually harder than it sounds and what makes it potentially even harder is that a lot of the middle boxes that would be doing that detecting is exactly the middle boxes geneva's mucking with with these strategies so it may be the case that their detectors are also getting screwed up whatever an imaginary detector would also be getting screwed up by these same strategies yeah so it's something they could take an action against but we haven't seen any sensors roll out something like this something else you could imagine the existing fitness functions just described for geneva it kind of assumes a static adversary like an adversary that's not playing along if you will i mean it's also assuming an adversary that's not doing anything special to hunt it out and you could imagine a sensor that's a little more sophisticated than that so something we've kept an eye on is is at the end of the future if either the sensor starts rolling out ai ml techniques or if the sensor starts hunting for traffic that looks very abnormal and you can imagine encoding additional uh bits into the fitness function such that you could encourage geneva to make this strategy blend in with normal traffic i want this look as normal as possible but still get through things like this so you could imagine all sorts of modifications to the fitness function to make an algorithm like this a stronger competitor against an adversary that's also playing along but we haven't seen the adversaries do that yet so we haven't needed to i was surprised when we talked to a bunch of you know also people in in the intersection of security and machine learning that uh there are as you say these ml based let's say malware detectors or or things like this i guess also weird traffic detectors and and people use them for example for company networks and so on and these are to my surprise also for example vulnerable to adversarial attacks so there's an entire new direction opening which usually people imagine adversarial attacks like ah i changed the image a little bit and it's really this distinction between how the human sees it and how the machine sees it but you know in malware it's like just bits and i feel like you know very small number of bits there's nothing like how the human sees it and how the machine sees it it's so weird um but yeah i think i think it's it's pretty cool and you got some attention in the media and the the articles usually go something like uh this ai can evade censorship or something like this and um now knowing that you use genetic algorithms uh what do you how do you think how was how's your work received in the media what do you think about it do you do you feel like they are kind of trying to put a few buzz words in there or were you happy with it in general pretty happy and i i've kind of been lucky to i mean even just discussions like this or we could talk about the work in a deeper context than just like throwing buzzwords around um like this is just an awesome way to kind of cut through that that buzzwordy um uh fanfare if you will yeah um so i've been kind of lucky and you always going to see buzzwords attached to things that's always something like that but um yeah i'd say overall it's been it's been received positively and things like this really would help us get there cool and the so just saying the code for geneva is available it's on github um you know anyone can anyone can i guess look it up your builds fail right now i'll i just have to tell you i'm sorry um yeah we're switching between ci systems and haven't finished the migration okay i mean yes uh nothing new here um so where is is there i mean there is a lot of open space here it seems the genetic algorithms are very cool they're they're like a a basis right here um do you think there are more places where uh like machine learning techniques especially you said you know we kind of have to draw back from the gradient-based approaches but there are definitely there's definitely possibilities if you think of something like you know alphago or something like this that's it's a discrete game but also you know they they work with neural networks that for example uh when you build your tree your modifications that guide that somehow that you know have an idea which of the modifications might lead to a better algorithm to a worse algorithm and so on do you see any sort of uh evolvement that could happen there definitely definitely our when we first wrote geneva our goal was not to be the last ai approached the space it was to be the first and hopefully the worst yeah it would be great if viewers out there hey take a crack at this there's all sorts of new techniques out there just waiting to be applied this this space is it's rich and it's interesting and it's impactful like this is the kind of space where you discover something and get that out of the worlds you're helping journalists and activists like right now so it we're really excited to see where this where the space goes and continues to blossom so yeah all sorts of all sorts of techniques just waiting to be applied and are you also actively investigating the the sensors side because i imagine that uh the more or the more capable you are in censoring things also the better you can research counter strategies so a bit we've tried to tailor our research in such a way that we're not directly helping a sensor we never want to publish a paper that's like really the use case of this is just making sensors better like so if we do do research down that vein it's purely in service of let's make evasion better yeah um we and we've tried to be very good about not releasing anything and not not publishing anything that's directly hey sensors this new technique man that's going to really change the game for you should try and roll that out so uh i guess that answers your question yeah yeah um well if you if you look ahead you say yeah we said that the space is wide open what would be what do you see as a a like maybe a bit of a north star for for the field like for let's say censorship evasion or something like this what would be characteristics of an ideal algorithm that's a really good question an ideal algorithm something to shoot for um so i think i can answer that question by talking to i guess how this how the the problem of censorship is getting harder um and getting more complicated um so as censorship is continuing to evolve like this this cat and mass game exists it's not just sensors patching bugs like sensors themselves are flawlessly getting more sophisticated they're getting better and one direction that we think sensors will start exploring in the future is this idea of more personalized censorship so instead of censorship policies being rolled out for their entire country you could imagine a system where users with elevated social credit scores or different professions things like this could access different content online and be subjected to different different forms of censorship and in cases like this something like just directly applying geneva gets a little bit harder because you can't just apply geneva in one vantage point and help everybody right like you need to suddenly have a way to to reach more people and help more people at once um so it's this question of how can we scale this up in a large way and how can we scale this up safely in a way that protects itself from attacks from the adversary like the nations they can see our traffic so in theory they could muck with the training how can we prevent that so in crafting this like ideal algorithmic circumstances a lot of things you have to consider so i think building towards this idea of can we do federated training across a large a large population can we do this in a way that protects users can we make the algorithm more efficient so it needs it needs less connections to figure things out all sorts of things like this i think are really good goals to shoot for and that is more people viewers try this out as more people like jump into the space and play with this these are some of the problems they're going to be building towards is there any work on like screwing with the sensors like i imagine that if i you know if i build an evasion attack that has like a really low hanging fruit of fixing it and that fix in itself would somehow be you know completely uh devastating but i don't know it when i implement it um is there work in this direction so is there work in the space of mucking with sensors definitely um crafting the kind of attack you describe is kind of tricky because we don't know what the sensor's code looks like yeah you know now there is this there is this idea of there are there are bugs and and limitations that as they patch them may expose them to other attacks so one quick example of this if you go back to our analogy if we're sending letters back and forth um a common a common limitation that many less sophisticated sensors experience is they can't if i've taken a packet or taken a letter and i break into two letters they can't put them back together yeah right and that's that's like a huge limitation because it's really easy for me just to take it back and split it up and send it through so to fix that the sensor all it needs to do all it needs to do is remember every packet it sees and then stitch it back together based on the numbers on each of the packets so that's like a simple fix to a limitation but when you apply that fix you open yourself up to the entire space of attacks of maybe i can sneak a letter in there that you think belongs halfway through the message but it actually belongs to the beginning or it actually belongs to the end or it actually doesn't belong in that at all um and so you have this is one example that we've seen in the wild where this idea of i have i need to fix the limitation and by fiction limitation i've opened myself up to a dozen other potential attacks so that definitely exists how how um how i'm just thinking uh from my noobish understanding right here how much of a problem is it that our protocols are rather fixed i imagine if i could if i had like a dynamic language where if i communicate with anyone the first step would actually be to negotiate a protocol in a very dynamic way right that would sort of give me the possibility much more to together with the person that i want to communicate with uh negotiate something that could get around these sensors in a in a completely adaptive fashion is that at all feasible or is there some some flaw so is it feasible maybe um i mean that if if such a thing like that could be built it'd be incredible yeah it'd be awesome so ai people ai people watching get on that because that sounds good that sounds awesome there are definitely some challenges into rolling that out and um you basically need to get in the headspace of if i roll up this protocol and the sensor knows about it what is it going to do what is it going to do so there are there are protocols that exist out there where from the very first byte you send the whole thing is encrypted and in that case it's pretty hard to fingerprint right there's it never looks the same it's always just a stream of random looking bytes but the sensor can also find that just by looking for something that looks like a random stream of bytes and just like you said that protocol never changes it always looks the same so if you you need to really develop a system that's flexible and dynamic enough that today it looks like this protocol tomorrow looks like this protocol today it looks like nothing in between so you really need to be very creative and very deliberate with how you do it so i i'm not aware of anything like that personally maybe someone's working on it out there but it would be awesome if you could do it now speaking of mocking with sensors you also have other work that uses the censorship infrastructure so essentially anything that's in place from the sensors to perform some some attacks as i understand it uh any any attack you could do is actually made potentially worse by the censorship infrastructure such as a ddos attack or something like this do you want to talk a little bit about that i would love to yeah so an area of work that we went that we started exploring a year or two ago uh something we noticed for a lot of these sensors is um when you interact with them as a user like they need to respond to you they need to send you some traffic right like if i'm if i'm trying to request some resource and that resource is forbidden maybe the sensor sends me a block page and that block page says hey you're not allowed to access this and the thing is that that communication there what's going on is my request can often be much smaller than the size of the block page i get back so as an attacker this opens up the space of hey maybe i can use the sensor to launch an attack at somebody else by making a request for forbidden things pretending to be someone else and then letting them send that huge response at that other person um and this is a this is an idea of a reflected attack or an amplification attack because as an attacker i can make a tiny request and get a bigger request out of it so i'm amplifying my traffic so amplification attacks so we started exploring whether we could do this to sensors and you use these nation state sensors or even just beyond sensors there's normal firewalls like things that universities or just regular networks organizations have deployed but we discovered hundreds and hundreds tens of thousands millions of ip addresses that were behind these sensors that we could use to launch these attacks yeah and found these attacks got crazy powerful and the so the the who does it hurt more the sensors or the final recipients of this the attack yeah so in this case the the weight is beared by both but the brunt of the impact will be felt by the victim yeah so this line of work it mucks with the sensor but really really the some of the i want to say the purpose or something you could distill this work down to was sensors are causing more harm to the internet then they're not just the harm of a sensor is not just restricted to the citizens within its borders like a sensor anywhere is a threat to anyone everywhere yeah um so it's it's this the work was less about let's flood a sensors network and more about let's prove to the world these things are dangerous when they've been applied as carelessly as they've been deployed now other than block pages you have some you have some very specific schemes of what you do specific to these censorship infrastructures that make these attacks even more powerful what what are examples of that yeah so discovering these attacks in the first place i'm making it sound very simple right you just send a request and then the response gets through um but i'm skipping over kind of an enormous step in here because what i've just described send a request pretending to be someone else should not be possible yeah that sentence should not exist and it shouldn't be a thing you can do and the reason that's the case is because when we make requests all the time this happens i think there's a i think there's a gif in there that explains exactly what i'm saying just scroll up a little bit there's a three-way handshake that we need to complete um and that three handshake is just this short exchange of packets i think it's the one right above that it's the short exchange of packets at the very beginning right here short exchange of packets that exists at the very beginning of our connection and as an attacker if i try and spoof it through a handshake if i pretend to be my victim and start the handshake the server's going to respond to the victim and so i won't be able to get the critical bit of information i need from that handshake to finish it and i need to finish that handshake in order to make a request so throughout all of the up all of networking history basically up until this paper it's been assumed that tcp this underlying protocol behind all these requests is immune to these type of amplification attacks largely immune there's a small caveat there but it's not worth getting into so how do we go about addressing this problem we used geneva and ai techniques and basically we replaced geneva's fitness function and we we told geneva hey you can talk to these sensors but instead of rewarding you for getting forbidden content we're going to do is we're going to reward you for getting content without establishing a connection and we're going to reward you for getting the biggest content you possibly can so kind of turning the fuzzer on its head a little bit and letting it explore the space of strategies that a confuses the middle box into responding so tricky into thinking we have a connection already yeah and then b once we've tricked it getting the biggest possible response we can and so this this is a second set of work that was really powered by the same geneva genetic algorithm and we were able to use the same set of the building blocks and primitives and programs that we had developed previously yeah we just applied them in a new way and this is if i understand it is not a weakness in tcp like if tcp were implemented correctly geneva wouldn't be able or shouldn't be able to find something around this but this is specifically because these middle boxes are in there right yeah you're spot on um tcp tcp itself is not the problem it's the implementation of tcp yeah and that's partially why we did this paper we did this work you can't just study tcp itself you can't like download the protocol specification like think really hard yeah because that's not going to help you you need to actually study real world sensors so that's what we did we took geneva we trained it against we trained against hundreds actually our sensors around the world and then then took the results of that and were able to scan the whole internet uh we scanned the end of it almost 50 times actually ipv4 internet with these different with these different packet sequences that geneva discovered and effectively just attacked ourselves over and over and over again yeah to see what kind of uh damage we could do and how does that square so before you said we're never going to release anything that helps the sensor in any way and now you're releasing a recipe for launching massive attacks on something right how does i mean i i i usually think you know any technology can be used for like with that i could actually attack the sensor directly right and and just make their life miserable um using their own infrastructure which is ironic even uh right um i could use it to you know i could use it to ddos uh the red cross as well uh so that my perspective usually is that any technology can be used for good and for bad uh but you've before said a little bit into the direction we never want to publish anything that helps the sensor uh this seems to be different what what's different here yes the difference the difference here is and i want to note that we didn't just discover these and just immediately put them out into the world so we spent almost a year actually just doing responsible disclosure we emailed every middle box manufacturer we could we could get in touch with and gave them advanced copies of our paper advanced copies of this attack we actually emailed there's something called certs country level emergency readiness teams these are teams that exist in various parts of the world that are basically designated to respond to network events pertaining to that region so we emailed all of them around the world like hey that chinese sensor you guys are operating potential problem there so we spent months and months working with ddos manufacturers certs middle box manufacturers to try and patch these things and clean them up before this ever got out into the world at the end of the day this kind of runs into this this broader responsible disclosure uh thing that a lot of the security field wrestles with of if i never publish this there's often no incentive for for this issue to be patched yeah like if there's no there's no downside to the network they don't need to patch it and if someone else discovers it before this gets out there then they can start using it without it being without the world and the defenders knowing about it yeah so there's this really tricky line you gotta tow almost of i need to let everyone have as much time as possible to patch it but i also need to know it's going to get out there to incentivize them to patch it um so with that with that in mind we took the approach of let's take as long as much time as we possibly can let's tell everyone ever any invested party about this attack how to patch it how to fix it we gave them scripts to test their own network and then after several months had passed and we were confident that they were if they were going to take action they already did then we released the work yeah cool yeah now you're a member of something that's called breaker space i've already mentioned it at the beginning do you want to maybe because it's pretty unique do you want to talk a little bit about what this is and what it does yeah i'd be happy to so breaker space is a lab at the university of maryland uh any umd students watching come check us out the breaker space lab the the kind of defining feature of this lab is that undergraduate students are invited to join and participate in the lab so it's it's the goal of this lab is to broaden and make research more accessible beyond just like pc students and graduate students who are doing it so this geneva team and the broader censorship team within this lab has been staffed i've been leading the team but i've had a team of undergraduates who've been working with me on these projects so every every project we've talked about today and every paper on our our website it's this has not just been a one-man show this has really taken a village to get these off the ground and get these moving they're it's huge huge tasks and um i'd be remiss if i didn't mention the huge team of students who've been working on this with me and okay not unrelated to them being undergrads or not did you like how often does it happen that you get into like hot waters like you know they're you know in security research there are national defense implications there are legal implications and so on like how do you navigate that space and how often does it happen that you're like oops i i hope no no one noticed this it definitely it definitely happens um and it's we're really lucky to have such a supportive like university atmosphere in which we can do these things yeah uh we've worked closely with um irb the institution review board and our network security people um i mean there was there was one week where we for that scanning people were talking about we're like all right let's kick off some scans and then we immediately knocked out the university firewall it's like oh no um and they worked with us and helped to get it back and then helped work in such a way that wouldn't happen again so what you're describing absolutely happens um i mean one time we were accidentally we didn't know this we were accidentally attacking like the city of jacksonville florida and it was like whoops let's let's go email them so that stops happening like the university of kentucky things like this so what you're describing happens all the time it's like oh shoot whoops and often those like whoops moments are like that's a cool discovery you just made uh we also got to go fix whatever you just broke yeah so totally happens happens all the time we got lots of crazy stories like that um we're really lucky to have such a supportive atmosphere in which we can do these things it's okay to break things um as a work to fix them obviously in such a supportive atmosphere yeah where can people go if they want to get started in this space like let's say i'm an ai researcher i want to i have a good understanding of whatever reinforcement learning and and evolutionary methods and genetic algorithms and all like but i've not much clue of security is there resources i can i can go to that you can recommend so for security in general there's there's so many i mean and there's i'm sure there's a two dozen youtube channels that could probably hook you up with like incredible um so maybe i like we can send someone and link some of those below or something um i wish i could say that there is like this amazing uh ai censorship i want to say like censorship resource space where everyone can come to and learn how to apply ai to these techniques uh someone like that doesn't quite exist but there are great there are great resources for learning about what censorship is happening in the world um so something like uni uh uni is ooni it's the open observatory of network interference spin out from the tour team that monitor censorship all over the world um you could pull up the website later but the um they can identify censorship in basically every country it's run by volunteers and it's an incredible organization so there's all sorts of groups like this that are studying censorship monitoring for censorship so for people who want to break into this more specific field of censorship there's all sorts of great resources sensor planet is another group run by the university of michigan they're an awesome team they also publish all their data cool so all these groups have this very open sharing like hop on their website and they got lots of great resources reports data you can get your hands in excellent uh is there is there anything else you want to get the word out to to machine learning and ai people uh big open questions anything uh that you feel should be out there uh really just this whole space like this this whole idea of um there's this entire space of you can apply these techniques to in a way that's immediately impactful helping real humans on the other side and humans who kind of need this help like you have this potential to make a real immediate impact on the world so it's a great space to get involved in excellent kevin thank you so much for being here and bringing this a bit a bit closer i i know more i hope everyone else does too now uh yeah thanks so much for having me this has been a blast excellent super appreciate it bye youhello there today i'm talking to kevin bock who is a cyber security expert and one of the main people involved in the geneva project geneva is a genetic algorithm that evades censorship by nation states so in real time geneva can evolve to the ever more present danger of censorship by really big entities such as governments all of this is done through an evolutionary search over a program grammar and in this interview we're going to touch on a whole range of topics including geneva how it works what it does why people research it and what it has done so far in the world but also the broader topics of security and its connections to ai how people can get started in this field and what the main questions and problems are in this space further geneva comes out of a project at the university of maryland called breaker space which is a sort of lab that includes undergraduates in security research which is a really cool project and i think highlighting this would be helpful to some people maybe you're at the university you don't know this exists go there take part all right without further ado i want to give over to the interview and have fun all right everyone i have with me today here kevin bach who is a phd student at the university of maryland a cyber security researcher and a member of breaker space which is a pretty cool project at the university of maryland he also has been in the news a little bit with a project that's called geneva which uses genetic algorithms to evade censorship by nation states and i think that's pretty cool so kevin welcome to the show and uh thanks for being here thank you thank you for having me i'm excited to be here so the goal of today it's a little bit different because i'm a total noob at security most of the audience of this channel is not is into machine learning maybe some know about security some know about um the the censorship apparatus that's in place around the world and what people do about it i think most won't so today i'll be asking mostly noob-ish questions and um we'll have you here to guide us through everything to guide us through like what's happening in this world so maybe you first can start off a little bit how did you get into like how did you get to the place where you are what's kind of the the main things in security right now that draw you to it so i think security and the censorship space also is in this this really cool um this really cool time where ai and ml techniques have been exploding in all these other fields and they're just over the last four years really breaking into security and we're still figuring out all the different applications where you can apply these techniques in security there's new techniques and new applications of this that people are discovering all the time from better ways to detect spam and better ways to identify hey this domain is malicious or ai-based scanners for that that binary you downloaded that's probably malware things like that so security field is still discovering all sorts of new ways you can apply these techniques and that was one of my motivations initially actually of bringing this censorship because this project was really the entire field of censorship's first foray into using ai and ml like techniques and if you if you talk about censorship what do you mean exactly by that yes there's a so many forms of censorship in in in effect around the world today i mean everything from political pressure to self-censorship to taking down po like there's so many different types so i'm going to scope this discussion down a little bit just the type of censorship that we study in this lab and that's this type of automated censorship that happens in the network performed by nation states so what do i mean by this um if you're a user in certain regimes around the world let's say in iran or something and you try and make a request as that request as that web traffic crosses through the the border to the country it is scanned parsed and inspected by some machines that physically reside in the network called middle boxes called because they're in the middle of the network and these middle boxes examine your requests and they say is this something we should allow or not and if the answer's no they either inject traffic to take down your connection or they drop your connection or they do something to disrupt what's going on and you'll notice everything i just said there there's no human in the loop there's there's no like human content review or anything like this it's it's a purely automated run by these middle boxes or firewalls deployed by these nations that just like automatically inspect internet traffic as they go by so that's really the scope of what we've been studying here naive question why why can't i just encrypt my traffic and then like every traffic looks the same towards the outside yeah that's a great question so why can't we just encrypt everything people have been trying so there's like a couple different approaches to this you're like well let's just use https right encrypted we're good unfortunately https has a small privacy leakage when you first set up an https connection and that very first initials called a handshake got first back and forth you as the client as a part of the protocol you have to announce the domain you're talking to and that announcement happens unencrypted so if you're making a https handshake to wikipedia in the very first packet you send it's going to include the word wikipedia and that's called the server name indication field you indicate to the server what the name of the server you're trying to talk to and unfortunately sensors just read that fields and then they take down your connection if you're talking to a forbidden domain so https unfortunately not close but not quite finishing the job now i will say there there have been just a quick sidebar there have been some advancements in hbs to try and fix this um there's a recent proposal to encrypt that field uh it's called encrypted sni um and china just started centering censoring that last year so the you can try and encrypt things but these sensors are often just hostile to um the idea of just letting their letting their citizens just encrypt all their traffic i guess it's it's a little bit like if everyone encrypts like with https nowadays everyone does it so you can't conceivably block https you know just just because you don't like some traffic but if there's a new type of encryption uh you you can probably it's probably only the people that have something to hide that use that type of encryption so is is like is a strategy that the rest of the world as fast as possible would use these techniques to kind of make that approach unusable that's exactly that's exactly right the the broader topic you're actually discovering and saying out loud here is this idea of collateral damage of can we make a protocol or something so popular and used so diversely that if a sensor were to try and block it it would cause irreparable harm to like good services there's some meaningful cost to performing that censorship so just like you've identified https that's everywhere they can't just shut down all https but rolling out a new encryption method for https that's not very widely deployed they can nip that in the bud and prevent its rollout so there's this kind of this interesting a race in a game between developers and these sensors that's still being played out now let's let's talk about more let's say naive approaches what is the the development of the field like what has been tried before and what has been let's say thwarted or what's the cat and mouse game looked like in the past i imagine different things like there's tor there is you know all kinds of things there is probably things that everyone installs on their end like vpns and tunnels and so on like what's what's been the general development over the years yeah so the researchers and sensors have been playing this cat mass game for two decades now and it's kind of evolved and it's been playing out in multiple fronts um so you're exactly right tor has been a huge uh front on that war if you will like we've developed tour and continued to advance it unfortunately though there are some limitations just the tor protocol and sensors can enumerate the tour entry points basically and just block you so once you get into tour you're generally great but they try and lock you out um there's been all sorts of techniques people from people have proposed like maybe i can disguise my traffic to look like skype and then the sensor is like well you didn't disguise it quite well enough blocked um there's a whole interesting field of defeating censorship or some field i should say um called packet packet manipulation based censorship and this is this idea where um all our communication is happening via packets and if you just tweak those packets in just the right way uh you could cause the sensor to miss you um and historically that's also been something that's played out in this cat and mouse game where researchers were will study these sensor systems and then they'll find a loophole and they'll deploy and use it and then the sensor is like oh i'll fix that and then we're back to square zero so this this game has really been continuing to play um i'll call one thing out real quickly about vpns because a lot of people particularly those who have been to china are like i've been able to use a vpn but it's been okay um vpns in in many places work in many places they don't there's a country in the news recently um they were in the news because they rolled out a new law that forced their citizens to swear on the quran that they would not use a vpn in order to get internet access installed in their homes it's just like a crazy sentence to say out loud yeah but in china for example these vpns they many of them work most of the time but what researchers have noticed is that around the time politically sensitive events are happening or political uh such as elections things like this a lot of vpns will just mysteriously stop working yeah and then after the event they'll mysteriously start working again and it kind of points to this broader idea that some of these countries may be sitting on more censorship capability than they deploy on a daily basis um and they they have more power than they use um so this cat and mouse game may even be like the cat may even be stronger than we think it is yeah uh this this this can you give us an idea of what this packet manipulation uh evasions look like because i imagine something you mentioned before you know if there's wikipedia in the hater i don't want my population to see wikipedia like that's it right what can i possibly manipulate there in order to get it to get through such censorship yeah so we can think about sensors um as our computers are sending packets around you can imagine a lot of that communication like you're writing mail and your packets are envelopes that are going that are going through the network and in order to have a communication with a server like wikipedia that's going to take a couple a couple envelopes back and forth right and the sensor is just like the postman in the middle reading all your letters unfortunately that postband's got to process a lot of letters a lot of letters and you can imagine the something that scale of like china you're you're dealing with a huge huge volume of traffic just at a constant basis what that means is the sensor can't just remember everything it sees okay so for example if it's trying to if it's trying to track that hey that person over there is trying to talk to that server over there and that person over there is talking that server over there that state it has to maintain right the this amount of state it has to maintain it'll grow and it's in the size of some word like china it could grow pretty fast so they have to be really careful about what they remember in the state they maintain so you can imagine doing something like let's let's say we're exchanging packets there's exists a type of packet called the reset packet and these are normal packets our computers send these all the time but they basically just exist to tell the other side stop talking to me immediately i'm hanging up the connection so you can imagine doing something like you and i are communicating we're sending these packets back and forth and i just slip one additional packet into the connection towards the beginning and it's reset packet and i'll send that packet along and when the postman sees that packet he's like well these guys have stopped communicating after this message he's going to ignore him forever and then he throws away the state he's maintaining about our connection he forgets that we're talking because why would he need to remember anymore he thinks we're done and if i craft that pack in such a way that it won't make it to you or you'll see it and ignore it or something like this then we'll be able to still communicate fine right like our our communication is unimpacted but any other packets that go by the sensor is like i don't know who this is yeah and you can get through so this is like the broad strokes this idea of packet manipulation based censorship where you're you're tweaking the packets that go by to try and basically trick the sensor that's in the middle into letting you continue to talk now do i do i see this correctly that there have been like a giant amount of these schemes proposed and as you say there's a cat and mouse game one is being proposed then they fix it then another one then they fix it so that points to the possibility of what if we could have something dynamic right what if we could have something that by itself tries to invent new things and that's where you went with geneva do i understand that correctly that's exactly correct yeah you're spot on yeah so over the years there's been i want to say dozens of these that have been proposed and researchers have it's exactly what is exactly this cat masking they studied the censorship system i mean the sensory system's not public so they're probing it they're they're trying to take measurements that's a lot of work and then they get an understanding they apply their good human intuition they develop something cool and publish it and the sensor fixes it and they don't tell you they fixed it yeah they don't they don't publish a paper that's like hey we just fixed your bug so it just resets the score zero and so the idea with with geneva which stands for genetic invasion the idea of this was it's an algorithm that could kind of flip this process on its head so instead of a human having to to take the approach of let's understand how the censorship works and then defeat it let's just have some ai or fuzzer or automated system just attack the sensor figure out ways through and then give it to the human and now after the fact my slow human brain can go figure out why that thing worked and now my brain is no longer the bottleneck to helping people get through the sensor how does how does this you want to go a bit more into detail i mean it sounds great at the surface but there's a reason right we need security researchers probing making sense and there's a reason that's the bottleneck if i were just to be like well you know fuzz a bit um it's probably not gonna work so what what does what does geneva do uh that allows it to even be successful where maybe humans take a long time or wouldn't be successful yes there were a couple pretty significant challenges when we first started in applying something like a genetic algorithm or really any ai to the space of censorship and if you think about the way censorship works it's not hard to imagine like why that's the case because if you think about think about a censorship problem right like a query is either censored or it's not it's just a binary decision so it's not like your traditional ml or ai where you have this nice like gradient descent there's no error you get back from the sensor the sensor doesn't tell you like hey if you tweet your query just a little bit you're getting closer yeah you know there's no gradient which with which you could work so that that property alone rules out the majority of the ml field as far as approaches you can take is there even a loss like you said it's hard to detect if you even get through how do you do that in the first place how do you notice success or failure yeah so in our case you're exactly right dude capturing that could be difficult um what we do to make it easier in ourselves is we obtain machines inside these censored countries and directly try to request for written context so geneva trains directly against the sensor and we know we got it because when the sensor takes action is kind of obvious so geneva will try and obtain some forbidden content while manipulating a packet stream and then if it succeeds great if it fails we'll know yeah right so this idea of how do we apply ml ai some fuzzing to this space like what how do we how do we build this um there's a couple main challenges towards doing that the first is this total lack of gradient that i mentioned um and really that only leaves you with kind of a small number of approaches um and we chose to go down the route of let's use a genetic algorithm for this there's some nice properties it's it's easily explainable you can understand how it works while it runs it's a little less black boxy than something more like a neural net or something or markov or something like this so but if you want to build a genetic algorithm you need a couple of things um you're seeing what some of these strategies look like right here um so if you want to build a genetic art there's a couple things you need you need some some building blocks something that something that the algorithm can compose and put together um and you need some way for it to put those things together i mean us humans as examples as far as like genetics goes we've got our dna bases right actg and our we can put those together in dna for the genetic algorithm for geneva we needed to decide what what makes sense for building blocks for the algorithm to use um and that alone is like an initial really huge challenge because you could be creative and then you can think about a million different ways an algorithm could manipulate a packet right flip a bit you could flip this bit like there's just so many different things you could give it to do so one of the first challenges we had to figure out was how do we balance what this algorithm can and cannot do to the data it has and on one hand we could let it flip any bit um the downside of that is it could take like forever to learn to check some but it's super powerful like on the other other extreme there we could just encode what previous researchers found and let it like play with those together it would be super fast but it'd be hard to learn anything new right we'd just be building in biases directly so the approach we ended up taking was giving geneva basically the same ability to change traffic as what the network itself could do so the network itself has just a few set primitives they can do the packets you can take a packet make multiple packets you can duplicate them it can change a header to something it's tampering a packet you can take a packet break in multiple pieces fragmenting you can take it back and drop it which is just basically deleting the package um so we build out these building blocks and then allow it to compose these things together in trees yeah so like syntax like you give it a syntax and it can perform it can assemble a little program out of this syntax one like one we see right here that's exactly correct can you walk us through what this particular thing does sure sure this is uh this is a this is this is kind of a fun this is kind of a fun strategy um so there's a few different components to a geneva strategy i'll break down the syntax for you real fast what these programs look like so the first component is the idea of a trigger the trigger is what's between the the square brackets so there's two triggers in this tcp flags s and tcp flags are and when geneva is monitoring traffic the trigger tells it which packet should i act upon so this first trigger you see here it says tcp flags s okay so that means that whatever actions are attached to that trigger will run on any syn packet it sees s stands for sin and sin means the start of my connection so what this is going to do to that packet is the very first action we see is duplicate so that means it's going to take that packet and make two of them now duplicate the syntax of this is it's one set of actions comma another set of actions so you'll see the two actions you see here are tamper and then send so the second duplicate we do nothing to so the second syn pad the second duplicate we're just going to send on the wire but to the first duplicate what we're going to do is we're going to replace the flags fields in that packet with synac sa and then we're going to send that packet so basically what this little program does is it sees outgoing synap packets outgoing syn packets to your computer and it duplicates them to make two packets and then replaces the flags in the first one with cynic now any networking person listening is like this is clearly ridiculous this this never should work like why why would why would we even do this why are we talking about this and what's going on here is that for certain sensors around the world uh synack is the packet that's typically sent by a server it's never sent by a client so what's going on in this in this strategy is when the client sends a synack the sensor says whoa i must admit something this client is clearly a server which means the server must be the client yeah it reverses the roles of client server in the mind of the sensor and as a consequence when the client makes the real request since the sensor is processing packets differently between client surfer you're through i see so that's this idea of the strategy so that connection in the in the mind of the sensor is already established as here's a server here as a client and it kind of keeps that state for subsequent packages more or less yep that's exactly it yeah so let's this is an example of just one strategy and one of these programs that so geneva built this program itself and it built this through the process of evolution yeah and you've discovered just to to to jump ahead a little bit because we're not through yet with explaining exactly how it works but you've discovered that geneva will actually reproduce a lot of the uh a lot of the common or known or or already discovered uh discovered things that researchers have proposed right yeah we had this with this really cool result initially where we set out to try and uh we wanted to we first developed this tool kind of bench market against the the rest of the fields um and that that's kind of challenging because sensors have continued to evolve yeah so we did was we sat down in the lab and we implemented in the lab our best guess as to what our our best implementation i should say as to what these sensors look like based on what previous researchers found and then trained geneva against these mock sensors and also trained it against the great firewall and and real sensors where we could um and we found was very quickly it was able to reproduce basically the entire field yeah um every strategy human had come up with this this also found and it found pretty quickly um so it's really showing the power of automated approaches and ai and ml yeah so you have you have uh let's let's get back a little bit you have this syntax right that you can build uh trees from which are valid programs in geneva this will modify the traffic somehow now to say that most of this traffic will just not even be traffic probably like it will like the connection will be somehow bad uh some of it will go through and some of it will actually maybe evade the sensor what do we need to get there what do we need to um you know to to get to a place where i guess if you just do it naively and you randomize a little bit it will just be bad like 99.9 of all the programs you generate you'll initiate them and then after a while you'll see like my traffic doesn't even isn't isn't even getting anywhere right so what are the like of the genetic algorithm components what do we still need yeah so we're we're building our way up to the united we've got just like you said we got our building blocks we got a way to put them together we got a syntax that we can build these programs out of and we can run these programs on network traffic and you're exactly correct that if we initialize completely randomly it's going to do terribly and that's exactly what happens we've tested this um so what what where do we need to go from here now that we have this so this this kind of brings us to this idea of let's let's let's get evolution in the mix so you can imagine you can imagine the way the way this works is we have a big pool of strategies okay we'll call this a population and each of these populations just take for granted for now that we have some diverse set of strategies in here and we have a way to test them right we can try and make a request for something forbidden and we can run these programs on those requests as we make them so for example from inside of china we can try and access wikipedia that's a sense of resource and we'll have these programs running on that connection we'll just try and make that connection over and over again what we'll see is some of these strategies will destroy our connection some of them will just not work at all and do terribly some of them might let her some of them might keep our connection alive and maybe if we get crazy lucky we'll defeat censorship but for now let's just say a whole bunch of them will just destroy our connection and maybe somewhat what we have is a fitness function and this fitness function uh this is it just borrow some a much broader space in ml and ai but it's basically this idea of if you take in some individual from the population some individual strategy how good is this thing survival the fittest like should this thing survive basically continue to propagate its genetic material so this was actually the second big challenge in applying ai and ml to the space of sensitive division of what on earth should a fitness function look like in this space because just like we talked about earlier there's no gradient right and even even coming with like a loss function can be a little tricky and i mean even if if like sorry to interrupt but if the fitness even like if if the i guess the fitness is it anything else than zero like okay maybe some connections don't even work to like the server next to you you can discard those but other than that the fitness is either doesn't reach the target or does reach the target and if it does you've kind of won right like how can you even get a meaningful signal is there a fitness in between zero and one yeah so and and part of what makes geneva work is we've kind of shoehorned our way into getting fitness between zero and one um and specifically what we do is is rule out those strategies that break your own connection um so that that's kind of how we've gotten between zero and one cause it's not it's not technically zero or one it's almost negative one zero one and negative one is geneva's shooting itself in the foot right it's just like dropping all your traffic like that's never gonna work and we shouldn't even bother exploring that space more right like we're never going to go anywhere but if you can make it so that your packets are at least interacting with the sensor and at least have the potential into the server well now we might be getting somewhere so basically what we do is we set up the fitness function in such a way that if strategies destroy the underlying connection they'll be punished severely and basically killed off and strategies that interact with the sensor even though they get sensors they'll get a slightly higher fitness function than those other ones so what's going to happen is because those those individuals aren't they're not successful but they're still the most successful in the population pool which means some subset of them will continue to reproduce basically that subset's just chosen randomly but because we're just choosing randomly mutation is still going to happen so we're basically taking a set of individuals they all interact with the sensor and then we just mutate them and try again and then mutate them and try again and effectively what this is turned into is a fuzzer like geneva is the the fitness function is basically makes this a targeted fuzzer where we can fuzz just the space of strategies just the space of programs that allow us to interact with the sensor and then where it gets interesting is as this fuzzer is running generation after generation just trying different crazy things against the sensor if it finds something that gets through suddenly that fitness is way higher than everything else and that individual will start sharing its genetic material and propagating within the population pool at that point we could stop we could stop the fitness function right there but we optionally add some additional punishments and rewards for the algorithm at this point and specifically we add basically a punishment for strategy complexity so if if this if an individual is successful we optionally punish it for basically the number of actions and the amount of overhead it adds to connection and the reason we do that is this is not strictly required but i have a very small smooth human brain and it's so much easier to understand a strategy that's only two actions long compared to something that's 50 actions for example so if we could encourage the algorithm like great you've got a solution now simplify it down for me and it will over the course of generations whittle it down to its smallest form and then at the end presents you its population pool and its its best individuals um and and we we see here a few ways you can mutate i think this this just essentially comes down to changing the syntax syntax tree in some form um yep and these are basically you can yeah you can imagine all the different ways you could you could take these programs and mix them around if you can think about it geneva could probably do it yeah and so just um maybe for for my understanding but you're trying all of this you you say you have some machines inside of these countries aren't and i read some like obviously this is not going to work against ip blocking like how do you how do you not get ip blocked by them if like i imagine there's like some weird traffic that's you know hits my censorship wall all the time um why don't i just be like well gone yeah that's a good question and we get this question a lot actually and you're kind of pointing to this this broader question of like what's the sensor's response yeah you're doing all these wacky crazy ridiculous things i mean there's a strategy in there that just lights up every tcp flag like that package shouldn't exist flatly it did it has no meaning on the network but geneva tried it found it and found that it works um so where do you sensor where do sensors go from here um it sounds like we're talking about things like it's sending crazy packets it sounds like that should be something that's easy to detect on the network um but it sounds easy until you try and write it um because if you think about it writing something to detect abnormality when you have no idea what that abnormality looks like um especially in the space of just like just how random and crazy the internet is all the time um identifying that is actually harder than it sounds and what makes it potentially even harder is that a lot of the middle boxes that would be doing that detecting is exactly the middle boxes geneva's mucking with with these strategies so it may be the case that their detectors are also getting screwed up whatever an imaginary detector would also be getting screwed up by these same strategies yeah so it's something they could take an action against but we haven't seen any sensors roll out something like this something else you could imagine the existing fitness functions just described for geneva it kind of assumes a static adversary like an adversary that's not playing along if you will i mean it's also assuming an adversary that's not doing anything special to hunt it out and you could imagine a sensor that's a little more sophisticated than that so something we've kept an eye on is is at the end of the future if either the sensor starts rolling out ai ml techniques or if the sensor starts hunting for traffic that looks very abnormal and you can imagine encoding additional uh bits into the fitness function such that you could encourage geneva to make this strategy blend in with normal traffic i want this look as normal as possible but still get through things like this so you could imagine all sorts of modifications to the fitness function to make an algorithm like this a stronger competitor against an adversary that's also playing along but we haven't seen the adversaries do that yet so we haven't needed to i was surprised when we talked to a bunch of you know also people in in the intersection of security and machine learning that uh there are as you say these ml based let's say malware detectors or or things like this i guess also weird traffic detectors and and people use them for example for company networks and so on and these are to my surprise also for example vulnerable to adversarial attacks so there's an entire new direction opening which usually people imagine adversarial attacks like ah i changed the image a little bit and it's really this distinction between how the human sees it and how the machine sees it but you know in malware it's like just bits and i feel like you know very small number of bits there's nothing like how the human sees it and how the machine sees it it's so weird um but yeah i think i think it's it's pretty cool and you got some attention in the media and the the articles usually go something like uh this ai can evade censorship or something like this and um now knowing that you use genetic algorithms uh what do you how do you think how was how's your work received in the media what do you think about it do you do you feel like they are kind of trying to put a few buzz words in there or were you happy with it in general pretty happy and i i've kind of been lucky to i mean even just discussions like this or we could talk about the work in a deeper context than just like throwing buzzwords around um like this is just an awesome way to kind of cut through that that buzzwordy um uh fanfare if you will yeah um so i've been kind of lucky and you always going to see buzzwords attached to things that's always something like that but um yeah i'd say overall it's been it's been received positively and things like this really would help us get there cool and the so just saying the code for geneva is available it's on github um you know anyone can anyone can i guess look it up your builds fail right now i'll i just have to tell you i'm sorry um yeah we're switching between ci systems and haven't finished the migration okay i mean yes uh nothing new here um so where is is there i mean there is a lot of open space here it seems the genetic algorithms are very cool they're they're like a a basis right here um do you think there are more places where uh like machine learning techniques especially you said you know we kind of have to draw back from the gradient-based approaches but there are definitely there's definitely possibilities if you think of something like you know alphago or something like this that's it's a discrete game but also you know they they work with neural networks that for example uh when you build your tree your modifications that guide that somehow that you know have an idea which of the modifications might lead to a better algorithm to a worse algorithm and so on do you see any sort of uh evolvement that could happen there definitely definitely our when we first wrote geneva our goal was not to be the last ai approached the space it was to be the first and hopefully the worst yeah it would be great if viewers out there hey take a crack at this there's all sorts of new techniques out there just waiting to be applied this this space is it's rich and it's interesting and it's impactful like this is the kind of space where you discover something and get that out of the worlds you're helping journalists and activists like right now so it we're really excited to see where this where the space goes and continues to blossom so yeah all sorts of all sorts of techniques just waiting to be applied and are you also actively investigating the the sensors side because i imagine that uh the more or the more capable you are in censoring things also the better you can research counter strategies so a bit we've tried to tailor our research in such a way that we're not directly helping a sensor we never want to publish a paper that's like really the use case of this is just making sensors better like so if we do do research down that vein it's purely in service of let's make evasion better yeah um we and we've tried to be very good about not releasing anything and not not publishing anything that's directly hey sensors this new technique man that's going to really change the game for you should try and roll that out so uh i guess that answers your question yeah yeah um well if you if you look ahead you say yeah we said that the space is wide open what would be what do you see as a a like maybe a bit of a north star for for the field like for let's say censorship evasion or something like this what would be characteristics of an ideal algorithm that's a really good question an ideal algorithm something to shoot for um so i think i can answer that question by talking to i guess how this how the the problem of censorship is getting harder um and getting more complicated um so as censorship is continuing to evolve like this this cat and mass game exists it's not just sensors patching bugs like sensors themselves are flawlessly getting more sophisticated they're getting better and one direction that we think sensors will start exploring in the future is this idea of more personalized censorship so instead of censorship policies being rolled out for their entire country you could imagine a system where users with elevated social credit scores or different professions things like this could access different content online and be subjected to different different forms of censorship and in cases like this something like just directly applying geneva gets a little bit harder because you can't just apply geneva in one vantage point and help everybody right like you need to suddenly have a way to to reach more people and help more people at once um so it's this question of how can we scale this up in a large way and how can we scale this up safely in a way that protects itself from attacks from the adversary like the nations they can see our traffic so in theory they could muck with the training how can we prevent that so in crafting this like ideal algorithmic circumstances a lot of things you have to consider so i think building towards this idea of can we do federated training across a large a large population can we do this in a way that protects users can we make the algorithm more efficient so it needs it needs less connections to figure things out all sorts of things like this i think are really good goals to shoot for and that is more people viewers try this out as more people like jump into the space and play with this these are some of the problems they're going to be building towards is there any work on like screwing with the sensors like i imagine that if i you know if i build an evasion attack that has like a really low hanging fruit of fixing it and that fix in itself would somehow be you know completely uh devastating but i don't know it when i implement it um is there work in this direction so is there work in the space of mucking with sensors definitely um crafting the kind of attack you describe is kind of tricky because we don't know what the sensor's code looks like yeah you know now there is this there is this idea of there are there are bugs and and limitations that as they patch them may expose them to other attacks so one quick example of this if you go back to our analogy if we're sending letters back and forth um a common a common limitation that many less sophisticated sensors experience is they can't if i've taken a packet or taken a letter and i break into two letters they can't put them back together yeah right and that's that's like a huge limitation because it's really easy for me just to take it back and split it up and send it through so to fix that the sensor all it needs to do all it needs to do is remember every packet it sees and then stitch it back together based on the numbers on each of the packets so that's like a simple fix to a limitation but when you apply that fix you open yourself up to the entire space of attacks of maybe i can sneak a letter in there that you think belongs halfway through the message but it actually belongs to the beginning or it actually belongs to the end or it actually doesn't belong in that at all um and so you have this is one example that we've seen in the wild where this idea of i have i need to fix the limitation and by fiction limitation i've opened myself up to a dozen other potential attacks so that definitely exists how how um how i'm just thinking uh from my noobish understanding right here how much of a problem is it that our protocols are rather fixed i imagine if i could if i had like a dynamic language where if i communicate with anyone the first step would actually be to negotiate a protocol in a very dynamic way right that would sort of give me the possibility much more to together with the person that i want to communicate with uh negotiate something that could get around these sensors in a in a completely adaptive fashion is that at all feasible or is there some some flaw so is it feasible maybe um i mean that if if such a thing like that could be built it'd be incredible yeah it'd be awesome so ai people ai people watching get on that because that sounds good that sounds awesome there are definitely some challenges into rolling that out and um you basically need to get in the headspace of if i roll up this protocol and the sensor knows about it what is it going to do what is it going to do so there are there are protocols that exist out there where from the very first byte you send the whole thing is encrypted and in that case it's pretty hard to fingerprint right there's it never looks the same it's always just a stream of random looking bytes but the sensor can also find that just by looking for something that looks like a random stream of bytes and just like you said that protocol never changes it always looks the same so if you you need to really develop a system that's flexible and dynamic enough that today it looks like this protocol tomorrow looks like this protocol today it looks like nothing in between so you really need to be very creative and very deliberate with how you do it so i i'm not aware of anything like that personally maybe someone's working on it out there but it would be awesome if you could do it now speaking of mocking with sensors you also have other work that uses the censorship infrastructure so essentially anything that's in place from the sensors to perform some some attacks as i understand it uh any any attack you could do is actually made potentially worse by the censorship infrastructure such as a ddos attack or something like this do you want to talk a little bit about that i would love to yeah so an area of work that we went that we started exploring a year or two ago uh something we noticed for a lot of these sensors is um when you interact with them as a user like they need to respond to you they need to send you some traffic right like if i'm if i'm trying to request some resource and that resource is forbidden maybe the sensor sends me a block page and that block page says hey you're not allowed to access this and the thing is that that communication there what's going on is my request can often be much smaller than the size of the block page i get back so as an attacker this opens up the space of hey maybe i can use the sensor to launch an attack at somebody else by making a request for forbidden things pretending to be someone else and then letting them send that huge response at that other person um and this is a this is an idea of a reflected attack or an amplification attack because as an attacker i can make a tiny request and get a bigger request out of it so i'm amplifying my traffic so amplification attacks so we started exploring whether we could do this to sensors and you use these nation state sensors or even just beyond sensors there's normal firewalls like things that universities or just regular networks organizations have deployed but we discovered hundreds and hundreds tens of thousands millions of ip addresses that were behind these sensors that we could use to launch these attacks yeah and found these attacks got crazy powerful and the so the the who does it hurt more the sensors or the final recipients of this the attack yeah so in this case the the weight is beared by both but the brunt of the impact will be felt by the victim yeah so this line of work it mucks with the sensor but really really the some of the i want to say the purpose or something you could distill this work down to was sensors are causing more harm to the internet then they're not just the harm of a sensor is not just restricted to the citizens within its borders like a sensor anywhere is a threat to anyone everywhere yeah um so it's it's this the work was less about let's flood a sensors network and more about let's prove to the world these things are dangerous when they've been applied as carelessly as they've been deployed now other than block pages you have some you have some very specific schemes of what you do specific to these censorship infrastructures that make these attacks even more powerful what what are examples of that yeah so discovering these attacks in the first place i'm making it sound very simple right you just send a request and then the response gets through um but i'm skipping over kind of an enormous step in here because what i've just described send a request pretending to be someone else should not be possible yeah that sentence should not exist and it shouldn't be a thing you can do and the reason that's the case is because when we make requests all the time this happens i think there's a i think there's a gif in there that explains exactly what i'm saying just scroll up a little bit there's a three-way handshake that we need to complete um and that three handshake is just this short exchange of packets i think it's the one right above that it's the short exchange of packets at the very beginning right here short exchange of packets that exists at the very beginning of our connection and as an attacker if i try and spoof it through a handshake if i pretend to be my victim and start the handshake the server's going to respond to the victim and so i won't be able to get the critical bit of information i need from that handshake to finish it and i need to finish that handshake in order to make a request so throughout all of the up all of networking history basically up until this paper it's been assumed that tcp this underlying protocol behind all these requests is immune to these type of amplification attacks largely immune there's a small caveat there but it's not worth getting into so how do we go about addressing this problem we used geneva and ai techniques and basically we replaced geneva's fitness function and we we told geneva hey you can talk to these sensors but instead of rewarding you for getting forbidden content we're going to do is we're going to reward you for getting content without establishing a connection and we're going to reward you for getting the biggest content you possibly can so kind of turning the fuzzer on its head a little bit and letting it explore the space of strategies that a confuses the middle box into responding so tricky into thinking we have a connection already yeah and then b once we've tricked it getting the biggest possible response we can and so this this is a second set of work that was really powered by the same geneva genetic algorithm and we were able to use the same set of the building blocks and primitives and programs that we had developed previously yeah we just applied them in a new way and this is if i understand it is not a weakness in tcp like if tcp were implemented correctly geneva wouldn't be able or shouldn't be able to find something around this but this is specifically because these middle boxes are in there right yeah you're spot on um tcp tcp itself is not the problem it's the implementation of tcp yeah and that's partially why we did this paper we did this work you can't just study tcp itself you can't like download the protocol specification like think really hard yeah because that's not going to help you you need to actually study real world sensors so that's what we did we took geneva we trained it against we trained against hundreds actually our sensors around the world and then then took the results of that and were able to scan the whole internet uh we scanned the end of it almost 50 times actually ipv4 internet with these different with these different packet sequences that geneva discovered and effectively just attacked ourselves over and over and over again yeah to see what kind of uh damage we could do and how does that square so before you said we're never going to release anything that helps the sensor in any way and now you're releasing a recipe for launching massive attacks on something right how does i mean i i i usually think you know any technology can be used for like with that i could actually attack the sensor directly right and and just make their life miserable um using their own infrastructure which is ironic even uh right um i could use it to you know i could use it to ddos uh the red cross as well uh so that my perspective usually is that any technology can be used for good and for bad uh but you've before said a little bit into the direction we never want to publish anything that helps the sensor uh this seems to be different what what's different here yes the difference the difference here is and i want to note that we didn't just discover these and just immediately put them out into the world so we spent almost a year actually just doing responsible disclosure we emailed every middle box manufacturer we could we could get in touch with and gave them advanced copies of our paper advanced copies of this attack we actually emailed there's something called certs country level emergency readiness teams these are teams that exist in various parts of the world that are basically designated to respond to network events pertaining to that region so we emailed all of them around the world like hey that chinese sensor you guys are operating potential problem there so we spent months and months working with ddos manufacturers certs middle box manufacturers to try and patch these things and clean them up before this ever got out into the world at the end of the day this kind of runs into this this broader responsible disclosure uh thing that a lot of the security field wrestles with of if i never publish this there's often no incentive for for this issue to be patched yeah like if there's no there's no downside to the network they don't need to patch it and if someone else discovers it before this gets out there then they can start using it without it being without the world and the defenders knowing about it yeah so there's this really tricky line you gotta tow almost of i need to let everyone have as much time as possible to patch it but i also need to know it's going to get out there to incentivize them to patch it um so with that with that in mind we took the approach of let's take as long as much time as we possibly can let's tell everyone ever any invested party about this attack how to patch it how to fix it we gave them scripts to test their own network and then after several months had passed and we were confident that they were if they were going to take action they already did then we released the work yeah cool yeah now you're a member of something that's called breaker space i've already mentioned it at the beginning do you want to maybe because it's pretty unique do you want to talk a little bit about what this is and what it does yeah i'd be happy to so breaker space is a lab at the university of maryland uh any umd students watching come check us out the breaker space lab the the kind of defining feature of this lab is that undergraduate students are invited to join and participate in the lab so it's it's the goal of this lab is to broaden and make research more accessible beyond just like pc students and graduate students who are doing it so this geneva team and the broader censorship team within this lab has been staffed i've been leading the team but i've had a team of undergraduates who've been working with me on these projects so every every project we've talked about today and every paper on our our website it's this has not just been a one-man show this has really taken a village to get these off the ground and get these moving they're it's huge huge tasks and um i'd be remiss if i didn't mention the huge team of students who've been working on this with me and okay not unrelated to them being undergrads or not did you like how often does it happen that you get into like hot waters like you know they're you know in security research there are national defense implications there are legal implications and so on like how do you navigate that space and how often does it happen that you're like oops i i hope no no one noticed this it definitely it definitely happens um and it's we're really lucky to have such a supportive like university atmosphere in which we can do these things yeah uh we've worked closely with um irb the institution review board and our network security people um i mean there was there was one week where we for that scanning people were talking about we're like all right let's kick off some scans and then we immediately knocked out the university firewall it's like oh no um and they worked with us and helped to get it back and then helped work in such a way that wouldn't happen again so what you're describing absolutely happens um i mean one time we were accidentally we didn't know this we were accidentally attacking like the city of jacksonville florida and it was like whoops let's let's go email them so that stops happening like the university of kentucky things like this so what you're describing happens all the time it's like oh shoot whoops and often those like whoops moments are like that's a cool discovery you just made uh we also got to go fix whatever you just broke yeah so totally happens happens all the time we got lots of crazy stories like that um we're really lucky to have such a supportive atmosphere in which we can do these things it's okay to break things um as a work to fix them obviously in such a supportive atmosphere yeah where can people go if they want to get started in this space like let's say i'm an ai researcher i want to i have a good understanding of whatever reinforcement learning and and evolutionary methods and genetic algorithms and all like but i've not much clue of security is there resources i can i can go to that you can recommend so for security in general there's there's so many i mean and there's i'm sure there's a two dozen youtube channels that could probably hook you up with like incredible um so maybe i like we can send someone and link some of those below or something um i wish i could say that there is like this amazing uh ai censorship i want to say like censorship resource space where everyone can come to and learn how to apply ai to these techniques uh someone like that doesn't quite exist but there are great there are great resources for learning about what censorship is happening in the world um so something like uni uh uni is ooni it's the open observatory of network interference spin out from the tour team that monitor censorship all over the world um you could pull up the website later but the um they can identify censorship in basically every country it's run by volunteers and it's an incredible organization so there's all sorts of groups like this that are studying censorship monitoring for censorship so for people who want to break into this more specific field of censorship there's all sorts of great resources sensor planet is another group run by the university of michigan they're an awesome team they also publish all their data cool so all these groups have this very open sharing like hop on their website and they got lots of great resources reports data you can get your hands in excellent uh is there is there anything else you want to get the word out to to machine learning and ai people uh big open questions anything uh that you feel should be out there uh really just this whole space like this this whole idea of um there's this entire space of you can apply these techniques to in a way that's immediately impactful helping real humans on the other side and humans who kind of need this help like you have this potential to make a real immediate impact on the world so it's a great space to get involved in excellent kevin thank you so much for being here and bringing this a bit a bit closer i i know more i hope everyone else does too now uh yeah thanks so much for having me this has been a blast excellent super appreciate it bye you\n"

AI against Censorship - Genetic Algorithms, The Geneva Project, ML in Security, and more!

Random Videos