Tuesday, 21 May 2019
9 Plenary session
Tuesday, 21 May 2019
BRIAN NISBET: Hello. Good morning. Wonderful people of RIPE 78, if I could ask you all to take your seats, that would be wonderful. There are sufficiently large numbers of you here this morning to start roughly on time. I am Brian, I'll be chairing this session this morning which is mostly related to how to deal with DDoSs and other such things, but hopefully lots of great presentations and interesting interactions.
You are also all part of an experiment this morning. Don't worry, it's not very detailed, but as Hans Petter called out yesterday, we're trying a different thing in regards to the questions this morning and the interaction. So, interaction at a RIPE meeting is vitally important, but we're also aware that not everybody is super enthusiastic about getting up to the mics. So, what we're going to try, just for this session, we're going to see how it goes and see what happens at future meetings is using a thing called Slido. So what we're doing here is instead of people getting up to the mikes, what we'd ask is you go to slido.com and enter the event code S075 and you will have a box there on the Internet connected device of your choice, probably not your security cameras, but to enter questions, and those questions will come up. Other people have the option of up voting or down voting some questions and then I will be reading them to the presenter. The mics will still be live, but questions on Slido will get priority, so we're going to try that this morning for this particular event. We'll see how it goes and obviously the PC would love to hear feedback on how people thinks it goes this morning. If you have a desperate burning need to follow up or otherwise, you can go to the mics, but again we have a limited amount of time for questions, and the ones that are on Slido get priority. So, please go and take a look at slido.com, S075 and we'll put that up as well at the end of the presentations.
So without further ado, at that point in time, we will move on to our first presentation. So, we have Matthias Wichtlhuber to speak about next gen blackholing to counter DDoS.
MATTHIAS WICHTLHUBER: Thank you for the introduction, and welcome to my talk, which is on next generation blackholing to counter DDoS, this is a joint work together with my appreciated colleagues, Chris, George and Anna. So, let me start with a bit of motivation, why we need a next generation blackholing mechanism.
So, what I did here is, I skimmed a bit the media outlets and looked at the attack volume that is reported in the media, so, those are the very spectacular ones that make it to the media, and if you look at the tan line you'll see in 2015, 200 gigabits per second were a big deal. And in the meantime you see in attacks up to 1.7 terabit per second, I think that one was last year, and you may as well remember the attack on Dyn from 2016, that crippled American Internet access at the east coast. So, of course, those are peak volumes, but nevertheless for each of these peaks that are very spectacular, we have a lot of small DDoS attacks that are going on in our platforms everyday. I mean, you may know that from your daily operations as well.
So, this is of course not an entirely new problem. So, there are something like an ISP DDoS defence tool box, so we have there ACLs, which allow you to filter at nearly arbitrary granularity, that means that you can filter for everything that flows, those ports destination ports, etc. But, they are vendor‑specific and they need a device config that probably looks different for each vendor.
On the other end we have traffic scrubbing services, this is the carefree package, so, these services are a complete industry that offers you to redirect your traffic to scrubbing centres and then scrub the traffic for you and send it back, either way the BGP or DNS redirection and essentially they are doing so in two models, on demand, that means when you need it and always on which means they become something like a transit provider.
Third we have FlowSpec. This is essentially the first mechanism that really works in the context of BGP. So it allows you to configure rules at your neighbour network, for example, your upstream to prevent the traffic from flowing into your network. It allows you to filter at arbitrary granularity, that means you can filter source and destination process well but it requires the cooperation of the other router. So, you need to agree with the other router's owner to enable this feature in order to be able to filter traffic.
And remote triggered blackholing as it is often offered by ISPs and IXPs is pretty similar and also works in the context of BGP. So, it also configures the filtering routes at the neighbouring network, but ‑‑ and that's the big difference ‑‑ it only filters at IP granularity. So everything that is sent against this IP or prefix is thrown away.
And as well as FlowSpec, it needs the cooperation of the other party. So you need to have a settlement with the other router's owner.
So, how should we do that in IXP? So, looking at all these current solutions, our idea for this work was to combine the properties of the existing solutions and try to eradicate the shortcomings of the existing solutions and I think doing a proper DDoS defence at IXPs has a lot of benefits. So, first IXP offers services to hundreds of ASes. That means if you filter this, you immediately get a lot of effect and a high leverage filter out traffic. And IXPs have multiple terabits of capacity, that means that we can easily absorb, even pick attacks, even the largest that are known today with our back plane capacity. Third, we are hopefully a trusted part of the Internet community. That means ASes may trust us to discard traffic for them. That's the main thing here.
So, let me recap how blackholing actually works today at Internet exchange points. Most of them you may probably know it or may even have implemented it themselves at your networks. But I want to have everyone on the same level and also in order to explain the shortcomings later on. So what we have here is a view that is separated into the control plane on the left and the data plane on the right. On the right the control is not too important on the slide. The only thing you need to know here is we have 3 ASes and AS1 has announced the prefix to the route server and the route server has redistributed the prefix to AS2 and AS3. That means there is a route essentially from AS2 and AS3 to AS1. So, on the data plane we have the following information, we have customers that wants to reach a service that is provided in AS1 and we have an anti P DoS attack that is going on, and the traffic sent over these networks and the links to the IXP. And obvious what happens at some point the link from the IXP to AS1 gets overloaded and, well, requests are dropped and everyone is unhappy.
So, what options to, does AS1 have to counter this attack with blackholing? So, essentially, AS1 announces another route. Usually a very specific one for that system that is under attack so it's usually some /32 or /128 for IPv6, and the route server ‑‑ and additionally, it attaches a community to this route that tells basically other routers to discard this traffic. So what you see here is essentially a blackholing community that is standardised and if some router receives this, the router should drop the traffic towards this prefix.
So, the route server redistributes this, but at this point, things start to get complicated because not everyone in an IXP is accepting this announcement. So, in this case here, you see that AS2 accepts the announcement and AS3 does not accept it. Probably because they don't have a rule for accepting more specifics than /24. And that of course has effects on the data plane. So, what you see here is AS2 that accepted the blackholing prefix totally blocks out all the traffic regardless whether it's the bad anti P DOS traffic or the good traffic, the traffic from users that want to reach the service in AS1 and forecast 3 you see that traffic is still flowing through regardless whether it's anti P DOS traffic or good traffic. I mean, the load on the uplink to AS1 has probably reduced but still some people might be unhappy with the situation.
So, to wrap that up. What does that mean? So, some of the problems are blackholing as it's implemented today it blocks unwanted and wanted traffic alike. You cannot differentiate between both. The behaviour is hard to predict, so, you may have ASes that accept it, you may have ASes that deny the blackholing announcements. And that may really lead to a situation where cannot predict how much traffic you will lose if you announce a blackhole. And you always have a subset of peerings that simply, where blackholing simply has no effect no matter what you are doing so. If they don't choose to accept it you will receive traffic from them and there is nothing you can do at an IXP about it.
Okay, let's give a bit more into the details of these limitations, so we have analyse that had to see how relevant these issues really are. So, what we did here is, we looked at a 40 gig bit IXP port, and you see here on the plot on the timeline you see that until 2020, roughly, you see mostly web traffic like port 80, port 443 and a bit of database traffic. And at 20:20 a memcache attack is started and you see suddenly a lot of, around 70% of memcache traffic. But what you see as well is there still remains 30% of benign traffic. So the customer here has used a blackhole. We see the traffic any ways, but if he uses the blackhole at this point, he will lose that 30% of good traffic as well. So, essentially, blackholing causes collateral damage at this point.
So, also, that tells us that we need something like more granularity on that so the question is how much granularity do we need there? In order to measure that, what we did is we looked at blackholing traffic as it is filtered on the platform. So what you see here on the right is, we compare the traffic that is not blackholed and the blackholing traffic, and then we, on the X axis we show the UDP source port of the traffic and on the Y axis the share of the traffic in the category. And it's pretty obvious that the usual suspect ports stick out very much. So you have things like NDP, L D AP, memcache, and I mean the really interesting point here is you have these six protocols, and if you can filter them, you already get rid of 80% of the DDoS traffic. So, essentially, 6 rules would be enough to get rid of 80%. So, essentially, you want to be able to filter these things out. You want to have a better granularity there.
So, third, we looked at how ineffective this can be. So, what we did there is we did an experiment with our own research AS. We have an AS that peers over the route servers, and we ordered some NDP DDoS attack on the AS, you can buy those today, and we ran an attack to our own system for ten minutes against a /32. And what we did after roughly 300 seconds, we announced the blackhole and we wanted to see how much traffic we lose. This is the result. You see here during, from second zero to 300, the NDP attack starts. We are getting roughly end bit of traffic from roughly 38 peers, so traffic is red, the peers are blue and after 300 seconds we announce the blackhole and you see that traffic drops from 800 to 600 M bits and the amount of peers that are sending traffic drops from 38 to 26. So that is not too impressive. And shows us that blackholing IXPs is not a very effective mechanism. So essentially, the signalling is too complex and you have to, like, you have these ASes that never accept blackholes and this is what you see here clearly on the plot.
So, from the preceding measurements we drafted the following requirements for an advanced blackholing mechanism. What do we want there? We want higher granularity, we want to be able to filter on destination header fields, Proxy‑Arp protocols etc. Basically everything that involved in the flow. We want a low signalling complexity, we want it be easy to use and in case of emergency you want a short set‑up time for this if the consumer calls you. So preferably it should work with everything they have already.
We want to decrease the level of cooperation that is needed for that. So, we want ‑‑ it's only the affected AS and the IXP that is involved in this, no other parties.
Another thing is telemetry, I didn't show that here in the plots but we often see behaviour like customer is announcing blackhole. Then he is withdrawing it to see if the the attack is still going on and reannouncing the prefix, etc., etc.. they can happen with very high frequency, so we want to be able to give the customer is feedback on the state of the tack at any time. Of course, the whole thing has to be scaleable, so we have different dimensions there, it has to scale in terms of performance, filters, reaction time, and configuration complexity.
And last but not least, we need to meet all these requirements with minimum requirements, Capex and Opex. It works with our existing hardware and we don't have to buy anything new.
So, our advanced blackholing system works as shown in the slide. So, we still have roughly the same situation as before. And AS1 is still under attack and what it does not counter the DDoS attack, it still announces a more specific of the system under attack, and it attaches a signal, essentially a blackholing community to this route. But instead of using the standardised blackholing community, it signals a different community that tells our system okay, you need to apply a filter here.
This route is filtered by the route server and sent to a blackholing controller. This blackholing controller is essentially another BGP peer in the peering LAN, and it accepts this announcement. And what it does is, it looks at the announcement, looks at what the customer wants to filter, and then applies this to the data plane. And as we see here on this example, you can use this to filter effectively out NDP traffic only.
The blackholing system essentially consists of three different layers, so we have a layered architecture there in order to be able to exchange parts of it. On top of it, we have a signalling layer. At the moment this is BGP, in the near future it maybe FlowSpec or something comparable. Essentially, this is used to signal what the customer wants to the IXP.
So, an IXP interface there.
Below that we have the blackholing manager. The blackholing manager is essentially a management layer that allows us to like collect all the announcements or all the signals from customers, create a consistent view of the BGP state and map this to a consistent view of the data plane state or the filters that have to be applied. And this involves things like, for example, if you get like conflicting signals, you have to somehow resolve these conflicts and then apply them to the data plane.
And last but not least, we have the filtering layer. And what the filtering layer essentially does is, it's configured by the blackholing management system on the different boxes and once this is applied, it starts to drop traffic.
So I think the most interesting part of this architecture is the layer in the middle, the management layer and I would like to look a bit more in detail at this.
So, let's zoom in a bit on that part. So what we have here essentially is we have a BGP parser that holds the session to our route servers, essentially it's not only one session, we have multiple route servers, so you have to handle all of these sessions at the same time. And we are parsing all the BGP messages that are coming in, decoding them and then we have a BGP processor that processes all the withdrawals and announcements, and pushes them into a routing information base. So up to here, this is very similar to a normal BGP speaker. However, this routing information base is a special implementation because it allows us to take snapshots at certain points in time, and then we can, like, calculate deltas between these snapshot and essentially these deltas are the configuration changes that we have to apply to the data plane. So we calculate these deltas and then push the configuration changes into a token packet queue. This token packet queue is there essentially to shape configuration changes to a reasonable limit and to configure maximum burst size. Very similar to routing hardware. And essentially we do that because we somehow need to protect our back end from being overwhelmed with configurations. So that's why we need a queueing mechanism there.
And on the other end of the queue we just pop the configuration changes out of the queue, and apply to a network management layer and essentially we can plug in several things here. So for research purposes we have implemented an SDN network manager there. What we currently use in the prototype in the platform is a cross network manager that applies these configuration changes, compiles these configuration changes in the cross policies and applies them to the platform. And from there on, everything is hardware‑specific.
So, let's recap a bit on the building blocks, what can we do with this architecture regarding the requirements that we defined before?
Well, we can filter at nearly arbitrary granularity, UDP, TCP, transport ports, whatever you want. We use BGP communities as before for the normal blackholing mechanism, or an API that is running side by side with that. We have a very low level of cooperation, so it's essentially just us and the ID P members so we can enforce this mechanism in our switching fabric, so no third parties involved in there and that means we can make it work reliably.
We can provide telemetry, so we can provide, on the one hand, statistics, on the other hand we can also allow customers to shape traffic, DDoS traffic, that means the customer still sees what is going on in the network.
We have scaleability. So, go a bit more in detail on the next slides, but we can do this at line rate in hardware.
And we have low cost because we can simply implement this in our existing hardware.
But, of course, also there is, this whole mechanism comes with some challenges, so, I just showed you the management layer, so there is quite a bit of BGP complexity involved. So, you need to handle all the announcements and withdrawals accordingly. Also, we have some problems of integrating this whole solution with our existing configuration proxy, so we do not configure the boxes directly but we have some intermediate system that does that for us and that was not really up to, like, processing configuration changes with the high frequency as we needed for this mechanism. So we need to work around that. And the third thing is actually one of the FlowSpec. We would have liked to use FlowSpec but currently our route servers are not really up to the task, so essentially the software stack does not support it at the moment. But this is something that we definitely have in mind and may look at in the future.
So, back to the issue of scaleability. So, does it scale? So, essentially we have two dimensions of scaleability there. We have scaleability with respect to the number of filters and IXP ports and the other dimension is scaleability with respect to configuration update frequency limits. So how often can you change stuff on the box? And how much can you push on the box? So regarding the scaleability with respect to the number of filters, what essentially binds you there is the size of the TCAM because you need to match header fields and drop them accordingly and there is also a number of system limits and port limits like total number of maximum filters to port. I showed some more details on the next slide.
And with respect to the configuration update frequency limits, we allow of to 6.33 filter updates per second. That means that if you take our current blackholing data and simulate it, we could process 70% of the current blackholing updates below 1 second. So it's comparable from the speed that it provides.
So, in order to measure the amount of hardware resources that you have for this on the boxes, we did some stress test on the IXP hardware. So we generated a lot of configuration, so each of these fields that you see here in this plot is one configuration that we tested and for each of these configurations we set an increasing number of Layer3 and Layer4 filtering criteria. This is what you see on the X axis here, and on the Y axis we set an increasing number of Mac filter criteria because we also wanted to be able to filter traffic from certain Macs on the peering LAN. And you see here this plot here is for 20% of the IXP member ASes using the service. So, everything is fine here. So then we increased the number of IXP member ASes that use the service to 60% and you see that the number of possibilities you have there starts to shrink and finally if you assume that a hundred percent of the IXP members use this service, you see that you are left with this small window on the bottom left and this is essentially what defines the configuration limits that we apply to the platform today.
So the axis are up ‑‑ so I cannot tell you what N is here, but essentially the message here is, we have a lot of head room for that, fortunately.
We repeated our measurement experiment and tried to show how effective it is with the new mechanism. So, essentially it's the same setup as before, our own AS with multilateral peering and NDP DDoS attack for ten minutes to a /32. What we first did after 300 seconds, we announced a shaping community and you see here that the traffic dropped from roughly 1 gigabit to 200 M bits. We still received traffic from all the peers, that's fine, but it's changed. And afterwards we announced a dropping community and you see that the traffic drops to nearly zero and we received some up traffic from our peers but also the peers that are sending traffic are nearly dropping to 0. So that works very well actually.
So, let me sum up. A number of DDoS mitigations solutions exist. We just discussed them, but they have different drop outs and some are with respect to performance, with respect to features, with respect to cost.
We identified and measured blackholing limitations of, limitations of the blackholing mechanisms at IXPs.
So, we proposed advanced blackholing which combines the benefits and overcomes problems of today's DDoS defence.
We implemented a new system with a BGP and API interface, and we evaluated and proved scaleability of the system.
So, that's all from my side. If you're having any questions, I am happy to take them.
BRIAN NISBET: Thank you very much. So, this is the experiment. As we said we are using Slido for this. One thing that I forgot to mention earlier, so Slido gets priority over the mics, certainly for this session. If you are putting a question in, please put a name and affiliation into the question, we're going to give the people who put it for this one a pass, but for the rest of the session, please put your name and affiliation in.
So we have a few questions for you.
So the first one from Johanas: Why do so many ISP accept blackholes; is it for technical reasons or do they fear abuse of blackholing?
MATTHIAS WICHTLHUBER: That has a lot different reasons. I can only speculate on most of them. One thing is we had customers who had a network engineer ten years ago who configured their box and it was never touched afterwards. The other case is simply policies, so, even IXP chooses to not announce, accept something for specific than a /24, that's the way it is. However, 99 percent of our blackholes are /32s. So... and probably a lot of other reasons that I don't know.
BRIAN NISBET: Okay. And from someone with no name. Like I said, from now on please name and affiliation: Did you observe issues with blackholing and RPKI max length in practice?
MATTHIAS WICHTLHUBER: Yes, we had some issues there, the problem with RPKI in particular is that people define their ROAs with a maximum prefix size. So if you want to announce something more specific than what is defined in the ROA, you need to define some exception, and that's what we did. So you have like discussed a lot of community how to do that and we came up with some nice solutions for that. You can look it the at the DE‑CIX website if you are interested.
BRIAN NISBET: Rudiger Volk, Deutsche Telekom ‑‑ Requirements do not list security of the mechanism and authorisation of blackholing request, do you care about those and how?
MATTHIAS WICHTLHUBER: Okay, so essentially, it has the same security as the normal blackholing mechanism, so what we are doing at the route server is, we are doing IRR checks and we are doing RPKI checks, we immediately inherit that for our mechanism. But, yeah, I mean ‑‑ yeah, it's not more or less secure than normal blackholing.
BRIAN NISBET: Okay. Another person with no name. Is this already an available service for DECIX customers?
MATTHIAS WICHTLHUBER: It is available as a closed better service but not as an open service yet. So we are still in the testing phase, we are not yet sure when we will be able to offer it for our customers.
BRIAN NISBET: Okay. Brian Dixon asks: What about auto blocking or auto depref triggering of spoof traffic based on RPKI or RADB, AS cone, no valid reason for spoof traffic?
MATTHIAS WICHTLHUBER: Okay, spoof traffic is a little difficult. I mean, what we're doing here is we are allowing customers to block traffic that is designed for them, so theatre destination. That is not true for a spoof packet, so essentially, for us it's third party traffic and we do not want to touch that.
BRIAN NISBET: Okay. Nicolai Leymann, Deutsche Telekom. Do you have any mechanism to check that a blackholing request is valid?
MATTHIAS WICHTLHUBER: As I said, we have IRR checks, we have RPKI.
BRIAN NISBET: I think for our last question maybe, but Gunner: What about random destination attacks, all packet fields random but attack directed at your AS?
MATTHIAS WICHTLHUBER: Okay. Then you can still like ‑‑ you can do everything that you could do with a normal blackholing mechanism just that it works, so you could still take out all the traffic for that specific destination.
BRIAN NISBET: And that is that. That's our time. So ‑‑ sorry, as I said, Slido gets priority and we're going to see how this works, feedback and always, and thank you very much.
MATTHIAS WICHTLHUBER: If you still have questions, you can come to me off line. Thank you.
BRIAN NISBET: So, onwards. And as I said, from now on please do put your name and affiliation into your question or into the entry.
So now we have our next speaker, with a first joint look at DoS attacks and BGP blackholing in the wild.
MATTIJS JONKER: So, hello. Thanks for that introduction. I am a Ph.D. student at the University of Twente in the Netherlands. Welcome to my talk. This is joint work done with these people. And I'm going to tell you a little bit about DoS attacks and blackholing and how these combine in the wild.
So, in the preceding presentation, there was already a good set of context for this session, because it's about DDoS, but let me just briefly recap that DoS attacks are conceptually simple yet very effective class of attacks that have gained a lot of traction over the last years, starting roughly in 2010 with the WikiLeaks attacks on to Spamhaus and onwards, and the bad thing about these attacks is nowadays they are also offered as a service by so‑called booters, which was also mentioned in the previous presentation. So these are, these allow people with any technical skills to login on a fancy interface on the web, you pay for a subscription plan in bitcoin or whatnot and then you are able to stress test your own network, but there is no validation on whether or not it's actually your own network you are targeting so you can abuse these services to toss others off the Internet. Then there are a couple of well known recent events that stress how severe these attacks can be, for example the attack on Dyn.
What can be said is that DoS has become one of the biggest threats to the Internet stability and reliability over the last years. Now, in comes BGP blackholing. Again, this feels a little bit redundant, but it's a technique that can be used to mitigate denial of service attacks. And it leverages the BGP control plane to drop network traffic. So, prefix announcements, they are tagged with a community and then there are a set of communities that are agreed upon to signal blackholing requests. So if you are suffering a DoS attack that cannot handle in your own network any more, you send up an announcement upstream or to ‑‑ within the IXP, you tag it with a certain value, 666 is a very commonly used value, and then hopefully your upstream provider or peer or whatnot will start dropping all the traffic destined towards that prefix.
And this is a very course grained technique and approach because you either drop everything or you let things through, like there is no middle way, as was already explained.
So, there is an excellent work and literature on blackholing. There is a paper in IMC 2017 that looked at blackholing, but one of the observations that we made at the time is that, like, the combination of DoS attacks and blackholing hasn't been widely studied yet. And one of the things that we wonder is whether or not blackholing, which is such a course grain technique, is only used in the most extreme cases of attacks. And then we observed that there is no clear understanding of how DoS attacks ‑‑ of how blackholing is used in practice when DoS attacks occur. So there was like this niche that we dove into. That's for the first half of this presentation is what these results are about. So we take longitudinal dataset on three years on DoS attacks and blackholing and we look at how these two combine with each other to confer some operational practices.
So, that's for the first part of the presentation.
There is going to be three datasets that I need to explain to you, two are on DoS attacks and one is on blackholing. So let me start with the first dataset on DoS attacks. These are attacks inferred from a large DarkNet, this is a /8 IPv4 network operated by UC San Diego, which captures backscatter from denial of service attacks in which the source IP address is spoofed randomly. Then we use a classification methodology by Moore et al, which is sort of like a dated methodology to infer attacks from packets that come into the telescope.
And then the second dataset of DoS attacks are from honeypot, which is the project from Christian Rossow. What these honeypots do is they mimic open reflectors, so they mimic NDP servers that are open and can be used in reflection attacks, and these honeypots try to be really appealing to attackers by offering large amplification, so, you send a 40‑byte request and then you get many more bytes back, so this is appealing because it allows you to magnify your traffic quite a bit.
Within the dataset we use logs from 24 of these honeypot instances that are distributed over the world, geographically and also logically among operators. So that's it for the two DoS datasets and I'll show you later on in schematic how we measure these.
The third dataset is on blackholing events. So, for this we scanned BGP collector data from the RIPE RIS project and from University of Oregon's route view project and we looked for prefix announcements in this data that are tagged with the blackholing community. And we used this framework to do the data analysis. Then we matched the values that we see in the communities against a dictionary of known blackholing communities to infer whether or not the blackholing activity, and the dictionary that we use is from the 2017 paper in IMC that I mentioned previously.
So to give you sort of an impression of how these three dataset sources relate to each other, I have made this schematic for you. On the top left here ISD attacking infrastructure, so you can think of a botnet or a compromised host or whatnot. But this is the malicious infrastructure. Then on the very right‑hand side we have the intended victim of the DDoS attack, who have their ISP, the victim AS and then there is this interconnecting link to the upstream provider. The first attack I want to show you is a randomly spoofed attack. So in this example it's a SYN attack. The infrastructure will send packets towards the victim address and it will randomly spoof the source address in these packets to any will value. Then these packets to the provider AS to the interconnecting link, the victim AS and then finally they reach the victim. As long as the victim is still capable of responding to these packets, because these are TCP SYN packets they will send like the second part of the TCP handshake, but instead of sending this handshake to the attacking infrastructure it will send it to the spoofed source address. So, if the source address is within the address space of the /8 DarkNet we actually see those packets coming in there. So that's how backscatter works. And then using the methodology that I pointed out earlier we can infer attack in the network telescope.
The second source on DDoS attacks are the on port data. So in this case, the attacking infrastructure will send faked, for example, DNS requests to reflectors and among these reflectors are these honeypots. Within these requests, the source address will be spoofed to that of the intended victim because that's where the reflected packet should go to. The reflector will process the request, send an answer and as I said, send it towards the victim address, which was spoofed in this packet here. The request or the answer will traverse to provider AS, the interconnecting link and so on and it will end up at the victim. So this is the reflection attack.
And this is it combined. An important note here is that these two raw data sources on DoS attacks are complement to each other, they do not capture the same thing. The first one captured randomly spoofed attacks and the second one reflection, which has specific spoofing.
Now, let's just say that this traffic is overwhelming for the victim AS and they decide to deploy blackholing to try and stop this attack. So they will send up a prefix announcement for this /32 that corresponds to the victim address and they'll tag this request with the blackholing community. And then assuming that the provider AS is peering at some point or one of its peers is peering with a BGP collector and assuming that the announcement propagates as far as the collector, we can actually infer the blackholing activity on this collector here. So this is how the three data sources play.
So, we put this data together. So we have this large set of attacks for three years, which is roughly 28 million attacks, and then we have 1.3 million, give or take, blackholing events. And then we put these two together. And then one of the first things we see is that if we look at the time at which the attack starts and then at the time at which we observe the blackholing in BGP data, we actually find that more than half of all the attacks are mitigated within a matter of minutes, and 44.4% actually within one minute or so. And only a very small fraction of the attacks takes longer than six hours to mitigate. So it takes longer than six hours before we see the blackholing activated. So, with what these numbers suggest to us is that there is automated mitigation which is very rapid.
What we also did is we looked at how long do blackholing ‑‑ blackholes endure after the attack is ended. So we look at the registered end of an attack in our data and then we look at the BGP data and see like how long does it take before this prefix is either announced without the blackholing community or before it's explicitly withdrawn and then we find that roughly three quarters of all the blackholed attacks see the blackhole withdrawn within three hours following the end of an attack. However, for almost 4%, it actually takes longer than 24 hours before the blackholing is withdrawn. So, what this tells us is that the side effects of the mitigation technique, because it's like either on or off, drop all your traffic, lasts well I don't know the end of the actual attack.
What we also did is we looked at the intensity of attacks and then see which ‑‑ how intense attacks need to be before they are blackholed. So, this is a CDF. The black top line is the intensity of all the attacks in our dataset, actually all the randomly spoofed attacks in our dataset and then the grey line is that of all the attacks for which we find corresponding blackholing in the BGP data. And what we see is that around two thirds of all the blackholed attacks have an intensity of up to 300 megabits per second. So this is inferred from the number of packets that we see coming back to the telescope per second.
Whereas about 90% of all the attacks have that intensity. So what this ‑‑ this confirms the intuition that it's more likely that an attack is stronger before blackholing is employed.
However, if we look at, like, more towards the left‑hand side of the CDF, we actually see that for 13.1% of all the blackholed attacks, we see a rather low intensity corresponding to about 3 megabits per second of traffic volume. So what this shows us is that operators actually mitigate through blackholing rather mild attacks that might just be nuisance. And we repeated this analysis also for the reflection attacks in the dataset, but if you want to see that you have to look at the paper because I didn't include those in the slide here.
What these results also confirm is that this Moore et al methodology which dates back more than ten years, works, because people might think like okay, you use very conservative thresholds and then what's considered like an attack ten years ago might just be noise now but we actually see that what we infer to be attacks is linked to mitigation. So it confirms that there are attacks.
And it also corroborates our previous findings in which we found a significant number of attacks daily using the same methodology.
What we also did, we looked at the attacks that we do not see. So if we take the 1.3 million, give or take, blackholing events and then we look at all of them for which we see a preceding attack and we actually find that for 27.8% of all these events, only for those we find an attacks, so this is nearly three quarters of events for which we do not find a corresponding attack and assuming that blackholing is mostly used for, to mitigate DoS, there is actually a large part that we're not seeing, and we're also not able to infer, like, what other attacks are hiding in this 72% or so.
But what these results do highlight is that randomly and uniformly spoofed attacks and reflection attacks, together, form a great share of the number of attacks that operators have had to will deal with.
So, the second part of my presentation is about service collateral. So as Matthias mentioned in his previous presentation, if you enable blackholing, there is going to be things listening on the network that can no longer be reached so you are going to be, you know, you are not going to be letting through legitimate traffic. But rather, looking at this from like a traffic ratio point of view I'm actually going to look at it from a service point of view. So which services do we see hosted within these prefixes that are blackholed?
So, to do this I need to introduce two additional datasets. The one is, the first is a large scale dataset of DNS measurements, it's from the there is going to be a presentation later in the project. Essentially it provides mappings between IPv4 addresses and websites based on the top to top label in the DNS, mail exchangers based on MX records and authoritative name servers based on NS records. And then we use data for the three legacy which represent roughly 50% of the global DNS name space.
Now, if we look at the number of blackholed prefixes in which we find websites, mail servers or name servers, that's tabulated in this table here. So for a little under 10% of all the blackholed prefixes over this three year period, we find that they link to one or more websites. Overall, this involves 782,000 websites. And one of the things we also looked at is, which of those websites do not have alternative IPv4 addresses, at least based on our measurements. And we find that for a large fraction of all these websites, that is actually the case. I should say that this is from a single vantage point, so, results are super conclusive.
And then for mail experiences we find, give or take, 177,000 names which, at the time of blackholing, do not have alternative hosting, and name servers and so on.
Second dataset that I'd like to introduce to you are reactive measurement. So we augmented this study on blackholing with reactive measurements. For this we used the RIPE Atlas platform and again BGP stream. So, what we did is, when we see that a prefix is blackholed, specifically a /32, we launch a reactive measurement, and then when we observe that the blackholing is withdrawn or the prefix reannounced without the blackholing community set, we send out another measurement. So, we use RIPE Atlas to send traceroutes to these /32s and for this we tried to select two probes in peer, customer and provider networks, so two probes in each, and we also scan a handful of IANA assigned ports, specifically for web, mail and DNS, so think about 25, 80, 443 and so on. And this is done from a single vantage point.
So, some methodological notes. Why do we infer that the blackholing is effective or not? So for the port probes, if we find that a port is open only for the deactivation measurement, so the measurement that follows the withdrawal of the blackhole, we infer efficacy. However, if we find that the port is open on activation, we infer the opposite. And then in all the other cases cannot conclude anything.
For traceroutes, so RIPE Atlas results will set a last hop as destination in the results. If we see strictly on the deactivation hatch we will infer efficacy of the blackholing, and if we see it on blackholing as well we infer inefficacy. Here are the results for the port probe inferences. So the way to read this table is that we got roughly 2,900 usable measurements for prefixes that host ‑‑ no, sorry, for the web ports, which means ‑‑ usable in this case means we got result on the activating edge and the deactivating edge. Then for about 7% of those we infer inefficacy, because we see a port open state on both the activating flanks, shortly after blackholing is announced and after it is with withdrawn. However, for 92.64%, we see that the ‑‑ we infer that the blackholing is effective to some extent.
And then only a very small number of cases we only see that there is a port open on the activating edge, which supports the chosen methodology.
Similarly, for traceroutes, we infer again efficacy if we find that the last hop responded after the blackholing is withdrawn. And if it responded shortly after blackholing, then we infer inefficacy. Comparing these numbers here, we have, from within peer networks, we can infer efficacy for about 30% of all of them and efficacy for about 8% and we find small numbers of cases in which one of the probes gives us one result, one of the Atlas probes and the other results gives us a conflicting result so that's this column here.
So while these percentages are low, we infer efficacy and well significantly more often than inefficacy, but unfortunately our coverage here is very limited because in many cases we see that the last hop doesn't respond.
Now linking this back to the DNS data for the prefixes for which we corroborate efficacy, we find that there are about, and this is for a month period, by the way, this reactive measurements, I forgot to mention that, we find about 31,000 websites that are cut off all together based on our measurements. Similarly, we find 323 name servers that are cut off. And then also if we look at the number of affected names, so the number of affected domain names, we see that the mail server for 700 domain names is cut off, at least for part of the Internet based on these measurements.
But there is like a footnote here is that mail transfers agents will typically try to send mail again so this might simply incur a delay rather than constitute a self‑inflicted DoS, and then there might also be cache mechanisms that will actually partially mitigate the DNS issues.
So, to conclude my presentation. We started by addressing a lack of understanding in how blackholing is applied when attacks occur. We wonder if it's only used in the most extreme cases, and although our study only provides first insights, we do show that there is rapid and automated mitigation, and we see that there are excessive retention time so it takes really long for blackholing to be withdrawn, which suggests that manual intervention is needed there. And then we also see evidence that less intense attacks are also mitigated. Then if we look at the second part where we augmented a study with complement measurements, we find that in a small number of cases we were able to corroborate efficacy or inefficacy, but unfortunately, our coverage is rather limited and this has something to do with observation delays in BGP data, firewalls that don't allow ops to respond in the cases we had opened, they do to support the measurements. So, a last note on future works. Sowe only see about one quarter of attacks matching to blackholing activity, so this is something we would like to look further into, and consider other attack data. For example, targets derived from C and C traffic or other means to infer attacks. And then we'd also like to improve on the reactive measurements, for example by looking at the path, at the response attributes of the last hop.
And that's it for my presentation.
BRIAN NISBET: So, again, questions if we have, we have one question at the moment. Please again with this, whatever we end up doing, assume that you have come to the mic and if your name and attribution, what you would give there. But Mem from Netnod asks, you found a lot of web DNS, etc., without alt‑adr address. Can that be due to Anycasted services?
MATTIJS JONKER: That is a question, right, of the CloudFlare presentation. I don't know. I haven't thought about it.
BRIAN NISBET: Okay. Johannes Deger from Ulm University says: You mentioned in the aftermath of a blackholing event some blackhole announcements are not withdrawn. Have you noticed blackhole routes that have never been withdrawn?
MATTIJS JONKER: No. No ‑‑ okay, no I'll try to be longer in that answer. I have seen blackholing events in the blackholing dataset that are rather long that live on for weeks, whatnot. I have also looked at these from the point of view whether or not they could be censorship‑related or not but for the blackholing events that we match up with DoS activities, so in this study most of them are, or all of them are withdrawn at one point or another, but again for very small percentage of them it takes longer than a day for them to be withdrawn.
BRIAN NISBET: So, Pierre‑Yves Maunier from Acorus Networks asks, he said: You said that less than 1% of the attacks that the realtime blackhole was activated for six hours, could this be due to the fact that the a majority of the attacks last less than six hours?
MATTIJS JONKER: Could be, yeah. So, I don't have the numbers on attack duration fresh in mind because I did do that in an earlier study, yeah, it could be ‑‑ so we did not consider basically all of the attacks that are blackholed are considered in a CDF, so if there is an attack that is stopped after an hour already it would still count towards the CDF, yeah.
BRIAN NISBET: And do you have any data about the attacks duration?
MATTIJS JONKER: Yes, so that's in another ‑‑ that's in a 2017 IMC paper on DoS attacks.
BRIAN NISBET: So, Emile Aben from the RIPE NCC. What type of traceroute did you use? Your scanning results suggest TCP traceroute might get you more responding targets than UDP or ICMP.
MATTIJS JONKER: So I used UDP traceroutes with Paris ID set to 1.
BRIAN NISBET: And Brian Dixon with with no attribution or affiliation even, did you do any outreach to blackhole announcing parties to confirm that your inferred or otherwise get more details than the actions on attacks?
MATTIJS JONKER: No. No.
BRIAN NISBET: Okay. Maybe something to consider. As I said, this gets priority. So, Edvinas Kairys from Adform: Do things like SYN proxy still work nowadays as a current DDoS counter measure?
MATTIJS JONKER: I don't know. I feel like I'm being tested ‑‑
BRIAN NISBET: You need to get a big black chair up here, or something. We have a couple of more minutes and a couple of more questions. Johannes Deger from Ulm University: There is the rumour that much of DDoS traffic comes from very specific locations have you looked at geoinformation of DDoS traffic and/or ASes?
MATTIJS JONKER: Yeah, so in the other paper that I mentioned, the 2017 one, we looked at the geolocation of targets, yeah, and sources.
BRIAN NISBET: That's another paper that people can look at.
So, any last questions folks? In which case, thank you very much.
So, for our last presentation this morning, which for fun and games also includes, I believe, a live demo of some kind so you can all wait for that with bated breath. So as, this is Koen van Hove also from the University of Twente, and they'll be speaking about DDoS clearing house, so solving DDoS attacks in the Netherlands, Europe and beyond by facilitating bridging solutions and stakeholders.
KOEN VAN HOVE: This is going to be, or going to include a live demonstration which is going to be awesome. But one of the drawbacks of that is that you have to wait a small amount of time before I can start it. So, please take this time for in case it does go wrong, look where your nearest exit is...
So, like thank you for the introduction. It is indeed what I'm going to talk about. It is about solving DDoS attacks in the Netherlands, Europe, and beyond. Facilitating bridging solutions and stakeholders and DDoS clearing house. Okay, we are going to go forth the longest title competition. I think we won. But we are just going to call it solving DDoS attacks because that's easier.
My name is Koen van Hove. I don't expect you to be able to pronounce that, just make something up and I'm fine with it. I am a researcher at the University of Twente, like the last speaker. I will quickly introduce what the problem is and what we are as an ID. Just by show of hands, is there anyone from these companies here in the room? Anyone? I see one hand. Okay. At least I'm not doing this presentation for nothing.
Is there anyone from academia that is also doing research in DDoS attacks? I know that there are a couple, so please show your hands, it's not just me. I see the previous speaker. Okay. And one more. Okay recollect that's not a lot. Then I think I'm wrong here then. Okay. I am a person from academia and there are as you can see, over 40,000 results when it comes for looking for DDoS attacks. Now, we have DDoS protection services that have provide the infrastructure for mitigating DDoS attacks. So, we have the research, we have the companies that can provide the infrastructure, then we start wondering why do DDoS attacks still exist? And the key to that problem is actually that currently there are, there is a gap between the DDoS protection services and academia. And what we are going to do or trying to do is we are trying to bridge that gap between academia and the protection services.
Now, they are the key to solving the problem. But they are of course not the only ones that we care about when it comes to solving DDoS attacks. There are in total five. The first are the victims of course, the ones that are suffering the attack because they want to make sure that their service is available again. Then there is the DDoS protection providers. Then there are the network operators and the CERT and CSIRT teams. We are law enforcement, because law enforcement also wants to know who is executing the DDoS attack because they would like to pay them a visit. And then of course academia because academia likes to research the nature of DDoS attacks and hopefully find a way to mitigate them.
Now, we want to make sure that we meet the needs of all these five stakeholders, and we do that, we have DDoS clearing house. This is quite literally a house and it consists of five components. There are four technical components and one non‑technical component, which is might be the most important part. And the technical potent, that's this one, it's cooperation. Because, if all the stakeholders try to solve DDoS attacks themselves, that's not going to work, we are going to run around in circles, reinvent the wheel five times and still end up with a square one.
Let's also consider the technical components. The first technical potent is the network measurement. You need some knowledge of what is happening on your network in order to detect where your DDoS attack is coming from, what it is. Basically this is your typical PCAP or your NetFlow or whatever you use, the more fine‑grained, the more details your network measurement has, the more fine‑grained your mitigation method can be.
Then there is something that we call the DDoS dissector. This is, it takes as an input whatever network measurement you have, and what it outputs is what we call a fingerprint, and fingerprint is actually just a way of summarise the characteristics of a DDoS attack, so, what you would normally do as a network operator is you would look at okay, we are currently under attack because we notice that our services are responding very slowly or are down and we now want to find out where is that attack, what can we do about that attack? What you do is normally what looks off, what looks strange, what don't we expect to see here? And that's basically what we do in a DDoS desector, and what outputs is just the summary of what that normally that network operator would conclude himself. Also in it outputs the filtered and anonymised network measurements that you would normally see ‑‑ that you would get from your network measurement. Basically what we want to do is, we want ‑‑ it wants ‑‑ you have your network measurements that subjects and it says hey it looks like there is a, you are under a DNS attack, then we want to filter that original input only for that attack and then anonymise it so that other people can also use information about that attack to help their summaries and their protection and their attribution and their solutions. Then there are the fingerprint converters, this is basically is takes the fingerprint that you generated from the DDoS desector and it generates it into something your hardware can use, because we have heard from two talks before this that a lot of different hardware uses use as lot of different formats and what we want to do is once you have created your fingerprint you can use it in a way. So your fingerprint is just the generic way to display or to, generic format and your fingerprint converter can convert it into a format that you want to use. Then there is the DDoS dB, which is the centralised or semi‑centralised storage and which stores and distributes that related info. It stores your summaries and your anonymised datasets which can then be used by others to help their mitigation or their research or whatever they do.
Like I said, cooperation is the most important part of this thing, because all these five stakeholders, all of them have one goal and that is they want to prevent DDoS attacks. However, they of course also have very ‑‑ they also have their own goals, but that's something that they share, at least that's something that they share. So, based on that, we think that we can, well not necessarily make them collaborate, but at least we can think that there is a basis for collaboration can be found.
Well, let me just give you an example of what I just explained in the boring slides talk. Let's say that we have a DDoS attack incoming. We have, let's say that there is a UDP amplification attack, it is, we see that we have a lot of UDP traffic incoming from port 53 and from a certain set of addresses. Well, our Cloud here that is running the DDoS desector notices that and we, it generates a summary that says, okay, from these IPs we note that there is a DNS attack, that's what's happening and it uploads it to the DDoS dB. From the DDoS dB, other people can generate their own fingerprints, or not the fingerprints, they can generate their own input for their own devices, firewalls, etc., so that they can then mitigate ‑‑ mitigate if it comes in for them. So, if now the same attack comes in at, in this case, grey, the grey path, then we can see that the DDoS attack is fended off because he already knows about the attack. And that means that the victims are now ‑‑ that the victims now no longer suffer the attack. The DDoS protect provider is happy because the attack is stopped and the network operator is also, because he can now rest again. There are still three‑ish, two and a half persons that need to be satisfied, that is the CERT and CSIRT teams, that is the law enforcement and there is academia.
Well, the CERT teams is very simple because it is uploaded to DDoS dB, so, they can find out if there is now coming something from the network. So, if I notice, hey, this is a DDoS attack and we notice that the sort of the DDoS attack is our own network, maybe let's fix it. So now they have knowledge that they are actually their systems are abused of their four attacks so that solves the problems that the CERT teams have.
Then there is law enforcement, well law enforcement, because we have now a dataset of all the DDoS attacks they can start to correlate the different attacks, they say oh hey we have one attack coming in and here and one attack coming in there and they look very similar. We know that this one is probably coming from somewhere around this area and, oh, we know that this person, we notice that he made a bit more error and we now know his last name, hey we finally found out who it is. So that's law enforcement. And then there is academia. Well because we have the anonymised input data, academia can actually replay the attacks on their own testing environments and then test how well their mitigation solutions work. So that's academia checked off.
Now, there is one extra element, that's something that is currently being drafted and that is the DDoS open threat signalling by the Internet engineer task force. Basically what it is is that the computer can demeanor or a system can, once it is under attack, can signal to the upstream, hey, I'm under attack. And that's also ties into the DDoS dB because it wants once the DDoS dB knows what the attack is about the upstream provider can also block that attack coming, so that the person that is suffering the attack is no longer ‑‑ will be available again.
Now, this is the very exciting maybe part that is going to be the demonstration, because what I showed you is not just fairy, it's actually working and we have already tested this at certain parties. Now, I will first do a quick demonstration of the DDoS desector. What we have here is we have input an attack ‑‑ this is the funny part about demonstrations, it always goes wrong when you need them.
Like I was saying, this is just the DoS desector, we have input what is a test, what was a test setup by the university, which is a DNS attack, I'm just going to make one, and then we go quickly switch over to the demonstration of the DDoS dB because this is going to take a few minutes. DDoS dB is actually already a thing. What it is is basically a very large database where people share their attacks, and what we can see here is our two ‑‑ I searched for DNS based attacks and what we can see here are two DNS based attacks where you can see well this is, this was a DNS where they were querying in the first case this one and they were saying give me severing. The destination was basically random. Their fault they are used for the input was the PCAP and the source IPs were these IPs and at the time of the attack, these were the autonomous system numbers and their country codes, which ‑‑ and then this is apparently a phone number.
Let's see how far you have come so far. Okay, you are still running...
This system, I mean you can see that I am slightly cheating here because this is running on local host but this system is also currently running on DDoS dB.org. The system is currently invitation only. However, you can request an invitation yourself, which we will then most like lie for this audience will be granted, because we are currently testing out how, what the best way to make this collaboration possible.
Things are always slower when you try to do them live. It's the tedious waiting or the anticipation of things to happen.
Now, well in this case I can also cheat because the logs are already available ‑‑ you see, I mean I already told you that you had to look ‑‑ I sincerely hope that you do now. Here is basically the log of what we, what the DDoS desector did. It effectively, it looks at basically what the top, the most important IP address is. Then it, based on ‑‑ it looks, okay, this is the most preference IP address which gets the most traffic. Then we look, oh, hey, what's the IP particle, which in this case is 17, which is UDP. What is the sort port? 99% are from port 53. Hey surprise, most of them are directed to the web server which is port 80. We can then see what the DNS servers were used etc.
So that's, that was the demo part.
In 2017 we started ‑‑ this project was started by the University of Twente. In 2018 we expanded to the Dutch government KPN and the ING bank and a couple of others that may or may not want to be named so they are not listed here. And in 2019, we hope to expand that to more European partners. Everything that you have seen so far you can find, is Open Source and you can find online on the GitHub page, which is the github.com/ddos/clearing‑house. You can see that here. You can also find except for DDoS dBs, which is local to a certain part in the Netherlands which is white‑listed only, so the website won't work for you, because that's also something that we have been thinking about or working, are currently working on and that's one of the challenges in the future directions.
We don't expect one centralised DDoS dB, the reason being that it would be kind of, well ‑‑ I mean, you are making yourself rather vulnerable if you are one centralised place to actually put all DDoS attacks are localised, reason being that I hope that mostly good guys are watching here but I don't expect that there are no bad guys watching either. We have ‑‑ this is one of the possible structures that we could have that certain systems communicate with one DDoS dB and they don't share everything to each other. However, what could also be the case is that we have basically one where everyone shares and then a few smaller ones that have their own DDoS dB instance where they only share with a small select group.
So that concludes my talk. If you want to reach me, you can ask questions now or if you want to ask them later, you can find me, or that's the e‑mail address.
BRIAN NISBET: Let's see what this has in here? There are ‑‑ do we have any questions? We do not have any questions from here. People have just understood everything you have said. So we do have time for questions if anybody has any. I mean, as we said, it gets priority, but the mics are still there, they are live. Oh, we have a question. Awesome.
SPEAKER: Nicolai Leymann from Deutsche Telekom. You mentioned dots. Is the system based on dots or where does it fit into the picture?
KOEN VAN HOVE: The DDoS protector that I showed, our goal is that we automate a lot of things in the process of basically analysing the DDoS attack itself, so, what we tried to do, what dots will bring into the picture is that you no longer have to contact your upstream provider that you actually do that automatically, then you have that small window where you are under attack and your service is not available, you can try to minimise that.
BRIAN NISBET: Okay. So, no more questions. So ‑‑
AUDIENCE SPEAKER: Can I ask a question? Can you talk a little bit ‑‑ Alissa Cooper, Cisco. Can you talk a little bit about how you anonymise the data?
KOEN VAN HOVE: What we noticed, that the person that's delivering the data, that they don't want everything that is associated to them or that be attributed to where we know the data coming from, everything related to that is that what we remove. So that is the destination IP, the ‑‑ every Mac address that's found in the ‑‑ basically, everything that we don't need for the analysis.
BRIAN NISBET: Okay. Thank you very much.
So, just before you all rush off to get coffee and T‑shirts, thank you very much, please remember to rate the talks, please give us feedback either in person or via pc [at] ripe [dot] net or other methods on the use of Slido this morning, we'd really love to get some feedback on that and how useful it was and how to go forward from here, and please remember the PC elections if you wish to stand for the PC or nominate a good friend or colleague, please send and e‑mail to pc [at] ripe [dot] net. Thank you all very much.
LIVE CAPTIONING BY
MARY McKEON, RMR, CRR, CBC