21 May 2019
JAN ZORZ: Okay, can everybody find a seat, please. We will start the session. Hello. There is more people coming in. Welcome back to 11 a.m. session. Please everybody find a seat. Our next presenter is David Huberman from ICANN, and he wants to revisit the root. So, let's see what David has to say about this. The stage is yours, thank you.
DAVID HUBERMAN: Hello, good morning everybody. I work with ICANN. And today I would like to talk a little bit about the root servers, about some lesser known history and a little bit about the evolution of root service.
The DNS specification was first published in November 1983 when Paul published RFC 882. The hierarchy that Paul envisioned required a server to function as the root of the tree, and so in 1984 Paul set up the first root server at the University of southern California institute ‑‑ information sciences institute, ISI. In 1985, we expanded from one root server to four, there were two root servers located in California at ISI, and there were two root servers on the US east coast, one at SRI in Washington DC, and another at the army's ballistic laboratory in Maryland. And by 1987 we were up to seven root servers, but all of these organisations and all of these root servers were located in the continental United States. And it wasn't until July 28th 1991 when we finally got some geographic diversity as the first root node was established outside the United States at Nordic university network in Sweden. NORDUnet in the world, supporting TCP IT and DECNET protocols, and importantly, the staff operating NORDUnet had quite a lot of DNS operating experience stemming from their work operating the national TLD of Sweden.se.
From September 1991 until the beginning of 1993, everything was housed inside the United States Department of Defence Nick. In the beginning John Postel and Mark transferred all the non‑military data out of the DOD nic and into the internic. Internic took over what is now known as A‑root. Later in the summer of 1993 he expressed his interest in running a root server given all of Paul's work on BIND. John said yes, and ISC was assigned the ninth root server.
When a DNS resolver boots up, it initialises its cache with a priming query, the priming query ‑‑ response, the resolver receives includes both a current NS C record set for the root zone and the IP address information for reaching the root servers. The root server names were not standardised, however, and in their state in 1985, the response to the priming query was maxed out, so to fix this, Bill manning, Paul Vixie and Mark Kosters applied label compression to the root server system. Mark came up with the name root server ‑ servers .net and registered it and signed out the letters to each of the root server operators. And so when label compression applied there was now room in the priming response for four more root servers.
And in 1997 we reached 13 root servers, and 22 years later in 2019 that is where we are.
So, today we have 13 root server labels, A through M, each having an IPv4 and IPv6 address and the 13 root server labels with operated by 12 different root server operators. The root servers are assigned to roughly around 1,100 instances or so around the world though it's interesting to note there are not 1,100 root servers, there can be many servers inside of an instance. And I wanted to take a look at how much traffic root servers were getting these days. So in December of this past year I took a look and it turns out I was able to pretty easily do this thanks to the root server advisory committee, is an advisory committee organised inside of ICANN that allows the 12 root server operators to coordinate on important functions of the of root servers. And what one of those functions is measurement, and a number of years ago they came up with a document ich provides a tandardised set of measurements that they publish every day that we can use to measure performance or things, interesting things about the root servers. So, I did that. And on this day I found there were about 77 billion queries being received at the root servers. And as you can see, the data is pretty clear, the majority of these questions come over UDP and over IPv4 though it is encouraging that note that UDP and IPv6 is actually a pretty good amount. Back in December, there is essentially no TCP traffic, and it would be very interesting to measure over the many coming months and years how that changes with some of the newer protocols.
So, we have 13 root servers, and they are operated by 12 root server operators. Verisign operates A; ISI today continues to operate B, cogent operates C; the University of Maryland operates D; NASA ‑ AMES does E; ISC continue to run F; the United States Department of Defence continues to run two root servers G and H, and I still stays with our friends at Netnod.
After label compression was applied to the root server system in 1995, four more labels, 10th, 11th, 12th and 13th were available to be added. Mark Kosters and John Postel sat down and discussed it and classifying it to the other root server operators as an experiment, John and Mark added four new labels, J, K, L and M. Two labels were set up experimentally at network solutions, and two more labels were set up at ISI, L and M. And the plan was when a new root server operator was ready to start operating a root server, the new labels will be divvied up. So the first new root server operator come along was RIPE, and RIPE was assigned K, which comes out of the network solutions group of two. The next root server operator to show up was Y, out of Japan, and John Postel said to Mark Kosters, hey, what's a root server, let's give them J for Japan and Mark said, "no, no, first off I have already given one of mine, you should give one of yours and the Y root server isn't for Japan, it's for Asia, why don't you give them one of yours? WIDE was run by Jude at the time so John gave him M.
J was still ‑‑ J was with Verisign. L. L was still run by ISI at the time. And it was in the same LAN as B. B was racked and L was on the floor. Officially, L moving over to ICANN was part of ISI support of ICANN's formation. Unofficially, it was probably a good idea to separate these two root servers topologically and it was probably a good idea not to have a root server sitting on the floor, so one day John from ICANN happened to come over and unplugged the and moved it to a data centre and racked it up and moved it into ICANN address space, it's running on ICANN root server. So here we are in RIPE.
So, as of last month, this region Joyce 340 root server instances out of the 1120 that were operating on 20 April. As defined by the RIPE NCC, 55 of the 75 economies in the RIPE NCC region enjoy a root server instance and these instances cover 104 metropolitan areas. So I wanted to find something interesting to say about very good root server coverage here in the NCC service region. Let's start with our local hosts, Reykjavik happens to enjoy five root server nodes, E, F, I, J and K. The RIPE NCC service region has the northern most root server instance globally, in Sweden and right next door across the lake is another instance in Finland. Frankfurt, very unsurprisingly has the most number of radio service serves for any city in the RIPE NCC service region and everyone here knows why.
But, but, Nuuk, the capital of Greenland has the most instances per capital at that. There is one root server instance for the 56,000 residents of Greenland.
So earlier we discussed the story behind J, K, L and M. But John Postel died shortly after M was assigned to WIDE, and with John's death, we were left with no system and no process to add or replace root server operators and so J stayed with Verisign and L stayed with ICANN even though John had wanted to identify other root server operators to assign those labels to.
It's been 20 years since John's death, and up until now we haven't had any process to deal with this but finally, the ICANN's R Sec has worked very hard on this and published a document called RSSAC called 037, it's an attempt to model who issued should govern the root server system and how it should evolve in times of need in the future.
The proposed model has five functions, in addition to a secretariat function and strategy, architecture and policy function, I want to highlight just really very briefly a bit about the designation and removal function. It's design really to provide substantial funding and for operations and importantly, in emerging operations. But it's the designation removal function that underpins this governance model and really allows us to move forward out of 1997.
It's intended of course to establish whenever there is a need for a new root server operator and I am going to stop here and say look it's worth noting we don't see any technical need for any additional root server letters. We have just seen there are thousands of root servers Anycasted across six continents, we have wonderful coverage and with such broad coverage it's difficult to justify creating new letters strictly on a technical basis. The designation removal function says there is a need, we will get some applicants, who wants to be a root server operator? Evaluate them, recommend who should be one based on your valuations, if you ‑‑ we have processes for that too. And of course, let's always talk about to the community about what we are doing and how we can do it better.
So, the proposed governance model is now under consideration by the ICANN board, the ICANN board Commission the ICANN organisation to develop a concept paper as part of its consideration. That is now being published, it's open to a public comments period and there will be further discussion at this in the next ICANN meeting in June.
Okay. Back to the interesting technical bits. In the traditional model this goes from left to right, a TLD needs to make changes to their zone, those changes are accepted by the IANA, the IANA does an edit to the root zone and when it's ready each day it sends the root zone to the root zone maintainer to Verisign, that is a cracked function. Today Verisign is the route server operator and tomorrow it could be anybody. Verisign however is doing it, it's done it for a long time and does a very good job with a highly redundant infrastructure and it has the final edit on the root zone and sends it off to each of the route server operators who distribute it to each of the instances and resolution happens.
Interestingly, though, the really large recursive resolver operators sometimes consider, well, rather than using the root server system for root resolution, maybe it would be better if we ran a local copy of the root zone. If we ran it perhaps in the same machine or at least in the same LAN as our recursive resolver. Steve Crocker named this concept hyperlocal. It's not meant as a replacement of the root serve system it's simply a complement. The IETF over the last few years decide since this exists in the wild it would probably be a good idea if we described what it was and how to do it and so RFC 7706 was published so people who wanted to do it would know how to do it and do it safely. The current Internet draft 7706 BIZ was updated this spring and helpfully, we hope it gives examples of how to set up modern resolvers to use hyperlocal.
For recursive resolver operators, it makes sense, it's local so it's faster, much shorter RTT. From a security standpoint it's local so the root information cannot be misrouted and you are not really going to deal a lot with snooping on the wire because it's right there. It's important to note, though, if you are setting up hyperlocal, you are adding in more fragility than just using traditional root service because you are probably going to set up hyperlocal and have a series of fall backs to use the traditional root serve system should anything go wrong and there is more complex than simply opening up a resolver and pointing it to the root.
So, the root server system works, it's enjoyed a strong, robust and highly redundant infrastructure that has stood the test of time since 1985 but the Internet continues to evolve, both in terms of technologies and usage and as the Internet evolves so too will the system of root service. And ICANN certainly hopes that the implementation of a new governance model should be a valuable and significant contribution to the evolution of the root server system.
And that's all we got.
JAN ZORZ: Thank you. You were quite fast. We have 12 minutes for questions.
DAVID HUBERMANN: Or we can go on to the next presentation and get to the break faster.
JIM REID: Excellent talk on the ancient history on how the DNS or root system got to where we are today. One point of observation though that you were talking about the hyperlocal versus root instances, there is a very, very important benefit of that which is completely overlooked and namely that's the ability to mitigate the impact of DDoS attacks on the root server system infrastructure. If you have got a local instance of the root you can actually sync that traffic locally and it will not crush your backbone links and will not get to the root server instances and take some load off them so maybe copy that into a future version of your slides. I think that's a very important point.
DAVID HUBERMAN: Certainly, certainly agree, but it is worth noting that the root server operators have spent literally decades building in these types of protections, especially some of the root server operators with broader coverage have worked very hard to insulate their root servers from large scale large terabit attacks in the hopes these can stand up and survive.
JIM REID: Yes, I fully appreciate that, I am not trying to denigrate the excellent work that all the operators have been doing for a long time, but we need more defence and depth and this is I think an important element of that. Thank you.
AUDIENCE SPEAKER: Hi David. So, you know this process of adding or detracting new root server operators is very interesting one to certain countries that feel that they somehow have some kind of a political investment in having a root server despite the presence of instances, so, I was in China recently and there was a lot of interest in the prospect of a root server, there is also a very strong feeling that "ICANN doesn't like China" and by which they sometimes mean the US doesn't like China, which is increasingly true, unfortunately. So, how much of a political hassle are ‑‑ is the RSSAC governance process prepared to deal with in creating or assigning new letters?
DAVID HUBERMAN: That sounds like a great question for members of the RSSAC who are in this audience and not for the poor network engineer standing up ‑‑
Kaveh: The proposal only focuses on technical requirements and leaves that open for the process, so the process is a three phased process, at the moment the concept paper is forming a group from larger than ICANN, includes IETF and technical community. That group has to decide on these things. RSSAC is developing metrics and basically requirements, there is a basic requirements non‑technical but again expectations from root server operators, and later on of course in Phase Two when that team is developed that's one of the things that we need to look into. But I just have to add, operating a root server instance so that there are requirements that you have to distribute IANA zone file and that is ‑‑ as soon as you make any changes you will automatically disqualify and there will be processes to monitor that. There is ongoing metrics which is going to define how you are going to measure or immediately or as soon as possible basically take someone off the list if they are providing something other than root zone file.
AUDIENCE SPEAKER: Brian Dixon, GoDaddy, one or two semi‑related things, the AS112 project which is not tied to the root but it's a great place for sync a lot of the garbage queries that would otherwise hit the root.
DAVID HUBERMAN: Yes. Thank you, Brian.
SPEAKER: My name is Ulka and I work for the RIPE NCC, I have a question from a participant, remote. Nick, speaking for himself: Do root server operators hope that HF C7706 reduces the load on them?
DAVID HUBERMAN: I can't speak for root server operators. I am here as a member of this community talking about what I thought was interesting history and some evolution about root servers. The root servers are 12 different organisations who each have their own opinions. It's a good question, nick, but I am going to defer on that, thank you.
JAN ZORZ: Are there any other questions? Actually, I saw the graph there, and what was going on with you with G?
DAVID HUBERMAN: So, G is one of the two root servers operated by the United States department of defence and they have not currently or recently been producing statistics in compliance with what they have ‑‑
JAN ZORZ: Just lack of statistics not lack of traffic?
DAVID HUBERMAN: We don't know. I can't see their traffic if they don't produce statistics.
RANDY BUSH: Are there any questions you can answer? And that includes that question.
JAN ZORZ: All right.
Kaveh: If you have any questions about K‑root I can definitely answer and also about RSSAC I can bring it back to RSSAC or to root Ops group.
SPEAKER: Ulka: We have a question from another remote participant and I am waiting to receive that person's name organisation ‑‑ it's Nick again, who is asking: What is your opinion on RFC 7706 for non‑root zones like .se publishes the entire zone?
DAVID HUBERMAN: It's a good question. One of the nice parts about this audience is there are so many really good people to save me.
AUDIENCE SPEAKER: Is there someone from SE?
AUDIENCE SPEAKER: Benno Overeinder: I think RFC 7706 might be most of the people don't have a clue what it's about, it's about serving local and authoritative zones. So locally I am running .se org, locally I am running the root zone and essentially that can improve the stability, etc., etc., of the DNS infrastructure. Just to give some clue to the rest of the room.
JAN ZORZ: The mic in the back.
AUDIENCE SPEAKER: HI, I'm Coleen from Article 19. I was wondering if you could elaborate about the articulation between the evolution of the governance model and the root server operation? Thanks. If you could elaborate a bit about that specific connection.
DAVID HUBERMAN: I mean, we have 12 root server operators and how well, are we doing, right? At ICANN we run L root. We try very hard to do a good job but we don't have a lot of input to tell us how well we are doing and one of the benefits of the new root ‑‑ the proposed governance model is that actually has a methodology to measure how are we doing, measure our performance. Also, there are financial ‑‑ there is a very important financial component. No one is paid directly to run a root server. It's ‑‑ and that is interesting if you think about it because it's a lot of time and money. So, it's also really interesting to take a look at where ‑‑ what I'm trying to say where RSSAC 37 can help is taking a look at the financial underpinnings and make sure they are okay so that we are not going to be in Annie merging situation in the future. (Emerging) (.
JAN ZORZ: That was good, thank you.
CHAIR: Next up is Roland van Rijswijk from NLnet Labs.
ROLAND VAN RIJSWIJK: Good morning. I want ‑‑ this is a joint project, it was started by the University of Twente where I also work part‑time as an assistant professor, that is the research lab for the dot NTCCT.
Now, what I want to talk to you is a small project that we started some five years ago, where we had this idea can we measure large parts of the global DNS on a daily basis. And in this talk I will discuss why we wanted to do that because why on earth would you want to collect that much data? How we do it and what we have learned so far while we have been doing this. And interestingly, I think for the people that were here for the first plenary session of the day they already saw a talk by my colleague Matthias, who used some of this data for his blackholing study. Right. So, I assume that most of the people in this room are familiar with what the DNS is, but it's good sort of recap why you would want to measure the DNS, right? Because it fulfils this vital roll on the Internet of translating something that we humans understand and name to something that our machines understand like an IP address or another serve that they need to go to deliver e‑mail or things like that. And in essence every network service that we have relies on the DNS. And consequently that means that if you start are recording what is in the DNS you learn something about the development the Internet over time, and that is our prime motivation. We want to learn about the evolution of the Internet through the DNS. Now the obvious question to ask is hasn't somebody tried this before? And of course, people have been recording DNS data for a long time. Some of you may be familiar with something that is called passive DNS, something that is very popular in the security community and basically what you do there is that the picture tries to show this, if you have a name server here that serves your local network so a resolver you capture all of the traffic that sends out to the Internet and you record it. If you do this at lots of resolvers around the globe you get a decent picture of what people are interested in but that's trick. Passive DNS only shows you what clients of those resolvers are interested in, it doesn't give you full coverage of, say, a top level domain. The other issue is that you have no control over when queries arrive. So if a domain is popular you might see tons of data points for that domain every day and if a domain is unpopular it might take a day or two before you see it again in your data set and if you want to create reliable time series you cannot use that kind of information but this is very useful for security purposes because it gives you an interest ‑‑ an idea of what people are actually interested in.
Now, what we do is what we researches like to call an active measurement, and what we do is we send a fixed set of queries and I am not going to bore you, you can find them on our website, we do that once every 24 hours and this at quite a large scale. So our current coverage is actually a little bit higher than this, it's now almost 218 million domains per day, and which is almost all of the gTLDs, the new gTLDs, you have to renew your subscription to get their data every time so we have gaps every few months. But we record almost, almost all of the gTLDs and we are expanding our coverage of country code top level domains so we have quite a few in Europe, in North America, we recently added a ‑ our first Latin America gTLD, Guatemala, and if there are any ccTLD operators in this room who don't see themselves on this list and would like to contribute data, I am here.
Right. So, maybe it's time to grab your binge owe card because on the next slide I am going to call this something that is on this slide. Is it going to be A, B, C or D? You are so smart. Yes. I am going to call this big data. Why do I call this big data? Because I need to get my research funded and research funders love big data. But of course on a more serious note, would our work qualify because you can make this claim but as a scientist you have to sort of substantiate these claims. Now, something that most people would associate with big data is the human general only, and then they printed it on paper and it was a stack of paper about that high for a single person and these are all the DNS and for one single human there are about 3 billion base pairs in a single general only. We collect around about 2.3 billion DNS records every day so that is about three‑quarters of a human so until about here. And of course it doesn't work like that, but hey. And since we started measuring in February 2015, we have collected over 3.1 trillion results, so we have collected this huge haystack and now we need to make some sense out of this. If you are collecting data on this scale you need to do that in a responsible way. Because you don't want your measurement to look like a DDoS to whoever you are measuring and especially if you have large DNS operators I saw there is somebody from GoDaddy in the room, they have a huge number of domains registered worldwide, we don't want to overload these people with loads of queries so we try measure responsibly by spreading out our measurement over the day, and we have also reached out to some large operators, for example, to Verisign who operates dotcom and dot net. Now form we have actually received very few complaints from people. We measure but if you are in this room and you see our traffic and feel that we are overburdening you, please talk to me. It should be really easy to find us because all our IP blocks are listed in the RIPE database and if you see queries coming from us and type it in you should get that kind of information which has a link to our website.
Now that we have collected all this data, what can we do? I am going to illustrate this with three recent examples that we have been working on, two serious examples and because by the time I have gone through the serious fun, you will be bored, a funny one at the end.
So the first example is about DNS security. Some of you maybe less familiar with DNS security but suffice to say it's something that you use to make DNS more secure by adding digital signatures and the .nl and ccTLD both thought that DNSSEC was very important to deploy so they gave their registrars a financial incentive to deploy DNSSEC and they give them a very small discount for every domain for which enabled DNSSEC. Now, I had a master's student who wanted to look into this and his hypothesis was that while such an incentive would encourage deployment en masse it might not necessarily that follow security best practices and that is what we want to find out. So we looked at both NL and .se, it would benefit mostly large operators, this incentive because the a very small amount of money so you need loads of domains to recover the cost, and the other hypothesis they might not do this according to best practices. In .nl and SE a very small number of operators is responsible for over 80% of signed to mains. So, in .nl is just 14 operators and .se is just three whereas there are in the order of 2000 registrars for .nl and for .se there are a few 100 registrars.
Now, the other take away from this graph is that if you look at the table, then you see that for large operators, both in .nl and SE 63% of the domains they have are DNSSEC signed. If you look at the over 20% it's 8 .5 and 4 .8, so that is the an order of magnitude less. So this sort of confirms our hypothesis large operators are way more likely to do this because of these incentives.
We compared it to where there are no such inten sieves any more,.org used to have one but it's not there any more and there you can see the number of signed domains between large and small operators doesn't differ that much. They are both around the 1% mark. So this incentive is really having an impact and it's having impact on large operators.
So, what we did was, we checked if they do this well. And we took the guidelines from NIST who set standard for how you should securely sign your domain and I am going to summarise the result and say they do most stuff correctly you about they use a zone signing key that is considered small, then 24 bits, and whereas NIST says you can use that key as long as you replace it on a regular basis they never did. They have been using the same key for years. And that is simply not secure any more, that needs to change.
Now of course I can spell out doom and gloom and say DNSSEC is a failure because people are not doing it correctly but the nice thing is people take this seriously and I gave a presentation about this particular topic at the Internet ‑‑ in Stockholm last November, and IIS, the operator decide they wanted to change their incentive policy and have explicit security requirements including in the policy and this is already having an effect. This graph here shows you enablement of what is called TLSA and that is too much detail to explain but you can use this to secure e‑mail communication basically. TLSA for MS records. This line is two days after I gave that presentation. And as you can see, this had a huge impact, so .se is the blue line and that is single operator enabling this security fee fewer for all of the domains they operate. This also had a secondary effect on.com where the red line jumps up. Another thing is that we said that one way to improve your security in that presentation is to switch to more modern crypt graph I can algorithms and here you see Binaro is one of the big three operators and they re‑enabled but switched to a different algorithm so this research is having really an impact.
The second example I want to talk about is DNS resilience. The attack on Dyn which is probably familiar to most of you in the room and was also discussed in the first plenary session this morning, was one of the largest DDoS events and this hit DNS servers from Dyn. And basically, all of the east coast of the US had trouble resolve some really popular domains, including LinkedIn, PayPal and I believe some parts of Twitter as well.
Now, what we wanted to find out was, whether we could see the aftermath of that attack in our data set. So what we did was, we looked at domains that used Dyn name servers and we looked whether there was any change after this attack. And what you see in the photograph and I will pay attention to the scales because I wanted to sort of show both lines in the same graph so I have doctored the scales. But what you see here in blue are people that exclusively use Dyn name servers for their domain and in red you see people that non‑exclusively use, that means they use Dyn and another operator. And the black line again is the attack and after the attack what you see is the exclusive use goes down, non‑exclusive use goes up. So people realise they have put all their eggs in one basket and they may want to diversify in terms of operators and also start using another operator. Another interesting take away from this was that it's not all of the Dyn customers that do this, this was about 4.5 thousand domain names in data set but these were all the important brands that do that.
And another take away which is not on the graph is that it didn't cost Dyn any customers, a PhD student wrote a paper about this, but the behaviour of their customers changed. A collaboration between the university I work at and CAIDA and ‑‑ what we use OpenINTEL data for is to look for points of failure in the DNS because the goal is to improve DNS's resilience in the face of DDoS attacks.
I am going to give you two ‑‑ sorry, three small examples from that. The first one is this graph. We are studying what we call parent child delegation TTL mismatches. If you register a domain then the TLD operator assign a TTL to your delegation, so to the name server records for your domain that are in that TLD. That actually determines for most of the time how long your delegation stays in a cache. And in this graph that is the red line here, so this is data for.com, and the delegation TTL is two days. Now, what we wanted to look at is if we look at what people actually set as TTL in their own zone, does that match what the TLD operator does? And what you can see, that spoiler, no, it doesn't, because over 95% of domains have a TTL that is half as long as what the parent does. Now, the impact of this is that people might be setting a lower TTL because they want some agility in case they want to make changes but actually they will have to wait the full two days that the TLD operator determines for any changes to propagate over the whole of the Internet, because of the way DNS caching works.
So people are are setting something and may expect to be able to be much more agile in the face of, say, a DNS hijack or DDoS attack but are actually not because the TLD operator determines this.
Another thing that's interesting is topolgical diversity. If you want to protect yourself against the denial of service attacks you might want to have your name servers in multiple autonomous systems because if one autonomous system is specifically targeted for attacks, and I am going to put the caveat in here before Oliver jumps up, yes, if you are Anycasting this might be coming from a single AS, let's assume you are not, which is true for quite a number of operators then you may want diversity and have your name servers located in multiple ASs. Now what you see here is the majority of.com domains, three‑quarters of them, have name servers that are in a single autonomous system. Interestingly, for .nl the picture is different, more than half have name servers in at least two autonomous systems, and that is because this is something that the .nl reg put a lot of emphasis on in checks that they perform on sort of DNS integrity, when you register your domain. I don't believe it is a requirement any more, it used to be a requirement that you had two name servers and they needed to be at least in two different networks. I think they have dropped that requirement butts still visible in the data set.
Now you could also look at IP prefixes and here I am looking at IPv4, you could do the same for IPv6 but I had limited space in my presentation. And again you see that what I look at here is, how many different IP prefixes of are name servers for domains in, and there you see that 15% of domains have name servers that are in a single IP prefix. And we are actually hypothesising these might be two machines in the same rack. That doesn't protect you against any form of attack or a fire or somebody pulling the plug or doing something ‑‑ some other stupid thing. We are using RIPE Atlas probes to check this hypothesis whether or not these name servers are actually in the same record in the same data centre. I don't have any results yet but I hope to have them by the summer.
Right. My final example is of course going to be something a little bit funny to leave you with a little bit more lighthearted and one of the things that we also are currently studying is what people put in DNS TXT records. Now, we find lot of stuff in TXT records, Jaffe script, Python, whole X509 certificates, parts of DNS zones, the things I highlighted in red might be malicious. We have some evidence that DNS TXT records are being used as a delivery method for malware pay loads and also these annoying coin mining scripts they get deployed through DNS TXT records. I know. Imagine you have DNS over https in your browser and you can do do this straight from your browser. Anyway, I will just leave that there.
Right. And on that note, never attribute to malice that which can adequately be explained by stupidity". Because people might be putting stuff in the DNS and it can be suspicious but it can also be plain stupid. Can we have a drum roll, what do you think it is? Here we go. People put their private key in DNS TXT records. Why, oh why, oh why, would you do that? This is ‑‑ this is one example and I sort of redacted the important bit. But why are people doing this? Anybody have any idea why you would do this? Right? Stupidity? Right.
So, we of course looked for matches because the private key record has the public key in it so we could match against the public key in other records. This was people trying to configure something called DKIM which is suggest use for e‑mail security but if you put the private key in the DNS it's not protect you a lot. So, that was a bit silly. And unfortunately, we found multiple examples of this but this demonstrates these type of technologies are actually pretty difficult for people to deploy, they add a lot of security but if you don't know what you are doing you are randomly going to click until somebody is happy and you might end up with this.
We found tens of keys and different keys in DNS records.
Right. So, I am just going to note I have five minutes so I am sort of on time. What is the future of this project going to look like? So our short‑term challenges is we want to sort of have robust data archival. We collect a lot of data, we just passed 100 terabyte collected mark somewhere this year and we need to archive this and we are talking to our national research funders to make sure we can get this for post earth, if our disks crash that will be unfortunate.
We would love to expand the number of ccTLDs that we cover so again if you are a ccTLD operator please come talk to us.
Now, our goal is that if somebody in 2025 wants to know what DNS look like in 2015 they should be able to come to us and we should be able to help them, because this is something that can really help us track changes on the Internet over time, and we have already, in the research had a we do in our group we have found many cases where we would have loved to have had this data for 2005, because the Internet still changes very rapidly. And of course, we want to have a real world impact with that. The example with .se illustrates that this type of research that we do can have an impact and improve the security of the DNS and with the security of the DNS the security of the Internet. Equally, the research that we have just started where we look at resilience of the DNS that is very important, we just had a talk about the root, but there are more important DNS zones out there that we need to protect against denial of service attacks.
Right. If you have any questions, you can either come up to me but Matthias who is in the room is part of this project so you can talk to him. Can you stand up. He is over there. And we are both here all week so please come talk to us if you have any questions. And with that, I leave you with the URL for our project, please go visit it and thank you for your attention, if you have any questions I will be happy to take them.
CHAIR: Thank you.
JAN ZORZ: Thank you for this TXT revelation. I am really happy that people does not put crazy things just in AAAA records but also in TXT records.
ROLAND VAN RIJSWIJK: Actually, that is not ‑‑ I will show you pictures later on. We looked at that ‑‑ we have a paper about that, actually.
AUDIENCE SPEAKER: Carsten. Clearly this amount of data can be misused, what do you do against misuse, usage of the data?
ROLAND VAN RIJSWIJK: So that's a good question. So we get access to the ‑‑ some of the TLDs zone files under contract that doesn't allow us to distribute that data so some of our data is open access and you can find it on our website. Abuse maybe, you can enumerate which exist and do naming which is why, for instance, come net and org forbid you from distributing their zone files to others. I would argue DNS data is still public so anybody can actually apply for access to these zone files and can collect this data if they wanted to. But as I hope I have convinced you that is actually a lot of work to do so. So we guard our data and have a completely isolated cluster where the data is air gapped from the Internet so we can access it from our research lab but of the open data sets we open every day you can use them to do research and people do that, for .se, Alexa and umbrella are public and we publish those measurements every day.
RANDY BUSH: IIJ. If we really believe that secondary, primary should be on different networks and in different physical locations, maybe we should write an RFC on that. Oh, wait, somebody already did.
ROLAND VAN RIJSWIJK: Yes
RANDY BUSH: Which leads to NLnet used to do something about that, as you said, why did they stop, what could have made it easier for they to maintain, I don't want to use the word and he enforcement and he and he but maintain the incentive to have what looked like technically reasonable requirement and if they could look into how it could be made more easy, that could probably spread to other allocations from other TLDs.
ROLAND VAN RIJSWIJK: Yes. So that's a few questions in one but I will try to answer all of them. I can't speak for .nl because I don't work there but I know this has their attention and they are doing quite a lot of work on propagating the use of Anycast to spread, to create more diversity for your name servers, not just for themselves but also for people in the .nl domain. There are other ccTLDs that actually operate services that allow you to outsource your DNS to them and they do the Anycasting. I think SYRA does that, but also .at does that and I think there are people from .at present, I saw some people. They are in the back of the room. So with Anycast you can still have multiple locations across the planet but they might all be in the single AS and if that somehow makes a mistake with the prefixes they have you are still not protected because you want something in another autonomous system.
Your question was how can they make it easier to do that and what I am thinking is, the problem I see is that this raises ‑‑
RANDY BUSH: Not how to make it easier to Anycast, how can cc ‑‑ or TLDs make it easier to check and to encourage distribution ‑‑ Anycast is really a small segment when you count, the majority are ‑‑
ROLAND VAN RIJSWIJK: That's right bush I do run two ccTLD registries and it's a pain in the butt.
ROLAND VAN RIJSWIJK: So they can check ‑‑ there are new checks to allow them to see if things have this type of diversity. What I don't know is why they removed those checks, probably because they created a higher bar of entry registering a domain if you enforce those. What you could do is encourage it. So they could use something similar to the incentive that we have for DNSSEC to say if you create this toplogical diversity for the names you operate or register for people we will give you a small bonus, because we believe this improves the stability of the Internet. That could be a way to address that. And as we have seen for DNSSEC that has worked. Does that answer your question?
RANDY BUSH: I think there are tools missing.
CHAIR: But we still have a couple of questions in line.
JIM REID: Great talk, really enjoyed that. A couple of points to make to pick up on what Randy had said. I think the issue about financial incentives to say uptake of DNSSEC is all very well but sometimes that can have unexpected consequence as you pointed out in your presentation, people are just using the same old key for the pumps getting the discount and they are not really invested in trying to do DNSSEC in a meaningful way. They just want to sign the zone and get the discount and we have to be careful about that kind of thing. Just a couple of points on your talk, it wasn't clear to me but based on the questions that have already been asked is that I understand you are now using ICANN zone file access agreement contractual provisions to get access to get copies of the gTLD zones, is that correct?
ROLAND VAN RIJSWIJK: That's correct.
JIM REID: As I am sure you know ‑‑ ICANN's remit doesn't extend to ccTLDs and several also exercise copyright over the registration database and the act of making the zone file available even for research purposes compromises that and that can create problems for certain registries. There was some other point I was going to make as well. You had measurements, it's obviously a no brainer too, if you are going to start triangulating things for the physical locations of the name server you will need to do some cleverness and make sure you are monitoring from different locations in the network so you find out the diverse locations for the same IP address.
ROLAND VAN RIJSWIJK: So we told the student about Anycast. So I am ‑‑ I hope ‑‑ I don't know if he is in the room but I hope Olafur is ready we instructed our students that Anycast exists.
AUDIENCE SPEAKER: Brian Tremmel, Google. I am getting used to that still. I came up here to ask a different question and somebody said Anycast five times so I changed the question. So in the act of measurements that you are doing what additional data do you keep about how the route like what the route was taken from the observation point to ‑‑ if I go back and want to figure out if there were sort of Anycast instance changes that had impact on what parts of zones you saw because maybe there is something going on in the background about how that is distribute behind do you have enough data to be able to pick that apart from the past?
ROLAND VAN RIJSWIJK: We have partial data to do that. Our goal was not to look at the topology of how we get our responses, but what we do record is, for every query we record an RTT, so we ‑‑ if there is a sudden change in routing we will see that reflected in the RTTs but that is unform only a proxy for any routing changes so we don't actually do routes to all of the name servers that we have. But we are sort of, I have been thinking about whether we can expand this measurement maybe on not every day but like a weekly or monthly basis where we, rather than trying to resolve every name, we try to resolve every name on that we find and possibly from multiple locations, because then you can also see this kind of diversity, but this is something that is actually quite a big project to deploy on its own.
CHAIR: Only the question from the chat, we need to cut the queue. Please be short if possible.
AUDIENCE SPEAKER: Ulka from RIPE NCC. I have two questions from Nick, and a comment from Cynthia. Nick's first question, is do you expect that encrypted DNS will reduce the visibility you have in the DNS traffic if you use passive DNS?
ROLAND VAN RIJSWIJK: Okay. Yes, of course it will, because if you are monitoring passive DNS in your local network and people switch to using DNS over https where the resolver is outside of that network, say it's the Cloudflare or Google quad 8 then the fact it's encrypted means you lose that visibility. But for passive DNS your purpose is slightly different than inspecting that traffic for security purposes, you are recording that traffic and aggregating it at a central point. So if more and more people switch to resolving through things like DNS over https and then that is going to reduce the quality of the data that flows into the big passive DNS collection points that we currently have, now I hope that answers Nick's question. I am starting to wonder which Nick this is, I have a feeling.
AUDIENCE SPEAKER: Ulka: So, it's where do you get your data from? What do you store? Do you store the IP address of the DNS client?
ROLAND VAN RIJSWIJK: So our ‑‑ we have our own prefixes from which we to the measurement. We record which instance of our measurement nodes actually performed a certain measurement so we can ‑‑ we have multiple worker nodes and I can tell you which performed which measurement. What was the second question? Where do we get the data from. I think that was answered because Jim asked. We have contracts with under the ICANN by‑laws with the gTLD operators and the ccTLDs are all individual contract.
CHAIR: Thank you.
JAN ZORZ: Hello Geoff. So there was a KSK roll you say, right?
GEOFF HUSTON: Hi everyone. You know, the Internet relies on two massively engineered systems whose internal operation is the source of continual surprise and amazement. The reason why these folk do OpenINTEL is the DNS is a miracle masquerading as a mystery. We have no idea why or thousand works. Similarly with BGP, absolutely no clue, no clue at all. When we first designed BGP we were routing across 10,000‑odd networks individual FIBs. Rudiger corrects me, just 1,000. The same protocol today is now kicking over 800,000 in v4, kicking well over 200,000 coming up in v6 if I remember slightly, the same protocol. Why does it work? If someone could answer that, good on you, I will buy you a beer. For the rest of us it's a mystery. Why does work so well? We don't know. And the DNS, when you ask a question, why does the authoritative server on average see the same question repeated at least once? 1 .8 queries hit the authoritative, why is the DNS a DDoS amplifier? What if the answer is NX domain, no is treated more differently than yes. If I ask a question for which there is not an answer the hits authoritative servers 2.5 times. So here are these folk at OpenINTEL asking the DNS questions where there are answers, but at the root so I have been told, more than to thirds of the question have no answer, maybe we should be studying the nos more than the yes, sirs, because that's what the DNS does. It shovels nonsense in vast amounts. So we don't know what we are doing. We really don't know what we are doing. So let's roll the key.
Now, I have heard a lot of KSK roll presentations, probably so have you, and most of them come from the root server Kebal who happen to have all this wonderful data because we don't understand QNAME minimisation which they don't share. I see something but I can't tell you what, I wish I could share it but I won't. I am not one of those cabals, I am not part of ICANN or PTI, APNIC doesn't run a root server, I am not a member of the Kebal, I can't even see root server query data, and maybe I shouldn't but I am not part of that so why the helm I presenting this crap to you? Surely you would want someone who pretends to know about it. Actually the important because, like you, I am a consumer. If it wasn't for the DNS you wouldn't have an Internet. We have stuffed up addresses so badly, the presentation by Cloudflare was a mere glimpse into how badly we have managed to reuse and recycle and hash up addresses, cramming 20 billion devices into just a couple of active billion addresses and going what a miracle, who owns that address? I am sorry, let's get over here, ownership I am not sure what you mean. That address, no, they only borrowed it for a microsecond or two. The only thing that keeps the Internet together is the name space, which I recognise are enshould get you very wordy because the name space is also a seething massive inconsistencies and phenomenal misunderstanding. We don't really know why but one‑third of you will do a DNSSEC validation of a name if it's signed. Now, you probably feel good about that, because there is this phenomenally obscure attack factor that might poise enyour cache. Now, the only time I have ever seen this has been in a lab down the Cumminski attack and I am not sure, folk there is much easier to be a pain in the neck in the Internet than launching one of these obscure attacks but nevertheless you all sign. And you all check, one‑third of users around the Internet perform some kind of check. So DNSSEC is used a lot. So, in some ways when you change the root trust point, a lot of people could be affected, and I am one of them and so are you. So, this was always going to be, of all the things we don't know in the DNS, a lot, this is one of the bigger things that you kind of go you really sure you want to do this? Because the KSK, the key signing key of the root zone of the DNS is super special, anywhere else in the DNS you have a parent, adult supervision. So if you want to change your key you tell your parent, Hi, here is the new key, let's do a key roll, everything is good the key gets rolled and it's fine. When you want to change the root key, you have got a real problem because who do you tell? Well if you run a resolver you have a copy, so you should know. If you do DNSSEC validation, you should know. How do I tell you it's changed? Mystery. There is no parent key, I can't tell anyone else up there, tell everyone, because it just doesn't work like that. Every validating point that does DNSSEC validation needs to load this new key.
So we had a plan. If you don't have a parent, use your predecessor because that's the only way you can yank yourself up by your boot‑straps, get the old key to sign the new key, you publish that in the root zone and it all works just fine. How do you tell the difference between an ultra swift attack and this slow deliberative process? Well the first assumption is that no one ever turns off their DNS resolvers or you are not meant to. And as long as you can see this new key for the hold down period, 30 days, you can trust it. So first assumption: This is not for folk who turn their devices on or off and no one ever does that, do they? Ignore that. This is for things that are always on. And now you add the new key, it all just goes swimmingly, we change the key and life is wonderful, here is the plan it's got lots of colours it must have been good, the big date was October 2017 and that is probably all of you are aware, since September 2017 the first these, oh, my God we have no idea what we are doing and something has just gone bang, happened, and the best laid plans got kind of stuffed up. I will talk about why in a second but the relate was to simply leave things as they were for 12 months, do a bit of community consultation whatever that means and come back and try it again because that's what happened, the new date was 11th October 2018 and the key was revoked in January, and basically just now, 10 ‑‑ 16th May, in one facility the old key has been destroyed, and I mean destroyed, and on the 14th August the other key will be destroyed. So long, we have all rolled, it's all done. So, the key got rolled and when we measured this and at APNIC we use ads, we didn't see much that was sort of impactful, slight blip but, you know, at the time 14% of the world's users were validating and not prepared to go to something that didn't validate, didn't really change so it seemed to work just fine. I even heard the word "success". Well, you know, nothing could be further from the truth. It wasn't a success, we were indeed incredibly lucky because we didn't know what we were doing but the things worked out kind of all right. We had no idea that it would work or it wouldn't work. We had no idea how to measure it, and we had no idea how to predict that impact so all of this was just shooting in the dark because if you really looked at the basic question, while we were planning this, how many folk wouldn't come? How many folk are behind implementations that couldn't correctly implement the specification and use resolvers that turn themselves on and off and don't hold for hold down perios. We had no idea, we can't measure that. We can only do it incorrectly. Why? Because one of the odd things about the DNS, despite all the recent efforts about DNS privacy, is even before that, the DNS's remarkably opaque. When you ask a recursive resolver a question, and it goes and asks the authoritatives a question, before the client subnet leak, thank you Akamai and others who conspire there, before that the question that hit authority tiffs had no reference about you or your identity, it was just a question, and just an anonymous question. That makes it really hard to understand the capabilities of the inside of the DNS. So there are a number of ways you can get the DNS to sort of talk about itself, and one way is to load information into the query. A query isn't necessarily a domain name, the almost like a piece of micro code to the authoritative server. Do something based on this instruction I am giving you. And one of the day of interpreting that code is to say I am going to till my Trust Anchor state, my KSKs that I trust. And this was the idea, signalling via queries, and so RSC 8145, resolvers generated their trusted key state, sent it off to the room, obviously we knew what was going to happen. Well sort of until we turned it on. Because when it got turned on, the problem was, that after the 30 days, there was still an annoying bunch of people who were reporting, they hadn't changed their Trust Anchor, they were still trusting the old key. The reason why the whole thing got deferred was that that group there, some 5% approximate of reporting folk were going, no, not coming with you, this is all going to go really bad. Now, a bit of a strange debate, because only the root server operators saw the data, and sort of we consulted the community, they say, and the rest of us going, I don't know. Whatever. Which was kind of what happened. So at the end of 30 days things kind of happened but they didn't clear out. After 12 months, the signal was still there. After 20 months, the signal was still there on reporting breakage. Maybe the measurement tool was completely wrong and maybe this is not the best way of doing it and maybe it isn't an accurate signal, maybe we don't understand the way DNS queries work and we don't really understand about forwarding, so many questions. No idea what the answers were, because the DNS doesn't let information out easily, the DNS caches like crazy, you have no idea whether you are getting good information or old information and in some way it's measuring the resolver's uptake of the KSK but not the number of users. Don't forget around 90% of the world's users use only a few thousand resolver systems, there may be millions but that trailing edge only services one or two people each time.
So, if you can't go forward, let's go backwards and put the signal in responses, use the site measurement, I have done a lot of this, you send the resolver a label and the label is a piece of micro code, if the label has a certain value instead of returning an answer you return surf fail. This should betray what is going on. You can tell if a user is trusting a certain key by simply saying when they ask a question in the DNS will they get the necessary web object. Look at the DNS and the web and see if it works. So we set that up prior to the roll in 2018, some 16% of users actually are running validating resolvers. It was only published as a spec in July, this is September, no one had implemented it, 15% of the users didn't actually do anything, we are trying to work with fractions of data, half a percent, of which .005% was saying I have got a problem, experimental noises around 2 to 3%, this is just never going to work. So we kind of predicted that around .1 to .2% of users would be affected, to be perfectly frank, we had no clue. So what happened, this is stuff from SIDN labs, we rolled, signatures changed and mostly nothing much happened, mostly what we saw was absolutely no impact whatsoever. What we heard was different, because the Irish Examiner ran a report about EIR who put a little bulletin saying whoops, sorry and so we looked. Because what happens when you don't survive is kind of bad, because if you are a validating resolver and you haven't got trust and it all expires you can't answer any question whatsoever, everything goes black. So when we look at EIR in detail what we found and we were sampling around between 800 and 1,800 people every time, all of a sudden a whole bunch of folk in EIR were thrown off the Internet for a day. They were unable to run a Google ad. Nothing happened. We couldn't couldn't even see them to measure because they had no DNS. So yes what we saw in EIR was indeed an impact, a sort of hang on a second, were they the only one? If we look at the data again at that level of detail, what we see once more in the world is this drop from 5.2 million to a little over 4 .7 million. So there was an impact there somewhere. Was that related to the key roll? So the next thing, look for ISPs that actually have users, more than 400 samples a day, look for DNSSEC validation of more than a third, look for a drop right at the roll, what do we find? Anyone on this list? Someone here from Portugal. Number 19. Go and look for yourself. If you have there you had a problem. Most folk fixed it, really quickly. But this is not no damage, this is actually quite a lot of damage, all over the world. Some folk simply said it's DNSSEC, DNSSEC validation equals no, lets press on. And three of them quite obviously just simply turned it off because what the hell it solved a problem. So it wasn't exactly problem‑free. So the immediate impact that I can see was indeed.2 to.3% of users, 32 cases that I saw out of those 35 it all got fixed quickie and the other three turned it off. Next thing, we revoked the key. This was meant to be banal, and quite frankly getting rid of the key was to simply put it in with the revoked bit set, everyone is meant to say I don't trust that key any more. It was a big response. Should be fine. And for what we saw no impact whatsoever. What the folk over at Verisign saw and the other route operators also reported the same and this is a slide, I left it off Duane's Wessel's presentation last weekend, as soon as the key got revoked the number of queries jumped and kept on jumping right up until the point of removal. There was something going on at the roots that had nothing to do with users, it was just something going on at the roots.
After a little bit of soul searching and hunting by the route servers this message at BIND certainly points to a case where there are some issues with old versions of BIND in a query spin. Quite logical, there is a lot of BIND out there and a lot of old BIND out there and doing crazy things when you do things that old BIND didn't expect, means you are going to get queries of the root that don't impact a lot of users but give a hiccup for a day or two.
What do we learn? Hard to same. We can roll the KSK, not without damage but we can do. We always knew that. The extensive contact campaign, well, you saw numerous number of people coming to these meetings saying we are going to roll the key and everyone was listening except EIR and someone in Portugal. The bigger lesson, the DNS is unhelpful, it's extremely opaque, but making it less owe opaque comprises your own personal privacy, the more we attach credentials and trace points into DNS queries, the more you individually can be tracked through DNS logs and through things like OpenINTEL, do you want that. It's hard to say, but there certainly seems to be a lot of sensitivity saying that is a bad idea so maybe instrumenting the DNS is impossible. What next?
Well, it was an experiment, we really didn't know what we were doing. Our efforts to try and measure it were largely frustrating and in the case of the TS signalling hopelessly frustrating. As I said we could think about making the DNS tell more tales but I am not sure we want to do that because that's going to be a bad idea. Is the world changing? If you look at the web PKI, rogue CAs are a continuous feature of that space. Everyone attacks CAs, certificate transparent is like sticking a Band‑Aid over a gaping wound. We have no real solution to the insecurity of the web PKI. We have now got free bees out there, let's encrypt and others, so the problem is massive. Fast attacks that attack the registry point seizure domain, re‑home the NS and DS records, use let's encrypt, grab certificate and all your goodies and run away, happen within two hours. What's your defence? Well, quite frankly, almost the only thing we have left that the browser folk refuse point‑blank to do anything about is actually using DANE, domain keys in the DNS, and pinning your public key into the DNS. So DANE key extensions in TLS is actually in my mind really, really good idea, the fact that Chrome, the authors disagree with me means it's never going to happen. Thank you, Chrome. But it's still a really good idea. And if in some unlikely event Chrome ever bend, you won't have just a few thousand resolvers who track the KSK everybody ‑‑ everyone who runs https, you and me and everyone else is going to have to keep a copy of that KSK because that's how you validate the DANE point. So, if we are thinking about rolling the key again, think about 20 billion N devices rather than a few thousand because this is a much bigger scale of operation.
So as always, opinions, should we roll the key and should we not? Long line of microphone queue, it was a success, it's so much better when we don't do it twice. So bizarre. I have certainly heard this said by many folk in the DNS who I know and respect, and let's just roll it, let's roll the algorithm, we need an elliptical curve because it's better, we should do backup keys, with 20 billion relying parties looking at you interesting. The other view, and there is another view of course, why are we doing this? The key was never compromised. Why are you replacing a good key with the same key, basically? You are just changing the bits and not the security structure. And if you are rehearsing for disaster, it's a wonderful disaster that gives you 30 months advanced warning because that's what we have just rehearsed. The next time somebody breaks into your house I would like 30 months' notice, I am going to bash the door down, I hope you are prepared. Is this going to teach us anything we didn't already know, we know nothing about the DNS. Is old signs really the best way, and more to the point, with 20 billion relying parties on a KSK which is almost the only way we can fix up the web security mess, how do we roll the key when they have got so many folk watching us and trying to track it? That's all I have got. We have got a few minutes for questions, thank you very much.
JAN ZORZ: Thank you, Geoff. We like to keep things interesting.
JIM REID: Geoff, I remember David Conrad talking about the political aspects of the KSK roll‑over before it happened and he was saying the ICANN board was going to have to make a decision to recommend doing the KSK roll‑over with insufficient and almost non‑existent technical or engineering data to back up that decision. That scared the hell out of me. I think, I think it's a question for you is, for the next KSK roll‑over what should we be doing about the measurements and metrics in a better position to assess the potential impact of that? As you say, this time we got lucky broadly speaking but we would need to be going to the next one with a better degree of certainty about the potential impacts and downsizing and in case it does go horribly wrong and I wonder what you think we should be trying to put in place to address that particular point.
GEOFF HUSTON: The first part, should the board of ICANN have been given the responsibility to make a highly technical decision based on data and behaviours they mostly had no clue about? It seems completely bizarre and more an exercise in buck passing than a decision based on sold evaluation of engineering merits. Bad move. Who else would you ask. Secondly, what should we do next time? I would like to think the minimisation gets deployed by everybody, because QNAME minimisation stops domain names hitting the root servers. If full do name names don't hit the root servers there is no more case to keep that data secret. And that, I think, is going to be the best way of doing it, I can see a few folk going no, no, we want to keep the secret of the root. If the only queries are for TLDs, it's not exactly a secret any more. All you are seeing is resolvers doing cache misses. That would give us a much better idea of having a broader discussion outside root server Kebal about what the best thing to do for all of us. Right now, you are trusting 13 operators, 12 really, to do the right thing, but they are looking at their servers, not you, and I think that's a bit weird.
AUDIENCE SPEAKER: Can I address the ICANN board ‑‑
GEOFF HUSTON: Pleased to meet you.
AUDIENCE SPEAKER: So we did actually spend a lot of time looking at so. Technical risks and issues and that is actually the reason why it was delayed one time, as we had a lot of issues that were raised so I just wanted to correct the record, that even though the board as a whole perhaps isn't that technical, we do take technical input and had a lot of briefing on what the issues and the Rick are.
SHANE KERR: From Oracle Dyn. Well, I am ‑‑ I have a few things, first of all with the QNAME minimisation there is nothing to prevent the operators from providing the equivalent of a QNAME minimised set of results now, right? They could just strip everything out but the last label, right? So I think there are still privacy concerns, right? I realise there is less but anyway. So, about the ISPs that you discovered that have problems, I think we need to be careful, I think that is probably a worst possible set because the law of large numbers says you are going to see some ups and downs in those paragraphs and get some false positives, right. So it may be only a few of them or maybe a lot, I don't know how long your data set you looked at, did you look for equivalent patterns over six‑month period and things like that?
GEOFF HUSTON: All I was looking in this case the signature that exactly matched the EIR signature across the one week period when the key was rolled. I admit there is a lot of noise in this data, the DNS is incredibly noisy. That list had a pattern of doing validation, few, more again. So, yes, could it be noise in some of those, totally. Did it correlate? Yes.
SHANE KERR: And the final thing is, I agree that we don't know exactly what is going to happen, but that is the nature of reality, that's called life, right? You never know what is going to happen. It's always about risks and trade‑offs, we started Anycasting the roof which had a lot of people had grave concerns because past ‑‑ packet splitting across packet, TCP not working, not being able to locate people where their queries are coming from, all those valid concerns but we did it anyway, we didn't know it was going to happen but it works, right, the root servers couldn't work like it does today or DNS couldn't work without something like Anycast, we need to keep in mind that is the world we live in. And worst case, you saw three operators turn off DNSSEC, that's the worst possible case. So it's not like the Internet is going to stop working even if everything goes completely wrong. The worst possible case is we have seen some operators who voluntarily turn it off rather than figuring out how to make it work and we know responsible operators can make DNSSEC work and we know key roll‑over works and yes, there are problems we don't know all about but I think we need to be careful about this message saying we are not exactly sure what is going on there could be drag onesing that going to destroy the Internet, that is over selling the case.
GEOFF HUSTON: It's only been 35 years. The other answer is, it's been 35 years, surely to God we must know more by now. And both answers or responses are equally valid. The DNS does surprise us today, 35 years later. That is an observation.
JAN ZORZ: We are cutting the queue. Please be brief.
ROLAND VAN RIJSWIJK: Two points. The first one is about the telemetry and query name minimisation, so I wholeheartedly support your call out for everybody to deploy QNAME minimisation which is a good idea but I also agree with Shane there are things we can do now and one of the things we could do is influence the RSSAC measurement set because they have recently put out a call for new metrics that root operators should collect and I argue we could collect more telemetry that is not necessarily privacy invasive but will tell us a lot more what is going on so I would encourage everybody to, to submit new metrics to that call.
The other thing is about standby keys. Now I know you have argued for this in the past if I remember correctly, and effectively what this mess has given us is a whole year of standby key, so we know that works, right? There were two cases, cases in the root for a year so why not put a new one in there now and then in the meantime try figure out how we are going to convince everybody to use that one and then roll to it?
GEOFF HUSTON: ICANN is certainly seeking input from the community about where to go from here. That's a fine suggestion. I will echo Paul Hoffman's exhortation all the time, there is a KSK roll‑over list, you can find it on Google, join it and say that. Thank you.
JIM REID: My point was just to go back to what Shane just said a couple of minutes ago and I paraphrase a little bit and ‑‑ there are risks and down sides but we just have to sort of get over it. Well, maybe not. I think we have to do some more to try and measure and quantify and get some meaningful statistics about this. If you take some analogies with any other industries you couldn't say for example let's put up a new building using some new material and if the skyscraper falls down, maybe it will, what the hell, give it a try. I don't think we can take that kind of approach. You can't introduce a new drug into the market until it's been properly tested and its effectiveness and what the potential down sides are. We don't have that around the root KSK and other aspects and I think we have to try and get some better metrics in there.
GEOFF HUSTON: I think that is a wonderful suggestion, Jim. Thank you very much.
CHAIR: There is the women in tech session starting at 1 o'clock in the side room and they asked me to remind you that everyone who is interested in that topic is invited. It's for all of you who want to go, male or female, they don't care about that. They will also have a special lunch at 12:30 upstairs and the session starts at 1:00. Additionally, remember until 3:30 you can nominate yourself for the PC elections and the nominees will be presenting themselves in the 4 o'clock plenary. And another bit from the PC: Please rate the talks for us. Thank you.
LIVE CAPTIONING BY AOIFE DOWNES, RPR