Plenary session
Tuesday‑21 May 2019
2 p.m..
CHAIR: Good afternoon everybody. Please take your seats and we have the three presentations in this section. The first one is by Milton Mueller presenting and also Brenden Kuerbis, get ready for mixed world. You have the stage.
MILTON MUELLER: I am Milton, I am from the Internet governance project at Georgia Institute of Technology and this is my colleague Brenden Kuerbis he is the post‑doctorate researcher. We believe the Internet community needs to understand better the dynamics of the standards competition by v4 and IPv6. We have got two incompatible protocol suites and we have a global Internet only when both of them are run in parallel or they go through some translation point. So, the big question is, is this mixed standard Internet a passing phenomenon or will we get stuck in it? How will this end? We were approached by ICANN's office of the chief technology officer to do this study and we were focussing on the economic incentives affecting the transition and not so much the technical and operational matters. Now, the research questions that guide this are listed here, you can see them. First of all we wanted to know what are the actual economic incentives that affect the decision to deploy v6? Secondly, we wanted to know ‑‑ we're getting the APNIC data about levels of, estimates of IPv6 deployment, we want to know whether our economic analysis would be able to explain observed levels and processes of IPv6 adoption. And finally, how do these translation technologies affect the economic incentives to deploy v6?
Now, standards competitions are not new. There is a robust economics literature dating back to the 1970s and historically, you have probably heard all about Betamax versus VHS, or maybe even you remember when Apple and Windows computers couldn't work together. Or maybe you have been wondering why everybody is on Facebook and of course it's because everybody else is on Facebook. So, the key economic concept here is what's called the network external alternatives and that is defined as follows: Certainly products and services, their value to one person depends on how many other people or places or things use the same product or service, users can also bridge incompatible technologies by duplicate use of both at the same time. Of course in the IPv4, IPv6 context this is called dual stack. So the network effect arises from what economists call a demand side economy of scope and this is important to understand. A two‑way communication network is not a single homogenous service. It's actually a gigantic bundle of connections, and each connection from an economic point of view is considered a separate product or service with its own demand and supply function. So, a user who makes an investment in a particular technology or standard, gains access to a wider scope of connections, a bigger bundle, when others join the same network service or adopt compatible technology. Now, the reason it's important to differentiate this and call it the economies of scope rather than scale is because scope economies give the market process unique characteristics. The equilibrium is highly path‑dependent, it really matters who joins a network and in what sequence. You get different equilibrae. Start‑up networks as it well known must reach a critical mass before network benefits are realised, and that may require subsidising early adopters. Competition between networks that are not compatible or interconnected is tippy. That is, once it becomes clear that a certain standard is gaining decisive edge actors will converge on the perceived ‑‑ in order to maximise their network benefits. And once that happens, once people converge on that winning network, a condition that economists have called inertia or lock insets in. And that means no one wants to sacrifice the tremendous value of widespread compatibility by abandoning the established standard for a new one even if the new one is better in some ways. So, in industries with strong network effects, adoption decisions and competitive outcomes can be strongly affected by interconnecting competing systems, or by gateway technologies that convert or bridge between compatible technologies, and again here there is extensive economics literature on the impact of converters and inter‑connection on actor incentives and market structure. You have to run dual stack, or to run both on NAT. From an economic percent, dual stack raises serious problems. IPv4 address exhaustion was the main reason to deploy v6 yet dual‑stacking maintains because if there were last moves advantage gained, this is well modelled in the literature and nobody would convert if literally you have to be the last person, nobody will actually do this and of course that's wrong. So, what then is the incentive to deploy? Well we find that IPv4 is a constraint on growth. And particularly as IPv4 prices rise in the secondary market, that makes it increasingly expensive to grow within IPv4. So, ever more intensive sharing of NATTing of addresses in IPv4 also increases costs and that again creates some kind of an incentive. So because of that need for backwards compatibility, IPv6 deployment does not immediately eliminate the need for IPv4 addresses nor does it eliminate the need for NATTing. The key driver is the network operators subjective assessment of the value of network growth to its business and the relative cost of growth in v4 and v6. And as we discovered and as Brenden will explore in more detail, as more of the network operators traffic runs on the IPv6 network. Its NAT costs and its need for v4. And, I mean, eventually starts to decline. So there is some good news here in terms of the network externality. I'll turn it over to Brenden now.
BRENDAN KURBIS: This calculates for every economy in AS the percentage of users population that has the capability to use v6 or prefer v6 connections. In the aggregate view the data don't indicate v6 being ignored nor do they indicate a technology ignoring a normal difficult fusion curve but in many respects the aggregation of all the world's networks into a single trend line can be misleading. For instance, we see substantial variants between economically developed regions and less developed regions and even within developed regions we see variants. For instance, in 2018, Europe was at 15%, about half of the Americans, Asia stood at 17%, and Oceania at about 14.5.
To better understand adoption, we collected the v6 capability and preference measurements from the APNIC system for 215 economies over similar 120 day periods. For ‑‑ during 2015, '16 and '17. We validated this data against other v6 measurements systems and found high correlation. From that data, we looked at 90‑day smooth averages and we identified countries with over 5% measured v6 capability in any one of the three years of study. And this figure identifies the number of economies by the growth trends that we observed. The vast majority of countries, about 80%, had no appreciable deployment. That is, they remained at or below 5% for the entire period. The next largest group, about 12% were economies with increasing levels of v6 deployment, or capability rather, exhibiting consistent, strong or rapid growth.
The next largest group were about 8% exhibited plateaus in deployment with v6 capability stopping at any ‑‑ at levels anywhere between 8% and 60%. And that group included numerous mature European economies as well as countries in all other regions in the world.
The most accurate picture of v6 deployment comes from looking at the adoption levels of visit ASes operating in the same market on a disaggregated basis and the figures here show v6 deployment levels for the top four ASes international market according to the samples collected by APNIC system which we used as a rough proxy for market share and we used the same measurement methodology. In the US and New Zealand what appear in the aggregate to be gradual country wise increases are revealed to be plait owes for earlier adopters, which are then joined by new deployments of other operators. Note, that in New Zealand one of the top four ASes had not deployed v6 at all and the earliest and most incentive deemployer was a mobile operator that plateaued at around 50%.
Again, we see how aggregate views conceal more nuances reality of deployment. In Australia only two of the top four ASes are deployers and parent decline in the first ASes deployment level has been covered up in the aggregate statistics by a slight growth in deployment by the second AS.
In here are some more examples of disaggregated capability measurements revealing contrasting individual AS operator decisions.
So, we next went through ‑‑ looking at macro social factors, and v6 adoption and we drew on the three main sources, APNIC datasets, the world development indicators, and also the fixed broadband and wireless market concentration data that's indicated by the Herfindahl‑Hirschman index and provided by Facebook's inclusive Internet index.
And so far the findings are pretty consistent with our understandings of the economic motivations or incentive to adopt v6, higher levels of v6 capability are correlated with greater country level GDP per capita. That makes sense because v6 deployment is costly. Network operators in countries with greater wealth are more likely to risk money on it and examples would be fixed broadband networks in numerous countries. Also, higher country level v6 capability rates were correlated with lower levels of concentration in wireless and broadband markets. Now, this isn't a straightforward to interpret but it does seem intuitive. A market with more players, increases the likelihood that one of the firms will make a random decision to deploy v6. And a less concentrated market it also more likely to be permit the entry of new firms that don't have legacy infrastructure for whom the cost of an IPv6 deployment is not much different than the cost of a v4 deployment.
Now, the supply of v4 numbers plays an important role in this standards competition. The prospects of what engineers call the v4 runout was the main reason for developing v6 in the first place. But from an economic point of view, resources never just run out as their supply diminishes they become increasingly expensive and consumption patterns adapt to this introduced scarcity with greater conservation and new forms of subs Tuesday, we have seen that in a couple of ways, first with adaptation to tire supplies by using NAT, and the second being the secondary market for v4 number resources.
The incentives provided by the secondary market have let to the identification of millions of unused and under utilised addresses, v4 numbers by brokers and exchanges, and we see evidence of this adaptation in the v4 market data. The limited publicly available price data that's out there, with we see medium price increases from doubled in four years from around 8 in 2014 to 17 in 2018 and as one would expect the range of market prices has narrowed as market has matured and initial uncertainties about the values of v4 addresses have been resolved and these price trends were corroborated with interviews, with other address brokers that we talked to.
Now, this is data from the regional Internet registries, another important indicators is the number of transactions in which v4 blocks were transferred over the past few years. The data shows a major increase in 2014 to 15 as the markets developed with smaller growth in increments in '16, '17 and '18. However, the number of transactions is still growing.
Now, this shows the actual number of IPs transferred from 2015 to 2018, an average of about 54 million numbers were transferred each year and the levelling off of growth in the amount of addresses transferred per year corresponds to a steady rise in price possibly indicating that supplies are untapped numbers, IPv4 numbers are no longer expanding.
Now, who is buying those resources? The answer is cloud service providers. CSPs are rapidly growing market and interviews revealed that in terms of the numbers of customers and interconnected hosts, these networks represent some of the largest networks ever connected to the Internet. Now, while the market initially served only individual developers enterprises of them has grown over the last three or four years. CSPs are buying v4 numbers because enterprise networks are lacking in IPv6 deployment relative to public provider networks. One interviewee said only about 5 to 7% of enterprise network traffic is IPv6 enabled, an assertion that is corroborated by Department of Commerce estimates of industry deployment of v6. And enterprises are also slow upgrade their applications particularly if they have current revenues from them. This drives the demand for v4 addresses among the CSPs as each instance of service may require globally routed v4 address.
So, we next modelled how an operator's requirements for v4 addresses over a 15 year period was affected under two transition cases, dual stack and conversion. And in both cases we assumed conservative carrier grade NAT configuration values. Now, the model allows us to explore the way the operators need for v4 addresses is affected by various subscriber baseline value in growth rates as well as the base line capability in the operator's traffic matrix ratio. This provides IP sight into how v4 address scarcity might create incentives to deploy v6. Now, as we discussed the model, it's important to keep in mind that the reasonableness of the assumptions, regarding growth and network size and the shift in v6 ratio will vary obviously across 60,000 ASes, operator diversity means that any number of outcomes are possible and the decisions regarding v4 address acquisition and v6 deployment are taking place at the individual operator level and reflect diversity and conditions, incentives. So the best we can do right now is show some various scenarios and discuss their implications for the transition.
So, this figure shows models a large mobile network. Coupled with slower 5% annual shift in the traffic ratio towards v6. And the levels of v6 traffic are assumed to begin at about 45%, that mimics the US. The assumptions produce an inflexion point after about ten where the network's technical requirements for v4 addresses starts to decline. That happens when the traffic ratio is near 75% v6 and 25% v4.
Now, this is the same model, but it's sensitive to specific patterns of growth and so here we play with the growth patterns. Growth are change in either critical variable could be linear, accelerating, plateauing or following an S curve pattern. For example, if subscriber growth rate slows down and starts to plateau after seven or eight years and the v6 traffic ratio follows a more rapid S curve growth pattern, the operator's need for v4 numbers starts to decline after only four years and approaches zero after 15.
Now, a different set of assumptions for a different type of network, however, tells a very different story. This figure shows a small enterprise network that only has about ten thousand users and gross only 20% per year. Since the enterprise network is located in an older legacy software in a device environment the starting point for v6 traffic ratio is settle at 22% rather than 45 in the prior slides. So in this case, the implementation of v6 doesn't do anything for the operator. Its v4 numbers, requirements increase modestly whether or not it implements v6. So the additional expenses for deploying, it would not reduce its demand for v4 addresses by a worthwhile amount. And while v4 number requirements would also increase without v6, the small scale of the network would mean that additional numbers wouldn't be that difficult to come by.
At worse, the NAT compression ratios would be tweaked to stretch their address assignments further. So, in general, for smaller scale networks, the gains from v6 deployment don't appear to justify the expenses. For such networks, it makes sense to continue to extend v4. And up until the market value of v4 address is held by the network exceeds the enterprises cost of transitioning to 6, would there really be an economic incentive to change? Not all enterprises own their v4 addresses and in those cases the pressure to shift may come from an upstream service provider. We have seen a scenario where growing mobile operator may be in a position to sell surplus v4 addresses into the secondary market. But it's also possible to posit scenarios where large providers might want to hang on to their v4 numbers.
This represents assumptions about a major cloud service provider. It assumes 60 million subscribes to start with an S curve growth pattern in their numbers over the 15 years, and the v6 traffic ratio is assumed to growth rapidly at first but then plateau at about 70%. We have assumed a lower starting point given the current customers of CSPs enterprise networks. The model shows the v4 requirements for a CSP decline steadily for five years while their traffic ratio shift rate exceeds the subscriber growth rate. As the shift rate levels off, however, the requirements for v4 numbers start to increase again as the subscriber growth continues. Thus some operators might, despite declines in their immediate v4 requirements, continue to hold v4 resources in the face of uncertainty about future v6 traffic ratios and v4 pricing.
But such a scenario could occur given the numerous economies that have exhibiting plateauing growth patterns right now.
So, in conclusion, there is good news and bad news in our findings. The good news is that v6 Sun likely to become an orphan and network operators the need to grow particularly mobile networks where the software and hardware ecosystems are mostly converted and v6 deployment can make economic sense. It mitigates a constraint on growth and can provide a path out of operational complexities and cost of large scale NAT. Our modelling of dual stack and conversion methods also shows for fast growing networks a v6 deployment can contain but not immediately eliminate the requirement for v4 numbers. A key variable is how can quickly the traffic rare shifts to IPv6. The rising price for v4 supplies an additional stimulus to deploy IPv6. The bad news is the need for deployers to maintain backwards compatibility eliminates a lot of network effects which would relieve the pressure to convert. Many enterprise networks don't need to grow much or maybe still lodge in this lower moving hardware and software ecosystem tied to v4. Another issue that emerged from our modelling is that once the IPv6 only traffic ratio wrong v6 deployers reaches a certain level, their v4 address requirements start to decline. These operators can therefore release these resources, these v4 resources into the market and that might alleviate shortages and facilitate continued low level growth for legacy v4 networks.
And in some, it's very difficult to imagine a scenario that takes less than 20 years to converge on v6, and as a result, I think the community would be well served to kind of think about any of the implications of a mixed world. Thanks.
(Applause)
CHAIR: Thanks. We have a few minutes.
AUDIENCE SPEAKER: Jen Linkova. Sorry if I missed it, I was told that the cost of the deployment of v6 is constant of our time and not decreasing for an operator.
MILTON MUELLER: What we said was the cost is a high fixed cost at the beginning.
JEN LINKOVA: I do not think it's fixed because it's actually decreasing over time because getting v6 deployed now in this particular network and getting v6 five years ago is completely different amount of effort and money. So, I would expect from my experience as a network engineer is that it will be easier and easier to deploy v6 over time.
MILTON MUELLER: But again that sort of supports the last mover advantage idea which may or may not be true. But if you can wait out and things will become cheaper and cheaper, then you have a tendency to put off the decision, right.
JEN LINKOVA: Well, probably, but I do not think it will get to plateau eventually right, basically growing close to zero so at some point it would not make any sense to wait.
AUDIENCE SPEAKER: Benedikt Stockebrand. Speaking for myself. I think you missed the point that to deploy IPv4 these days you largely need IPv6 anyway, both for DS‑Lite or ‑‑ or XLat. So, even if you want to do IPv4, IPv6 basically comes for free because otherwise you could deapply ‑‑ and that makes basically an IPv6 come at no cost at all.
MILTON MUELLER: Again, so, when we talk about the IPv4 holdouts, we're not talking about new networks, mostly, we're talking about existing networks that are already there that are on v4. And the point is, why would they ‑‑ what is the incentive to change? Our basic point was if they want to grow a lot, they have a very strong incentive to change. If they just want to sit there and continue what they have been doing, which many smaller enterprise networks do, what's to make them deploy v6?
AUDIENCE SPEAKER: But that doesn't happen. First off, bandwidth keeps increasing even if you say we stay with your customer base you have to increase bandwidth, what used to be considered broadband ten years ago is a pain in the ass today. And we also have the number of device or the number of customers staying online round the clock is still increasing in some areas. So, ten years ago people had the DSL modem, or whatever, and they'd switch off together with the computer. These days they have a Smart TV and what else and they need a permanent address, so even then you need more addresses for the same customer base and there is no such thing as we just keep things running the way they are.
MILTON MUELLER: We'll discuss it off line.
AUDIENCE SPEAKER: I think you also ignored the fact that the increasing cost of maintaining v4 is going to drive the eyeball ISPs to start applying a surcharge for it eventually. They are going to have to. It's economic reality. And I think once that starts happening, you start seeing more of an incentive on the targets of those eyeballs to deploy v6 or lose customers.
BRENDAN KURBIS: We looked a little bit in the case of CSPs, cloud service providers, to see if there was pricing differential anywhere and we didn't find any public evidence. That doesn't mean it's not happening. That's a good point.
CHAIR: Thanks.
AUDIENCE SPEAKER: Geoff Huston. I have a technical question about your modelling. I appreciate exactly where you are going with this. But what is going through my head is that the effect of happy eyeballs in applications in dual stack networks produces a very rapid bias for v6. The triggering factor is actually in the CDN, in the contents sides where the data is flowing from, as they adopt 6, the pressure, if you will, on the v4 NAT space for the ISP diminishes greatly and that creates a pool of unused v4 addresses not because of the ISP but it's actually because of the availability of 6 and the interaction with with happy eyeballs. So, as I said the question is, did you factor in that innate preference to use 6 and not use 4, and the pressure on the CGNATs in your models? Because I didn't see that and that was sort of where I'm going, there is something I just don't quite get.
MILTON MUELLER: I think we did factor that in in terms of the ratio, the traffic ratio.
GEOFF HUSTON: But that should decline over time as the service providers go dual stack, the applications on the client side prefer 6, the pressure on the CGNAT reduces because the first interaction is always on 6 so you never pull out a 4 address to complete a connection. That was the bit that kind of ‑‑ that's what happened in a number of other environments where you had two competing technologies you preferred A, B died, not through any sort of deliberate action but through the preference for the first dropped the demand for the second. I'd just encourage to you look at that in your model.
AUDIENCE SPEAKER: Jim Reid. I have got a couple of points. I wonder if you have looked at one or two of the IPv6 deployments success stories? I'm thinking particularly of the work that was done in Slovenia with the v6 project. They did a great job with getting IPv6 deployed and it wasn't just economic factors there were other external alternatives that were brought to bear there to make that successful story. I wonder if you have considered that. My second point is, do you consider IPv6 deployment to be an example of market failure? And if so, what should be done about that?
BRENDAN KURBIS: To answer your first question we did look in the papers at T‑Mobiles and kind of understanding their incentives and we talked to them about that deployment. So, I think that answers your first question.
The second question was market failures ‑‑
MILTON MUELLER: So, that's kind of a blanket term that is not very help: What market has failed? The entire ISP market in if so what global authority it going to order everybody to do something about that? I think what, what we have shown is that in many ways the market is working very well. That is it is shifting ‑‑ it is increasing the price of v4 addresses, shifting them to the most highly value uses and creating incentives for growing networks to actually deploy v6, and when there is no economic incentive, again the market is working well at telling certain operators, you know, don't bother.
AUDIENCE SPEAKER: Jordi Palet. I have read the article that you had published in APNIC and also the complete article that is linked to RIPE with the study. While I agree with some your considerations, I really think that there are many aspects that are not really, in my opinion of course, the right way to look into this. And just to comment two things.
You look into two different types of networks, but I don't think you are really considering the picture of IPv6 only one access and dual stack in the lines of the users, that's one aspect. The other one is you are not considering that while you deploy Carrier‑Grade NAT to avoid moving to IPv6, you also need to keep buying v4 addresses because some service providers, like Sony PlayStation network, keep blocking your IPv4 blocks that you have assigned to the Carrier‑Grade NAT. So the economy on the consideration ‑‑ the economical considerations on that change radically up to the point that it doesn't make sense to buy Carrier‑Grade NAT. It's cheaper to just keep buying addresses.
BRENDAN KURBIS: That's fair point. While we looked at the pricing data and thought about the impacts it has on incentives of operators, we didn't actually incorporate pricing into the model. So fair point.
CHAIR: Every question feels complete stenographer's screen and more, so thank you guys. And before announce our next two speakers. We can talk about v6 transition for a whole day right, I mean ‑‑ and so I just want to know that till this Plenary runs out you can nominate yourself or somebody you like to the PC and also, please don't forget to go to the main site. And rate talks you like or not like and here is Andrei Surrey and Petr Spacek.
PETER SPACEK: Hello. Welcome to DNS flag day talk which is this time different because I will present the first part about the theory and the past flag day in 2019 and Ondrej Suri will present the next part which is coming in next year or maybe a bit later.
So first the motivation, what's the problem we're trying to solve here? DNS is just so complex that it's just impossible to implement correctly at first go which means that people doing implementations make mistakes which is natural, and sometimes the other players in the DNS market decided okay, we need a work around for sometime to be interoperatable with the other implementation which made a mistake. Let's say in theory nice because it allows everything to work. So users are happy. Everyone is happy, right.
Not really. The problem with work‑arounds is that they will get relied on, eventually users and the implementers who make mistakes decide okay it worked for a month or a year so why we should fix this bug? It was in there, it still works, nobody cares. Not so much. In fact, we have seen invalid problems which were caused by work‑arounds introduced in 1999 and they were causing new problems in 2018. So, the fundamental problem with work‑arounds is that they interact with everything in the ecosystem including the standard protocol on the one hand and also other work‑arounds on the other hand and nobody knows what's going to happen, including the DNS software developers.
That's the technical problem. But also, there is an let's say economical problem with work‑arounds because if it works, nobody has motivation to fix it. Which effectively means that the price for development of the work around and also for maintenance and diagnostics of the work around and the operations is paid by the good players which try to follow the protocol and at one point did the mistake and decide to have a work around for interoperability reasons.
So that was the theory of the problem. Now, what is DNS flag day? In short, it's a trash pick‑up day. We can say that different software vendors and big operators gather and decide okay, at this particular day we are going to stop supporting this broken implementation and they have to fix their stuff, otherwise we will not talk to them any more. So, this is the way how involved parties basically fix the economical problem because now the cost of the maintaining work around is shifted back to the non‑compliant parties, the implementations who don't follow the protocol as they should.
Okay, that was the theory. Now let's have a look at what happened in February 2019.
We had the birth collaboration of software vendors and also public DNSes over operators, which worked closely together and I have to put emphasis on this ‑‑ together ‑‑ because some people argue that all the logos on this slide are basically competitors competing with each other. But in this case, we were operating closely together and managed to fix stuff. So, very quickly. So we can get to the new information.
It was first DNS flag day in the history, except the flag day zero when they actually started to use DNS, which of course is something cost a lot of fear, because it was for the first time everyone was somehow worried there was some misunderstanding normal Internet users that scream warnings in newspapers and didn't know what was going to happen, oh, Facebook is going to stop working, and stuff like that. So of course, the reachout campaign had its problems but we managed to get the information across to the operators. And of course, there were, there was some signs behind we were doing measurements to see what's going to happen.
So, for example, this table presents a measurement numbers which were taken three months before the February 2019. And estimated breakage rate on this data sample was 5.68%, which seems a lot right, it's 5%, oh, what do we do? Well, we have to take into account that we are talking about the real Internet, the Internet with capital E, which effectively means that something is always broken on the Internet. And in our measurement, we found out that more than 15% of domains was only broken for reasons totally unrelated to the DNS flag day. So almost 15% were broken and the flag day would theoretically add five more on top of it. They did happen. Well, for understanding what happened afterwards, we have to look into breakdown, who was responsible for the breakage in the measurement. And if you look at this table, you can see that two players, which are in the yellow, because some players have more than one domain user name servers, two players in the dataset were responsible for 66% of the breakage worldwide, and top eight players were DNS operators were responsible for 85% of breakage. Which effectively means that we needed to convince just a couple of big operators to get the stuff fixed.
So that was the expectation three months before the event. What actually happened? Actually, not much. It did work, the DNS flag day worked very well. Thanks to the cooperative community, not only the vendors and the resolver operators but most importantly the corporation on the authoritative side, so the web hosters and the DNS hosters, the vast majority of domains was fixed before the announced date, and there is still some breakage which can be seen and measured even today, but it turns out that these domains are just not interesting, because it's just park domain without any traffic, no users using it, so just nobody cares. So of course the numbers didn't add up to zero but actually, there was no reported breakage which we can measure or, you know ‑‑ all the support lines were silent because we had only anecdotal evidence that like four domains were reported broken in one week and stuff like that.
So, let me say a big thank you for everyone who participated and I suppose there is a lot of operators in this room worked on that as well. Thank you very much.
And to conclude the part of the talk about 2019, we can say that it worked, we have proven that we can, together, improve and fix Internet at scale, which is something unheard of because a lot of people in DNS community was very sceptical, that's not going to work, and you will lose users and stuff like that. No. Actually when we cooperate we can improve things. And this time, we will try to be much clearer when it comes to the communication, so that's the reason why we are on stage today, because we want to announce the next steps for the next year as soon as the information becomes available, which is basically a week ago. So, at this point, I am giving the floor to Ondrej Sury who will be talking about the next types.
ONDREJ SURY: Hi. This is Ondrej from ISC. When we say DNS flag day 2020 it means like we haven't set a date yet, but this is like preliminary, what we are going to do. So, first of all, because the flag day 2009 flag day was such a success, we decided to keep you on your toes and do this like every year or every two years. And the next problem, we ran a quick poll during DNS org meeting in Bangkok two weeks ago and the biggest problem we face now in DNS is the IP fragmentation because IP fragmentation doesn't work. Is there anybody who thinks that IP fragmentation is a good thing and it works? I saw like one shy hand...
So, basically there are some resources about the IP fragmentation, and, even if it works, it opens an area like some kind of attack on the DNS where you put some weird stuff into the second fragment. There was a search by Vergey Farasen [phonetic] presented at the DNS org and UDP is platable to large DNS messages. When it's fragmented it's prone to be attacked and it's costing operational issues around the globe.
So, the goal of the next flag day is just eliminate the operational issues caused by the fragmented IP and also improve the security of the DNS so you can rely more on like the security of the DNS messages because they only receive is not fragmented.
For ‑‑ so, what does it really mean? So for large DNS answers, we will have to switch to TCP, and it will mean no change for, like, small answers than the default size we pick. So, most the DNS population won't be affected anyway. There is existing standard DNS over TCP in RFC 7766, which says basically TCP in DNS is a thing, you know, even though it was a thing before, but it's like saying you must do TCP. And picking a number for the default EDNS buffer size, like how big the DNS message could be, is around somewhere under the like minimum fragment size for IPv6. So, what does it mean if you are not non‑compliant? For the other server, it means you don't listen on TCP. For alternative also it doesn't mean you don't honour the requested EDNS buffer size so you send larger responses, gets fragmented, and on the recursive side as a client it means that you ignore the TCP bit which means ask for TCP to get a full answer.
So, advantages of the TCP. It like basically makes the IP fragmentation issue a non‑issue. It's hard to spoof. So, it makes it more suitable for low high services like domain validation for certification alters, like Let's Encrypt or the DNSSEC boot strapping because if you make multiple TCP connections from multiple places, you are more sure that if all the answers match, that you haven't been attacked, at least not ‑‑ well, you can be attacked if the attacker is close to the source, but, you know, how often does that happen? And it's also a preparation for the DNS over TLS because that happens over TCP.
So for the operations. There is the RFC which basically says due to DNS over TCP. So, the message here is not only for DNS people, it's also for the networking people looking at their laptops typing and not listening, so if you are networking person like managing the firewall for the company, please listen now and allow the port 53 on TCP 2 not on a UDP. And we will ‑‑ like the defaults for the software will be a number like this, 1220, to avoid fragmentation and the defaults in the software will reflect this. So basically the alternative DNS servers must not send oversized answers, and basically what it means, that if you have a recent DNS server from the vendors, it means you don't need to make any change because it's already there.
It's mostly the ‑‑ for the people who think DNS is easy. Who thinks DNS is easy? Stop trolling. But, yeah, there are some like non‑compliant ‑‑ well, we saw this with the previous flag day, the most problem came from the custom implementations of the DNS because DNS is vast standard that is covered by many RFCs and it's easy to miss one. I don't blame those people.
So, on the resolver side, again, you must honour the RFC 7766 which says DNS over TCP is a thing. Answer on TCP port 53 again, firewall people please listen. And on a, like, making client connections to servers, the maximum EDNS and added default EDNS buffer size will be 2020 or something like this. And basically the resolvers must fall back from UDP to TCP if the TC flag is set. Again, if you are running, like, the software from us, cz.nic, or the others, you don't have to do anything, basically, because you are already compliant.
So, there are some preliminary measurements have been done, and there are around 7% domains on servers that don't accept TCP. But, you know, not domains are equal, some are just like parked or not used at all and the breakage is very ‑‑ again, 7%, so do you remember those names? Please don't write your own DNS server. Been there, done that. Regret that still.
So, again, there is a website, dnsflagday.net. We will have announced UI for that, so even ordinary people who are not used to using dig or kdig or drill can like use different common lines, so they can test their own domains if they are compliant. These are just like examples how you can do that with dig.
The test result configuration visual will become default for the new released versions of BIND, not resolver, power DNS recursive resolver, inbound are like this. You can test it before the date. So what's missing right now? So, we missed the exit date. So basically there are two groups, the one says do it as fast as possible, so February 2020. The other group says, make it more slower, so, that would be 2021. I'm ‑‑ I think because the breakage is so concentrated to these providers, and if you can fix those we can probably do the February 2020, because that's nine months from now, what we have to settle on the date so we can start communication and thanks for Internet Society for offering to use their distribution channels to reach the members. We also need to settle on the EDNS buffer size. So 1220 is the safest value. But there are reports from the power DNS users that 1232 is also of value because they are running it for years now, and without any problems. The 1280 is probably on the edge that something might break because of the tunnels and MTU. And this will just go to the defaults for a newly released software for us of the vendors. For the like big DNS operators, they can switch it at any time from that now. But basically we are saying from February 2020, it's your problem, not our problem. So, fix your software. We will not fix our software because we are not broken. You are. So, basically, this is the whole message from the DNS flag day, is you are broken, we are not, and this is the date where we can say that and not face the wrath of the users.
But the principle is DNS over TCP must work. So it's also room for improvement in the DNS software world because if we move more traffic to the TCP we should probably also focus on improving TCP performance in all the software.
Okay. So how to get in touch with us. There is the website I already told you that ‑ dnsflagday.net. We have a Twitter. We have an announcement list called DNS Announce, that should be low volume, and if you are an operator of a DNS or operator of a network that has some DNS servers inside, please join, we will not spam you. This is like moderated announcements only list. If you have any questions, there is a DNS operation list at the DNS org and you can also talk to us, there are many DNS folks involved in this like me, Petter, Benno from NLnet Labs and probably others here, Oliver from CloudFlare, they were also participating in the last flag day. You can talk to us and now it's time for the questions.
CHAIR: Thanks. We have at least five minutes.
AUDIENCE SPEAKER: Hi. Jelte Jansen, SIDN Labs. One remark and one question. I overheard a discussion about this yesterday and somebody shouted, do not call it a flag day. Should we call this something different, like 'Patch Tuesday'?
ONDREJ SURY: Heard this complain but last time I checked there was no loaded zombies in my computer, so it's a marketing thing, so it's like be sure it works, and it's better to exaggerate a little bit, like to move the glaciers into ‑‑
AUDIENCE SPEAKER: It's actually moves to my actual question because I suspect this this one might be more controversial than the last one as this one will have lasting operational effect. You mentioned you were measuring things. Have you already measured how many queries on average would cross the line of 1220?
ONDREJ SURY: No. I don't know the numbers, no. Petter may have.
PETER SPACEK: I can answer that. Okay, I will not name them, but one of the very big resolver operators said it's less than 1% of queries.
AUDIENCE SPEAKER: Peter Koch, DENIC. So, first of all, I'd like to applaud your efforts for this year's flag day. I like that very much. Seeing your plans for 2020, I start to get worried for a variety of reasons. One is, for this day's flag day there was kind of an effort to get your own software out version and get rid of complexity in your software and vendors doing that might be a useful thing. As Jelte already said, this is probably crossing a boundary, you seem to imply decisions on other parties and I am wondering what the governance model of this whole thing is, like where are these decisions taken? I'm also worried about some of the papers in your references that have been highly controversial and have not been able to be reproduced in the data and the claims they make and so the least you want to do probably is want to get your ‑‑ I am inclined to say propaganda, but get the information straightened and more explanation out why exactly there is risk or not a risk. Not all of these papers claiming risks of fragmentation have been able to be reproduced actually. And then, you not want to over do the flag day thing each and every year. That's putting a big stress on people and people that get more and more distant from the core DNS business, so please take that in mind. And also, Geoff said, we are not really sure what we are going to do here but let's roll and that's what I saw on that slide that didn't really know what the EDNS size would be, so be careful.
CHAIR: I see Shane there, you are from the remote? I'll let you ‑‑
ONDREJ SURY: We are being helpful and it is having effects on the DNS software too, because, for example, BIND makes very hard to make sure that it picks the right EDNS buffer size, so there is complexity involved. The other more newer software might not do that but there is a complexity in the DNS software related to the EDNS buffer size and picking the right one. So if you pick one administratively and I think the general agreement is that the IP fragmentation is harmful, so, we need to like address that. And I think this is the right way how to address that and we are being careful, we are doing the measurements.
AUDIENCE SPEAKER: Remote question. Marco Schmidt. I have a remote question from Turma. He asked would the tool on the dnsflagday.net website be updated so that everyone could check their domains for the TCP support?
ONDREJ SURY: Sure, sure. That's what I was talking about, that it will have a web UI for that and we'll test boast like a resolver, if your resolver is able to communicate using TCP, and like, the alternative for your zone so there will be two parts of the test. That's the plan. But we cook up what we are going to do like two weeks ago, so... so, we need more time to like prepare the test.
CHAIR: Thanks. And a mobile app, please.
AUDIENCE SPEAKER: Shane Kerr, Oracle Dyn. It seems to me like there is ‑‑ first of all, I like the idea of the vendors getting together and trying to improve the ecosystem, but does the fact that you are doing this outside of the context of any of the existing DNS organisations mean you think that they are ineffective or they failed? Like is there any IETF activity going on around this to make a standard saying this is the way you should do things?
ONDREJ SURY: That is the RFC 7766. There is RFC.
SHANE KERR: It doesn't say to set your EDNS buffer sizes to specific values, does it?
ONDREJ SURY: But if says like the DNS over TCP is mandatory, there is ID about the IP fragmentation. So yes ‑‑
SHANE KERR: But it allows to you set buffer size to the one that you have chosen. But for a new implementer or someone coming in from another thing it's not heart of this evil secret cabal Open Source companies, how do they know how to get started other than reading through decades of mailing lists and presentation archives and things like that.
ONDREJ SURY: I am happy to write the PCP document.
CHAIR: You can write a draft and recommend ‑‑
SHANE KERR: Okay. I mean, the camel has not got enough load lately so I think we need more.
AUDIENCE SPEAKER: Randy Bush, IIJ. I think I just heard you set up another little boys' club that's not part of our little boys' club. So, we got DNS, which is always been solid and reliable, DNSSEC now over TCP and path MTU, and what could possibly go wrong? Is there a simple instruction sheet over how high my users can hide from this crap until six months after its settled out?
CHAIR: Nine months.
RANDY BUSH: Okay, nine months, whatever. Seriously, is there some way to opt out from this until the fallout settles?
ONDREJ SURY: I don't know what you mean by that. Petter ‑‑
PETER SPACEK: I would like to say that if you are running up‑to‑date software which means like released in last ten years, it will work, because it implemented correctly in the software. It means you are not doing something stupid in the firewall. So, I mean ‑‑
RANDY BUSH: People have been telling me it will work for 30 years, and every year it breaks. I'm trying to ask is there a way, something I can do for me and my users to protect them from this year's breakage until you fix it?
PETER SPACEK: Follow the standards.
RANDY BUSH: Geoff said yes.
GEOFF HUSTON: If you want to get out of this quickly and duck, don't do DNSSEC and keep all the answers low. The real issue right now that we face with TCP is not vendor code, it's firewalls. 17% of resolvers sit behind firewalls that don't admit TCP sessions for DNS. TCP over 53 is blocked for 17% of resolvers. That affects 6% of the Internet's eyeballs. Most of them use another resolver, get out of the hole. 2% don't. So that flag day is actually about those 2%. We know where they are. Whether we can tell them to stop it or turn off DNSSEC or whatever, I don't know, but that's the basic stats of the problem and the problem is not DNS, the problem is, as usual, is the firewall middleware.
CHAIR: Good point, Geoff. One last question, then we have to move on.
RANDY BUSH: Just in case anybody here thought 2% was a low number, multiply that by a few million users.
AUDIENCE SPEAKER: Brian Dixon, GoDaddy. Just following up on Peter Koch's question. Is it the, ‑‑ is it an issue that you would like to see validation of the previous papers on fragmentation being a problem and demonstrated as being a very high level or very high degree problem? If so, I have done research on that and can publish that. Just let me know if that's something ‑‑
ONDREJ SURY: Yes, please, if you have research on that, yes, please. But I don't know with Peter if you saw the presentation from Frigi Verisson [phonetic] and he presented some new facts about the fragmentation so it's worth, like, reading the presentation. I am sure about it's worth listening to. But at least read the presentation.
PETER KOCH: I read this one.
CHAIR: Thank you very much. So, with that, Alex Band in his new role, tell us about the RPKI, what can possibly go wrong after DNS?
ALEX BAND: Hello. My name is Alex Band. I work for NLnet Labs and it still feels weird saying that after one‑and‑a‑half years working there. We actually make Open Source software since 1999, and I didn't even know that when I started. It's kind of incredible so we to research and development, make free Open Source software for core Internet infrastructure. So, you probably know us from this. I think NSD is one of the first things we did as an organisation to create an alternative authoritative DNS server because, back in the day, BIND was the only thing out there. And then later on we made recursive resolver Unbound. We do lots of other DNS things too that I didn't know about. I was, wow, that's a lot. That's really a lot of DNS, and this happened and we had this chocolate invested KSK rollover session that was hosted at our office and we know you really really love DNS. But the the thing was that the mission of NLnet Labs was always to do something for core Internet infrastructure. So that also includes like IPv6 and things like BGP and we sat at the lunch table and we were like, what can we do? What can we do in terms of BGP and do something useful in that area? We make some tools and do some research. So, I was like I know, I know, can we do RPKI? I have some ideas there. And it was like, oh, God, yeah, given my history, it would be obvious that I would say that. I was like really, routing security there is like a hot thing now because even at the RIPE meeting we're doing that.
Now, I'm not going to go into this particular thing, who is doing that. But if you look at the news posts in the last couple of days, I think axis role, ISP Netherlands had an announcement today, the Moscow Internet exchange sent out an announcement yesterday, the adoption of RPKI is actually kind of going. If you want to know more about this and see Job Snijders wearing a tie, I highly recommend that you attend the Routing Working Group where he will tell you all about this. This is not really what my talk is about.
Because the guys were like, wait, hang on, so, Job Snijders is wearing a tie. How did we get there? I was like, okay, there is this thing, resource public key infrastructure. It's been a standard for a while and it's aimed at making routing more secure. And right now, it's aiming to provide origin validation with the later goal to provide path validation as well. And the idea is that every resource holder gets an announced certificate that they can use to issue statements about their routing intent, and these statements are called route origin attestations and then other operators they can download all of these statements and base routing decisions on it. You see a BGP announcement, you compare it to a ROA and it goes either yes, that is valid because it is covered by a ROA, or you go, I see this prefix, but however, it doesn't seem to be coming from an authorised autonomous system or it is more specific than is allowed by the ROA, or it is not found. So, I see BGP announcement, there is no ROA at all for it, so therefore, I do nothing really.
So those are your three options.
And it's all aimed at answering the question, is this BGP announcement that I see authorised by the legitimate holder of the address space? That's the whole idea about that. So the guys at the lunch table were, hang on a minute, we already had something for that, I distinctly remember doing this. Right. That's fine, right. We're done, why do we need this? I was like there is a problem because this whole IRR thing, there is more than one and they are not really all authoritative and there are some there if you just pay some money, or you could just register whatever is still available. So this whole IRR system is not watertight enough for today's Internet. It was set up with the best intentions but it doesn't really scale.
So, there we go. We take the best bits from a route object and we just place it in this route origin attestation, which in turn is signed, we make a signed statement, by an X 509 certificate that contains all the resources that NLnet Labs holds, which in turn is signed by a route certificate that the resource issuer holds and that is in this case the RIPE NCC, for us.
Now, this is replicated all over the world. So, all of the RIRs can do the same and you end up with this structure that follows the same hierarchy as the way that resources are being issued all over the world. Now, in some cases, you have national Internet registries, so, for example, LACNIC has them, such as Nick PR and Nick MX, and you have about 17, 18 of them in the APNIC region, so for Taiwan, China, Japan, they all have a national Internet registry that takes care of resource allocations for their constituents. Now in this scenario everybody would run their own certificate authority, so they would have a certificate, they would create these route origin attestations for it which they would publish within their own infrastructure, that sort of how the stands were written when the RIRs were like, you know, what, if that is the way we're going to deploy this, then it might actually take a really really long time for this to take off as a technology. Why don't we provide a hosted interface and this is exactly what the five RIRs have been doing since 2011 so you can log into your respective portal. Request a certificate which is then hosted on the infrastructure of the RIR, and then a ROA, a signed statement is created and published within the RIR infrastructure. You go great, because I don't have to worry about a thing, I don't have to install my own software, I don't have to maintain anything, I don't have to worry about a thing. That's great.
And truth be told, it's done wonders for adoption. If you look at the dataset that is currently out there, giving people something to work with in terms of useful information to base routing decisions on and if you look at the quality of that data, the host interfaces have been great in order to sort of boot strap the entire problem, because you are ultimately going to end up in a situation where people are like, yeah, could I start looking into RPKI but there is not really any data out there, so why do I even bother? So, the first couple of years, we needed to build up and prove that the RIRs were capable of running certificate authorities properly, and they are actually being a dataset out there which was high‑quality enough to make it worth the effort. And looking at all of the traction that we're seeing right now, it looks like we're getting to that point. Right. And it's really easy to get started in use, you just have to log in with your credentials, in most cases click a button and you're done. You make a couple of commands or you click a couple of check boxes and you put publish, you're done. You don't have any cost of any hardware. You don't worry about the key storage, no worries about the publication of all of it. And you don't really have any worries about the uptime of the thing. Well, with that, I mean, you can worry about whether you can log into that portal, but that is not your problem to solve if it's a problem. Right.
If you are in the RIPE region you are probably familiar with this one. And this one is really nice, because it's ‑‑ yeah, it's really nice because you can create ROAs and it will give you suggestions, it will show you look with your resources, these are all of the BGP announcements that I see are being done. And the only thing you have to do is say that you authorised them, yes or no, and it will create all of the Crypto in the background and publish it. And if something goes wrong, it will send you an alert, this is all pretty good. But this is just the RIPE NCC implementation, and there is four others and they are all really really different. They are really different. In terms of user interface, but also in terms of functionality. So, in the RIPE NCC interface that I just showed you, if you create a statement, it will issue a ROA and it will have a certain validity, but it will automatically review that ROA for as long as you have that happen entry in the portal but that's not the case for all the implementations, and there is differences in user access. For example this RIPE NCC interface, if you have credentials to log in to the portal and you have the the so‑called resources role that allows you to set up RPKI but also sell all of your address space, those are the same permissions. So that kind of granularity isn't there, but you have to fact of authentication, yes or no, that is also different in all of the different interfaces and for example the publication inter fall for ROA is also really different. If you do this in the RIPE NCC interface within a couple of minutes, the ROA that you have created will be available on the server. Available to be downloaded. If you do this in some other interface, it could actually take up to six hours. So that's really different too.
And you have different support levels. So, some organisations, they provide 24/7 support and others, well don't. So, if there is a problem, it's not certain that you can actually reach somebody. So, if you go, this RPKI thing is great, but I'm not quite sure that I want to be dependent on all of these factors, and I want to be in control of my own destiny, then maybe I'd prefer to run my own RPKI instance on my own servers and do the publication myself and have a little bit more control.
You can do that. You can install some software. You can generate your own certificate, and do a little song and dance with the parent RIR to get that signed and establish communication with the parent. And you can take care of your own publication. But the good thing is that the way that RPKI is designed, that running the CA, so doing the signing bit and the publication bit is actually separated. So, as an organisation, you could say I within a to run my own certificate authority, but I don't really want to take care of publication of it. So I want to publish with a third‑party. It is an option for you. So, it's good to keep in mind that those two things can be separated out.
So as a result, you can be completely operationally independent from the parent. And you could integrate this a little bit better, because, again the hosted systems, some of them offer an API, others don't and there is also others that offer an API but also allows to you create things but not edit or delete so you would still have to log into the web interface and do some clicking in order to remove something that you created earlier using the API. So that's not very convenient either.
So, running it on your own systems would actually give you the ability to integrate this with your provisioning systems, with your iPalm etc., etc. And you are in total control of the publication of your ROAs. If you create something, you publish it, you know when it is there, you know when it is available. If you are trying to fix a problem in the middle of the night and you are clicking around in a hosted web interface you may have to wait up to six hours for it to appear and then hope it will propagate, then you may not want to depend on that.
And, and this is also an important thing for example for national Internet registries, that they can offer RPKI as a service to their constituents. So they can run this and then they can offer such a portal where people can log in, have user management, etc., etc. And allow them to issue she is things which are not currently possible. So, for example, in the case of the Brazilian Internet registry they would have to tell all their customers to log into the LACNIC portal instead of being capable of doing it themselves. This was actually a trigger. So you go like, okay, I may want to do this. This sounds like a good idea. What's the software out there? Well, there is one. It's the original implementation by Dragon Research Labs, because RPKI D, these are your choices, I am making my argument really well at the lunch table because what we need here is some diversity, we could create a toolset and it will also give us the opportunity to do research which we love, and that will be a real live project and it will diversify the efforts that we do within NLnet Labs ourselves and fulfil that longstanding wish of doing something in both DNS and BGP. They were like, yeah, hang on a minute, we're like a small not for profit, right. How are you going to get this paid for? What are you going to do? So, I listened to a lot of conferences, did a lot of talking and e‑mailing, and this is now the result. So this is also a bit of acknowledgment and a bit of accountability because the fact of NLnet Labs is now capable of developing a software toolset is in part thanks to you. Because your membership contribution to the RIPE NCC enables the existence of the RIPE Community Projects Fund. And they granted us a sum of money allowing us to have some developers working on this until completion.
The other thing was a conversation that we had at the IETF with Nick Behar, who said we were actually right in the process of having a couple of engineers develop such a CA so we could start offering this for our Brazilian members, but now that we hear that you're planning to build this, then we'd actually think it's a much better investment to help fund you and you make an Open Source tool that is available for everybody. That would be a much better investment and better for the good of the Internet.
And then came in and said if you want to set up a testing platform, then we make sure you have virtual routers and Digital Ocean came in if you need some infrastructure, if you want to set up some servers, some VMs to run all of the tests on or in order to test publishing something, then that is a possibility too and Mozilla was like, we also think this is a good idea and we're willing to fund you, which, as a result, now means that we can have two staff working on this for the next one‑and‑a‑half to two years, making something that we have named Krill. The genesis of the name, I am going to refer you to Tim. Tim, raise your hand, so people know how to find you. The only thing I'm going to say about this is that Tim is a biologist, you can explain the rest. It's fine.
And then Tim showed me this. This is like the stage where we're at. So there is something living on his laptop, we developed this out in the open, all of it is available in GitHub, so you can follow along with the transparent development, a public route map and everything so people can follow along with what we're doing. This is sort of where we are.
So we have an event sourcing architecture, we have an API command line interface, a user interface, you can create RPKI objects, and we got the whole publication thing. We have that as a separate component. So, that actually means that if a CDN would say, as the good of the Internet servers, I want to offer an RPKI publication server, so that you as an organisation can say, I have a CA, I would like to publish with this third‑party, then that is actually an option that CDNs, for example, could offer as a good of the Internet thing.
We have an embedded trust anchor now for testing, and now we have to sort of put all of the nuts and bolts together and do like real testing of getting this to run under a real parent. So, being able to connect this to the CA that each RIR runs and see if we can make this work properly so do all of the interoperability testing, etc., to have a goal to have a product grade service running towards the end of the year, somewhere between the autumn and Christmas. Then, of course, we will continue to develop this during the course of 2020, where we will offer like a user interface that you're used to with sort of the RIPE NCC as a starting point. So, it will give you suggestions, it will alter new ROAs. If people want to have HSM support, so we can build that in etc., etc., but then we are actually getting to the point where we have a production grade toolset that people can use and will provide us feedback on and that will further steer the road map that we have.
And then we were like, while we're at it, let's also make some relying party software. So, that is the software that you use to download all of the certificates and ROAs, validate them and feed them to your routers, and that is called routinator, which is maintained by another developer, and he is our resident space nut, we don't have a marketing department. So this is what you get. You give your developers the freedom to choose whatever name they like and I did get to make them a logo, that was a lot of fun, so there we go.
And I want to mention this because, this was also a really interesting discussion because we are like a classic C shop, we have done everything in C forever, and in a lot of cases where we developed something, it was ‑‑ it started out as a collaboration, so you work with some people from other organisations and you don't really get to have complete control over these kinds of choices, you know, you are just going to do it in C because that is what everybody knows, but for this one it's like 2018, we're going to start a brand new project, which language we do it in is completely our choice, what will we use? Will we still use C? 2018? Or would we want to try something else? And we actually did a shoot out between go and rust, and we ended up choosing rust, it was a little bit closer to our heart and a little bit closer to what we were used to, and after about a year, it turned out to be a really, really excellent choice that we really like. And it's been serving us really well and with the development of the rust language itself, we are seeing great things ahead. So, that's like quite a leap for us to go from C only for about 19 years, then all of a sudden pick up a new language like that.
So, this actually sounds kind of compelling having software like that. Maybe I want to run that. But what kind of hardware and setup are you going to need to run this. It's actually not that complicated. It's less complicated than you think. It's hardware is actually fine for the signing operations, you are like yeah, do I now have to buy like hugely expensive HSMs? Actually that's not really needed either. So you're not forced to use an HSM if you are careful about your security, you can also store the keys on disk. That is fine. If you later want to opt for using an HSM anyway, it is a possibility you have, but not really an absolute requirement. And these are also, for example, some of the lessons that we learnt ourselves when Tim and me still worked at the RIPE NCC, and to name another example, I didn't know LACNIC at this time uses HSMs either. So it's not really a thing. And then then for the publication server you have to think about this a little bit harder, because for the publication server, that is something publicly facing which you really have to think about. And what if is breaks? Well this is not going to be like a DNSSEC horror story. If you mess it up ultimately everything is going to fall back to unknown. So always keep that in mind. If it breaks, it's not really a problem.
So, to summarise:
If you want to delegate ROA management to different business units or you want to provide it as a service to customers, you want to have tied integration with your own systems and have like an API you can talk to, if you want to manage ROAs seamlessly across RIRs to see it as a complete pool of resources instead of having to click around in up to five web interfaces, then running your own CA is actually a viable option. You will have fine‑grained access control and you will have complete control over the publication interval.
And that's it. Any questions?
(Applause)
ONDREJ SURY: We are running a little bit over time so please be quick with the questions.
AUDIENCE SPEAKER: Marco Schmidt again. I have a couple of remote questions. First, from Martin as an individual, can you run your own certificate authority but let RIPE NCC publish the output?
ALEX BAND: If the RIPE NCC would offer that as a service, absolutely. It is something that is technically possible, if we talk to the RIPE NCC and they would, you know, sort of allow the infrastructure to do that, then, yes. Technically possible, but not practically possible right now.
MARCO SCHMIDT: I have another one. This is from Sandy Murphy, Parsons, a question about the trust anchors. That might be more a relying party problem, but then it would be interested to hear if the certificate authority is involved.
ALEX BAND: Whose certificate authority?
MARCO SCHMIDT: Of the trust anchors, I think.
ALEX BAND: I don't understand.
MARCO SCHMIDT: I can double‑check here.
ALEX BAND: I think if the ‑‑ or Tim has the answer. You want to give this a shot?
TIM: Tim, NLnet Labs. Maybe I can clarify, so there was a slide up there with, that said embedded trust anchor, this is really meant for testing of the software. You want to test locally so you need to start off somewhere, but the idea is not that this is going to run trust anchors out there, it's just to be able to do things locally until the time that that can be operated on their own RIR or wherever.
ALEX BAND: Practically this will just mean that relying party software connects to the five trust anchors of the five RIRs and just follow the three down, whether that is something hosted or whether that is something that you run yourself.
AUDIENCE SPEAKER: Owen DeLong, Home technologies. I lost my train of thought. I apologise.
AUDIENCE SPEAKER: Good afternoon. My name is Gabriel with Advania. First of all, thank you for a great presentation. So, my question is, with delegated RPKI, you mention that you are responsible for your own publications of signatures. To whom do you publish these signatures and how do you ensure that the ISPs and the transit provider across the world are build to located your signatures and verify them?
ALEX BAND: The certificate you hold, the certificate that you create is being signed by the parent RIR, so either that is the national Internet registry or the regional Internet registry it forces that it only has the resources on it that you are the holder of, that is the way that the system is designed and all the ROAs that you create, it is essentially impossible for a ROA to be out there under that certificate for address space that you're not the holder of. So, the system is locked down in a way that it is only possible to have a certificate with resources that you own, and that is signed for and verified by the region Internet registry.
AUDIENCE SPEAKER: That is pretty much what I wanted to ask.
AUDIENCE SPEAKER: Rudiger Volk, Deutsche Telekom. In my lab, I have been running the Dragon research stuff with all the functionality I am happy to see that additional versions are becoming available, and well, okay, that's a good news for progress of RPKI usage. On the other hand, it's nice to see a great sales person on the stage pitching the stuff. What should be in mind is that actually running a reliable resilient service distributed with all the parties that join the bandwagon creates challenges, making sure that all the parties, and I'm just looking at the five RIRs at the moment, have published and scrutinised rules of operation that prevent down times and outages becomes a more interesting problem if more parties join. So, kind of yes, there are many use cases where you want to run your own CA. On the other hand, with that benefit also comes a large responsibility to the rest of the network, well, okay, if you do not properly address that responsibility, you and your customers will be hurt badly.
ALEX BAND: Absolutely.
CHAIR: Thank you for feeding the whales. So this concludes this session, and please don't forget to rate the talks, this helps the Programme Committee to have some feedback on whether we picked right or wrong. Thank you.
LIVE CAPTIONING BY
MARY McKEON, RMR, CRR, CBC
DUBLIN, IRELAND.