Thursday, 23 May 2019
2 p.m. routing and IPv6 tutorial
Thursday 23 May 2019
CHAIR: Good afternoon everyone. After the lunch for an experiment. No, not for another dessert but for another experiment. So this is something that we are doing for the first time and trying to improvise as we do that. This is ‑‑ what is this not? This is not Routing Working Group innovating the holy lands of IPv6 just because they ran out of time in the previous slot. This is something that is relevant to routing, relevant to IPv6 as you both use routers which forward the v6 packets, and it's an attempt to explain to the community, or initiate the discussion of the community if they need similar type of, say, I cannot use the word training, maybe guiding or tutorial sessions explaining how the concepts or the fundamentals of those things which you are using daily in fact wash.
And this is ‑‑ why is this being discussed? In specific Working Groups we specifically focus on the rather deep problems, deep‑going problems, but don't necessarily talk about the fundamentals, the reasons why certain things are being done in a way, why certain things are are designed as they are designed, why they operate as they operate. And for focus appears to be on actual operational and deployment, not necessarily on the architectural and kind of the conceptual design things.
Also, in the industry there doesn't appear to be a place where these topics without the vendor, specific vendor marketing and the specific vendor details could be discussed. Therefore, we are trying to experiment on a very ‑‑ on an initial and simple and at the same time very complex topic, how a router works. And what happens inside, what are the implications and why your big and shiny router which potentially could do gazillions of packets a second, cannot do this and that and other and there are other fundamental limitations and you are being really unhappy.
The content itself is not new. This was presented a similar topic was discussed back at the last IETF, and the reason for that was somewhat different than what we are trying to discuss here. It was on a discussion about what the protocols and the protocol design, how that impacts the router operation. If I have a packet decorated with many tags, headers and other things and I want to look at all of them deep into the packet, I want to look deep into the content and it appears that I cannot do that practically, so why? Is that the fault of the protocol design? Is that a fault of the design? Is that somebody else's fault? Why do you separate theoretical things combine together as a practical system appear together not to work?
The discussion talks with community operators here, RIPE and some other forums indicated that this topic might be interesting. Therefore, this experiment. That's basically the reasoning why we would like to occupy for around an hour of your time. So, treat it as an attempt on a community education. Right now we have chosen this topic, if you see these types of events bringing value, probably this can be reused for different topics, how does certain technology components work, how things which are interesting for you and relevant for you how do they work.
So, in summary, with we need your feedback and really, really what we would like to see is, it's not only us the the chairs spreading the propaganda here to you, it's a by directional communication, so probably we can run this in the format that if you hear something and if you violently disagree or violently agree, go to the mic and let's have a discussion just at the spot. Please don't wait until the very end. The whole point of this is to have discussion. And to sense the feeling of the room, whether this brings any value to you.
With that, well, yes, I am the invader here because the IPv6 chairs have very kindly agreed to donate their slot for this discussion. This is not in the routing budget so maybe they would have any comments to add to this.
CHAIR: It's an experiment, we'll see how it goes, but I think it's good that sometimes Working Groups do work together, and say, let's see what this brings up because there are sometimes issues that cover many, or at least more than one Working Group and, well, let's as Ignas already said we need your feedback and we're very open to it.
IGNAS BAGDONAS: All right. Thanks. So I understand that you will leave me alone with the whole public ‑‑ you don't want to receive rotten tomatoes, it's only me.
Right. Modern router architectures. The first thing, the content here is not only originally mine, there is a large group of people who did many of these over the years so that's a just a compilation of those and those are talking points not slides. So let's have a discussion.
You might have a question, why spherical router and what is a spherical routing vacuum. A spherical routing vacuum is an iniPhone I will powerful iniPhone I'll featureful router that can take any protocol, any pay loud and do what you want with with that. Naturally, such routers exist only in theory. Therefore, spherical routers is in a vacuum.
If we try to look into logical groups of the content.
What is a modern router, how is looks like from the ‑‑ out of which components it is built and what happens there. And it is intentionally very simplified, and this is not a tutorial how to build a high performing system or in general not a tutorial how to build a router, so please don't take a soldering iron after listening to that and attempt to build one.
The focus is on the actual core principles of how the ‑‑ and the special focus is the data plane of the router. How that operates and not necessarily on the implementation details. Therefore, if you are interested to hear how your favourite vendor implements something or what not, you need to turnabout that, sorry this is completely not about that.
Why do we need to talk about this? There appears to be prevailing misconceptions in the community of how the router works internally. Quite a popular and very wrong opinion is that a router is just a programmable computer. Generalically programmable computer. It takes the packet. It looks into this field, that field, it looks into the pay load, calculates this, calculates that, looks something in one area, something in another area and then forwards the packet. In theory, yes. However if we are talking about the practical and high performing systems, that's completely not true. And that is not true for very good practical reasons. You need to trade such generic flexibility for performance. Certainly you can have a router which is both implemented as a generic computer and operates on a principles of a generic computer. However the performance of such a system would be nowhere near what you would like to see.
Then, another big area of misconception coming, well, with the marketing flavour, is that why do we need specific hardware when we have this software defined fashionable thing of one flavour or the other, it just works, we use more powerful server and it just solves all problems. Again, to the level, there are very good reasons and say the attempt of this talk is to try to explain at least briefly why you still need specialise and very specialised hardware components. And on the other hand, if we are talking about a software data planes, why high performing software data planes are not necessarily generic sequential programmes.
This was, this content is adapted from what has been discussed at the IETF. However the message is quite different. IETF deals mostly with developing the protocols and trying to see how those protocols can be implemented. One of the reasons for this discussion at the IETF was heated discussions happening for a long time in a six‑man Working Group, which deals with maintenance and extensions of the IPv6 and especially in the context of extension headers, and there is still a widely, wide misconception in the community that extension header is, well, it's a simple and very flexible thing you just put in another extension header and get the functionality what you need. However if we are looking from the practical perspective that's definitely not the case. So for the IETF that was mostly done as an explanation, what are the implications of the protocol design to the actual platform? Here it's much, on a much wider level simply to try to explain how the router works internally.
So, if we look at the very, very simplified, this is actually a spherical ‑‑ this is a square router or a diagram of a square router. Three large components there. The control plane where your favourite BGP ‑‑ well, we are joint Routing Working Group, so therefore BGP ‑‑ favourite routing protocols live and that's not the focus of this talk, and that's mostly generic computer. And then you have the forwarding components. This is where the actual packet sending from the left to the right happens. And where the most interesting things appear. And then the gluing logic, the interconnect which is mostly needed for gluing the components together.
Control plane. Think of it as a generic computer. If you want it to perform highly unscaleable in normal software engineering rules apply. However, even if you have a powerful control plane complex trying to use that as a backup or a main path for forwarding of your data plane certainly is not a good idea. We are easily talking about several orders of magnitude and performance difference here, and why we will quickly come into that.
So, this is something that controls how your favourite router operates, but in fact is not that much critical to the overall platform performance.
Now the forwarding. This is where the actual processing of the data plane traffic happens and this is where the interesting things do live. So, if you look at the packet, that's a set of fields and fields have meanings in different contexts, therefore, you need to look up them in certain sequences, you need to look up them in certain name spaces, and that ‑‑ while not a complex operation isolated, if you look at what is needed for actually forwarding the real world IPv4, IPv6 packet in a real network, you need to look up quite many elements, items, in quite many different locations. And all of that takes finite amount of time. Memory access is not instantious, even if we are talking about tens of nanoseconds, those operations do add up.
Then the important misconception that I personally quite often meet in the operators community is why do we need buffering? Routers are fast enough, links are fast enough, therefore we don't need memory at all, just forward the packets. There are two concepts which control the need for buffering. That's congestion and contention. And while conjunction can indeed be solved with a certain amount of buffering and interface capacity, contention, where you have two incoming packets competing for aggregate at exactly the same interface, cannot be solved by that. One has to wait. One or more. Even if for really small amount of time on the order of micro seconds, you need to store them somewhere. The higher the speed of the interfaces the more of the interfaces and the higher the rate of traffic the more you need to store these things even for short periods of time. Therefore, buffering today and for the foreseeable future definitely will be something that needs to be taken into account in practical routers.
Now, trying to classify of what a forwarding component looks like.
Ranging from that same generic programmable processes, generic purpose processors as you might see in a control plane or in your laptop, two very dedicated very inflexible non‑programmable but at the same time very powerful pipeline type of lookup engines.
Now, why do we need the interconnection fabric if the current silicone process allows you to have 100 soft interfaces on the same component? That does not necessary enough. You might still have a router that has more than a few money interfaces. On the other hand you have to access the memory. And that is a far bigger user of electrical interface spins than the other faces. Therefore, it's not practical for the power consumption, for the heat removal purposes to build components which have many thousands or tens of thousands of electrical signals. Therefore, we see systems which are being built out of a separate discrete components interconnected via one fabric.
So, fabric can be even within the component. You actually don't notice that if you look at the physical router that can typically, most frequent use that interconnecting the separate line cards so the forwarding complexes which are on each and every individual line card, to send the actual data plane traffic from one to the other you need interconnection mechanism. And then lightly more exotic type of systems is where you have dedicated shelves, dead said chassises interconnected and trying to act as single network element.
Now, the differences in software and hardware world.
Software makes things look simple and sequential. You take a packet. You look up in this field, you look up in that field, then you look up in that field, then you modify this, then you modify that. This works, this is quite simple. However, each of these operations takes time and that time is ‑‑ the hardware world is slightly different. You may have all these operations happen at the same time, and the key word here is may. Again it depends on how that hardware is designed. But if it allows, you can easily do multiple lookups and multiple say modifications or statistics updates at the same time introble. Therefore that is one of the reasons why hardware assisted forwarding platforms if designed properly can provide you substantially higher overall throughput or performance.
You might argue that happen multi‑corporate assessors in the order of hundreds of processing course are a commodity, yes they are. So, software and underlying operating systems understand for example what are a very long time so why can we just not use that and be happy? We can, however that is not easy. Parallel programming is definitely not as simple as a sequential one and that is one of the reasons why we don't see it ‑‑ we see a bigger takeup of that but not necessarily on a commodity level.
On the other hand, a highly scaleable parallel hardware is expensive, just because you need more resources, which you can use in parallel. So, therefore, even if we have certain say abilities to use that more or less efficiently, that's not necessarily what is happening today in a commodity.
Now, specialised ‑‑ the discussion about the specialised network processors or MPUs versus say dedicated general purpose processors versus the completely general purpose processors. You might think that generic processor is infinitely scaleable. In theory it is. In practice there are really down‑to‑earth limitations like memory bandwidth. Such programmable processor needs instructions. You need how to feed that instructions that takes memory bandwidth or in general bandwidth into the processing component which can be used for your packets. Flexibility, yes, that's a gain. However, that flexibility comes at a cost. If all what you need to do is the simple table lookup, having dedicated hardware for that will yield several orders of magnitude better performance than to try to even highly optimised generically programmable processor. However, if two days later you decide to deploy a new protocol, no data plane protocol which requires a different type of lookup, you might end up in tears with your fixed pipeline type of forwarding engine, because you simply cannot use that new data plane protocol.
Practical systems, they come into the balance of trade‑offs. What functionality is really critical for, and without which I cannot give away the performance versus some future approving and mostly it's about the future approving bus the systems which are being designed now will get deployed in several years and will stay in production for several more years. Protocol development and the data plane moves slightly faster than the hardware. Therefore, if you have flexible or semi‑flexible system that might be more practical reasonable at a cost of grow performance.
So, if we try to integrate hardware and software. Anything programmable needs instructions. Getting instructions needs accessing them ‑‑ means accessing the memory. Accessing the memory means competing for the same bandwidth as you are forwarding the packets. Fixed or inflexible ‑‑ or say partially programmable or even completely non programmable lookup components don't need instructions. However, that comes at a cost that they can only do one particular function and that's it. And they still need access to the data. So, that is one other important aspect of that say platform component design.
Now, the more inflexible and at the same time the more performing the forwarding engine is typically it is less able to look deeper into the packets, or it is less able to do, say, more interesting or recursive or repeated lookups. Many of the pipeline based routers chop off the initial header or actually remove ‑‑ cut off a certain amount of bytes, and they process that. The remainder of the packet, including the rest of the pay load, is not reachable for them. This means that things like checkups, things like pay‑load fix ups and another operations which require access to the pay load may completely be out of reach for such forwarding components. And yes, there are plenty of practical systems which have these limitations and the reason why is that you are limited by the memory bandwidth. The more of the headers you take for lookups, the more of that bandwidth is used for transferring those headers around therefore it take more time for processing. And this directly relates to how many of the packets you can process and well as, you know, the measurement unit of why, how a route is better than the other of course is packets per second. Tell that to your vendor's marketer.
Instructions allow you to do very flexible things but at a big cost. Then implementation ‑‑ of the programmable and a fixed forwarding components, that comes at ‑‑ that is implemented in silicone and inter text of gates. The more functionality you need naturally the more gigs you need. Then complexity. At this time is traded, either you spend gates for the programmable functionality, you need the programme interpreter, you need gate machines for fetching the instructions and interpreting them, and you ‑‑ well, trade the space of the component or you dedicate them to a completely inflexible lookups, simple lookups, simple packet rewrites. And that is a design question. Cannot have both. There are practical limitations basically laws of physics still hold. The amount of power consumption and resulting heat disspacious is limited by the ‑‑ well is governed by the frequency and the amount of such gates, and therefore, cannot have both at the same time.
The less flexible the forwarding engine is, the more divided into the logical functions it is. So, if you take the generic purpose processor, you can make lookup first and queueing later or in fact you can queue first and then lookup later or you can do something in the middle. If you look at the fixed hardware forwarding components, those have a very strict sequencing of what goes first, what goes next, and that to an extent limits the functionality of what your router can do.
If you deploy a new data plane protocol which requires some, well, fancy or specific processing of certain headers, it might be that you will not be able to do that, and you might not be able to do that in performing way. It's always possible to send that misbehaving packet to your control plane and probably it will be processed there, but at the cost of a several orders of magnitude less performance.
One of a rather painful and problematic areas with fixed pipeline types of designs recursion, which means all sorts of tunnelings and stacks encapsulations. Typically one parse through the pipeline assumes one set of headers. If you want ‑‑ if you find out that the deeper layer is yet another layer of headers and addresses, you might end up needing to recirculate that again via your forwarding pipeline, and that very effectively cuts your performance in half, if you need to do that more times, multiple times.
Now, quite many of these topics are nothing more than a generic, say, software engineering or generic engineering system. They are just applied to a rather specific field of dealing with packet headers and then looking them up in different tables, and other structures. However, the inflexibility and the dedication to the particular task easily allows you to achieve performance that is nowhere near close to the software only implementations.
The important aspect to remember again that the laws of physics still hold. If in order to forward the packet you need to consult ten different entities, ten different locations which have information about different headers, you will end up accessing memory ten times and that is ten times slower than trying to access it once. Therefore, from a protocol design perspective, if you can compress the amount of information which is important for forwarding, that certainly is a big gain.
Even today, even if we are talking about a high bandwidth links, today the limiting factor is not necessarily the actual interface, be it today the limiting factor is still the memory bandwidth. That can be solved. You can have wider memory, that means more electrical signals, that means more power dissipation, that means more signal integrity problems, that means more complexity.
One of the major efficiencies of the custom hardware design is the performance per watt. For the generic processors you need to spend power for interpreting the commands. This is not needed for fixed pipelines. Now, power comes in two different aspects. One is you need to feed the components that the actual power used and then you need to take the power out which is generated due to heat. Again, you cannot run away from the laws of physics. If you increase ‑‑ that's a quadratic function which, if you increase the power, if you increase the frequency, the end result increases in the square. So cannot have ‑‑ that is a practical limitation of how fast your forwarding components can go.
Then, the amount of gates that you can squeeze into a particular silicone process limits the size of the packet and that depends on how many signal pins you can run out of it, and this limits the amount say of interfaces, the amount of memory channels that you can have.
Gates implementation in silicone for each functionality, you need them and you need to trade. If you need more interfaces, you spend the gates. If you need more lookup, you spend gates. If you need more complex lookup, you spend gates. You cannot have all of them at the same time. And that's, again, natural trade‑offs in the system design.
This is one of the limiting aspects of the practical systems today. Memory has capacity that is something which is not necessarily limiting factor. A far bigger limiting factor is the memory band, how quickly you can access the things there. Anyone talking, say, for about fast static memory on a single digit nanoseconds. First thing that comes at a huge cost of the capacity that can be sustained at this, right. The second thing is the number of the electrical signals that are needed in order to get that ‑‑ get the address in and the data out. So, while in theory your router may have gigabytes of memory available, typically that is available for your control plane and your BGP tables. For actual memory used for actual forwarding is not that same memory; it's a dedicated type, and that comes in megabytes or tonnes of megabytes. And therefore, for many of the practical platforms, even today, you see limitations that it can not have more than a rather low number of entries or you need to do, or you need to manually partition that for one data plane protocol on the different data plane protocol.
If we take a few examples, MPLS is rather simple. That's a direct lookup. 20 bits, that results in one million of entries, that's easy to do. However, not that easy to do if you have a recursion. IPv6 can be done at a direct lookup. That's only ‑‑ well four billion entries in theory, that can be compressed via some clever schemes. Many of the practical implementations shift the prefix length and do a direct lookup for a range, like 16 million of them, and then the rest as a second lookup.
IPv6 is a little bit of an offender here. 64, 48 bit lookups of much more friendly for the memory. Typically that is done in two stages. And that adds quite a lot of complexity and also at the same time, reduces the actual lookup performance at least twice.
Different type of memories are used for different purposes. The fact that you might have one million of routing entries completely does not mean that you might have one million of a forwarding adjacencies. That information comes with a very different requirement and they are typically stored in a very different memory components, different memory types, and that is one of the reasons you have different positioning of different routing platforms. Some of them might have a lot of interfaces with quite low number of prefixes supported. Some of us might have say on the order of hundreds of interfaces but with the a large lookup scale.
And that is a well illustratory example of how the actual forwarding of a packet looks like. So the context of this example is the VRF. You have a VRF were a packet comes in gets forward and the statistics gets updated. So initially, you need to find out what is a forwarding context, that means looking up the data plane encapsulation, the Internet tag, maybe something else which you are using, and that is trying to match in ‑‑ what is called a content addressable memory. You put the address on the input and you get the actual index where the information related to that entry resides. Then you do another look up in another memory to find out what context that packet belongs. Then you do another lookup in the actual forwarding table to find out where to send that packet and then you do another write to another memory to update the statistics counters.
Now with the optimised and well optimised pipeline design, that can be done in a small amount of cycles. It is not necessarily sequential, many of these things can be done in parallel. With a software implementation of course we are talking about the sequential accesses to different memories, and as a result, we use available memory bandwidth for different operations.
So, another down‑to‑earth aspect is electrical interface. The more of a different memory types the more of different interfaces you have. All of those needs somehow to be connected to your affording component. And that means increasing the number of electrical signals. And today, several thousand of pins is feasible; going beyond that you start to trade functionality versus the overall cost of the solution, and that is one of the limiting factors.
A word about NPUs, you might have had heard that and this is basically what we have been talking previously about. NPUs is ‑‑ can range from being a generic specialised processor which is good at doing certainly lookup operations optimised at a data transfer of certain sizes. Can range to a completely inflexible non‑programmable pipelines, can be something in between. That's just a generic term. However that term is really highly abused in the industry. Quite often it is well which is conception that if we have a slow router and we put an MPU there, that will become a fast router, openly in the spherical router, not necessarily in reality.
So with this, an open discussion this was a quick attempt, probably nothing it new for you here, mainly it's a little bit systemised and looked from a slightly different engineering perspective, not from the operational side your thoughts, your feedback about this, do you think that this is of any value? Was this something new or what you can say use in your daily life? What do you think?
AUDIENCE SPEAKER: Benedikt Stockebrand. I haven't done any work on this for, pretty much never really, but I got a bit of background on the electronic stuff. As I understand, it is possible and there is a manufacturer currently producing these chips called Barefoot, the Barefoot chips, which kind of use, kind of FPGA kind of programmable chip inside the interface, if I understand correctly. Wouldn't that mean that we can actually move some of the complexity right into the chips there and still maintain some level of programming capability that sorts it out or is it still another limited approach that doesn't work in reality?
IGNAS BAGDONAS: First thing, FPGA, that's a lookup table. There might be different flavour, some of them might be more specialised some of them lightly less, for example some might have dedicated logical interfaces. But the core FPGA is a lookup table. That means if you can implement what you want by a set of lookup tables, possibly implementing a sequential logical ‑‑ it would not make any practical sense. If you can implement what you want based on a lookup table, yes, it can help you to some extent. FPGA is not a solution to all of your problems. Just for the same reason. FPGA is mostly a generalically programmable hardware. It's not necessarily a generalically programmable software but it's a generalically programmable hardware component. That means you may implement some functionality which will become fast, but that will be limited in complexity, although you will end up uses all of your FPGA resources only for some simple but fast functionality. If you want to implement something, say, more flexible, you'll come down to the same trade offs. This is basically a programmable thing. Right. It needs to do well something comes in, you need to instruct that, if this then that, if this, then that, right. Therefore, you can ‑‑ so, to answer shortly. If you want to ‑‑ if you want to use FBGAs to offload rather simple tasks and offload them in an efficient way, yes. If you want to use them for offloading rather complex tasks, you still can do that but the performance will not be as good, just because you will end up implementing a generic programmable processor on that VGA which is by definition slower than a programmable processor.
So if we can talking about the practical implementations of this BFD is one of the protocols which can be used in such scenarios, and yes, there are practical platforms which can assist in terminating BFD sessions without actually getting them to the line card well quadroplane processor. Yes the systems do exist, and they work, they do their part. However, if you want to terminate an IP tunnel on an FPGA, theoretically you can, but that's probably not the best idea to do that.
BENEDIKT STOCKEBRAND: I haven't got hold of these things. I just heard somebody explaining how these works, so they are are kind of specialised FPGAs with building blocks, whatever you call them, designed for networking. But the real ‑‑ I think you answered the most important question, which is how much flexibility do I gain with these? And that's still sort of limited, if I understand correctly, compared to a full software implementation.
IGNAS BAGDONAS: Compared to full software implementation, yes. You need to be specific what kind of plexability you are looking for. If you want to terminate a simple yet chatty protocol, yes you can do that. If you want to implement ‑‑ to terminate the complex protocol, probably you can do that, but the question is whether that makes practical sense. It's basically it's a trade off.
BENEDIKT STOCKEBRAND: Okay. So, IP as such, if you only talk Layer3 or Layer2 and Layer3, it's ‑‑ well, slightly less complex than HTTP, so, of course at some point you have to deal with complexity but then there is a lot of complexity is only taken care of on the end devices not on the routers in between. So ‑‑ I think it's still not a question how far you can get. Okay, but anyway. Thank you.
AUDIENCE SPEAKER: So before I ask, or before I make the suggestion I was going to make, go back to the barefoot networks and P4 question that you asked a moment ago. So P4 and barefoot. It's not an FPGA, it's technically is an ASIC, you could almost call it an Desic, it actually is a programmable ASIC, which is why they are cost competitive with Broadcom for the majority of their stuff, like with the tomahawk series broad comes and stuff. I won't go into it a lot more here, my colleague Aaron Glynn has begin a talk at the previous RIPE meeting, you can look that up if you want. And it's almost like even out of scope for this because it's a much deeper dive and also like, it's a whole new ball of wax, it's pretty incredible. I would say the main thing about it is that it uses the P4 language which is a domain‑specific language like you would have open CL to programme a GPU or so. P4 is a domain‑specific language specifically designed for packet processing, it looks a lot like maybe a firewall config or something or matching language this sort of thing. And it's kind of gorgeous actually, that if you want to learn more about it look at P4.org. So that being said, what I was going to suggest is I think something that ‑‑ thank you for doing this, first of all.
What I think would be really interesting as a subtopic of discussion for this is the trade off between sort of individual cheap data centres‑ish boxes, 1 U, 2 U type machines, which are not fancy, and the sort of more traditional router architectures that ISPs are are used to doing which are the big chassis type things with all of these fun components and my opinion on that is that it's kind of a trade off between do you want to pay your vendor to handle the complexity of dealing with lots of packet forwarding engines and fabrics and stuff or do you want to just like do this with 100 gigabit ethernet yourself and deal with with your own fabric over a description ratios within your data centres, your metro your POP, whatever, and yes that is a statement that I would be willing to volunteer for like a ten minute lightning talk on the trade‑offs between chassises and 1U boxes and stuff because I think they both have their place.
IGNAS BAGDONAS: Right. Let me respond to both of these. So, first about the ASICs. This is my only view of the things, and I see that as a highly, highly abused term. Of PGAs and ASICs. Your favourite zone processor is an ASIC and well your 1 megabyte static of memory 60 nanobyte is an ASIC that's a term. The engine on which P4 is run they are specifically switch which are able to interpret the compiled P4 byte code. Right. So that's a programmable processor. The fact that it uses different byte code or well whatever the presentation it is running on. Equivalent is for that PGAs, you need the bit stream in order for a PGA to be useful. That's another form of the byte code which gets interpreted. It's just that it is not interpreted at the Runtime with but before the Runtime, right. Whereas your P4 forwarding component will interim that at the Runtime. It might be optimise that had it does something it might have specific hard coded components in that. Those are completely implementation details. So but that's basically my overall comment about the term asiic, that's completely over abused and maybe it would be better to name other things in their actual names.
Now, about the specific ‑‑ say different types of platforms used for different purposes and why one router is big and power Hungary and another one which is 20 times more performing fits into 1 rack unit and works perfectly fine. I would see two aspects to that. One is vendor marketing. If the only thing that you have is one single component and you want to cover different market segments, you probably have no other option than to try to use that component to cover different market segments and therefore you end up with a platform which initially were not positioned for that particular market segment. And that probably has less to do with the actual engineering than with purely with the business aspects of, well, particular vendor.
Now, about the technical aspects of why there are different platforms and why they have different operational problems. Certainly that's a fact. And you try to optimise certain things for certain functions. If your adjacency table has a few hundred of well possible entities, that's one thing. If you are trying to build a box which has many more than those, you certainly engineer that for different capacity.
So, yes, there are technical trade‑offs. And in my view those probably will be two different lines. One is a purely business question of how do you address specific market, how efficiently your platform can address that specific market. And again, there is no one single answer. You might subsidize specific verticals by having much better margins than other verticals, even with the same product. So, in my view, that's much more of a business question than the technical question. However, the education in this field, if I may use the word education, I certainly see that this might be needed, so basically, this is an network design question. The scaleability, the rate of change, the state accumulation points in the network, if you grow from this to that, what will happen, and topics like that. And this is needed in order to make the, say, justified decision of, if you choose this platform or the other platform.
So, I heard you said you are volunteering, right.
BLAKE: Yeah, I'd be prepared to put at the next RIPE meeting for 10 or 15 minutes for architecture, where do you really need a chassis or a 1 U box and what does the grey area in between like.
IGNAS BAGDONAS: Right, any other comments? Any other questions?
Right. If you don't have, then I have a few.
So, again, this was an experiment. And what in general you think, was this at least remotely useful? If this would have done again on a different topic, would you come again? If you have a list of topics, what would you like to see? So, from my side, and well, I'm an invader from routing area Working Group talking to the IPv6 people here, so... but from my side, I certainly see a lack in the industry of discussions or in general, understanding about the architectural aspects of network design decoupled have vendor marketing decoupled from specific platform and product details, but how do you design certain things? What are the implications how can that can work, how that can not work? What are the risks, what will break? How does that scale or in general, how do you design ‑‑ for example, how do you design a network for a data centre? And yes, there are different types of data centres, but overall constructs, they are quite simple in you use overlay, you use this; you don't use overlay you do that; you use something else, you do that. The amount of, say, degrees of freedom in a design is not that large. However, there are many components and that's what makes the overall system complex.
So, if we try to look at some potential examples of these deep device, for example, the BGP protocol mechanics and how that works. How will free VPN works again from a protocols mechanics perspective, not necessarily how you implement it for your favourite vendor but what is happening there.
How the DNS system works. How the, for example, the HTTP based applications do work in real life, the different layers, how they interact and so on. What do you think, is this of a potential interest? Would you be willing to participate in such type of discussions? Do you see value in these types of events? Would you be willing to present on those topics? And what, in general, do you think about this experiment, both in the format and the concept of it? Was the topic of any interest of any value? And shall we try to experiment with with that further? If yes, where? Is that fitting for RIPE? Is that just completely out? Shall we go to some our forum or just well abandon that and let your favourite vendor marketing department come to you weekly and brainwash you?
So the stage is yours. Please, I really really would like to see a by directional communication so I spread the propaganda here for an hour. You have got enough. I see somebody wearing a fancy BGP shirt, so yes are you volunteering for next session on BGP?
AUDIENCE SPEAKER: Yes. I just wanted to say that ‑‑ my name is Ernesto and I work for LinkedIn, and I just wanted to say that I think it's a great idea, the technology deep dives idea.
With regards to the topics and just speaking from my experience interviewing a lot of engineers for web scale type jobs, there is a big lack, I concur with you, that there is a big lack of understanding even with some seasoned engineers in regards to some what you would think would be very basic topics like how does BGP work like for example? A router receives a BGP routes, how that route ends up in, you know, in your forwarding table? Like, what happens before that happens? Like how does TCP react to congestion, what are all the different elements that are involved in that. Basic packet forwarding as well.
So, I think that there is a lot of value in this technology deep dives, there is an opportunity to take part I will volunteer to help out with these efforts. Thank you.
AUDIENCE SPEAKER: Benno Overeinder, RIPE PC in this role. I really like the presentation, and also if you just mention ‑‑ again, I think the basic principles of BGP, at the very beginning of the RIPE meeting, we have tutorial sessions in the morning from 9 to 11, we'd be more than happy to receive more and more submissions, tutorial submissions on these topics, though it's both a call‑out to everyone here in the room F there is more need for these type of deep dives in technology, please also submit architecturals.
IGNAS BAGDONAS: One of the reasons why this went through the back door and not through the official PC path ‑‑ I know that you are the Chair of PC ‑‑ so, this topic is rather specialised. And the whole reason why this topic is as it is, that it was at another organisation in the IETF for the particular topic of how IPv6 interacts with the router. Well that's basically the pre‑history of this, why this was needed. And therefore, routing plus IPv6 was a natural choice of doing this experiment. I would believe that this is a little bit too specialised topic for the general audience of RIPE.
BENNO OVEREINDER: Look at the room. I agree,, I am fine with everything. The more the better. But just I want to open the floor to all of you, there are also morning slots, two hours Monday morning, to present even more. So, it's not a critique, definitely not, I just want to make sure that there is room in the RIPE meeting for this kind of ‑‑ also more in‑depth presentation, so it's not a critique at not at all, I just want a thumbs up and if people really want more, there is room. Submit tutorials to the RIPE PC, and we evaluate them and also deep dive technical presentations are more than welcome. I want to give the word to Francisco.
AUDIENCE SPEAKER: Wolfgang. I think the topic was interesting and we should have technology deep dives again. However, I must admit I had a hard time not to fall asleep. And I think that's not because it wasn't interesting, but first it's a RIPE meeting, we already past dinner and on the third day, or fourth, and on the other hand, the lecture style of presenting the initial information I think is not the best possible way to get information across and keep people interested and following the programme.
So my suggestion would be if you are going to do something like that again, don't go for a lecture style. But, do something like, well, two people on stage going questions, answers, examples, thrown in, something like that. Okay.
AUDIENCE SPEAKER: Frances, yes to what Benno and Wolfgang said, but to give you a little bit of of what I just experienced ‑‑ and remember I'm not an engineering, I am a measurement researcher, which means I have a lot of deep knowledge in a very particular field, but I am interested and I am seeking knowledge about all the other fields that are around here, so what I do all the time, I go around and I talk to people and I find that knowledge can be rather compartmentalised. So there are very, very many specialists around here and sometimes it's even hard to bridge the knowledge between these people and I try to understand all the stuff that they are telling me and what you just did was a very good example of a thing that is very technical, but you presented it in a way that, with the patchwork knowledge that I have gathered over the years I could easily follow it, and so I think this could be a good benefit to bridge the different knowledge islands that we have, or that I see that we have in this community that we actually go out, or leave our like topics of comfort zone and may be able to benefit from the things we all know and be able to talk about that.
AUDIENCE SPEAKER: Spencer: Thank you for bringing this material here, and doing this. One thing I would encourage people to think about is they are thinking about this deep dive and other possible deep dives is if you trip over things that are the way the network works now that are not working for you, please say something.
One of the conversations we had on spherical routers in the IETF in March was about extension headers, and that seemed, from Geoff Huston's presentation this morning on IPv6 it seemed to be even darker than we had any idea in the IETF. We really appreciate feedback, people telling us that we need to do something differently or that something that should be working isn't working. Thank you all for that.
AUDIENCE SPEAKER: Tail from Oracle and I confess when I saw Benno get up I thought he was going to be addressing the question of I'll volunteer for doing a DNS one. So along those vains, if there were an interest in doing DNS, I'm happy to, you know, put a presentation together perhaps with you know, many of the other fine DNS experts in the community here, to do a dialogue style presentation or something, to help people kind of understand the operational needs of the DNS on the networks.
IGNAS BAGDONAS: I will pick on the last part, the dialogue style, and, well, I don't really want it but, however, I think that the after‑lunch laziness was slightly more powerful than I wish, right. On a serious note, it's not me trying to spread propaganda here. I think the community acknowledges that there are problems in this field, but those problems, in order to address those problems a bi‑directional communication is needed, you are saying you at the IETF, you are doing science fiction there, can you please address this and that and that and we need this problem to be solved and not necessarily in those way that and need a solution in this particular way and not what you are thinking about. It's a little bit too late to discuss about those things once they are baked in a silicone and shipping, so therefore the timeliness of these discussions are also important.
In the real world, all of this room should go to the IETF and tell loudly that we have problems and please fix those. Again, this will not happen and for very good practical reasons. Therefore something like a bi‑directional communication channel between say different operational groups in the IETF is need and that is just one attempt in this direction to try to see whether something can be done, it's not just me spreading the propaganda, it's me trying to explain some problematic areas and trying to gather the feedback from you. So the more feedback you provide the more that feedback can be aggregated and fed back to the standards organisations.
JEN LINKOVA: First of all I'd like to mention that bi‑directional communication exists, it's called mailing list. I know that people are not ‑‑ but what I was thinking of when I was listening to this second time, first time in March and now, I think it's ‑‑ we have a kind of feedback loop. I remember there were things which were not possible five years ago, and I was told, no, you never can implement it in hardware, and five years later, oh surprise here, so I think ‑‑ I'm not sure that we should say oh, cannot ever do this if there are enough demand for doing this, it's probably could be done but at costs. But technology is evolving, right. And I actually walked up because you mentioned extension headers.
So, my understanding is we have less technological problems, extension headers but more operational problems. Right. So, people filter them while a significant part of hardware could deal with that level of extension header chin so I don't think we can blame technology or design for this. It's the same reason why do people filter TCP 53 to their DNS servers? Nobody knows, right? I think it's the same as extension headers because nobody asks them to do otherwise or because they did not know they need to do otherwise and it comes back to incentives for them.
IGNAS BAGDONAS: So, the comment that what was not feasible sometime ago is feasible today, yes, that's a very valid one. However, you should not underestimate the standard developers too. If a few extension headers today are, say, a norm on the side of an exception, 27 extension headers might be the reality after sometime, IPv6, or something, and that is to an extent a certain rate, right. Hardware capabilities come probably with some LAG after the demand shows up. However, that ‑‑ in general, I agree with you. However, I cannot agree at the absolute sense, the sense that there is a time gap between where the user community wants to deploy something and where that is generally available.
JEN LINKOVA: That's very true. Even for software work, right? I have a request sitting for five years in the vendor queue before I can get it deployed in the network and another year‑and‑a‑half of doing qualification testing, so that's true, so that's why I think the earlier the dialogue starts here, the better. I remember when I couldn't put IPv6 into MPLS because, oh my God, so your TCP flags in the packet are now too far away, too far deep in the packet. Now you can do this. So, I think ‑‑ it should be like feedback a little bit ‑‑ there is operational problem or there is a protocol and protocol supposed to solve it, hardware vendors might say oh, no, it's too hard, can you revisit it. But the problem is still there. I do not see many people developing protocols and go into the IETF just because they have a crazy idea and they just like it. Quite often there is an operational reality behind it.
IGNAS BAGDONAS: Well some do.
JEN LINKOVA: Okay. Maybe yeah, but I do not think there will be people willing to pay hardware vendors to do this. Normally when the whole discussion starts about oh I want this in hardware and cannot do this. There are normally people who are willing to get it in hardware and get it deployed.
IGNAS BAGDONAS: Again this is a cycle. You get a new wish list, right I want this performing in hardware, absolutely, right. If there is a big critical mass, vendors will start to listen. If you are alone nobody will pay attention to you. However, if, if that critical mass develops a request for functionality which is not necessarily a right one, it might happen that the vendors will happily implement that because they see a market there.
JEN LINKOVA: I am a bit concerned about saying right one. What do you mean by the right functionality. What's right for one operator might not be ‑‑ you know like I heard operator saying what are you doing with all this quick thing you, you encrypted protocol how are we going to blog that stuff because cannot see what's inside. So obviously an encryption is not the right thing for some people. While other people are actually fight to go get it deployed. So there is not just one right thing to do. It depends who you ask.
IGNAS BAGDONAS: True. So what I mean by that is that the right things which do not break the remainder of the network infrastructure. And well, we have examples of attempts to build things in a way which break the infrastructure. And if the vendor sees a market for that they will follow the market. I would say that's a discussion for another day.
Any other comments? Any feedback?
Do you think that it was good investment of your 90 minutes or how many minutes did we spend here? Yes. Excellent.
JEN LINKOVA: It's not about extension headers. I think maybe you want to send a survey to people. You maybe want to send a survey asking about feedback and suggested topic or something so we might hear from people who are remote or still heading e‑mails and so on.
IGNAS BAGDONAS: I think the regular survey of the session will be there automatically, the feedback on the content and all the other regular things. And, yeah, definitely, if you are, if you are listening to this remotely, please express your feedback. Probably that would be good on IPv6 and the routing mailing lists, or any mailing list at RIPE that you think is appropriate, you can send personal e‑mails to Chairs, to anyone who you think would be the right contact for this.
So, I think I will very then selfishly use those 15 minutes to compensate for the overrun of the previous routing session. We overran that by 14 minutes so that's one minute for you to spare then. All right. Colleagues, thank you. Thank you for attending. I think that's it from my side. Again, if you have anything to discuss, comment, please do.
LIVE CAPTIONING BY
MARY McKEON, RMR, CRR, CBC