[RAM] ITR/ETR functions in hosts, NATs & servers - not in routers?
Robin Whittle <rw@firstpr.com.au> Wed, 04 July 2007 13:50 UTC
Return-path: <ram-bounces@iab.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1I65Fg-0007fa-26; Wed, 04 Jul 2007 09:50:40 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1I65Ff-0007fS-LC for ram@iab.org; Wed, 04 Jul 2007 09:50:39 -0400
Received: from gair.firstpr.com.au ([150.101.162.123]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1I65FZ-0005HV-4l for ram@iab.org; Wed, 04 Jul 2007 09:50:39 -0400
Received: from [10.0.0.8] (zita.firstpr.com.au [10.0.0.8]) by gair.firstpr.com.au (Postfix) with ESMTP id AF45D59DA1; Wed, 4 Jul 2007 23:50:31 +1000 (EST)
Message-ID: <468BA59B.2060404@firstpr.com.au>
Date: Wed, 04 Jul 2007 23:50:19 +1000
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.4 (Windows/20070604)
MIME-Version: 1.0
To: ram@iab.org
References: <20070703142426.AB11D872D8@mercury.lcs.mit.edu>
In-Reply-To: <20070703142426.AB11D872D8@mercury.lcs.mit.edu>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Scan-Signature: e3901bdd61b234d82da85cc76f05a7e8
Cc: Christian Vogt <christian.vogt@nomadiclab.com>, Noel Chiappa <jnc@mercury.lcs.mit.edu>
Subject: [RAM] ITR/ETR functions in hosts, NATs & servers - not in routers?
X-BeenThere: ram@iab.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Routing and Addressing Mailing List <ram.iab.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ram>, <mailto:ram-request@iab.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ram>
List-Post: <mailto:ram@iab.org>
List-Help: <mailto:ram-request@iab.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ram>, <mailto:ram-request@iab.org?subject=subscribe>
Errors-To: ram-bounces@iab.org
I assumed that LISP/Ivip would always involve expensive routers with large central memories and fast ASIC-based FIBs, but now I can imagine a way the whole thing might be done, in principle, in the longer term, without any routers at all. I am not suggesting routers won't in practice be needed - just that once something like this becomes a normal part of the Internet, most of the ITR and ETR work can probably be done without routers. To support unaltered hosts, I think caching "pull" ITR routers will be required to handle most of their outgoing packets. To support unaltered hosts in unaltered provider or AS-end-user edge networks, I think some high volume push ITR routers will be needed in the core (not inside, or at the border of, provider or AS-end-user edge networks). But see point 8 for how such a fast full database "push" ITRD might be made with a bunch of servers. With 8 gigs of RAM, even a single server running full database "push" ITRD code could use just 2 memory cycles to look up the mapping for individual IPv4 addresses when 1 to 1.5 billion addresses are handled by LISP/Ivip - so maybe this is the best way to do it, forgetting about caching ITRCs. Short version: 1 "Notify" (AKA "cache-invalidation"?) as an addition to a pull system. 2 Using a first line of pull ITRs which let through the small number of packets they haven't got mapping for so these "novel" packets are handled instantly by a full database ITR. Caching ITRs are going to be a lot cheaper than high bandwidth pull ITRs, so this would save money without allowing any packets to be delayed. 3 A caching ITR function in sending hosts, so the host doesn't need a caching ITR. This would be in a NAT router, not in hosts behind NAT. 4 Then, if the pull ITR only handles a small load, maybe implement it in software running on a cheap PC box. 5 If all the hosts/NATs were upgraded to include a caching ITR in their operating system software, maybe there wouldn't need to be an actual router for caching ITR work at all. 6 ETR functions can easily be performed in a server, so there's no absolute need to have a router for this. Also ETR functions can be done in the receiving host, provided the host has a local "care-of" address. A NAT device can perform ETR functions for all packets it receives for its hidden hosts. 7 Maybe a high performance "core-ITR" can be made without routers by implementing a bunch of high-volume caching ITRCs and lower volume push ITRDs on a bunch of Linux/BSD boxes. 8 So with modest upgrades to hosts (not behind NAT) and to NAT devices (DSL and cable modems etc.), maybe an entire provider or AS-end-user edge network can be made to do high-performance LISP/Ivip sending and receiving with a bunch of special servers and no LISP/Ivip functions in router$. In "Re: the separation of ID/RLOC", Christian Vogt wrote: CV > Fast provider fail-over requires a very agile mapping CV > function, however. To me, it is not clear how either a pull CV > or a push model, or a trade-off between the two, can support CV > this in a scalable manner. M > This is the key part of the mapping mechanism. I believe that M > at the first step some prototype systems based on DNS (pull M > model) and DHT (Push model) or some hybrid model M > respectively must be deployed and test their performance, M > then decide which mechanism is a good one. > > Michael, > > you are certainly right. The point I am making is that, from a > delay perspective, a pure pull model won't support efficient > provider fail-over. A push model might; it depends on the > actual mechanism. But still, for one edge network to update the > mapping, it will have to push the update to a large set of > ingress tunnel routers. This holds for LISP, and for Ivip at a > later transition stage. In my recent message "5 Database <--> ITR push, pull and notify" I propose a push system to get the full database with second-by-second updates out to large numbers (~100,000?) of ITRDs ("push" ITRs with the full Database) and to QSDs, which are Query Servers (not routers) which also have the full Database. Then I propose that ITRCs ("pull" ITRs which Cache the result of queries, and don't have the full database) query these QSDs either directly or via one or more levels of QSC (proxy Query Servers with Cache). In addition to this basic "push" system, with locally available responses for "pull" queries, I also propose a "Notify" function - which might be similar to what Noel called "cache-invalidation". "Notify" means that when a QSD gets a real-time database update for a mapping which it has recently (some caching time) been asked about, it sends a notification to whatever device sent the query. That device will be either an ITRC - which therefore receives the update within seconds of the database changing - or a QSC, which does the same thing to whichever device sent it the query. Within a fraction of a second of the QSD getting a mapping update, all the ITRCs which recently queried that mapping are notified of the change. I guess this sort of approach has been used in other settings. The example I give assumes a fixed 10 minute caching time, which should make it relatively simple to implement the timers in the QSDs and QSCs. However the principle could be extended to work with variable caching times which are specified in each response. In that message I also propose an Ingress Tunnel Function in Host (ITFH) - to encapsulate and tunnel just that host's outgoing packets, but not if the host is behind NAT. This would be like an ITRC, making queries to a QSD or QSC. It would be an operating system upgrade and have no real cost, whilst taking a great load off ITRCs. ITFH only works on hosts or NAT devices which have a normal BGP-reachable IP address or an address which is mapped by LISP/Ivip. It won't work on hosts behind NAT. It could be an operating system upgrade for hosts, and would be especially valuable in web servers and the like which are pumping out large numbers of packets. It would cost nothing financially and involve only a small CPU load (compared to the rest of the server stuff) to implement the caching ITR function in the host itself. ITFH could be implemented as a firmware upgrade for DSL and Cable modem NAT routers. However, there really needs to be a standardised autodiscovery system for the one or more QSDs or QSCs to direct queries to. Also, ideally, there should be an auto discovery system so the ITFH function knows one or more addresses of full push ITRDs to tunnel packets to when it hasn't got the mapping for them at that moment. The ITRCs and ITFH functions could have two or more upstream QSDs or QSCs to query. In "1 Ivip ITR strategies, including in the host" I suggest having these two types of caching ITR (ITRC and ITFH) only make queries and do mapping for the main volume of packets, letting "novel" packets pass, whilst analysing traffic patterns to see which destination addresses show up in more than a few novel packets, so a query can be made about the mapping of that address. Those packets which the ITRC or ITFH don't currently have mapping for can either be forwarded normally so the local routing system can forward them to a full database ITRD - or the ITRC/ITFH could tunnel these packets explicitly to an ITRD. This way, there would be no delay in tunneling any packet - and the ITRD would only handle a generally low volume of "novel" packets the ITRCs and ITFHs haven't got mapping for yet. The path lengths would be optimal or close to optimal for packets handled by a local ITRC. The path length would be perfectly optimal for all packets encapsulated by the ITFH in the host or NAT device. The path length would be longer for those "novel" packets which need to go to a full ITRD. The ITRD could be in the same network (provider or AS-end-user edge network) or (with Ivip) could be outside the network. This would mean that each network either doesn't absolutely need a full push ITRD, since it is acceptable for the novel packets to use one outside the network, or that the full push ITRD the network does have will be handling a relatively low volume of packets. If the volume of packets going to this one ITRD is low enough - or if there are enough of these ITRDs in the network so the load on them is low enough - then the ITRD function could be implemented in a standard PC with suitable software, saving on the high expense of a router with its very large FIB requirements etc. The full push ITRD needs a lot of memory for its copy of the database (~RIB) and its FIB data - and lots of CPU power to handle incoming updates, crunch them to be the database and then to crunch the database and changes to it to create the FIB data and changes to the FIB data. A device like a dual-core 64 bit CPU Linux/BSD server with 8 Gigabytes of more of RAM can also do relatively simple table-based lookups to classify packets. A billion individually mapped Ivip addresses for IPv4 would need 4 gigabytes of RAM, since all that is needed is a 32 bit address for where to tunnel them to. Ordinary motherboards have one or two gigabit Ethernet interfaces on board, the whole thing could boot from a USB stick or flash drive, and wouldn't need a hard drive. Its capacity to receive, classify, encapsulate and send packets would be primarily limited by the CPU's ability to shovel packet data around the place and to perform the FIB packet classification process. This could be simpler than usual, because there is lots of RAM. Memory latency is likely to be the bottleneck, since ordinary motherboard memory systems, DDR RAM etc. is not optimised for fast random access, but for slower access to a bunch of words which are pumped into the cache, even if the software only needs to read a byte. I guess memory latency is 40 to 60 nsec. It might take, say, 5 random memory accesses to classify a packet and find the address to tunnel it to. This is pessimistic, since with the 1 gig x 32 bit system described above it would take two accesses, one to find the starting point of the correct table for that master-prefix and the second to to read the 32 bit "LOC" address. So if the CPU was doing nothing more than this, it would be able to classify several million packets a second. There is plenty to do, so the packet rate would be significantly less than this. I will continue as if the ITRD is a really onerous task, and that generally an ITRC would handle more packets per second - but I am not so sure this is the case. If there were a billion addresses in the Ivip system, then there is no need for a rule-based FIB function which is separate from the local copy of the master database. The updates would simply be applied to the appropriate 32 bit locations in each array, and the same arrays would be used for classifying packets. It could hardly be simpler. For a single "master-subnet" such as a 13.0.0.0/8 (assigned to Xerox in 1991 and currently unadvertised) was assigned to the Ivip system, this is represented as an array of 16,777,216 32 bit locations. When an update arrives: 13.12.11.10 to 13.12.11.22 are now mapped to 77.77.77.77 all that is needed is 13 writes to memory, which will be cached and be written out in probably 100 nsec. When a packet comes in addressed to 13.12.11.13, the software firstly finds from an array indexed by the most significant 24 bits whether this address is in an Ivip-mapped /24 of the address space. If it is, the array contains a 32 bit starting address for the array just mentioned above, and a starting address for that master-subnet. Then by subtracting the master subnet, shifting the result left by two bits and adding it to the start address of this master-subnet's array, the CPU can read the mapping data. With enough memory, this is so fast and simple that it may be better to simply do all the ITR functions like this - forgetting about ITRCs and their need for query servers. However, an ITFH function in a host can't devote 4 gigs of RAM and a lot of CPU power to run their own full database ITRD. Doing most of the encapsulation in the sending host would be best, and that requires the ITFHs to be able to query QSDs or QSCs. IPv6 ITRs wouldn't work so well with a table-based system, because there are so many bits in play to get to a /64 granularity. They would need a recursive table lookup scheme, using three or four bits of destination address at a time, until eventually finding a match result, and then reading a 128 bit address to tunnel the packet to. A full database ITRD in a 64 bit 8gigabyte host, even with two gigabit Ethernet ports, would be slower than a full ITRD implementation in the FIB of a high-end router, but the whole thing would only cost USD$1000 or so. Maybe a single server with two gigabit Ethernet ports could have a high enough throughput to make the cluster idea below unnecessary. In that case, just have one or more such servers as peers with a stub connection to any conventional BGP or internal router where a full ITR function is needed - assuming the router could load-balance packets between multiple such servers. Here are some ideas about creating a relatively high-speed ITR to handle raw packets from multiple unmodified networks, which a conventional transit router has fished out and sent on a gigabit Ethernet link. In Ivip, the router would advertise all the prefixes of the "master-subnets" and forward the packets it receives whose destination addresses match any of these prefixes, to this gigabit link. The router would get back from the link encapsulated packets, which it would forwarded to its peers for their trips to the ETRs all over the world. Here I assume we need a bunch of caching ITRCs with a single ITRD to handle packets they don't have mapping information for. But perhaps it would be easier to make one server the replicator (see my recent message: 5 Database <--> ITR push, pull and notify) for the rest of the servers which are simply direct table-lookup ITRDs, as described above. Then there's no need for a QSD and the whole thing is a lot simpler. The router can presumably farm out packets to the various IP addresses of each of the ITRDs, or maybe one of them can do the the farming out. BGP peers - border & transit routers \ | / Gigabit stub replicator and \ | / ITR cluster - 7 plain diskless | | | Linux/BSD servers \ RRRRRRRRRRRRR \---R R R Conven- R 1G Ethernet -----R tional R ---------------\ R R | -----R Transit R EEEEEEEEEEE | R Router R E E | /---R R E E--/ / R R E 8 E ..................... RRRRRRRRRRRRR E E----. Replicator & QSD . | | | E P E ..................... / | \ E o E / | \ E r E ..................... E t E----. ITRD push . E E ..................... E S E E w E ..................... E i E----. ITRC1 pull cache . E t E ..................... E c E E h E ..................... E E----. ITRC2 pull cache . E E ..................... E E E E ..................... E E----. ITRC3 pull cache . E E ..................... E E E E ..................... E E----. ITRC4 pull cache . E E ..................... E E E E ..................... E E----. Spare . E E ..................... EEEEEEEEEEE Some motherboards have two gigabit Ethernet ports, so the whole system could use two links where I show one, including perhaps to and from the router. The servers would be identical and boot from a USB stick or flash IDE/SATA drive. They would all have the same software and assume roles depending on what was needed. They would probably have 8 (or in the future 16) gigs of RAM. The router could distribute the raw packets to the five ITRCs which would handle most of the packets, pretty quickly. Their FIB code would be optimised for relatively low number of rules and fast classification. Packets which don't match are forwarded to the ITRD which has a full copy of the database and an FIB which classifies all packets to every possible prefix in the Ivip system, including to /32 for IPv4 or /64 for IPv6. This would involve a lot more RAM and slower classification, so the packet rate of the ITRD would be less than for each ITRC, but it could instantly classify, encapsulate and forward any packet, without needing to query the QSD query server, which the ITRCs would be very frequently querying. The top server is that full database query server, and a replicator, receiving two or more streams of database update information and pumping out ten or more streams to other sites, as well as feeding a stream to its QSD function and to the ITRD. Probably a little server cluster is not what everyone wants in their data centre, but it would be a cheap alternative to buying a new router with special ITR functions. - Robin http://www.firstpr.com.au/ip/ivip/ _______________________________________________ RAM mailing list RAM@iab.org https://www1.ietf.org/mailman/listinfo/ram
- [RAM] the separation of ID/RLOC Michael
- [RAM] Re: the separation of ID/RLOC Dino Farinacci
- [RAM] Re: the separation of ID/RLOC Michael
- [RAM] Re: the separation of ID/RLOC Dino Farinacci
- Re: [RAM] Re: the separation of ID/RLOC Christian Vogt
- Re: [RAM] Re: the separation of ID/RLOC Michael
- Re: [RAM] Re: the separation of ID/RLOC Dino Farinacci
- Re: [RAM] Re: the separation of ID/RLOC Christian Vogt
- Re: [RAM] Re: the separation of ID/RLOC Eliot Lear
- Re: [RAM] Re: the separation of ID/RLOC Christian Vogt
- Re: [RAM] Re: the separation of ID/RLOC Christian Vogt
- Re: [RAM] Re: the separation of ID/RLOC Noel Chiappa
- Re: [RAM] Re: the separation of ID/RLOC Tony Li
- Re: [RAM] Re: the separation of ID/RLOC Scott W Brim
- Re: [RAM] Re: the separation of ID/RLOC Christian Vogt
- [RAM] Detecting multihoming failures Robin Whittle
- Re: [RAM] Re: the separation of ID/RLOC Christian Vogt
- [RAM] ITR/ETR functions in hosts, NATs & servers … Robin Whittle
- Re: [RAM] Re: the separation of ID/RLOC Iljitsch van Beijnum
- Re: [RAM] Re: the separation of ID/RLOC Christian Vogt
- Re: [RAM] Re: the separation of ID/RLOC Iljitsch van Beijnum
- Re: [RAM] Re: the separation of ID/RLOC Tony Li
- Re: [RAM] Re: the separation of ID/RLOC Eliot Lear
- Provider Selection and Mapping Updates (Re: [RAM]… Christian Vogt
- Transport Protocol Adaptation (Re: [RAM] Re: the … Christian Vogt
- Re: [RAM] Re: the separation of ID/RLOC Christian Vogt
- Re: [RAM] Re: the separation of ID/RLOC Christian Vogt
- Re: [RAM] Re: the separation of ID/RLOC Christian Vogt
- Re: Provider Selection and Mapping Updates (Re: [… Iljitsch van Beijnum
- Re: Provider Selection and Mapping Updates (Re: [… Eliot Lear
- Re: [RAM] Re: the separation of ID/RLOC Tony Li
- Re: Provider Selection and Mapping Updates (Re: [… Iljitsch van Beijnum
- Re: [RAM] Re: the separation of ID/RLOC Dino Farinacci
- Re: Provider Selection and Mapping Updates (Re: [… Christian Vogt
- Re: [RAM] Re: the separation of ID/RLOC Christian Vogt
- Re: [RAM] Re: the separation of ID/RLOC Tony Li