[RAM] ITR/ETR functions in hosts, NATs & servers - not in routers?

Robin Whittle <rw@firstpr.com.au> Wed, 04 July 2007 13:50 UTC

Return-path: <ram-bounces@iab.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1I65Fg-0007fa-26; Wed, 04 Jul 2007 09:50:40 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1I65Ff-0007fS-LC for ram@iab.org; Wed, 04 Jul 2007 09:50:39 -0400
Received: from gair.firstpr.com.au ([150.101.162.123]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1I65FZ-0005HV-4l for ram@iab.org; Wed, 04 Jul 2007 09:50:39 -0400
Received: from [10.0.0.8] (zita.firstpr.com.au [10.0.0.8]) by gair.firstpr.com.au (Postfix) with ESMTP id AF45D59DA1; Wed, 4 Jul 2007 23:50:31 +1000 (EST)
Message-ID: <468BA59B.2060404@firstpr.com.au>
Date: Wed, 04 Jul 2007 23:50:19 +1000
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.4 (Windows/20070604)
MIME-Version: 1.0
To: ram@iab.org
References: <20070703142426.AB11D872D8@mercury.lcs.mit.edu>
In-Reply-To: <20070703142426.AB11D872D8@mercury.lcs.mit.edu>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Scan-Signature: e3901bdd61b234d82da85cc76f05a7e8
Cc: Christian Vogt <christian.vogt@nomadiclab.com>, Noel Chiappa <jnc@mercury.lcs.mit.edu>
Subject: [RAM] ITR/ETR functions in hosts, NATs & servers - not in routers?
X-BeenThere: ram@iab.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Routing and Addressing Mailing List <ram.iab.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ram>, <mailto:ram-request@iab.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ram>
List-Post: <mailto:ram@iab.org>
List-Help: <mailto:ram-request@iab.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ram>, <mailto:ram-request@iab.org?subject=subscribe>
Errors-To: ram-bounces@iab.org

I assumed that LISP/Ivip would always involve expensive routers
with large central memories and fast ASIC-based FIBs, but now I
can imagine a way the whole thing might be done, in principle, in
the longer term, without any routers at all.  I am not suggesting
routers won't in practice be needed - just that once something
like this becomes a normal part of the Internet, most of the ITR
and ETR work can probably be done without routers.

To support unaltered hosts, I think caching "pull" ITR routers
will be required to handle most of their outgoing packets.  To
support unaltered hosts in unaltered provider or AS-end-user edge
networks, I think some high volume push ITR routers will be needed
in the core (not inside, or at the border of, provider or
AS-end-user edge networks).  But see point 8 for how such a fast
full database "push" ITRD might be made with a bunch of servers.

With 8 gigs of RAM, even a single server running full database
"push" ITRD code could use just 2 memory cycles to look up the
mapping for individual IPv4 addresses when 1 to 1.5 billion
addresses are handled by LISP/Ivip - so maybe this is the best way
to do it, forgetting about caching ITRCs.

Short version:

1  "Notify" (AKA "cache-invalidation"?) as an addition to a pull
   system.

2  Using a first line of pull ITRs which let through the small
   number of packets they haven't got mapping for so these
   "novel" packets are handled instantly by a full database ITR.
   Caching ITRs are going to be a lot cheaper than high bandwidth
   pull ITRs, so this would save money without allowing any
   packets to be delayed.

3  A caching ITR function in sending hosts, so the host doesn't
   need a caching ITR.  This would be in a NAT router, not in
   hosts behind NAT.

4  Then, if the pull ITR only handles a small load, maybe
   implement it in software running on a cheap PC box.

5  If all the hosts/NATs were upgraded to include a caching ITR
   in their operating system software, maybe there wouldn't need
   to be an actual router for caching ITR work at all.

6  ETR functions can easily be performed in a server, so
   there's no absolute need to have a router for this.  Also
   ETR functions can be done in the receiving host, provided the
   host has a local "care-of" address.  A NAT device can perform
   ETR functions for all packets it receives for its hidden hosts.

7  Maybe a high performance "core-ITR" can be made without
   routers by implementing a bunch of high-volume caching ITRCs
   and lower volume push ITRDs on a bunch of Linux/BSD boxes.

8  So with modest upgrades to hosts (not behind NAT) and to
   NAT devices (DSL and cable modems etc.), maybe an entire
   provider or AS-end-user edge network can be made to do
   high-performance LISP/Ivip sending and receiving with a
   bunch of special servers and no LISP/Ivip functions in
   router$.


In "Re: the separation of ID/RLOC", Christian Vogt wrote:

CV > Fast provider fail-over requires a very agile mapping
CV > function, however. To me, it is not clear how either a pull
CV > or a push model, or a trade-off between the two, can support
CV > this in a scalable manner.

M > This is the key part of the mapping mechanism. I believe that
M > at the first step some prototype systems based on DNS (pull
M > model) and DHT (Push model)  or some hybrid model
M > respectively must be deployed and test their performance,
M > then  decide which mechanism is a good one.
>
> Michael,
>
> you are certainly right.  The point I am making is that, from a
> delay perspective, a pure pull model won't support efficient
> provider fail-over.  A push model might; it depends on the
> actual mechanism.  But still, for one edge network to update the
> mapping, it will have to push the update to a large set of
> ingress tunnel routers.  This holds for LISP, and for Ivip at a
> later transition stage.

In my recent message "5 Database <--> ITR push, pull and notify" I
propose a push system to get the full database with
second-by-second updates out to large numbers (~100,000?) of ITRDs
("push" ITRs with the full Database) and to QSDs, which are Query
Servers (not routers) which also have the full Database.

Then I propose that ITRCs ("pull" ITRs which Cache the result of
queries, and don't have the full database) query these QSDs either
directly or via one or more levels of QSC (proxy Query Servers
with Cache).

In addition to this basic "push" system, with locally available
responses for "pull" queries, I also propose a "Notify" function
- which might be similar to what Noel called "cache-invalidation".

"Notify" means that when a QSD gets a real-time database update
for a mapping which it has recently (some caching time) been asked
about, it sends a notification to whatever device sent the query.
 That device will be either an ITRC - which therefore receives the
update within seconds of the database changing - or a QSC, which
does the same thing to whichever device sent it the query.  Within
a fraction of a second of the QSD getting a mapping update, all
the ITRCs which recently queried that mapping are notified of the
change.

I guess this sort of approach has been used in other settings.

The example I give assumes a fixed 10 minute caching time, which
should make it relatively simple to implement the timers in the
QSDs and QSCs.  However the principle could be extended to work
with variable caching times which are specified in each response.

In that message I also propose an Ingress Tunnel Function in Host
(ITFH) - to encapsulate and tunnel just that host's outgoing
packets, but not if the host is behind NAT.  This would be like an
ITRC, making queries to a QSD or QSC.  It would be an operating
system upgrade and have no real cost, whilst taking a great load
off ITRCs.

ITFH only works on hosts or NAT devices which have a normal
BGP-reachable IP address or an address which is mapped by
LISP/Ivip.  It won't work on hosts behind NAT.  It could be an
operating system upgrade for hosts, and would be especially
valuable in web servers and the like which are pumping out large
numbers of packets.  It would cost nothing financially and involve
only a small CPU load (compared to the rest of the server stuff)
to implement the caching ITR function in the host itself.

ITFH could be implemented as a firmware upgrade for DSL and Cable
modem NAT routers.  However, there really needs to be a
standardised autodiscovery system for the one or more QSDs or QSCs
to direct queries to.  Also, ideally, there should be an auto
discovery system so the ITFH function knows one or more addresses
of full push ITRDs to tunnel packets to when it hasn't got the
mapping for them at that moment.


The ITRCs and ITFH functions could have two or more upstream QSDs
or QSCs to query.

In "1 Ivip ITR strategies, including in the host" I suggest having
these two types of caching ITR (ITRC and ITFH) only make queries
and do mapping for the main volume of packets, letting "novel"
packets pass, whilst analysing traffic patterns to see which
destination addresses show up in more than a few novel packets, so
a query can be made about the mapping of that address.  Those
packets which the ITRC or ITFH don't currently have mapping for
can either be forwarded normally so the local routing system can
forward them to a full database ITRD - or the ITRC/ITFH could
tunnel these packets explicitly to an ITRD.

This way, there would be no delay in tunneling any packet - and
the ITRD would only handle a generally low volume of "novel"
packets the ITRCs and ITFHs haven't got mapping for yet.

The path lengths would be optimal or close to optimal for packets
handled by a local ITRC.  The path length would be perfectly
optimal for all packets encapsulated by the ITFH in the host or
NAT device.

The path length would be longer for those "novel" packets which
need to go to a full ITRD.  The ITRD could be in the same network
(provider or AS-end-user edge network) or (with Ivip) could be
outside the network.

This would mean that each network either doesn't absolutely need a
full push ITRD, since it is acceptable for the novel packets to
use one outside the network, or that the full push ITRD the
network does have will be handling a relatively low volume of
packets.

If the volume of packets going to this one ITRD is low enough - or
if there are enough of these ITRDs in the network so the load on
them is low enough - then the ITRD function could be implemented
in a standard PC with suitable software, saving on the high
expense of a router with its very large FIB requirements etc.
The full push ITRD needs a lot of memory for its copy of the
database (~RIB) and its FIB data - and lots of CPU power to handle
incoming updates, crunch them to be the database and then to
crunch the database and changes to it to create the FIB data and
changes to the FIB data.

A device like a dual-core 64 bit CPU Linux/BSD server with 8
Gigabytes of more of RAM can also do relatively simple table-based
lookups to classify packets.  A billion individually mapped Ivip
addresses for IPv4 would need 4 gigabytes of RAM, since all that
is needed is a 32 bit address for where to tunnel them to.

Ordinary motherboards have one or two gigabit Ethernet interfaces
on board, the whole thing could boot from a USB stick or flash
drive, and wouldn't need a hard drive.

Its capacity to receive, classify, encapsulate and send packets
would be primarily limited by the CPU's ability to shovel packet
data around the place and to perform the FIB packet classification
process.  This could be simpler than usual, because there is lots
of RAM.  Memory latency is likely to be the bottleneck, since
ordinary motherboard memory systems, DDR RAM etc. is not optimised
for fast random access, but for slower access to a bunch of words
which are pumped into the cache, even if the software only needs
to read a byte.   I guess memory latency is 40 to 60 nsec.  It
might take, say, 5 random memory accesses to classify a packet and
find the address to tunnel it to.  This is pessimistic, since with
the 1 gig x 32 bit system described above it would take two
accesses, one to find the starting point of the correct table for
that master-prefix and the second to to read the 32 bit "LOC" address.

So if the CPU was doing nothing more than this, it would be able
to classify several million packets a second.  There is plenty to
do, so the packet rate would be significantly less than this.

I will continue as if the ITRD is a really onerous task, and that
generally an ITRC would handle more packets per second - but I am
not so sure this is the case.  If there were a billion addresses
in the Ivip system, then there is no need for a rule-based FIB
function which is separate from the local copy of the master database.

The updates would simply be applied to the appropriate 32 bit
locations in each array, and the same arrays would be used for
classifying packets.  It could hardly be simpler.

For a single "master-subnet" such as a 13.0.0.0/8 (assigned to
Xerox in 1991 and currently unadvertised) was assigned to the Ivip
system, this is represented as an array of 16,777,216 32 bit
locations.  When an update arrives:

    13.12.11.10 to  13.12.11.22  are now mapped to 77.77.77.77

all that is needed is 13 writes to memory, which will be cached
and be written out in probably 100 nsec.

When a packet comes in addressed to 13.12.11.13, the software
firstly finds from an array indexed by the most significant 24
bits whether this address is in an Ivip-mapped /24 of the address
space.  If it is, the array contains a 32 bit starting address for
the array just mentioned above, and a starting address for that
master-subnet.  Then by subtracting the master subnet, shifting
the result left by two bits and adding it to the start address of
this master-subnet's array, the CPU can read the mapping data.

With enough memory, this is so fast and simple that it may be
better to simply do all the ITR functions like this - forgetting
about ITRCs and their need for query servers.

However, an ITFH function in a host can't devote 4 gigs of RAM and
a lot of CPU power to run their own full database ITRD.  Doing
most of the encapsulation in the sending host would be best, and
that requires the ITFHs to be able to query QSDs or QSCs.


IPv6 ITRs wouldn't work so well with a table-based system, because
there are so many bits in play to get to a /64 granularity.  They
would need a recursive table lookup scheme, using three or four
bits of destination address at a time, until eventually finding a
match result, and then reading a 128 bit address to tunnel the
packet to.


A full database ITRD in a 64 bit 8gigabyte host, even with two
gigabit Ethernet ports, would be slower than a full ITRD
implementation in the FIB of a high-end router, but the whole
thing would only cost USD$1000 or so.


Maybe a single server with two gigabit Ethernet ports could have a
high enough throughput to make the cluster idea below unnecessary.

In that case, just have one or more such servers as peers with a
stub connection to any conventional BGP or internal router where a
full ITR function is needed - assuming the router could
load-balance packets between multiple such servers.


Here are some ideas about creating a relatively high-speed ITR to
handle raw packets from multiple unmodified networks, which a
conventional transit router has fished out and sent on a gigabit
Ethernet link.   In Ivip, the router would advertise all the
prefixes of the "master-subnets" and forward the packets it
receives whose destination addresses match any of these prefixes,
to this gigabit link.

The router would get back from the link encapsulated packets,
which it would forwarded to its peers for their trips to the ETRs
all over the world.

Here I assume we need a bunch of caching ITRCs with a single ITRD
to handle packets they don't have mapping information for.  But
perhaps it would be easier to make one server the replicator (see
my recent message: 5 Database <--> ITR push, pull and notify) for
the rest of the servers which are simply direct table-lookup
ITRDs, as described above.  Then there's no need for a QSD and the
whole thing is a lot simpler.  The router can presumably farm out
packets to the various IP addresses of each of the ITRDs, or maybe
one of them can do the the farming out.

   BGP peers - border
   & transit routers

       \   |   /               Gigabit stub replicator and
        \  |  /                ITR cluster - 7 plain diskless
        |  |  |                Linux/BSD servers
\    RRRRRRRRRRRRR
 \---R           R
     R  Conven-  R   1G Ethernet
-----R  tional   R ---------------\
     R           R                |
-----R  Transit  R   EEEEEEEEEEE  |
     R  Router   R   E         E  |
 /---R           R   E         E--/
/    R           R   E    8    E    .....................
     RRRRRRRRRRRRR   E         E----.  Replicator & QSD .
        |  |  |      E    P    E    .....................
        /  |  \      E    o    E
       /   |   \     E    r    E    .....................
                     E    t    E----.  ITRD  push       .
                     E         E    .....................
                     E    S    E
                     E    w    E    .....................
                     E    i    E----.  ITRC1 pull cache .
                     E    t    E    .....................
                     E    c    E
                     E    h    E    .....................
                     E         E----.  ITRC2 pull cache .
                     E         E    .....................
                     E         E
                     E         E    .....................
                     E         E----.  ITRC3 pull cache .
                     E         E    .....................
                     E         E
                     E         E    .....................
                     E         E----.  ITRC4 pull cache .
                     E         E    .....................
                     E         E
                     E         E    .....................
                     E         E----.      Spare        .
                     E         E    .....................
                     EEEEEEEEEEE

Some motherboards have two gigabit Ethernet ports, so the whole
system could use two links where I show one, including perhaps to
and from the router.

The servers would be identical and boot from a USB stick or flash
IDE/SATA drive.  They would all have the same software and assume
roles depending on what was needed.  They would probably have 8
(or in the future 16) gigs of RAM.

The router could distribute the raw packets to the five ITRCs
which would handle most of the packets, pretty quickly.  Their FIB
 code would be optimised for relatively low number of rules and
fast classification.  Packets which don't match are forwarded
to the ITRD which has a full copy of the database and an FIB which
classifies all packets to every possible prefix in the Ivip
system, including to /32 for IPv4 or /64 for IPv6.  This would
involve a lot more RAM and slower classification, so the packet
rate of the ITRD would be less than for each ITRC, but it could
instantly classify, encapsulate and forward any packet, without
needing to query the QSD query server, which the ITRCs would be
very frequently querying.

The top server is that full database query server, and a
replicator, receiving two or more streams of database update
information and pumping out ten or more streams to other sites, as
well as feeding a stream to its QSD function and to the ITRD.

Probably a little server cluster is not what everyone wants in
their data centre, but it would be a cheap alternative to buying a
new router with special ITR functions.


 - Robin      http://www.firstpr.com.au/ip/ivip/

_______________________________________________
RAM mailing list
RAM@iab.org
https://www1.ietf.org/mailman/listinfo/ram