Re: [RAM] Number of DFZ routers - radical improvement of BGP unlikely

Robin Whittle <rw@firstpr.com.au> Sat, 07 July 2007 01:42 UTC

Return-path: <ram-bounces@iab.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1I6zJn-0000pi-7d; Fri, 06 Jul 2007 21:42:39 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1I6zJl-0000oC-UN for ram@iab.org; Fri, 06 Jul 2007 21:42:37 -0400
Received: from gair.firstpr.com.au ([150.101.162.123]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1I6zJg-0000LI-Nh for ram@iab.org; Fri, 06 Jul 2007 21:42:37 -0400
Received: from [10.0.0.8] (zita.firstpr.com.au [10.0.0.8]) by gair.firstpr.com.au (Postfix) with ESMTP id D40F859E3F; Sat, 7 Jul 2007 11:42:25 +1000 (EST)
Message-ID: <468EEF75.3000308@firstpr.com.au>
Date: Sat, 07 Jul 2007 11:42:13 +1000
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.4 (Windows/20070604)
MIME-Version: 1.0
To: ram@iab.org
Subject: Re: [RAM] Number of DFZ routers - radical improvement of BGP unlikely
References: <Pine.GSO.4.33.0705281004210.18621-100000@iscserv1> <465BA0DB.3070000@firstpr.com.au> <F11163E4-EBEC-4182-B746-D13A01958AC3@muada.com>
In-Reply-To: <F11163E4-EBEC-4182-B746-D13A01958AC3@muada.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 73734d43604d52d23b3eba644a169745
Cc:
X-BeenThere: ram@iab.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Routing and Addressing Mailing List <ram.iab.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/ram>, <mailto:ram-request@iab.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/ram>
List-Post: <mailto:ram@iab.org>
List-Help: <mailto:ram-request@iab.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/ram>, <mailto:ram-request@iab.org?subject=subscribe>
Errors-To: ram-bounces@iab.org

Hi Iljitsch,

Short version:

I no longer have any hope that BGP could be improved so much that
DFZ routers could cope indefinitely with current growth in the
global routing table size - or with the larger growth I had in
mind which would enable routers with RAM-based FIBs to handle
millions of prefixes to /24 granularity, without concern about
route aggregation.


Long version:

You quoted something I wrote on 29 May.

Initially, I had thought that the problem with routing and
addressing was with FIBs running hot and not coping with the
number of routes.  I proposed a RAM-based lookup system for IPv4
and IPv6 which would only make sense if the routers could use BGP
to cope with 3, 10 or 20 times the ~200k routes they currently
handle.  I thought that wouldn't be impossible, because BGP is
just a matter of a "CPU, router software, message with peers"
etc., operating at much lower speeds than the gigabits per second
of the FIB handling the packets themselves.

By late May, I was despondent about LISP ever helping much,
because I couldn't see how it could be incrementally deployed and
because it seemed to be a patch-up overlay for a routing system
where the FIBs surely could be made (in the 5 to 7 year timeframe)
to handle IPv4 space in millions of arbitrarily located /24
granularity prefixes.  So I was still thinking the best place to
start was by souping up BGP enough to be able to slice and dice
IPv4 space at the BGP level into millions of prefixes, with
RAM-based FIBs, without concern about route aggregation.  Then, I
thought, what remaining problems could be solved by placing less
demands on something like LISP.

That was when I had only a cursory understanding of BGP.

During May I learnt a lot more about the memory requirements, CPU
load, software and policy complexities of BGP.  I also learnt
something about the current implementation of the MRAI timer and
path hunting.  There was a fascinating discussion about this on
the RRG and then the IDR lists, starting 20 June:

http://psg.com/lists/rrg/2007/maillist.html#00146
http://www1.ietf.org/mail-archive/web/idr/current/maillist.html#02422

Then I came to understand what most people have known for years -
BGP uses intentionally highly simplified messages in order to help
with scaling, but this gives routers only limited abilities to
recognise which alternative paths may still work after another
path is withdrawn.  Even with the simplification, the load on CPUs
 and memory is extreme and the thought of handling five or ten
times more routes - and then ten and twenty times more in the next
ten years, is not realistic.

Now I think there could be some worthwhile improvements to BGP,
maybe to be developed and incrementally introduced over the next
few years - but that this would at most "hold-back-the-clock"
against routing table growth by a year or so.   The simplest
approach is probably Tony Li's and Geoff Huston's Path Length
Damping - yet this will rely on router manufacturers largely
restoring the RFC1771 (propagate withdrawals immediately) MRAI
timer function, which I understand they generally ignored by 2002
and which RFC4271 banned in 2006!

I wrote:

> So if BGP could be radically improved, . . .
>
> Unfortunately, no BGP experts think that such improvements
> are feasible.  Maybe some unconventional ideas from
> non-experts might be helpful.

I withdraw this suggestion.  I can't imagine any
backwards-compatible upgrade or incrementally deployable
replacement for BGP could achieve the radical improvements I was
hoping for.

Maybe someone else will have some great ideas for dramatically
souping up BGP.  BGP solves some very difficult problems.  I can't
think of any drastic improvements.


  - Robin

_______________________________________________
RAM mailing list
RAM@iab.org
https://www1.ietf.org/mailman/listinfo/ram