[Idr] Re: BGP path hunting, MRAI timer and Path Length Damping

Tony Li <tli@cisco.com> Thu, 21 June 2007 23:28 UTC

Return-path: <idr-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1I1W4G-0004tK-H9; Thu, 21 Jun 2007 19:28:00 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1I1W4F-0004tF-LV for idr@ietf.org; Thu, 21 Jun 2007 19:27:59 -0400
Received: from sj-iport-6.cisco.com ([171.71.176.117]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1I1W4E-0006h6-2d for idr@ietf.org; Thu, 21 Jun 2007 19:27:59 -0400
Received: from sj-dkim-3.cisco.com ([171.71.179.195]) by sj-iport-6.cisco.com with ESMTP; 21 Jun 2007 16:27:57 -0700
X-IronPort-AV: i="4.16,449,1175497200"; d="scan'208"; a="169875042:sNHT56386791"
Received: from sj-core-5.cisco.com (sj-core-5.cisco.com [171.71.177.238]) by sj-dkim-3.cisco.com (8.12.11/8.12.11) with ESMTP id l5LNRvkl023407; Thu, 21 Jun 2007 16:27:57 -0700
Received: from xbh-sjc-231.amer.cisco.com (xbh-sjc-231.cisco.com [128.107.191.100]) by sj-core-5.cisco.com (8.12.10/8.12.6) with ESMTP id l5LNRuka016693; Thu, 21 Jun 2007 23:27:56 GMT
Received: from xfe-sjc-211.amer.cisco.com ([171.70.151.174]) by xbh-sjc-231.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 21 Jun 2007 16:27:56 -0700
Received: from [171.70.224.106] ([171.70.224.106]) by xfe-sjc-211.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 21 Jun 2007 16:27:55 -0700
In-Reply-To: <467A393E.7040106@firstpr.com.au>
References: <467799C6.3020302@firstpr.com.au> <4678251A.5090002@apnic.net> <1182338421.46790d755a0ac@webmail.nist.gov> <467A393E.7040106@firstpr.com.au>
Mime-Version: 1.0 (Apple Message framework v752.3)
Content-Type: text/plain; charset="US-ASCII"; delsp="yes"; format="flowed"
Message-Id: <AB5476FA-C226-43BF-9C0E-93729020F276@cisco.com>
Content-Transfer-Encoding: 7bit
From: Tony Li <tli@cisco.com>
Date: Thu, 21 Jun 2007 16:27:54 -0700
To: Robin Whittle <rw@firstpr.com.au>
X-Mailer: Apple Mail (2.752.3)
X-OriginalArrivalTime: 21 Jun 2007 23:27:55.0811 (UTC) FILETIME=[CBD50F30:01C7B45B]
DKIM-Signature: v=0.5; a=rsa-sha256; q=dns/txt; l=6067; t=1182468477; x=1183332477; c=relaxed/simple; s=sjdkim3002; h=Content-Type:From:Subject:Content-Transfer-Encoding:MIME-Version; d=cisco.com; i=tli@cisco.com; z=From:=20Tony=20Li=20<tli@cisco.com> |Subject:=20Re=3A=20BGP=20path=20hunting, =20MRAI=20timer=20and=20Path=20L ength=20Damping |Sender:=20; bh=s2iZRk4R9QdDywmJk0zihzfz+jQ7wByMxJFHoPkiafs=; b=AY0sP9V6UtTb2Qcpmh53x+9kgk/8iTy2QB12Lm/lA8qSXeAwXSj0MGrn4MneFEpU5cNEW2vP UWLD/P78xtoBdPHZ72IbYwhRldvYTi574J7b1djpV4LOSaIfCgGhBfRf;
Authentication-Results: sj-dkim-3; header.From=tli@cisco.com; dkim=pass (sig from cisco.com/sjdkim3002 verified; );
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 33cc095b503da4365ce57c727e553cf1
Cc: idr <idr@ietf.org>, "K. Sriram" <ksriram@nist.gov>
Subject: [Idr] Re: BGP path hunting, MRAI timer and Path Length Damping
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/idr>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
Errors-To: idr-bounces@ietf.org

Hi Robin,


On Jun 21, 2007, at 1:39 AM, Robin Whittle wrote:

> Tony, can you advise on this?  Anyone else?


I'll try.


> Geoff's diagram :
>
>   http://www.potaroo.net/ispcol/2007-06/dampbgp.html
> 	
> explains the pattern of behavior which results in a series of
> longer best-path messages emanating out from the local part of
> the network, followed by a withdrawal.
>
> I am not sure about the time elapsed between the four sections
> of Fig 1, which I will call A, B, C and D.


The first thing to understand is that there is no absolute standard  
of time in any of this.  BGP (and routers in general) are 'soft' real- 
time systems.  That is, given a task to do, they will get around to  
it, but it will frequently be a 'best effort' approach, and not  
scheduled to a precise moment.  Other computations within the  
protocol and outside of the protocol in other management functions  
may cause significant time variances in event latency.


> Here is my understanding, based on Geoff's diagram and Tony's
> written explanation.  Is there a better explanation somewhere?
>
> A   This is the steady state before T = 0.0, when (2) detects
>     that its link to (1) goes down.


We should note that at this point, AS 5 should have paths as follows:

2: 2 1
3: 3 2 1
4: 4 3 2 1


> B, sort of . . .
>
> T =  0.1 secs
>
>    (2) sends withdrawals to (5) and (3).
>
>    This is almost immediately after the link goes down - say
>    a 0.1 second processing time after (2) determines that (1)
>    is unreachable.  This is the time it takes to generate two
>    withdrawal announcements.


Again, this would be ideal...  Reality is that you have asynchronous  
systems, updates are not computed simultaneously, and can be delayed  
for a variety of reasons.


>    (3) gets the withdrawal too.  This is where my understanding
>    departs from the diagram.  In B, (5) has sent an
>    announcement but 3 hasn't.  I figure they would typically
>    send their announcements at about the same time.


Recall 5's path table from above.  After processing the update from  
(2), it's path table should look like:

3: 3 2 1
4: 4 3 2 1

Since 5 and 3 are not synchronized at all, there's a significant race  
between the withdrawal from 3 and the advertisement of "5 3 2 1" by (5).


>    In my understanding, at 0.1 seconds, both (5) and (3) have
>    been sent the withdrawal from (2) but have not yet processed
>    them.


In figure B, we see (5) advertising its alternate path.  It has  
processed the withdrawal from 2, but not processed the to-be- 
delivered withdrawal from 3.


> In my understanding, there is no strung out sequence of longer
> and longer paths being announced by 5, as Geoff's diagram
> depicts.  Just the first announcement at T = 0.1s of a longer
> path, followed by a withdrawal at T = 30.2 seconds.  This does
> not really resemble the "path hunting" pattern Geoff and Tony
> refer to.


First, that is *exactly* path hunting.

You're assuming that 5 implements MRAI and doing so fairly  
precisely.  If 5 does not implement MRAI, you'd see exactly the set  
of updates in Geoff's diagram.


> My understanding of what Geoff writes is that "path hunting"
> occurs anyway, and is exacerbated in variations in the time
> constant of the Minimum Route Advertisement Interval (MRAI)
> Timer.  I am not sure if these are small variations of fractions
> of a second from the default 30 seconds, or some other default
> or configured values such as 20 to 50 seconds.  I don't see how
> various values of MRAI in different routers would affect my
> explanation of what happens in the situation depicted by Geoff's
> diagram, other than to change the delay time to 20.2 or 50.2
> seconds, for instance.


You just need a slightly richer topology to create more interesting  
race conditions.


> II guess this would need to replace the MRAI timer's function, or
> operate with longer time constants than that timer, but I don't
> think Tony's I-D mentions this.


The assumption was that this would subsume MRAI.


> This sounds like a great idea.  I understand that when applied
> to Geoff's diagram, here is what would happen, assuming for the
> moment that there was no MRAI timer:
>
> T = 0.1 seconds
>
>    (2) sends withdrawals to (5) and (3).
>
>
> T = 0.2 seconds.
>
>    (5) gets (2)'s withdrawal and does not announce anything,
>    since its best path "3, 2, 1" is now longer than before,
>    and Tony's (Geoff's?) PLD algorithm is invoked.


Exact ownership of the algorithm isn't clear to me.  It's been kicked  
around a bit and I thought I heard of it from someone, but I checked  
with them and they denied it.  In any case, I'll explicitly  
relinquish any claim to it.  I'm much more interested in getting it  
deployed...


> So with this PLD algorithm, as I understand it, there is only
> one message sent by (5), and it is the correct one, sent at 0.4
> seconds, not at 30.2 seconds as with current arrangements.


Sounds right.


> Tony, you co-wrote the RFC - what do you think?


Co-edited and really just for 1771.

As always, I think that the RFC is a fine goal, and that implementors  
frequently take short cuts that may or may not be warranted.  I'm not  
throwing rocks at them because frankly, many of those rocks would end  
up back in my lap.  As I noted earlier, implementing MRAI per the  
letter of the spec is non-trivial, and I sympathize with those who  
decided not to implement it.  On the flip side, I'm also leery of  
having a system that has no built-in protections against persistent,  
rapid oscillation.  If there are no protections built in, then some  
poor network operator will be awakened at 4am with a router that is  
melting down under the update load.  I've done that too and wouldn't  
wish that on anyone.  Thus, I think we have a responsibility to  
provide a better set of mechanisms.

Regards,
Tony

_______________________________________________
Idr mailing list
Idr@ietf.org
https://www1.ietf.org/mailman/listinfo/idr