[Idr] Re: BGP path hunting, MRAI timer and Path Length Damping
Tony Li <tli@cisco.com> Thu, 21 June 2007 23:28 UTC
Return-path: <idr-bounces@ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1I1W4G-0004tK-H9; Thu, 21 Jun 2007 19:28:00 -0400
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1I1W4F-0004tF-LV for idr@ietf.org; Thu, 21 Jun 2007 19:27:59 -0400
Received: from sj-iport-6.cisco.com ([171.71.176.117]) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1I1W4E-0006h6-2d for idr@ietf.org; Thu, 21 Jun 2007 19:27:59 -0400
Received: from sj-dkim-3.cisco.com ([171.71.179.195]) by sj-iport-6.cisco.com with ESMTP; 21 Jun 2007 16:27:57 -0700
X-IronPort-AV: i="4.16,449,1175497200"; d="scan'208"; a="169875042:sNHT56386791"
Received: from sj-core-5.cisco.com (sj-core-5.cisco.com [171.71.177.238]) by sj-dkim-3.cisco.com (8.12.11/8.12.11) with ESMTP id l5LNRvkl023407; Thu, 21 Jun 2007 16:27:57 -0700
Received: from xbh-sjc-231.amer.cisco.com (xbh-sjc-231.cisco.com [128.107.191.100]) by sj-core-5.cisco.com (8.12.10/8.12.6) with ESMTP id l5LNRuka016693; Thu, 21 Jun 2007 23:27:56 GMT
Received: from xfe-sjc-211.amer.cisco.com ([171.70.151.174]) by xbh-sjc-231.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 21 Jun 2007 16:27:56 -0700
Received: from [171.70.224.106] ([171.70.224.106]) by xfe-sjc-211.amer.cisco.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 21 Jun 2007 16:27:55 -0700
In-Reply-To: <467A393E.7040106@firstpr.com.au>
References: <467799C6.3020302@firstpr.com.au> <4678251A.5090002@apnic.net> <1182338421.46790d755a0ac@webmail.nist.gov> <467A393E.7040106@firstpr.com.au>
Mime-Version: 1.0 (Apple Message framework v752.3)
Content-Type: text/plain; charset="US-ASCII"; delsp="yes"; format="flowed"
Message-Id: <AB5476FA-C226-43BF-9C0E-93729020F276@cisco.com>
Content-Transfer-Encoding: 7bit
From: Tony Li <tli@cisco.com>
Date: Thu, 21 Jun 2007 16:27:54 -0700
To: Robin Whittle <rw@firstpr.com.au>
X-Mailer: Apple Mail (2.752.3)
X-OriginalArrivalTime: 21 Jun 2007 23:27:55.0811 (UTC) FILETIME=[CBD50F30:01C7B45B]
DKIM-Signature: v=0.5; a=rsa-sha256; q=dns/txt; l=6067; t=1182468477; x=1183332477; c=relaxed/simple; s=sjdkim3002; h=Content-Type:From:Subject:Content-Transfer-Encoding:MIME-Version; d=cisco.com; i=tli@cisco.com; z=From:=20Tony=20Li=20<tli@cisco.com> |Subject:=20Re=3A=20BGP=20path=20hunting, =20MRAI=20timer=20and=20Path=20L ength=20Damping |Sender:=20; bh=s2iZRk4R9QdDywmJk0zihzfz+jQ7wByMxJFHoPkiafs=; b=AY0sP9V6UtTb2Qcpmh53x+9kgk/8iTy2QB12Lm/lA8qSXeAwXSj0MGrn4MneFEpU5cNEW2vP UWLD/P78xtoBdPHZ72IbYwhRldvYTi574J7b1djpV4LOSaIfCgGhBfRf;
Authentication-Results: sj-dkim-3; header.From=tli@cisco.com; dkim=pass (sig from cisco.com/sjdkim3002 verified; );
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 33cc095b503da4365ce57c727e553cf1
Cc: idr <idr@ietf.org>, "K. Sriram" <ksriram@nist.gov>
Subject: [Idr] Re: BGP path hunting, MRAI timer and Path Length Damping
X-BeenThere: idr@ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Inter-Domain Routing <idr.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www1.ietf.org/pipermail/idr>
List-Post: <mailto:idr@ietf.org>
List-Help: <mailto:idr-request@ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/idr>, <mailto:idr-request@ietf.org?subject=subscribe>
Errors-To: idr-bounces@ietf.org
Hi Robin, On Jun 21, 2007, at 1:39 AM, Robin Whittle wrote: > Tony, can you advise on this? Anyone else? I'll try. > Geoff's diagram : > > http://www.potaroo.net/ispcol/2007-06/dampbgp.html > > explains the pattern of behavior which results in a series of > longer best-path messages emanating out from the local part of > the network, followed by a withdrawal. > > I am not sure about the time elapsed between the four sections > of Fig 1, which I will call A, B, C and D. The first thing to understand is that there is no absolute standard of time in any of this. BGP (and routers in general) are 'soft' real- time systems. That is, given a task to do, they will get around to it, but it will frequently be a 'best effort' approach, and not scheduled to a precise moment. Other computations within the protocol and outside of the protocol in other management functions may cause significant time variances in event latency. > Here is my understanding, based on Geoff's diagram and Tony's > written explanation. Is there a better explanation somewhere? > > A This is the steady state before T = 0.0, when (2) detects > that its link to (1) goes down. We should note that at this point, AS 5 should have paths as follows: 2: 2 1 3: 3 2 1 4: 4 3 2 1 > B, sort of . . . > > T = 0.1 secs > > (2) sends withdrawals to (5) and (3). > > This is almost immediately after the link goes down - say > a 0.1 second processing time after (2) determines that (1) > is unreachable. This is the time it takes to generate two > withdrawal announcements. Again, this would be ideal... Reality is that you have asynchronous systems, updates are not computed simultaneously, and can be delayed for a variety of reasons. > (3) gets the withdrawal too. This is where my understanding > departs from the diagram. In B, (5) has sent an > announcement but 3 hasn't. I figure they would typically > send their announcements at about the same time. Recall 5's path table from above. After processing the update from (2), it's path table should look like: 3: 3 2 1 4: 4 3 2 1 Since 5 and 3 are not synchronized at all, there's a significant race between the withdrawal from 3 and the advertisement of "5 3 2 1" by (5). > In my understanding, at 0.1 seconds, both (5) and (3) have > been sent the withdrawal from (2) but have not yet processed > them. In figure B, we see (5) advertising its alternate path. It has processed the withdrawal from 2, but not processed the to-be- delivered withdrawal from 3. > In my understanding, there is no strung out sequence of longer > and longer paths being announced by 5, as Geoff's diagram > depicts. Just the first announcement at T = 0.1s of a longer > path, followed by a withdrawal at T = 30.2 seconds. This does > not really resemble the "path hunting" pattern Geoff and Tony > refer to. First, that is *exactly* path hunting. You're assuming that 5 implements MRAI and doing so fairly precisely. If 5 does not implement MRAI, you'd see exactly the set of updates in Geoff's diagram. > My understanding of what Geoff writes is that "path hunting" > occurs anyway, and is exacerbated in variations in the time > constant of the Minimum Route Advertisement Interval (MRAI) > Timer. I am not sure if these are small variations of fractions > of a second from the default 30 seconds, or some other default > or configured values such as 20 to 50 seconds. I don't see how > various values of MRAI in different routers would affect my > explanation of what happens in the situation depicted by Geoff's > diagram, other than to change the delay time to 20.2 or 50.2 > seconds, for instance. You just need a slightly richer topology to create more interesting race conditions. > II guess this would need to replace the MRAI timer's function, or > operate with longer time constants than that timer, but I don't > think Tony's I-D mentions this. The assumption was that this would subsume MRAI. > This sounds like a great idea. I understand that when applied > to Geoff's diagram, here is what would happen, assuming for the > moment that there was no MRAI timer: > > T = 0.1 seconds > > (2) sends withdrawals to (5) and (3). > > > T = 0.2 seconds. > > (5) gets (2)'s withdrawal and does not announce anything, > since its best path "3, 2, 1" is now longer than before, > and Tony's (Geoff's?) PLD algorithm is invoked. Exact ownership of the algorithm isn't clear to me. It's been kicked around a bit and I thought I heard of it from someone, but I checked with them and they denied it. In any case, I'll explicitly relinquish any claim to it. I'm much more interested in getting it deployed... > So with this PLD algorithm, as I understand it, there is only > one message sent by (5), and it is the correct one, sent at 0.4 > seconds, not at 30.2 seconds as with current arrangements. Sounds right. > Tony, you co-wrote the RFC - what do you think? Co-edited and really just for 1771. As always, I think that the RFC is a fine goal, and that implementors frequently take short cuts that may or may not be warranted. I'm not throwing rocks at them because frankly, many of those rocks would end up back in my lap. As I noted earlier, implementing MRAI per the letter of the spec is non-trivial, and I sympathize with those who decided not to implement it. On the flip side, I'm also leery of having a system that has no built-in protections against persistent, rapid oscillation. If there are no protections built in, then some poor network operator will be awakened at 4am with a router that is melting down under the update load. I've done that too and wouldn't wish that on anyone. Thus, I think we have a responsibility to provide a better set of mechanisms. Regards, Tony _______________________________________________ Idr mailing list Idr@ietf.org https://www1.ietf.org/mailman/listinfo/idr
- [Idr] Re: [RRG] Re: BGP path hunting, MRAI timer … Tony Li
- [Idr] Re: [RRG] Re: BGP path hunting, MRAI timer … John G. Scudder
- [Idr] Re: BGP path hunting, MRAI timer and Path L… Tony Li
- [Idr] Re: [RRG] BGP path hunting, MRAI timer and … Tony Li
- [Idr] Re: BGP path hunting, MRAI timer and Path L… Tony Li
- [Idr] Re: BGP path hunting, MRAI timer and Path L… Robin Whittle
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… tom.petch
- [Idr] Re: BGP path hunting, MRAI timer and Path L… John G. Scudder
- [Idr] Re: BGP path hunting, MRAI timer and Path L… Tony Li
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Robin Whittle
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Robert Raszuk
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Curtis Villamizar
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Robin Whittle
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Curtis Villamizar
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Curtis Villamizar
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Danny McPherson
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Paul Jakma
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Joe Abley
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Paul Jakma
- RE: [Idr] Re: BGP path hunting, MRAI timer and Pa… Schliesser, Benson
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Michael
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Tony Li
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Jakob Heitz
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Tony Li
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Paul Jakma
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Geoff Huston
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Paul Jakma
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Michael
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Paul Jakma
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Curtis Villamizar
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Curtis Villamizar
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Paul Jakma
- Re: [Idr] Re: BGP path hunting, MRAI timer and Pa… Curtis Villamizar