[NSIS] NSIS Interop Summary Report from Karlsruhe

Hi all,

the WG chairs solicited feedback about the latest NSIS interop in
Karlsruhe. So here it is (sorry for the long delay):

Overview:
=========
* Implementation teams:
  - University of Coimbra (GIST, QoS NSLP, NAT/FW) each Draft -13
  - Uni Goettingen (GIST, QoS NSLP, NAT/FW) each Draft-13
  - Uni Karlsruhe (GIST, QoS NSLP, NAT/FW) each Draft-13
  - Siemens Roke Manor Research was planned to participate but
    could not make it.
* We mainly concentrated on GIST tests, because
  this spec has to be off the table first.
* We used test cases that were defined by Christian Dickmann
  (and discussed at last IETF among Christian, Alan Ford, Andrew
  McDonald and partially me). You can find the document here
  (though in a format of an internet-draft, it is not published as such
  yet)
http://user.informatik.uni-goettingen.de/~cdickman/draft-dickmann-nsis-ntlp-interop-00.txt

* Christian did a great job in setting up tools for performing
  automated tests. This sped up testing a lot.
* GIST  was quite thoroughly tested with Goettingen vs. Karlsruhe
  (in both directions, see below).
  We tested all the main protocol features like D-Mode, C-Mode,
  TLS+TCP, v4,v6, late state installation, etc.
* The GIST Spec is implementable; we found no major issues
  with the GIST spec, only the need for
  some clarifications that can be found here in our tracker:
  https://projekte.tm.uka.de/trac/NSIS/report/10
  (I entered some additional tickets after the interop event).
* A major open issue is, however, NAT traversal functionality
  that could not be tested yet.
* Interop URL: https://projekte.tm.uka.de/trac/NSIS/wiki/Interop-20070509

Details (as written in a mail from Christian)
=================================================
Christian created a report based on the data collected by the
Interop Tools. Take a look at:
http://dickmann.homeunix.org/nsis/displayReportList.php
---------------------------------------------------------------------
Christian's mail:
Of course, this is only a subset of all tests that were made (and only
the last run of each test case), but I guess we are close to 90 or 95%
coverage. A lot of test cases come with captures so that you can look up
the message exchange for successful or failed tests. I set the
success-flag for most tests directly during the interop event once the
test was completed. However, for some test cases I set the flag today
based on my notes (and my memory).

As you know the test cases were based on [1].
Each test case belongs into one of these categories: "normal operation",
"recoverable problems", "unrecoverable problems".

I analyzed the report based on these categories (notation "X vs. Y"
means X is Initiator and Y is Responder):
Normal operation:
  Goettingen vs. Karlsruhe:
    3-Way Handshake: 7 PASSED, 10 run of 11 test cases
    Refresh of soft state: 2 PASSED, 2 run of 5 test cases
    Setup of multiple sessions: 3 PASSED, 3 run of 4 test cases
    Interception on NSIS forwarders: 3 PASSED, 4 run of 4 test cases
    GIST Stateless operation: 0 PASSED, 1 run of 2 test cases
  Karlsruhe vs. Goettingen:
    3-Way Handshake: 8 PASSED, 9 run of 11 test cases
    Refresh of soft state: 0 PASSED, 1 run of 5 test cases
  Goettingen vs. Coimbra:
    3-Way Handshake: 8 PASSED, 9 run of 11 test cases
    Refresh of soft state: 2 PASSED, 2 run of 5 test cases
  Coimbra vs. Goettingen:
    3-Way Handshake: 2 PASSED, 3 run of 11 test cases
  Karlsruhe vs. Coimbra:
    3-Way Handshake: 5 PASSED, 5 run of 11 test cases

Recoverable problems:
  Goettingen vs. Karlsruhe:
    3-Way Handshake: 2 PASSED, 3 run of 3 test cases
    Refresh of soft state: 4 PASSED, 4 run of 5 test cases
  Karlsruhe vs. Goettingen:
    3-Way Handshake: 1 PASSED, 1 run of 3 test cases
  Goettingen vs. Coimbra:
    3-Way Handshake: 2 PASSED, 2 run of 3 test cases
    Refresh of soft state: 4 PASSED, 4 run of 5 test cases

Unrecoverable problems:
  Goettingen vs. Karlsruhe:
    3-Way Handshake: 5 PASSED, 5 run of 5 test cases
  Karlsruhe vs. Goettingen:
    3-Way Handshake: 3 PASSED, 4 run of 5 test cases
  Goettingen vs. Coimbra:
    3-Way Handshake: 4 PASSED, 5 run of 5 test cases

As we can see, we did not test all test cases, even for Goettingen vs.
Karlsruhe.
However, for Goe vs. Ka we can say that we covered a really good part of
the GIST specification. In particular we tested all important features
(except for NAT traversal and Stateless mode). This makes me very
confident that the spec is implementable (except for NAT traversal, see
NSIS mailing list).
I think we tested Goettingen against Karlsruhe quite thoroughly (in both
directions). We ran 47 test cases (IPv4 only) and passed 38 of these.
That's a success rate of 80%. The tests against Coimbra were not as
complete. We ran 25 test cases (IPv4 only) between Göttingen and Coimbra
and passed 22 of them (= 88%). Thus, the test cases we tested were quite
successful, but we ran only half of the test cases compared to Goe/Ka.

Also note that we did only very few test cases with IPv6 and only with
Göttingen vs. Karlsruhe. However, those were all successful.

My personal lessons learned from this interop:
- Even 3 days were not enough. Considering that we only had 3 GIST
implementations to test, this sounds like we wasted too much time. I try
to find reasons below.
- I guess 70% of the problems we found between Göttingen and Karlsruhe
were NOT related to interop problems. On the other hand, all bugs found
in our (Göttingen) implementation (very few overall) were related to
real interop issues. I guess the reason is quite simple: I ran ALL test
cases defined in [1] with Göttingen vs. Göttingen and so found and fixed
a lot of bugs before arriving in Karlsruhe. I suppose the other
participants did not do internal tests as thorough as ours, which
resulted in this rate of non-interop issues.
- More internal tests (see above) but also remote tests before the
actual interop event can save a lot of time. I did some remote tests
with Karlsruhe and we were able to detect and fix some problems before
meeting in person. With Coimbra I also performed some remote tests (just
basic D-Mode and C-Mode, both failed), but not as thorough as with
Karlsruhe. We ended up having problems with TLS for many hours during
the interop event. We should do much more remote testing to avoid these
lengthy debugging phases!
- The interop draft [1] helped a lot. First, it has so many test cases
that you would not think of during an interop event. Second, for complex
tests the expected behavior and useful triggers are not always obvious.
Having them written down with one or two rounds of review saves time and
avoids mistakes during the interop event. Third, the interop draft now
helps to write this report, as we can "measure" the success.
- The interop tools helped a lot and saved a lot of time. I guess the
tools saved roughly 50% of my time. Maybe even more. Considering that we
are very short on time, this is a huge improvement. Besides that, many
test cases are simply not feasible without tools (for example attack
scenarios, corrupted objects, etc.). We should continue to use the tools
and extend them so that they can become the default tool to do the
interop. That would also help in remote testing.
- The organization by Roland was very good. We had enough switches,
cables, spare machines, space, etc. The network setup we discussed on
this mailing list a week before the interop was well set up and avoided
long setup and re-plugging phases. You can see that this is important
when you look at Luis and Vitor who tried to configure their network on
site to correspond to the network layout. It took them quite a while of
debugging to configure their machines correctly. More preparation from
all sides and an agreed-upon network setup help to avoid these kinds of
delay.
- Taking all these issues into account, I think that 3 days CAN be
enough time, if and only if enough preparation is done in form of
defining test cases, writing tools to help with testing, testing
internally against the test cases, doing remote tests and organizing and
preparing for the event itself.

Overall I am quite satisfied with the interop. I think we made a big
step ahead towards mature and interoperable implementations. We tested
and discussed a good portion of the spec and found no (or very few and
minor) real problems. And we tested quite a lot of test cases and had a
reasonable success rate that is even going to improve with remote tests
taking place in the next few weeks.

Thank you all. I had a great time at the interop and special thanks go
to Roland and the University of Karlsruhe for doing a very good job of
hosting this interop event.

Regards,
Christian Dickmann

[1]
http://user.informatik.uni-goettingen.de/~cdickman/draft-dickmann-nsis-ntlp-interop-00.txt
--------------------------------------------------------------------

_______________________________________________
nsis mailing list
nsis@ietf.org
https://www1.ietf.org/mailman/listinfo/nsis