CURRENT MEETING REPORT Minutes of the Benchmarking Methodology Working Group (BMWG) Reported by Jim McQuaid, Bay Networks, Inc. SESSION ONE 1. Current Internet Draft document on ÒNetwork ElementÓ testing. Jim McQuaid reviewed the status of this document as shown below: o "Nearly done" 1993-1994 o "Final" editing pass February, 1995 o (appendices incorporated; version 01 posted) o Reviewed by area director (MO) August 1995 o (edited to reflect comments; version 02 posted) McQuaid briefly highlighted some of the changes in the 01 to 02 edit for the benefit of those unfamiliar with the current document. These include: o the insertion of language in section 4.0 "Evaluating the Results," reminding readers about the statistical issues in testing, which are otherwise not discussed in the draft, o in the section 7.0 "DUT Set Up," the change from SHOULD to MUST regarding reporting the exact configuration and set of enabled functions for the device under test, o and, in the section 9.4, "Frame sizes in the presence of disparate MTUs," the need to test up to the limits of the largest MTU are emphasized. The group then reviewed the remaining issues/points of discussion raised against the 02 version of the draft. Each was resolved as follows. 1.1 Latency definition used in section 26.2, "Latency". An inconsistency was pointed out in this section. RFC 1242 describes two possible latency measurement definitions, identified as "store and forward" and "bit forwarding" devices. The implicit definition in this section is a hybrid. It was agreed to remove the definition from this section and reference the two definitions in RFC 1242. Testing should report which definition was used. 1.2 Formula in section 26.5, "System Recovery". The formula currently stated in this section is inadvertantly backwards. System recovery is measured from timestamp A to timestamp B but the draft says to compute A-B. It was agreed that B-A is the correct statement of the method. 1.3 Scope of SNMP testing/Management Frame, Section C.2.4.3, "Management Query Frame". The actual frame could be added to this section, but the draft is sufficiently complete without it to go forward. It was clarified that the scope of the information requested by this SNMP frame should be for a single interface only, not for some other list of possible interfaces installed. It was agreed that the document should be revised to version 03 and posted in January. At that time there will be a last call and the draft will be forwarded to be published as an informational RFC. There was some discussion about whether or not this document fit in the new category of "Best current practice" documents and it was agreed that it did not really qualify for that. Before the next topic was taken up-and as an important transition issue- the question of the life cycle of this draft in the light of future developments was discussed. After discussion there was general agreement to the idea that the current ID should be published and that, as with RFC 1242, it would serve as a reference document for future efforts. Therefore a future document on switch testing, for example, need only discuss the points of difference with the ÒbasicÓ methodology described in the current draft. In effect, all the set up, reporting and other aspects of the methodology could be incorporated by reference into newer methodology drafts. 2. Ethernet Switch testing discussion Bob Mandeville and Ajay Shah presented some of the thinking behind the draft document circulated to the BMWG list (but not yet posted anywhere, read on). A lively discussion of about one hour raised a number of questions which will be addressed in the next draft. A partial draft that addresses some of the most contentious areas will be circulated to the list in January and, subsequently, a new ID will be posted. Bob Mandeville presented several important supplemental thoughts and figures. This material is available as a PDF file (Acrobat). The major question raised concerned the complex relationship of offered load, bi-directional traffic, Ethernet collisions and media limits versus switch limits. This issue is the foremost issue to be resolved. Some of the sub-issues raised in this discussion are listed below as 2.1. 2.1 Throughput: pattern of switching and loading 2.1.1 Determinate vs. Random vs. Cycled addressing of load synchronization? collisions - media 2.1.2 Overload, define it Q of congestion? Q of multiport? 2.1.3 Unidirectional vs. Bidir 2.1.4 Burst size / burst pattern / interframe/burst gap [too tied to tester architecture?] 2.1.5 Q of resonance / phase-lock (6 in, 6 out) in other words, could a specific burst pattern result in a specific output pattern which has some ÔmagicÕ frequency resonance for a given device? The following issues were raised and proved to be much less controversial. 2.2 Behavior tests is A to B affected by congestion on C to D? handling of errored frames [runts, etc.] 2.3 Address handling / learning 3. Call setup testing discussion This was an exploratory discussion. Three scenarios were proposed as possible benchmarking frameworks. ----- frames offered --->| DUT |-----> frames forwarded ----- Figure A, simple call setup benchmark ----- frames offered --->| DUT | | |<----> call setup call READY ACK <---| | | | ----- Figure B, call setup ACK benchmark ----- frames offered --->| DUT | | |(<----> call setup) call READY ACK <---| | | |-----> frames forwarded ----- Figure C, call setup and data forwarding benchmark One of the points raised was a question about other metrics already established in the telephony world for benchmarking those such as Figure C. Shikhar Bajaj volunteered to look into this matter. He later circulated (to the BMWG list) a summary of a BOF on this topic at the ATM FORUM meeting in London. This was sent out by Gregan Crauford, chair of the TEST Working Group. McQuaid reviewed the ITU-T Revised Draft/Recommendation I.35bcp, dated 7/95, entitled "Call processing performance for a B-ISDN." This discusses benchmark setups similar to Figure C above and gives target objectives for 'national, international and end to end' telephone networks. A target of roughly 1350 milliseconds plus propagation delay is cited for national networks. SESSION TWO Reported by Guy T Almes The working group met in a second session that was devoted to the IPPM agenda and chaired by Guy Almes. The chair used a set of slides to organize the session. These slides have been reproduced and follow these minutes. The written minutes will emphasize those points not in the slides or where there was significant discussion on points that are present in the slides. In the minutes, we will use the notation [slide k] to refer to the kth slide. The session began with a review of the IPPM BOF at Danvers, the IPPM/BMWG Meeting at Stockholm, and the interim IPPM/BMWG meeting at Pittsburgh in September [slide 3]. Further, the general qualities of metrics we want to define were discussed [slides 4 and 5]. There then began a series of four brainstorming discussion. The first discussion [slide 6] focused on Delay Metrics. We discussed various technical issues in delay measurements, including the importance of pinging 'through' routers rather than 'to' routers. Another was the impact of caches on attempts to accurately measure delay. We then discussed various motivations for measuring delay, among which were the following: o To measure the presence and degree of congestion in an IP cloud. For example, if the baseline delay through a given lightly loaded cloud is known, then delay above that baseline is a measure of congestion of the cloud (or of fallback or erroneous routing through the cloud). One particularly interesting example of a delay measurement being used by one provider (Geoff Huston of Telstra) was to measure the percentage of time that delay across a cloud exceeds a given threshold. If the percentage above this threshold exceeds a given amount, then it is taken as an indicator that service is unsatisfactory. o To measure the likelihood that telnet, or some other delay- intolerant application, will work effectively. In all these discussions, Mike O'Dell noted that we need an improved understanding of the phenomenology surrounding IP clouds before we would really understand what to measure. The importance of understanding both the unloaded delay and the delay under load was discussed. The second discussion [slide 7] focused on Flow Capacity. An important distinction exists between the ability of a cloud to sustain a single high-speed flow (as in remote access to a supercomputer) and the ability of a cloud to sustain a large number of small flows (as with a large number of http GETs). Treno attempts to measure the former (cf. http://www.psc.edu/~pscnoc/treno.html). A related issue includes the measures of packet loss and variance of delay, since these will frustrate the ability of TCP flow control to be effective. Mike O'Dell used the term 'hit and run flows' to characterize the large number of brief connections that appear due to the Web. Non-TCP flows, such as voice- over-UDP and MBone flows, are also important to measure and understand. It was noted that while users desire to understand the ability of a given cloud to carry traffic, providers desire to understand the nature of 'offered load'. Sean Doran noted that the techniques we were discussing, both in delay measurement and with treno-like tools, would work better if routers implemented an ability to rapidly receive certain probe packets, reverse the source and destination addresses, and fire them back to the sender; this would dramatically improve the accuracy and reduce the overhead of such measurements when routers are subject of tests. Also discussed was the conjecture raised at the Pittsburgh meeting, that an ongoing estimate of flow capacity might be possible by combining (1) a baseline measurement of flow capacity, and (2) an ongoing measurement of variations in round-trip delay. In this context, it was noted that the Network Time Protocol (NTP) implementation of Dave Mills maintained such an ongoing record of measured delay. A third, very brief, discussion focused on Availability [slide 8] metrics. Though a significant issue in the eyes of users (and therefore providers), the press of time prevented a thorough discussion. The fourth discussion focused on the role of router surrogates, or 'transponders', located at strategic locations on the Internet. Jamshid Mahdavi presented some slides on this topic. He noted that the Internet could largely be modeled as a graph whose vertices were clouds and exchange points, and whose vertices were connections from the clouds to exchange points. Two interesting issues were: o Whether such transponders should be placed (a) at exchange points, (b) just inside clouds near exchange points, (c) at user sites, or (d) some combination of these. o Whether such transponders should be (1) passive, responding, for example, to ping or treno probes, (2) active, initiating measurements among each other and storing the results for public distribution {using, for example, the techniques documented by the OpStat Working Group] and available to users via the Web. The session closed with a talk by Steve Corbato of the University of Washington. He described an approach to analyzing real-time network performance based on occasional fast (1-2 Hz) SNMP polling of router interfaces. His presentation example focused on the aggregate flow characteristics across a campus Internet border router. His slides are available at: http://weber.u.washington.edu/~corbato/ippmtalk/.