Nov 2009Geib/Morton/ Hasslinger/Fardid / draft-geib-metrictest-011 Draft-geib-ippm-metrictest-01. Incorporates RFC2330 philosophy, draft-bradner and draft-morton. Inputs and ideas on which this draft is based on: Draft-morton: Compare the single implementation against the metric specification. Philosophy RFC2330: IPPM metric implementations measuring simultaneously along an identical path, should result in the same measurement. Validate for a single implementation as well as for different compatible implemenatations. Apply Anderson Darling K-sample test with 95 % confidence (see RFC’s 2330 & 2679). To be conforming to a metric specification, publish the smallest resolution under which Anderson Darling k- sample test was passed with 95% confidence. Document the chosen implementation options (and be aware of the possibly resulting limitations for a statistical test comparing different implementations). Draft-morton (and IETF in general): improve IPPM metric specifications based on implementation experience before promoting them to standards.
Nov 2009Geib/Morton/ Hasslinger/Fardid / draft-geib-metrictest-012 Draft-geib-ippm-metrictest-01. Identical networking conditions for repeated measurements. Metric implementations will be operated in real networks. Metric compliance should be tested under live network conditions too. Identical networking conditions for multiple flows can be reached by: Setting up a tunnel using IP/MPLS transport between two sites. Simultaneously measure with 5 or more flows per implementation. Ensure that the test set up doesn’t interfere with the metric measurement. Metric Implement. A Instance 1 Metric Implement. A Instance 2 Tunnel terminati on 1 Tunnel terminati on 2 Internet Example: „repeating“ measurements under identical network conditions with a single implementation by measuring with two parallel flows.
Nov 2009Geib/Morton/ Hasslinger/Fardid / draft-geib-metrictest-013 Draft-geib-ippm-metrictest-01. Some results with two instances of a single implementation. Unless stated otherwise: two implementations, partially sharing a path, same packet size and same queue. Resolution is 1 s. Data has been normalised to the same average value (ADK is sensitive to variations in averages too). AD2 (95%) test. For more details on measurement set up see ietf_75_ippm_geib_metrictest.pdf. Single instance, different packet sizes, different queues, low load, normalised on same mean. Two instances, same queue, same packet size, moderate load, not normalised. DelayMean [ms]Standard Dev. [ms]AD2 Test Flow A passed Flow B JitterMean [ms]Standard Dev. [ms]AD2 Test Path A passed Path B
Nov 2009Geib/Morton/ Hasslinger/Fardid / draft-geib-metrictest-014 Draft-geib-ippm-metrictest-01. More results (1). Two instances, same queue, same packet size, moderate load, not normalised, 32 samples only. Single instance, single queue, low load, results split into four contiguous sets of data (“repeated measurement single implementation”), not normalised. Pckt LossMean [pckts]Standard Dev. [pckts]AD2 Test Path A passed Path B DelayMean [ms]Standard Dev. [ms]ADK passedAD2 failed Interval A With B,CWith D Interval B With A, CWith D Interval C With A, C, D Interval D With CWith A, B
Nov 2009Geib/Morton/ Hasslinger/Fardid / draft-geib-metrictest-015 Draft-geib-ippm-metrictest-01. More results (2). Two instances, same queue, same packet size, low load, data has been normalised. ADK test passed after limiting temporal resolution to 25 s. Single instance, different packet sizes, different queues, low load, normalised on same mean. ADK test passed after limiting temporal resolution to 150 s. DelayMean [ms]Standard Dev. [ms]AD2 Test Path A100.9 failed Path B DelayMean [ms]Standard Dev. [ms]AD2 Test Flow A failed Flow B
Nov 2009Geib/Morton/ Hasslinger/Fardid / draft-geib-metrictest-016 Draft-geib-ippm-metrictest-01. Next steps. If the concept is comprehensible and makes sense, is there support to go on? If “yes”: complete the draft by adding more of Al’s ideas, add figures and so on. Get a review by design team (name volunteers). Improve draft again, resubmit and suggest as WG draft. If the answer is “no” – read the draft and suggest changes.
Nov 2009Geib/Morton/ Hasslinger/Fardid / draft-geib-metrictest-017 Backup
Nov 2009Geib/Morton/ Hasslinger/Fardid / draft-geib-metrictest-018 Draft-geib-ippm-metrictest-01. Prior work: RFC2330 repeatability (precision). RFC2330: „A methodology for a metric should [be] repeatable: if the methodology is used multiple times under identical conditions, the same measurements should result in the same measurements.” Draft-geib: This demands a high precision. By measuring a metric multiple times, probes are drawn from the underlying (and unknown) distribution of networking conditions. Source: Wikipedia High precision, low accuracy High accuracy, low precision
Nov 2009Geib/Morton/ Hasslinger/Fardid / draft-geib-metrictest-019 Draft-geib-ippm-metrictest-01. Prior work: RFC2330/2679 ADK sample (95% confidence). RFC2330: „A methodology for a given metric exhibits continuity if, for small variations in conditions, it results in small variations in the resulting measurements.” Using a different metric implementation under otherwise identical (network) conditions should be a “small variation”. The sample distribution of metric implementation A is taken as the „given“ distribution against which the sample distribution of metric implementation B is compared by a goodness of fit test (proposal: Anderson-Darling k-test). RFC2330 provides guidelines on testing for goodness of fit for calibration (quotes): Summarizing measurements using histograms, the “EDF” is preferred. IPPM goodness-of-fit tests are done using 5% significance (see also RFC2679). …recommends the Anderson-Darling EDF test.