Presentation is loading. Please wait.

Presentation is loading. Please wait.

Development of Verification and Validation Procedures for Computer Simulation use in Roadside Safety Applications NCHRP 22-24 VERIFICATION AND VALIDTION.

Similar presentations


Presentation on theme: "Development of Verification and Validation Procedures for Computer Simulation use in Roadside Safety Applications NCHRP 22-24 VERIFICATION AND VALIDTION."— Presentation transcript:

1 Development of Verification and Validation Procedures for Computer Simulation use in Roadside Safety Applications NCHRP 22-24 VERIFICATION AND VALIDTION METRICS Worcester Polytechnic Institute Battelle Memorial Laboratory Politecnico di Milano

2 Meeting Agenda 9:00-9:30Introductions/Instructions (Niessner, Focke) 9:30-10:30Definitions and V&V Procedures (Ray) 10:30-11:30ROBUST Project Summary (Anghileri) 11:30-NoonSurvey Results Noon-1:00Lunch 1:00- 2:30V&V Metrics (Ray) 2:30-4:00Future Work for Task 8 (Ray)

3 VALIDATION METRIC A mathematical measure that quantifies the level of agreement between simulation outcomes and experimental outcomes. (ASME V&V 10-2006) VERIFICATION METRIC ASME V&V 10-2006 does not define a term “verification metric” but we can infer from the definition of validation metric the following definition: A mathematical measure that quantifies the level of agreement between simulation outcomes and a known analytical or computational outcomes.

4 BACKGROUND Validation and verification metrics: – Quantify the level of agreement between two solutions. – Calculate the error (i.e., difference) between two solutions. Specification and use of validation/verification metrics is important because different metrics give different scores for the same time-history pairs. Need an automated, consistent, repeatable, non-subjective procedure.

5 PARTS OF A METRIC A V&V metric has two implicit parts. METRIC PARAMETER : ACCEPTANCE CRITERION The metric parameter is the item that is calculated to compare two results. It may be: Domain specific Shape comparison It may be: Deterministic Probabilistic The acceptance criterion indicates if the level of agreement between the two results is adequate. It may be: Deterministic Probabilistic (i.e., confidence interval)

6 Evolution of Metrics (from Trucano, et al 2002) Qualitative Comparison Deterministic metric with deterministic acceptance criteria Deterministic metric with probabilistic acceptance criteria Stochastic metric with probabilistic acceptance criteria

7 DOMAIN SPECIFIC METRICS A measure of the “reality of interest” in the specific technical domain. If we are designing an aircraft wing, the wing tip deflection is in our reality of interest. Wing-tip deflection is an appropriate domain specific metric for this problem. If we are designing an aircraft seat, the head injury criteria (HIC) of the occupant is in our reality of interest. The HIC is an appropriate domain specific metric for this problem.

8 DOMAIN SPECIFIC METRICS When the domain is one where evaluation metrics have already been defined for physical experiments, it makes sense to use the same evaluation metrics for comparisons. Examples: FAA seat back design uses the HIC. DoD ship hull design uses the peak acceleration measured at various points on the ship hull to a specified sub-surface blast. Rail car design uses the end crush of the car. What should we use in roadside safety? The Report 350 evaluation criteria (US) or The EN 1317 evaluation criteria (Europe).

9 DOMAIN SPECIFIC VALIDATION METRICS We already have well defined evaluation criteria for roadside safety crash tests: Report 350 Report 350 Update EN 1317 Which criteria would make good domain specific metrics for validation and verification? Some criteria are numerical (i.e., H and I) Some criteria are phenomological (i.e., C, D, F, G, K and L) We can evaluate a simulation using the same domain specific metrics as we use in crash tests. Comparison Report 350 evaluation criteria matrix for test 3-31 on the Plastic Safety System CrashGard Sand Barrel.

10 RECOMMENDATION We should use the same evaluation criteria for comparing the results of full- scale crash tests to simulations that we already use to evaluate full-scale crash tests. These would be deterministic domain specific metrics. Validation – for comparing to experiments. Verification – for comparing to known solutions (e.g., benchmarks) Discussion? Comparison Report 350 evaluation criteria matrix for test 3-31 on the Plastic Safety System CrashGard Sand Barrel.

11 ACCEPTANCE CRITERIA For phenomenological criteria (i.e., A, B, C, D, E, F, G, K and N), the acceptance criteria is binary – YES or NO. Examples In a simulation the vehicle penetrates the barrier but in the crash test the vehicle did not penetrate. Simulation Criterion A: Pass Crash Test Criterion A: Fail Valid? NO In both the simulation and validation crash test, the vehicle rolls over as it is redirected. Simulation Criterion G: Fail Crash Test Criterion G: Fail Valid? YES For numerical criteria (i.e., H, I, L and M) we could pick a criteria based on (1) engineering judgment or (2) an uncertainty analysis. Example: Assume our acceptance criterion is that the difference must be less than 20 percent. Criterion H in the table is satisfied so the comparison is valid. If the acceptance criterion were 10 percent, criterion H would be invalid. Comparison Report 350 evaluation criteria matrix for test 3-31 on the Plastic Safety System CrashGard Sand Barrel.

12 DOMAIN SPECIFIC VERIFICATION METRICS Verification is the comparison of simulation results to a known solution. In roadside safety we do not really have known solutions but… Simulation must obey some basic laws of mechanics: Conservation of energy, mass and momentum. Roadside safety verification metrics: Total energy must be conserved ± 10. Energy in each part must balance ± 10. Hourglass energy in each part must be less than ± 10. Time history sampling rate must ensure that the time histories yield correct data (i.e., acceleration integrate to correct velocity). Mass scaling must not affect the finial solution. These verification checks ensure that the simulation results at least conform to the basic physics represented by the model.

13 SHAPE COMPARISON METRICS For dynamic problems it is also useful to compare time histories. How do we quantify the similarity/difference between time history data? How do we compare shapes? Frequency domain metrics Point-wise absolute difference of amplitudes of two signals. Root-mean-squared (RMS) log spectral difference between two signals. Time domain metrics Relative absolute difference of moments of the difference between two signals Root mean square (RMS) log measure of the difference between two signals Correlation Coefficient Geer MPC Metrics Geer Sprague-Geer Knowles-Gear Russell Metrics ANOVA Metrics Ray Oberkampf and Trucano Global Evaluation Method (GEM) Velocity of the Residual Errors Weighted Integrated Factor (WIFac) Normalized Integral Square Error (NISE)

14 SHAPE COMPARISON METRICS We can group these into similar metrics: NARD Metrics Point-wise absolute difference of amplitudes of two signals. Root-mean-squared (RMS) log spectral difference between two signals. Relative absolute difference of moments of the difference between two signals Root mean square (RMS) log measure of the difference between two signals Correlation Coefficient MPC (magnitude-phase-combined) Metrics Geer Sprague-Geer Knowles-Gear Russell Metrics ANOVA Metrics Ray Oberkampf and Trucano Velocity of the Residual Errors Global Evaluation Method (GEM) Weighted Integrated Factor (WIFac) Normalized Integral Square Error (NISE) Nard frequency domain metrics very rarely used. Nard time domain metrics seldom used. Several are components of the MPC metrics so using them is redundant. Ray and Oberkampf methods almost identical. GEM is part of a proprietary software package so details are difficult to know. Use Oberkampf’s version for illustration. All four MPC metrics are similar. For illustration we will use Sprauge-Geer MPC metrics.

15 SHAPE COMPARISON METRICS Sprauge-Geers MPC Metrics The magnitude component measures relative differences in magnitude between the computational (i.e., c) and measured (i.e., m) responses. The phase component (i.e., P) measures relative differences in phase or time of arrival between the computational (i.e., c) and measured (i.e., m) responses. The combined component combines the affect of P and M into one parameter. P and M are coordinates on a circle of radius C. 0 ≤ M, P and C ≤ 1.0 Perfect agreement is M=P=C=0.0 No relation is M=P=C=1.0

16 SHAPE COMPARISON METRICS MPC Metric Components All the MPC metrics can be viewed coordinates of a circle: - M is the x coordinate - P is the y coordinate - C is the radius of the circle

17 SHAPE COMPARISON METRICS ANOVA Metrics This is the average residual error between the measured (i.e., m) and computed (i.e., c) signals at each time step. This is the standard deviation of the residual errors. If we know the mean and standard deviation and we assume error is random (i.e., normally distributed) we can perform a standard statistical hypothesis test. Calculate the T statistic for confidence level α and N samples. For crash tests, N is always very large -- effectively ∞. This is a very well established statistical technique applied to error estimation. Identical curves would give a mean, standard deviation and T score of zero. All three scores run between zero (best) and one (worst) like the MPC metrics.

18 SHAPE COMPARISON METRICS All the time-domain metrics are point-to-point comparisons. It is important that the measured (i.e., m) and calculated (i.e., c) data should: Be collected at the same time interval so each m and c value matches in time. This is highly desirable although not totally essential. Values can be interpolated in order to accomplish the same thing but this introduces another source of random error. Be filtered in the same way to avoid numerical noise from giving in- accurate results Highly desirable but not totally essential. If the data are filtered differently there may be poor magnitude or error comparisons that are a result of the filtering rather than the shape comparison. Be shifted so that the impact time is coordinated. Highly desirable but not totally essential. There are methods to account for a time shift (i.e., Knowles-Gear accounts for this internally).

19 SHAPE COMPARISON METRIC EVALUATIONS Three studies comparing different shape comparison metrics were found: Schwer (2004) Schwer (2006) Compared: - Sprauge-Geer MPC - Russell MPC -Oberkampf ANOVA -RMS Selected: - Sprauge-Geer MPC - Knowles-Gear MPC -Oberkampf ANOVA Ray (1997) Plaxico, Ray & Hiranmayee(2000) Compared: -Geer MPC -Relative Moments -Correlation Coefficient -Log RMS -Log spectral distance -Ray ANOVA -Velocity of Residuals Selected: -Geer MPC -Ray ANOVA -Velocity of Residuals Moorecroft (2007) Compared: -Sprauge-Geers MPC -WiFAC -GEM -NISE Selected: -Sprauge-Geers MPC All three recommended: - One of the Geer MPC metrics - One of the ANOVA metrics

20 SHAPE COMPARISON METRIC EVALUATION Ray et al Ray used the NARD metrics, the ANOVA metrics and the Geer MPC metrics to six “identical” rigid pole crash tests performed by Brown.

21 SHAPE COMPARISON METRIC EVALUATION Ray (1997) Based on the ANOVA results Ray suggested: - e < 5 % - σ < 20 % - V < 10 % 0 th though 2 nd moments less than 20 % 3 rd though 5 th moments do not seem to be discerning. - RMS > 70 % ? - ρ > 0.90 - ΔAB < 0.35 ? - D ? M, P and C < 10 %

22 SHAPE COMPARISON METRIC EVALUATION Ray et al Plaxico, Ray and Hiranmayee (2000) used the NARD metrics, the ANOVA metrics and the Geer MPC metrics to compare test 3-11 results for a strong-post w-beam guardrail test from TTI.

23 SHAPE COMPARISON METRIC EVALUATION Ray et al Plaxico, Ray and Hiranmayee (2000) used the NARD metrics, the ANOVA metrics and the Geer MPC metrics to compare test 3-11 results for a strong-post w-beam guardrail test from TTI. X X Relative moments all generally less than 20 % X X XX

24 SHAPE COMPARISON METRIC EVALUATION Ray et al Plaxico, Ray and Hiranmayee (2000) used the NARD metrics, the ANOVA metrics and the Geer MPC metrics to compare test 3-11 results for a strong-post w-beam guardrail test from TTI. X

25 Schwer examined four metrics: – Oberkampf’s version of the ANOVA metrics – Sprauge-Geers MPC metrics – Russell metrics – Root-mean square (RMS) metric Results should be consistent with Subject Matter Expert (SME) opinions. Increased use of metrics, but selection rationale is rarely specified. SHAPE COMPARISON METRIC EVALUATION Schwer

26 Schwer examined: An ideal decaying sinusoidal wave form with: +20% magnitude error -20% phase error +20% phase error An example experimental time history with five simulations:

27 SHAPE COMPARISON METRIC EVALUATION Schwer Magnitude comparison: All but the Russell metric are biased toward the measured value (this is good). Sprauge-Geer and RMS give values similar to the actual error. Phase comparisons: All four metrics are symmetric with respect to phase. RMS and ANOVA are very sensitive to the phase.

28 SHAPE COMPARISON METRIC EVALUATION Schwer The Oberkampf ANOVA metric has some interesting features: – Plotting the residuals shows exactly where in the time history the error is largest. – Top curve: Initial peak has large error Tail of the curve is very prone to error Middle of the curve is pretty good – Bottom curve: Initial peak has a lot of error. Initially relaxes to a more reasonable error. Error gets progressively worse as the event goes on.

29 SHAPE COMPARISON METRIC EVALUATION Schwer Metrics indicating the red simulation has the most error: – Sprauge-Geer – Oberkampf ANOVA and – Russell – (RMS indicates red as a close second) Metrics indicating the green simulation has the least error. – Sprauge-Geer – Russell – RMS – (Oberkampf ANOVa indicates green as a close second) Schwer does not recommend any particular metric although he discourages the use of the RMS. – All four metric provide same general results. – Each metric measure slightly different things so they all have some value. – When they all agree, you really have something. Sometimes one will yield a good result and another metric may provide a poor result.

30 SHAPE COMPARISON METRIC EVALUATION Schwer Schwer also sent five time history comparisons to 11 experts and asked them to rate the agreement between zero and unity (unity being the best).

31 SHAPE COMPARISON METRIC EVALUATION Schwer The subject matter expert results agree with both the Sprauge-Geers and Knowles-Gear metrics. The expert results generally have the same range (note the error bars) and the metrics tend to be positioned in the mid-range of expert opinions. Both the Sprauge-Geers and Knowles-Gear metrics appear to be good predictors of expert opinion.

32 Quantitative curve shape metrics: – Three components: peak, phasing, and shape. – Components sometimes combined. – Need consistent values: If 10% magnitude (peak) error is “good”, a 10% shape error should also be considered “good”. Results should be consistent with Subject Matter Expert (SME) opinions. Increased use of metrics, but selection rationale is rarely specified. SHAPE COMPARISON METRIC EVALUATION Moorecroft

33 Four Curve Shape Metrics Evaluated Sprague and Geers (S&G) – General purpose curve shape metric – Implemented in a spreadsheet Weighted Integrated Factor (WIFac) – Automotive curve shape metric – Implemented in a spreadsheet Global Evaluation Method (GEM) – Automotive curve shape plus peak and timing – Requires ModEval (stand alone program) Normalized Integral Square Error (NISE) – Biomechanics curve shape plus magnitude and phase – Implemented in a spreadsheet SHAPE COMPARISON METRIC EVALUATION Moorecroft

34 20% Magnitude Error Only+/- 20% Phase Error Only SHAPE COMPARISON METRIC EVALUATION Moorecroft Moorecroft examined: Three ideal wave forms: +20% magnitude error -20% phase error +20% phase error And a typical head accel. curve. Head Acceleration

35 Three Ideal Waveforms and Head Acceleration ScenarioRef. ErrorS&GWIFacGEMNISE Magnitude20%20.016.710.91.6 + Phase~20%19.555.25.918.2 - Phase~20%19.555.05.418.2 Head Accel.6.3%9.933.13.62.9 SHAPE COMPARISON METRIC EVALUATION Moorecroft Sprauge-Geers is a direct measure of the error and works well for phase and magnitude. NISE is more sensitive to phase errors than magnitude. GEM is more sensitive to magnitude than phase. WIFac is more sensitive to phase errors than magnitude.

36 SHAPE COMPARISON METRICS Moorecroft Moorecroft compared five simulation models to a physical test. Which is really best? A: WIFac B: NISE C: Sprauge-Geers and GEM D: None E: None

37 Subject Mater Expert Survey 16 experts (industry, gov’t, academia) submitted evaluations of 39 test/simulation time history curves. Evaluations consisted of a score (excellent, good, fair, poor, very poor) for magnitude, phase, shape, and overall agreement. The data represent accel, vel, pos, angle, force, and moment time histories derived from both occupant & structural responses. Data normalized such that highest peak = 1. SHAPE COMPARISON METRIC EVALUATION Moorecroft

38 Example Curve (Pair 19/SME 1) Mag.PhaseShapeOverall Excellent GoodX FairXX PoorX Very Poor SHAPE COMPARISON METRIC EVALUATION Moorecroft

39 Subject Mater Expert Data Analysis Qualitative scores converted to quantitative: – Excellent = 1 – Good = 2 – Fair = 3 – Poor = 4 – Very Poor = 5 Basic statistical calculations computed for each test/simulation pair (average, mode, st dev, etc.). – Mode represents the most frequent response. SHAPE COMPARISON METRIC EVALUATION Moorecroft

40 Inconsistent Magnitude Pair 2 Mag. Mode = Poor Lower Mag. = 0.79 Error = 21% Pair 19 Mag. Mode = Fair Lower Mag. = 0.69 Error = 31% SHAPE COMPARISON METRIC EVALUATION Moorecroft

41 Excellent GoodFairPoor Very Poor Magnitude Error vs. SME Mag. Score SHAPE COMPARISON METRIC EVALUATION Moorecroft

42 Mag. % Error vs. SME Mag. Score Avg. Diff. St DevAvg - 1 St Dev Avg + 1 St Dev Suggested Range (%) Excellent2.21.70.53.90 – 4 Good7.12.44.79.54 – 10 Fair18.46.911.625.310 – 20 Poor34.412.721.747.120 – 40 Very Poor51.99.042.960.940 + SHAPE COMPARISON METRIC EVALUATION Moorecroft

43 Phasing Defined for SME Evaluation as the “timing of events.” Time of the peak is typically used within a relative error. Definition of a reference time allows for a time independent error calculation. – Simple relative error (∆t / t T ). – 5 ms difference at 50 ms (10%) vs. 150 ms (2.5%). – For ref = 100 ms (∆t / t ref ), error = 5% regardless of location in time history.

44 Phasing Average Difference Average (ms) St DevNumber Excellent1.7 7 Good2.02.78 Fair14.711.715 Poor21.26.246 Very Poor42DNE1

45 Comparison of Phasing Error to Mag. Low (% error) High (% error) Suggested Mag. Range Excellent050 - 4 Good084 - 10 Fair14010 – 20 Poor23020 - 40 Very Poor42 40 +

46 Metric Avg. vs. SME Shape S&GWIFacGEMNISESuggested Mag Range Excellent4.514.92.70.80 – 4 Good12.928.111.13.34 – 10 Fair25.945.423.914.410 – 20 Poor32.148.731.725.220 – 40 Very Poor 65.674.233.678.0> 40

47 Curve Shape Results S&G most closely reproduced the reference errors for idealized waveforms. S&G and GEM performed best in the discrimination evaluation. S&G, GEM, and NISE were all consistent with the SME evaluations. – Curve shape error matched the error ranges suggested from the magnitude data. SHAPE COMPARISON METRIC EVALUATION Moorecroft

48 Moorecroft’s Curve Shape Recommendations Simple, deterministic metric. – Easy to implement in a spreadsheet. – Limited number of seat tests. Individual error scores should be consistent. – i.e., 10% is “good” for all features (e.g., M, P and C). Error metric biased towards the experiment. – Consistent with certification activities. Appropriate results for idealized curves. Metric results consistent with SME values. Sprague & Geers metric meets these specifications and appears to be the best choice for validating numerical seat models. SHAPE COMPARISON METRIC EVALUATION Moorecroft

49 Recommendation Adopt the Sprauge-Geers MPC metrics and the ANOVA metrics as the shape comparison metrics for both validation and verification. Sprauge-Geers ANOVA

50 Recommendation Adopt the Sprauge-Geers MPC metrics and the ANOVA metrics as the shape comparison metrics for both validation and verification.

51 Phenomena Importance Ranking Tables A PIRT is a table that lists all the phenomena that a model is expected to replicate. Each phenomena in the PIRT should be validated or verified as appropriate. There will be a PIRT for: – Each vehicle type in Report 350 (i.e., 820C) – Each type of test (i.e., 3-10)

52 Roadside Safety Simulation PIRTS Example: 820C Model PhenomenaVerified?Validated? 1Correct overall geometry 2Correct mass and inertial properties 3Comparison to NCAP test 4Comparison to off-set frontal test 5Comparison to FMVSS 214 side impact test 6Rotating wheels 7Functional suspension system 8Functional steering system 9Failing tires 10Snagging edges (i.e., hood, door, etc) Validation Level of the model

53 Roadside Safety Simulation PIRTS Example: 820C Model PhenomenaVerified?Validated? 1Correct overall geometry√ 2Correct mass and inertial properties√ 3Comparison to NCAP test√ 4Comparison to off-set frontal test 5Comparison to FMVSS 214 side impact test 6Rotating wheels√ 7Functional suspension system√ 8Functional steering system 9Failing tires 10Snagging edges (i.e., hood, door, etc) Validation Level of the model5/10=0.5 We can have more general confidence in models that have a higher level of validation – more specific phenomena have been represented.

54 Roadside Safety Simulation PIRTS Example: Guardrail PhenomenaVerified?Validated? 1Correct overall geometry√ 2Correct mass and inertial properties√ 3Comparison to NCAP test√ 4Comparison to off-set frontal test√ 5Comparison to FMVSS 214 side impact test 6Rotating wheels√ 7Functional suspension system√ 8Functional steering system√ 9Failing tires 10Snagging edges (i.e., hood, door, etc) Validation Level of the model7/10=0.7 We can have more general confidence in models that have a higher level of validation – more specific phenomena have been represented.

55 Roadside Safety Simulation PIRTS Example: 820C Model PhenomenaVerified?Validated? 1Correct overall geometry√ 2Correct mass and inertial properties√ 3Comparison to NCAP test√ 4Comparison to off-set frontal test√ 5Comparison to FMVSS 214 side impact test 6Rotating wheels√ 7Functional suspension system√ 8Functional steering system√ 9Failing tires 10Snagging edges (i.e., hood, door, etc) Validation Level of the model7/10=0.7 We can have more general confidence in models that have a higher level of validation – more specific phenomena have been represented.

56 Roadside Safety Simulation PIRTS Example: 820C Model PhenomenaVerified?Validated? 1Correct overall geometry√ 2Correct mass and inertial properties√ 3Comparison to NCAP test√ 4Comparison to off-set frontal test√ 5Comparison to FMVSS 214 side impact test 6Rotating wheels√ 7Functional suspension system√ 8Functional steering system√ 9Failing tires 10Snagging edges (i.e., hood, door, etc) Validation Level of the model7/10=0.7 We can have more general confidence in models that have a higher level of validation – more specific phenomena have been represented.

57


Download ppt "Development of Verification and Validation Procedures for Computer Simulation use in Roadside Safety Applications NCHRP 22-24 VERIFICATION AND VALIDTION."

Similar presentations


Ads by Google