Download presentation
Presentation is loading. Please wait.
Published byDarcy Gilmore Modified over 9 years ago
1
Testing plan outline Adam Leko Hans Sherburne HCS Research Laboratory University of Florida
2
2 Testing plan Tasks to accomplish Come up with general steps to test each aspect of our tool evaluation strategy Select 3-5 applications that exhibit particular problems (too much synchronization, load imbalance) to use for every tool Re-evaluate Paradyn, TAU? (PAPI: n/a) Create listing of tools we wish to evaluate For each: Determine how to obtain Contact developers for evaluation license if necessary (make contacts w/developers) Gauge popularity of tool based on information available and information from developers General strategy For each tool, Describe overall architecture for tool (e.g., how instrumentation is achieved, etc) Give general comments about tool, including strong and weak points Estimate (contacting developers as necessary) how hard it would be to extend tool to support UPC and SHMEM Give scores (1-5, 5 best, 0 if n/a) for each feature listed in table 9.1 of last semester’s report Put into “feature matrix” spreadsheet Some scores may need to be filled in after to give relative ratings (available metrics, etc) After installing a tool, If tool similar to other tools or simple/not useful, spend less time evaluating tool If tool is a good candidate for extending, spend more time on it In each case, feature matrix and performance factors will be recorded for each toolAfter each evaluation, put together 5-15 slide presentation and show to group for further comments
3
3 “Feature matrix” testing plan Available metrics (9.2.1.3) Look at documentation, run tool, record what is available Give subjective score for depth of features provided Cost (9.1.1) Look up on web site or documentation If free, give maximum score If not, give score based on how much per seat $0-$500: 4 $500-$1000: 3 $1000-$2000: 2 $2000+: 1 Documentation quality (9.3.2) Subjective score based on reading available documentation and manuals (web sites, PDF files, etc) Give special emphasis to tools that provide documentation for other developers wishing to extend their tool Extendibility (9.3.1) Open-source tools may be given higher scores if the code is well-written Take a quick look at existing source to determine Best guess at how much work is required to add UPC and SHMEM support into tool May have to talk to developers to get better information (leverage contacts here) Would existing companies be willing to work with us to extend tool? Filtering and aggregation (9.2.3.1) Subjective score based on how frugal tool is with presenting information vs. usefulness of information Hardware support (9.1.4) Obtain from documentation One full point for each architecture, give weight to architectures that support SHMEM/UPC
4
4 “Feature matrix” testing plan Heterogeneity support (9.1.5) Can determine from documentation and from running tool (Not a necessary feature though) Installation (9.1.2) Subjective measurement based on time and effort needed for installation One person (Adam) should do all to give fair comparisons Leverage Adam’s SysAdmin experience Can move to two people if this becomes too demanding Interoperability (9.2.2.2) Can determine from documentation and from running tool Score based on how many formats tool can export to and how popular formats are e.g., 1 point for SvPablo’s format Question: Which formats are useful? Learning curve (9.1.6) Subjective score based on how long it takes to become proficient with standard options of tool Manual overhead (9.2.1.1) Subjective score based on how much time it takes to insert instrumentation calls in source code Less lines inserted = higher score Tools with automatic instrumentation instrumentation will receive 5 Measurement accuracy (9.2.2.1) Testing method: Run code with no instrumentation and insert timing calls Run code with instrumentation necessary to pinpoint bottlenecks and see how overall wall time was affected Program’s correctness should not be affected Keep output from correct run and diff with output of instrumented run to make sure results are the same
5
5 “Feature matrix” testing plan Multiple analyses (9.2.3.2) Can determine from documentation and from running tool Multiple executions (9.3.5) Can determine from documentation and from running tool Multiple views (9.2.4.1) Can determine from documentation and from running tool More views presented = higher score Special weight to hierarchical presentations Keep record of which views tool supports Performance bottleneck identification (9.2.5.1) See how well tool either: Automatically diagnoses problems Guides user to pinpoint bottlenecks in code Use our benchmark suite for test cases Probably give equal weight to each application in our suite the tool was successful in identifying bottleneck Profiling / tracing support (9.2.1.2) Can determine from documentation and from running tool For tracing: Keep record of size of each trace file for standard benchmark applications used during evaluations Assign trace-specific score after all tools have been evaluated based on relative trace file size
6
6 “Feature matrix” testing plan Response time (9.2.6) Subjective score based on measured observations from running tool Searching (9.3.6) Can determine from documentation and from running tool Software support (9.1.3) Can determine from documentation and from running tool Source code correlation (9.2.4.2) Can determine from documentation and from running tool System stability (9.3.3) Subjective score based on measured observations from running tool Technical support (9.3.4) Not really sure how to score this column Will most likely have to be subjective
7
7 Benchmark suite 3-5 applications used during performance tool testing Each has a particular “bottleneck” and a decent amount of parallelism (~30-50% efficiency) Too much unnecessary synchronization (can a tool determine this?) Too many small-sized messages (fix w/message aggregation and communication restructuring) Etc Each program should be “braindead” in that fixing the bottleneck will improve overall efficiency Also include “regular” program to make sure tool doesn’t give bogus recommendations Applications should be well-known and mainly well-implemented Problem size about ~200s running time sequential
8
8 Task dependencies Tool evaluations Inputs Brian Ideas on how to evaluate usability General rules accepted in literature (if any) Outputs Recommendations Good ideas from existing tools Things to avoid that caused problems in existing tools If any tools should be extended to support UPC/SHMEM Literature searches Inputs None in particular Outputs Strategies to best provide analysis capabilities in our tool Recommendations for incorporating bottleneck identification processes in our tool Some idea for Adam’s dissertation!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.