Testing Heuristics: We Have It All Wrong

Testing Heuristics: We Have It All Wrong
J. N. Hooker (1995) Presented to EARG: davet

Abstract / Summary Comparing 2 algorithms and using realistic test problems is hard Answers question of faster but not why More scientific approach is needed Confuse R&D… testing is only suitable for the D

Introduction New algorithm: an algorithmic race determines the fate and fame Emphasis on competition is anti-intellectual and does not build insight for the long run The richest observations are often informal Competition diverts time & resources from investigation

Alternative? Instead of competietion, controlled experimentation
For example: Find algorithm ‘Characteristic’ Design experiments to see how presence / absence of this characteristic affects performance Ideally build a mathematical model that predicts behaviour and then test experimentally

Evils of Competitive Testing
Life’s not fair Implementation Coding skill, (parameter) tuning ‘vanilla’ paradox Test problem selection Randomly generated pitfalls Selective advantage when introduced alongside algorithms Biased evolution / tail wags the dog No such thing as a representative problem set

Insight-less Kitchen sink algorithms
Informative testing occurs at design-stage Too much time on ‘code optimization’

A More Scientific Alternative
Efficient code is important, but more preliminary work required: ‘Bridge Competitions’ SAT DPLL Branching case study Need Feature Isolating Constructed benchmarks

What to Measure Solution Quality vs. Running Time Attempt to decouple
References McGeoch Measure only what a model predicts Flip the paradigm: (Page 10, 2nd para.) Code is the phenomenon Algorithm is a simplified model of the phenomenon (code) Running time is immaterial w.r.t. the real phenomenon Subroutine calls » subroutine details » data structures

Benefits of Scientific Testing
Irrelevent: (sic) Machine speed Data structures* Coding Skill Algorithm tuning Establishment of existing algorithm (implementations) Remove reliance on benchmark problems: Concoct problem sets specifically atypical

Research vs. Development
Benchmark Suites good for ‘development’ but controlled experimentation is needed for ‘research’ Evaluate research on contribution to understanding, not advancing the ‘state-of-the-art’

Testing Heuristics: We Have It All Wrong

Similar presentations

Presentation on theme: "Testing Heuristics: We Have It All Wrong"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Testing Heuristics: We Have It All Wrong

Similar presentations

Presentation on theme: "Testing Heuristics: We Have It All Wrong"— Presentation transcript:

Similar presentations

About project

Feedback