Hardware-less Testing for RAS Software Aviad Zlotnick and Orna Raz IBM Haifa Research Laboratory
Overview Early testing has many benefits Mocking and simulation are two extremes in enabling early testing Frequently neither is applicable We suggest a mid-way solution that can be adjusted to available resources Give an example from a real RAS system – one of the most challenging domains
Contents Early Testing Related Work Small Scale Simulation RAS HTDR – the Haifa Test Driver for RAS Concluding Remarks
Early Testing Products have faults The later a fault is discovered, the more expensive it is Customer unhappy Customer to Support to Development chain Access to data … Test early, test much, test wisely Early – limited by dependencies Much – limited by time and resources Wisely – not in our scope today
Contents Early Testing Related Work Small Scale Simulation RAS HTDR – the Haifa Test Driver for RAS Concluding Remarks
Dependency Test program Unit under test Collaborator The TEST decides who is called Collaborator
Dependency Test program Unit under test May not exist yet Expensive resource Difficult to activate Slow … The TEST decides who is called Collaborator
Mock: collaborator's behavior tailored to a specific test Test program Unit under test Selection Expected results The TEST decides who is called Mock collaborator Collaborator
Mocking Enables “test early” by eliminating dependencies Test script defines behavior High coupling of test and code Usually, many small tasks Done by the developers Painful when “many” becomes “very many”
Simulation Test program Unit under test Simulated System Collaborator Selection The TEST decides who is called Simulated System Collaborator
Simulation ACM Classification: Hardware, Performance Assumed to be done at hardware boundaries External specification defines behavior No coupling of test and code Very expensive A big, concentrated effort Done by a separate simulation team Very slow Conflicts with “test much”
Contents Early Testing Related Work Small Scale Simulation RAS HTDR – the Haifa Test Driver for RAS Concluding Remarks
Small Scale Simulation - between mocking and simulation Test program Unit under test Selection Collaborator logic Expected results The TEST decides who is called Mock collaborator Collaborator
What’s small about it? This is a simulation of a part of a system Not necessarily on hardware boundaries Some functions are just not there It requires less resources (smaller server, less memory) Tests logic, not performance Sometimes – using a “micro” model (e.g., 2 byte data block) It is FASTER than the real thing In contrast with full scale simulators Note: there seems to be a continuum between mocks and full simulation Choosing where to break it is a challenge
Small Scale Simulation: Schematic Diagram Script: Configure Execute Synchronize Inject Verify Environment, (e.g., ODM) Driving Functions Exercised Code Interface Layer One step ahead of Mock Objects – adds external control Also, ideal mocks are very simple, there is no logic in them Semantic Model
Contents Testing Related Work Small Scale Simulation RAS HTDR – the Haifa Test Driver for RAS Concluding Remarks
RAS - for the fortunate who have not been there… RAS – Reliability, Availability, Serviceability Managing a system when things go wrong Prevention Maintenance and service Repair In mission critical domains No single point of failure (SPOF) Concurrent repair, code load, upgrade
RAS Challenges Many more bad paths than good paths Physical actions, e.g., replace a faulty component Very hard to automate Timing is an issue – resources cannot be locked Time consuming Restart system Format disk drive Recover from everybody else’s bugs At the end of the day, you will be blamed
RAS Challenges Many more bad paths than good paths Physical actions, e.g., replace a faulty component Very hard to automate Timing is an issue – resources cannot be locked Time consuming Restart system Format disk drive Recover from everybody else’s bugs At the end of the day, you will be blamed RAS really stands for “It’S All youR fault”
Contents Testing Related Work Small Scale Simulation RAS HTDR – the Haifa Test Driver for RAS Concluding Remarks
The Enterprise Storage Environment Regression Scripts RMC Methods APIs: Device driver Device driver ODM APIs RAS Code The Enterprise Storage Environment Model Utilities The HTDR Environment Interface utilities Interface utilities Machine State RMC Objects ODM Physical State File System Model
Semantic Model (in a file system)
Automatic Regression – Capture and Replay Test Scripts Expected Results Run Output compare Passed Failed
Test Scripts Unix shell commands Include commands that simulate physical operations Synchronization: waitUntilGT “lsrsrc Disks | grep ‘state = 0’ | wc –l” 4 5 30
Status Alive and kicking Used mainly for unit tests High impact in integrating new hardware Automatic regression suite with dozens of tests Several bugs found, including timing Major issues: Some code under test is not deterministic Deciding why a test failed
Bottom line: We are still searching for the golden path Concluding Remarks Reviewed early testing Discussed mocking and simulation for overcoming dependencies Introduced Small Scale Simulation in between mocking and simulation Introduced RAS Described the Haifa Test Driver for RAS The issues of domain knowledge, deployment, and who-does-what are elaborated in the full paper Bottom line: We are still searching for the golden path