Download presentation
Presentation is loading. Please wait.
1
Debugging of Parallel Systems Joel Huselius (joel.huselius@mdh.se) A Short Introduction
2
Terminology Error (bug) An unwanted state in a product Fault An unintended condition that can cause an error Debug The process of locating, analysing, and correcting suspected faults
3
Classes of Errors Probe effect Observability Problem Livelock Deadlock Stampede effect Bystander effect Irreproducibility effects Completeness problem
4
Cyclic Debugging Repeated executions Execute – Halt – Examine – Continue loop Probe effect Irreproducibility problem Stampede effect
5
Monitoring To record information of a program execution, in order to review it in a model of the target environment offline Software Hardware Hybrid
6
Monitoring (cont) Browsing Replay Simulated Replay Probe effect Regression testing Accuracy of the model versus reality
7
Major Players and Contibutions Recent Disputations Dieter Kranzmüller “Event Graph Analysis for Debugging Massively Parallel Programs” 2000 Henrik Thane “Monitoring Testing and Debugging Distributed Real-Time Systems” 2000 Seminal Papers LeBlanc and Mellor-Crummey “Debugging Parallel Programs with Instant Replay” 1987 McDowell and Helmbold “Debugging Concurrent Programs” 1989 Carver and Tai “Replay and Testing for Concurrent Programs” 1991 Fidge “Fundamentals of Distributed System Observation” 1996 Schütz “Fundamental Issues in Testing Distributed Real-Time Systems” 1994
8
Conferences IEEE Parallel and Distributed Systems IEEE Symposium on Reliable Distributed Sysmtems ACM International Symposium on Software Testing and Analysis
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.