Download presentation
Presentation is loading. Please wait.
1
Failure in Railway Signal Box, Altona Germany
Real Time System Failure A Case Study Vijai Raghunathan Instructor : Dr.Lumpp Course : EE585 Fault Tolerant System, Fall 2006 University of Kentucky
2
University of Kentucky
Introduction In 12 March 1995, German railway replaced switch tower with a computerized system. Computer System – by Siemens Railway Station Involved – Hamburg Altona Hamburg Altona – extremely busy station with passengers every day. University of Kentucky
3
University of Kentucky
Altona Station University of Kentucky
4
University of Kentucky
The Old System 250 rail shunts A few hundreds of signals 7 major switch stands 50 experienced switchmen University of Kentucky 1/9
5
Electromechanical Systems
The above shows the old switch system where many switchmen are needed to control the system. This system was later automated (like the Altona railway line). University of Kentucky 1/9
6
University of Kentucky
The New System 18 switch stands All the above controlled by Intel 486 based real time systems One central operating and displaying system (BAR 16) coordinating all stands. BAR 16 – 16 bit interface BAR 16 – redundant hardware with 2 processors, ram and no disk Incompatible with the old one…….cannot run in parallel with old one. Needs 40 switchmen lesser than old system. University of Kentucky
7
University of Kentucky
The Computer System The computer system used. The diagram is labeled in German (taken from a paper by a German author who describes the Altona incident) University of Kentucky 1/9
8
University of Kentucky
Failure of New System BAR 16 failed immediately after start up. Cause not found for hours. So Altona station temporarily shut down. Passengers – forced to take other railway routes 25 kilometers away. University of Kentucky
9
University of Kentucky
Cause Programming Error Possible Stack Overflow Condition Routine handling stack overflow went into dead loop. Allocation of stack required a few more bytes over 3500 bytes. But RAM was only 3500 bytes. The stack was one written in software. So the memory the stack used had to be the RAM. The size of the RAM was 3500 bytes. But the programmers forgot about this and their stack routine algorithm used a little ( a few more bytes ) than 3500 bytes. Also, the programmers never paid attention to this as they felt the stack routine would never be called. University of Kentucky 1/9
10
University of Kentucky
What happened ? Actual Hardware Present Program’s stack size RAM Stack Size (algorithm) 3500 bytes Size>3500 bytes Error Zone * - As per the hardware, the stack can contain a maximum of 3500 bytes (assuming the RAM is dedicated for holding the stack values alone). If during the execution of the program the stack routine is called and a stack variable is attempted to be pushed into the 3501th byte space, then obviously the program will halt due to the presence of a supporting hardware. University of Kentucky Error zone * 1/9
11
University of Kentucky
Bug Fixed The bug appeared only twice in 4 days. Finally on Wednesday (system failed on Sunday), bug was fixed. A new RAM with 4000 bytes was built to allocate the stack routine. 4000 bytes (500 more to give a safe upper limit) Rail traffic back at 1:59pm on Wednesday. University of Kentucky
12
University of Kentucky
Conclusion Hidden Faults – tough to find! Siemens Manager – said the team felt the stack routine would not be used due to the presence of a good RTOS. Assumption that “stack routine” will never be used was a mistake. Expect your software to run in the worst situations while designing RT Systems. Altona Incident - Another Lesson Learnt by engineers! University of Kentucky
13
University of Kentucky
References University of Kentucky
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.