Secure Proactive Recovery – a Hardware Based Mission Assurance Scheme Ruchika Mehresh1 Shambhu J. Upadhyaya1 Kevin Kwiat2 rmehresh@buffalo.edu shambhu@buffalo.edu kwiatk@rl.af.mil 1Department of Computer Science and Engineering, State University of New York at Buffalo, NY, USA 2Air Force Research Laboratory, Rome, NY, USA Research Supported in Part by ITT Grant No. 200821J and NSF Grant No. DUE-0802062 6th International Conference on Information Warfare and Security, 2011
Outline Structure Motivation Threat model System design Performance analysis Conclusion
Motivation Mission assurance Goals Feasibility study Survivability Security Fault tolerance Low cost (Time overhead) Adaptation and evolution Feasibility study Long running applications Prevention Detection Recovery Hardware-based Smart defender
Outline Structure Motivation Threat model System design Performance analysis Conclusion
Byzantine fault tolerance Threat Model Time diversity Spatial diversity Reactive recovery Proactive recovery Byzantine fault tolerance
The Quiet Invader Smart attacker Quiet invader Make decisions to maximize the potential of achieving their objectives based on dynamic information Quiet invader Camouflages to buy more time Plan to attack mission during critical stage (Why?) Example: Long running countdown for a space shuttle launch that runs for several hours
Outline Structure Motivation Threat model System design Performance analysis Conclusion
Replica 3 Coordinator Replica 1 H C Replica 2 H C Replica 3 H C Workload Workload Workload Workload Workload Replica 1 H C Replica 2 H C Replica 3 H C Replica n H C R R R R Periodic checkpoint Hardware Signature Periodic checkpoint Hardware Signature Hardware Signature Periodic checkpoint Hardware Signature Hardware Signature Periodic checkpoint Periodic checkpoint
Hardware Signature Generation IDS System reg
Outline Structure Motivation Threat model System design Performance analysis Conclusion
Performance Analysis Cases Workload Case 1: Systems with no checkpointing Case 2: Systems with checkpointing, no failures/attacks Case 3: Systems with checkpointing, failures/attacks Workload Java SciMark 2.0 benchmark workloads: FFT, SOR, Sparse, LU Multi-step simulation based evaluation approach [Reference: Mehresh, R., Upadhyaya, S. and Kwiat, K. (2010) “A Multi-Step Simulation Approach Toward Fault Tolerant system Evaluation”, Third International Workshop on Dependable Network Computing and Mobile Systems, October]
Results
Results FFT LU SOR Sparse Case 1 3421.09 222.69 13.6562 23.9479 Case 2 Table 1: Execution Times (in hours) for the Scimark workloads across three cases Results FFT LU SOR Sparse Case 1 3421.09 222.69 13.6562 23.9479 Case 2 3477.46 226.36 13.8811 24.3426 Case 3 (M=10) 3824.63 249.08 15.2026 26.7313 Case 3 (M=25) 3593.39 233.83 Table : Execution times (in hours) for the Scimark workloads for the three cases
Results
Results
Results M=5 M=10 M=15 M=25 Optimal Checkpoint Interval (hours) 0.3 0.5 0.65 0.95 Execution Times(hours) 248.97 241.57 238.16 235.06 Table : Approximate optimal checkpoint interval values and their corresponding workload execution times for LU (Case 3) at different values of M
Outline Structure Motivation Threat model System design Performance analysis Conclusion
Conclusion Low cost solution to secure proactive recovery Mission survivability Utilized redundant hardware Small overhead in absence of failures Effective preventive measure Future work To evaluate this scheme for a distributed system
Thank You !!