RABA’s Red Team Assessments 14 December 2005 QuickSilver.

Slides:

Advertisements

Similar presentations

Computer Systems & Architecture Lesson 2 4. Achieving Qualities.

Advertisements

Mitigating Routing Misbehavior in Mobile Ad-Hoc Networks Reference: Mitigating Routing Misbehavior in Mobile Ad Hoc Networks, Sergio Marti, T.J. Giuli,

Security by Design A Prequel for COMPSCI 702. Perspective “Any fool can know. The point is to understand.” - Albert Einstein “Sometimes it's not enough.

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

Chapter 19: Network Management Business Data Communications, 5e.

DARPA ITS PI Meeting – Honolulu – July 17-21, 2000Slide 1 Aegis Research Corporation Intrusion Tolerance Using Masking, Redundancy and Dispersion DARPA.

CSE 486/586, Spring 2013 CSE 486/586 Distributed Systems Byzantine Fault Tolerance Steve Ko Computer Sciences and Engineering University at Buffalo.

Randomized Failover Intrusion Tolerant Systems (RFITS) Ranga Ramanujan Noel Schmidt Architecture Technology Corporation Odyssey Research Associates DARPA.

11 TROUBLESHOOTING Chapter 12. Chapter 12: TROUBLESHOOTING2 OVERVIEW  Determine whether a network communications problem is related to TCP/IP.  Understand.

1 Intrusion Tolerance for NEST Bruno Dutertre, Steven Cheung SRI International NEST 2 Kickoff Meeting November 4, 2002.

Chapter 19: Network Management Business Data Communications, 4e.

CS 582 / CMPE 481 Distributed Systems Fault Tolerance.

1 ITC242 – Introduction to Data Communications Week 12 Topic 18 Chapter 19 Network Management.

Mitigating routing misbehavior in ad hoc networks Mary Baker Departments of Computer Science and.

Unit 8: Tests, Training, and Exercises Unit Introduction and Overview Unit objectives:  Define and explain the terms tests, training, and exercises. 

SE 450 Software Processes & Product Metrics 1 Defect Removal.

Fundamentals of Information Systems, Second Edition

Security in Wireless Sensor Networks Perrig, Stankovic, Wagner Jason Buckingham CSCI 7143: Secure Sensor Networks August 31, 2004.

Stephen S. Yau CSE , Fall Security Strategies.

Testing - an Overview September 10, What is it, Why do it? Testing is a set of activities aimed at validating that an attribute or capability.

4 4 By: A. Shukr, M. Alnouri. Many new project managers have trouble looking at the “big picture” and want to focus on too many details. Project managers.

Software Process and Product Metrics

Maintaining and Updating Windows Server 2008

Chapter : Software Process

Section 11.1 Identify customer requirements Recommend appropriate network topologies Gather data about existing equipment and software Section 11.2 Demonstrate.

Database System Development Lifecycle © Pearson Education Limited 1995, 2005.

Software Quality Assurance Lecture #8 By: Faraz Ahmed.

University of Palestine software engineering department Testing of Software Systems Fundamentals of testing instructor: Tasneem Darwish.

Test Organization and Management

A Framework for Automated Web Application Security Evaluation

Unit 5:Elements of A Viable COOP Capability (cont.)  Define and explain the terms tests, training, and exercises (TT&E)  Explain the importance of a.

Improving Intrusion Detection System Taminee Shinasharkey CS689 11/2/00.

Software Inspection A basic tool for defect removal A basic tool for defect removal Urgent need for QA and removal can be supported by inspection Urgent.

Michael Ernst, page 1 Collaborative Learning for Security and Repair in Application Communities Performers: MIT and Determina Michael Ernst MIT Computer.

Centro de Estudos e Sistemas Avançados do Recife PMBOK - Chapter 4 Project Integration Management.

Chapter 19 Recovery and Fault Tolerance Copyright © 2008.

BSBPMG505A Manage Project Quality Manage Project Quality Project Quality Processes Diploma of Project Management Qualification Code BSB51507 Unit.

EEC 688/788 Secure and Dependable Computing Lecture 7 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

SPS policy – Information Presentation Presentation to ROS June 16, 2004.

Lecture 7: Requirements Engineering

Practical Byzantine Fault Tolerance

Byzantine fault-tolerance COMP 413 Fall Overview Models –Synchronous vs. asynchronous systems –Byzantine failure model Secure storage with self-certifying.

From Viewstamped Replication to BFT Barbara Liskov MIT CSAIL November 2007.

Process Improvement. It is not necessary to change. Survival is not mandatory. »W. Edwards Deming Both change and stability are fundamental to process.

1 ZYZZYVA: SPECULATIVE BYZANTINE FAULT TOLERANCE R.Kotla, L. Alvisi, M. Dahlin, A. Clement and E. Wong U. T. Austin Best Paper Award at SOSP 2007.

Byzantine fault tolerance

Fundamentals of Information Systems, Second Edition 1 Systems Development.

 Distributed file systems having transaction facility need to support distributed transaction service.  A distributed transaction service is an extension.

Chapter 8 Lecture 1 Software Testing. Program testing Testing is intended to show that a program does what it is intended to do and to discover program.

Fault Tolerance Benchmarking. 2 Owerview What is Benchmarking? What is Dependability? What is Dependability Benchmarking? What is the relation between.

6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.

Tufts Wireless Laboratory School Of Engineering Tufts University Paper Review “An Energy Efficient Multipath Routing Protocol for Wireless Sensor Networks”,

Making knowledge work harder Process Improvement.

EEC 688/788 Secure and Dependable Computing Lecture 6 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University

Virtualized Execution Realizing Network Infrastructures Enhancing Reliability Application Communities PI Meeting Arlington, VA July 10, 2007.

Detecting, Managing, and Diagnosing Failures with FUSE John Dunagan, Juhan Lee (MSN), Alec Wolman WIP.

Role Of Network IDS in Network Perimeter Defense.

Fault Tolerance

Intrusion Tolerant Distributed Object Systems Joint IA&S PI Meeting Honolulu, HI July 17-21, 2000 Gregg Tally

Info-Tech Research Group1 Info-Tech Research Group, Inc. is a global leader in providing IT research and advice. Info-Tech’s products and services combine.

Tool Support for Testing Classify different types of test tools according to their purpose Explain the benefits of using test tools.

Chapter 9 Database Planning, Design, and Administration Transparencies © Pearson Education Limited 1995, 2005.

Fundamentals of Fault-Tolerant Distributed Computing In Asynchronous Environments Paper by Felix C. Gartner Graeme Coakley COEN 317 November 23, 2003.

Project Integration Management

Fault Injection: A Method for Validating Fault-tolerant System

Chapter 13 Quality Management

System Testing.

EEC 688/788 Secure and Dependable Computing

Sisi Duan Assistant Professor Information Systems

Presentation transcript:

RABA’s Red Team Assessments 14 December 2005 QuickSilver

Agenda Tasking for this talk… Projects Evaluated Approach / Methodology Lessons Learned o and Validations Achieved The Assessments o General Strengths / Weaknesses o AWDRAT (MIT) Success Criteria Assessment Strategy Strengths / Weaknesses o LRTSS (MIT) o QuickSilver / Ricochet (Cornell) o Steward (JHU)

The Tasking “Lee would like a presentation from the Red Team perspective on the experiments you've been involved with. He's interested in a talk that's heavy on lessons learned and benefits gained. Also of interest would be red team thoughts on strengths and weaknesses of the technologies involved. Keeping in mind that no rebuttal would be able to take place beforehand, controversial observations should be either generalized (i.e., false positives as a problem across several projects) or left to the final report.” -- John Frank (November 28, 2005)

Specific Teams We Evaluated Architectural-Differencing, Wrappers, Diagnosis, Recover, Adaptive Software and Trust Management (AWDRAT) o October 18-19, 2005 o MIT Learning and Repair Techniques for Self-Healing Systems (LRTSS) o October 25, 2005 o MIT QuickSilver / Ricochet o November 8, 2005 o Cornell University Steward o Dec 9, 2005 o JHU

Basic Methodology Planning o Present High Level Plan at July PI Meeting o Interact with White Team to schedule o Prepare Project Overview o Prepare Assessment Plan Coordinate with Blue Team and White Team Learning o Study documentation provided by team o Conference Calls o Visit with Blue Team day prior to assessment Use system, examine output, gather data Test Formal De-Brief at end of Test Day

Lessons Learned (and VALIDATIONS achieved)

Validation / Lessons Learned Consistent Discontinuity of Expectations o Scope of the Assessment + Success Criteria Boiling it down to “Red Team Wins” or “Blue Team Wins” on each test required significant clarity o Unique to these assessments because the metrics were unique Lee/John instituted an assessment scope conference call ½ way through o we think that helped a lot o Scope of Protection for the systems Performer’s Assumptions vs. Red Team’s Expectations In all cases, we wanted to see a more holistic approach to the security model We assert each program needs to define its security policy o And especially document what it assumes will be protected / provided by other components or systems

LL: Scope of Protection

Validation / Lessons Learned More time would have helped A LOT o Longer Test Period (2-3 day test vice 1 day test) Having an evening to digest then return to test would have allowed more effective additional testing and insight o We planned an extra 1.5 days for most, and that was very helpful We weren’t rushing to get on an airplane We could reduce the data and come back for clarifications if needed We could defer non-controversial tests to the next day to allow focus with Government present More Communication with Performers o Pre-Test Site/Team Visit (~2-3 weeks prior to test) Significant help in preparing testing approach The half-day that we implemented before the test was crucial for us o More conference calls would have helped, too o Hard to balance against performers main focus, though

Validation / Lessons Learned A Series of Tests Might Be Better o Perhaps one day of tests similar to what we did o Then a follow-up test a month or two later as prototypes matured With the same test team to leverage understanding of system gained We Underestimated the Effort in Our Bid o Systems were more unique and complex than we anticipated o 20-25% more hours would have helped us a lot in data reduction Multi-talented team proved vital to success o We had programming (multi-lingual), traditional red team, computer security, systems engineering, OS, system admin, network engineering, etc. talent present for each test Highly tailored approach proved appropriate and necessary o Using more traditional network-oriented Red Team Assessment approach would have failed

The Assessments

Overall Strengths / Weaknesses of Projects Strengths o Teams worked hard to support our assessments o The technologies are exciting and powerful Weaknesses o Most Suffered a Lack of System Documentation We understand there is a balance to strike – these are research prototypes essentially after all Really limited ability to prepare for assessment o All are Prototypes -- stability needed for deterministic test results o All provide incomplete security / protection almost by definition o Most Suffered a Lack of Configuration Management / Control o Test “Harnesses” far from optimal for Red Team use Of course, they are oriented around supporting the development But, we’re fairly limited in using other tools due to uniquenesses of the technologies

AWDRAT Assessment October 18-19, 2005

Success Criteria The target application can successfully and/or correctly perform its mission The AWDRAT system can o detect an attacked client’s misbehavior o interrupt a misbehaving client o reconstitute a misbehaving client in such a way that the reconstituted client is not vulnerable to the attack in question The AWDRAT system must o Detect / Diagnose at least 10% of attacks/root causes o Take effective corrective action on at least 5% of the successfully identified compromises/attacks

Assessment Strategy Denial of Service o aimed at disabling or significantly modifying the operation of the application to an extent that mission objectives cannot be accomplished o attacks using buffer-overflow and corrupted data injection to gain system access False Negative Attacks o a situation in which a system fails to report an occurrence of anomalous or malicious behavior o Red Team hoped to perform actions that would fall "under the radar". We targeted the modules of AWDRAT that support diagnosis and detection. False Positive Attacks o system reports an occurrence of malicious behavior when the activity detected was non-malicious o Red Team sought to perform actions that would excite AWDRAT's monitors. Specifically, we targeted the modules supporting diagnosis and detection. State Disruption Attacks o interrupt or disrupt AWDRAT's ability to maintain its internal state machines Recovery Attacks o disrupt attempts to recover or regenerate a misbehaving client o target the Adaptive Software and Recovery and Regeneration modules in an attempt to allow a misbehaving client to continue operating

Strengths / Weaknesses Strengths o With a reconsideration of system’s scope of responsibility, we anticipate the system would have performed far better in the tests o We see great power in the concept of wrapping all the functions Weaknesses o Scope of Responsibility / Protection far too Limited o Need to Develop Full Security Policy o Single points of failure o Application-Specific Limitations o Application Model Issues Incomplete – by design? Manually Created Limited Scope Doesn’t really enforce multi-layered defense

LRTSS Assessment October 25, 2005

Success Criteria The instrumented Freeciv server does not core dump under a condition in which the uninstrumented Freeciv server does core dump The LRTSS system can o Detect a corruption in a data structure that causes an uninstrumented Freeciv server to exit o Repair the data corruption in such a way that the instrumented Freeciv server can continue running The LRTSS system must o Detect / Diagnose at least 10% of attacks/root causes o Take effective corrective action on at least 5% of the successfully identified compromises/attacks

Assessment Strategy Denial of Service o Aimed at disabling or significantly modifying the operation of the Freeciv server to an extent that mission objectives cannot be accomplished o In this case, not achieving mission objectives is defined as the Freeciv server exits or dumps core o Attacks using buffer-overflow, corrupted data injection, and resource utilization o Various data corruptions aimed at causing the server to exit o Formulated the attacks by targeting the uninstrumented server first, then running the same attack against the instrumented server State Disruption Attacks o interrupt or disrupt LRTSS's ability to maintain its internal state machines

Strengths / Weaknesses Strengths o Performs very well under simple data corruptions (that would cause the system to crash under normal operation) o Performs well under a large number of these simple data corruptions (200 to 500 corruptions are repaired successfully) o Learning and Repair algorithms well thought out Weaknesses o Scope of Responsibility / protection too limited o Complex Data Structure Corruptions not handled well o Secondary Relationships are not protected against o Pointer Data Corruptions not entirely tested o Timing of Check and Repair Cycles not optimal o Description of “Mission Failure” as core dump may be excessive

QuickSilver Assessment November 8, 2005

Success Criteria Ricochet can successfully and/or correctly perform its mission o “Ricochet must consistently achieve a fifteen-fold reduction in latency (with benign failures) for achieving consistent values of data shared among one hundred to ten thousand participants, where all participants can send and receive events." Per client direction, elected to use average latency time as the comparative metric o Ricochet’s Average Recovery demonstrates 15-fold improvement over SRM o Additional constraint levied requiring 98% update saturation (imposing the use of the NACK failover for Ricochet)

Assessment Strategy Scalability Experiments -- test scalability in terms of number of groups per node and number of nodes per group. Here, no node failures will be simulated, and no packet losses will be induced (aside from those that occur as a by-product of normal network traffic). o Baseline Latency o Group Scalability o Large Repair Packet Configuration o Large Data Packet Storage Configuration Simulated Node Failures – simulate benign node failures. o Group Membership Overhead / Intermittent Network Failure Simulated Packet Losses – introduce packet loss into the network. o High Packet Loss Rates Node-driven Packet Loss Network-driven Packet Loss Ricochet-driven Packet Loss o High Ricochet Traffic Volume o Low Bandwidth Network Simulated Network Anomalies – simulate benign routing and network errors that might exist on a deployed network. The tests will establish whether or not the protocol is robust in its handling of typical network anomalies, as well as those atypical network anomalies that may be induced by an attacker. o Out of Order Packet Delivery o Packet Fragmentation o Duplicate Packets o Variable Packet Sizes

Strengths / Weaknesses Strengths o Appears to be very resilient when operating within its assumptions o Very stable software o Significant performance gains over SRM Weaknesses o FEC-orientation - focus in statistics belies valuable data regarding complete packet delivery o Batch-oriented Test Harness – Impossible to perform interactive attacks Very limited insight into blow-by-blow performance o Metrics collected were very difficult to understand fully

STEWARD Assessment December 9, 2005

Success Criteria The STEWARD system must: o Make progress in the system when under attack. Progress is defined as the eventual global ordering, execution, and reply to any request which is assigned a sequence number within the system o Maintain a consistency of data replicated on each of the servers in the system

Assessment Strategy Validation Activities - tests we will perform to verify that STEWARD can endure up to five Byzantine faults while maintaining a three-fold reduction in latency with respect to BFT o Byzantine Node Threshold o Benchmark Latency Progress Attacks - attacks we will launch to prevent STEWARD from progressing to a successful resolution of an ordered client request o Packet Loss o Packet Delay o Packet Duplication o Packet Re-ordering o Packet Fragmentation o View Change Message Flood o Site Leader Stops Assigning Sequence Numbers o Site Leader Assigns Non-Contiguous Sequence Numbers o Suppressed New-View Messages o Consecutive Pre-Prepare Messages in Different Views o Out of Order Messages o Byzantine Induced Failover Data Integrity Attacks - attempts to create an inconsistency in the data replicated on the various servers in the network o Arbitrarily Execute Updates o Multiple Pre-Prepare Messages using Same Sequence Numbers and Different Request Data o Spurious Prepare, Null Messages o Suppressed Checkpoint Messages o Prematurely Perform Garbage Collection o Invalid Threshold Signature Protocol State Attacks - attacks focused on interrupting or disrupting STEWARD's ability to maintain its internal state machines o Certificate Threshold Validation Attack o Replay Attack o Manual Exploit of Client or Server Note: We did not try to validate or break the encryption algorithms.

Strengths / Weaknesses Strengths o First system that assumes and actually tolerates corrupted components (Byzantine attack) o Blue Team spent extensive time up front in analysis, design and proof of the protocol – it was clear in the performance o System was incredibly stable and resilient o We did not compromise the system Weaknesses o Limited Scope of Protection Relies on external entity to secure and manage keys which are fundamental to the integrity of the system STEWARD implicitly and completely trusts the client o Client-side attacks were out of scope of the assessment

Going Forward White Team will generate definitive report on this Red Team Test activity o It will have the official scoring and results RABA (Red Team) will generate a test report from our perspective o We will publish to: PI for the Project White Team (Mr. Do) DARPA (Mr. Badger)

Questions or Comments Any Questions, Comments, or Concerns?