Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cognitive Support for Intelligent Survivability Management CSISM TEAM June 21, 2007.

Similar presentations


Presentation on theme: "Cognitive Support for Intelligent Survivability Management CSISM TEAM June 21, 2007."— Presentation transcript:

1 Cognitive Support for Intelligent Survivability Management CSISM TEAM June 21, 2007

2 2Outline Introduction Status, results and plans for technical thrusts –Multi-layer reasoning for cyber-defense administration Knowledge representation and rules for system wide reasoning (OLC) Fast containment response and policies (ILC) –Improving defense parameters and strategies by learning augmentation –Implementation and Integration Conclusions

3 CSISM Introduction and Background

4 4 Problem Domain: Self-Regenerative Systems Our Focus: Automated interpretation of observation and response selection.. Level of service w/o attack undefended Survivable (3 rd Gen.) Regenerative time Level of service Start of focused attack Graceful degradation: Adaptive response limited to static use of diversity and policy; Event-interpretation and response selection by human experts. Level of service w/o attack undefended Survivable (3 rd Gen.) Regenerative time Level of service Start of focused attack Retain level of service and improve defense: Static and dynamic use of artificial diversity; Use of wide area distribution; Automated interpretation of observation and response selection, augmented by learning from past experience. Cyber-Defense Survivable systems Automated … Self-improving……

5 5 Cyber-Defense Decision-Making Landscape

6 6Challenges Goal: Automate the reasoning performed by expert cyber-defense administrators –Effective, reusable, easy to port and retarget Challenges: –Making sense of low-level information (alerts, observations) to drive low-level defense-mechanisms (block, isolate etc.) such that higher-level objectives (survive, continue to operate) are achieved –Doing it as good as human experts –Additional difficulties Rapid and real time decision-making and response Uncertainty due to incomplete and imperfect information Widely varying operating conditions (no alerts to 100s of alerts per second) New symptoms and changes in adversary’s strategy

7 7Approach –Multi-perspective multi-hypothesis deliberation Keep all options open– delay the bindings Divide and conquer – Current-utility as well as potential adversarial counter- response based response selection A simple “match” is insufficient against intelligent adversary Unpredictability to counter gaming – Contain while deliberate Buy time – Learning-based dynamic modification of defense parameters and strategies “Immunity” against repeats and variants Interpret Select Response ILC OLC Learning Multi-Layer reasoning

8 Knowledge Representation and Rules for System-wide Reasoning

9 9 Objectives Represent knowledge of cyber-defense Allow reasoning about attack and defense, including look-ahead Automate most reasoning Encode enough detail to estimate relative goodness of alternatives in most situations Extract knowledge from Red Team encounters; attempt to generalize Separate generic, reusable, knowledge from system-specific

10 10 Achievements Classification of knowledge Classification of reasoning Breadth-first: Relationship between alerts, accusations, corruption, flooding, failures Instantiate for DPASA Depth-first: DPASA registration protocol Run 6, Nov 2005 Red Team exercise Encode knowledge and reasoning 1 st -order logic prototype Soar rules and data Representing concepts, instances and relations– use of a common ontology (Adventium’s Netbase)

11 11 1.Symptomatic: possible explanations for a given anomalous event –Both generic and system-specific 2.Relational: constraints that reinforce or eliminate possible explanations –Mostly system-specific 3.Teleological: possible attacker goals and actions that may be used to accomplish the goals –Mostly generic 4.Reactive: possible defensive countermeasures for a given attack –Both generic and system-specific Kinds of Knowledge Focus so far has been on 1, 2, and 4

12 12 Focus so far has been on restrictive reasoning. Restrictive –From observations of past events and knowledge of system properties, deduce good explanations and good defensive responses –(the reasoning restricts what is possible) Predictive –Look ahead, comparing alternatives Kinds of Reasoning

13 13 Example from Run 11, Nov 2005 Server 1 (Linux) Server 3 (Solaris) Server 4 (Linux) Server 2 (Windows) accusation: violated protocol accusation Reasoning: Under most likely assumption, no common-mode failure and exploit of at most one OS, Servers 2 and 3 can’t both be lying, so Server 1 must be corrupt. It’s not restartable, so quarantine it. Note that no information source is completely trusted.

14 14 (Simplified) Example from Run 6, Nov 2005 Monitor 3Monitor 4Monitor 2Monitor 1 Client 2 Client2 LAN Client 1 comm accusation: no heartbeats accusations Reasoning: All 4 monitors claim to have received communication from one client but accuse another client of not delivering heartbeats. They can’t all be lying. The communication path for some must be OK, so either Client 2 or its LAN is bad. Ping Client2 to determine which.

15 15 OLC Reasoning Flow

16 16 Rapid Prototyping Use automatic theorem prover –“prover9”, McCune, UNM –1 st order –encode restrictive reasoning –Advantage over Soar: –Existing algorithm for deep reasoning –Easier to get started –Disadvantages compared to Soar: –Goals are not selected automatically –Reasoning algorithm can’t be controlled –Non-1 st -order reasoning not available

17 17 Encoding in Soar Soar is based on more than 20 years research into human cognition. It uses pattern-directed inference and hierarchical control to reason in a manner similar to human thinking The OLC inference engine will use coherence theory to search for a set of hypotheses that is maximally consistent with the observations and with its experience—we anticipated the need, but our implementation has not yet faced a situation Use of standard ontology and Protégé Managing the complexity of knowledge acquisition Use of Herbal to generate Soar rules from higher level representation

18 18 Conclusion and Next Steps A good start: Knowledge and reasoning sufficient for defense of DPASA in some Red Team exercises, e.g., run 6 Rough estimate of coverage: Existing rules would reason about all alerts and defend successfully in roughly half of Nov 2005 runs in which human operators also defended successfully 2 nd half will be harder Needed now: Immediately: rules for flooding; redundant groups; phases of mission Soon: attacker objectives in larger-scale attacks

19 Fast Containment Response and Policies

20 20 Inner Loop Controller (ILC) Objectives Attempt to contain and correct the problem at the earliest stage possible Policy Driven: Implement policies and tactics from OLC on a single host. Autonomous: high speed response can work when disconnected from the OLC by an attack or failure Flexible: Policies can be updated at any time Adaptive: Use learned characteristics of host and monitored services to tune the policy. Low impact on mission: able to back out of defensive decisions when warranted

21 21 Survey of ILC Work Requirements –The threat model, Performance, Range of sensing and response, OLC communications Design –Study typical applications and recovery needs Policies First Prototype –Dynamically configurable rule-based policies Plans for Integration and Testing –With the testbed emulating the DPASA survivable JBI –As a stand-alone program on real host

22 22 ILC Prototype-1 Architecture Java Driver Program –Instantiate reasoning components, start load System API –OLC Communications –Sensing and Response Jess Inference Engine Policy Modules –For each application and services monitored Java Driver Jess Rule Engine A System API (Java+Jess) BCD Saved State Files jess facts and rules D

23 23 Components of ILC Response Monitored Service S Status, Settings Detection Rules for S Problem Types Problem Instance P Problem Types and Response Policies Detection API Response API Internal Objects used in implementing ILC responses. internal timers Evidence E

24 24 ILC Status – June 2007 Requirements and design for ILC Working Java Driver –Initializes Jess inference engine –Remote access to ILC for policy manipulation or remote debugging Preliminary System API modules for –ILC embedded in emulated test environment –Standalone ILC for Linux host –Initial ties with learning/adaptation module Sample policy modules –for SELinux, EFWAgent (Typical defense mechanisms)

25 25 Next Steps Integration with emulated test environment –Flesh out API, make compatible with ontology –Explore interactions with OLC, e.g. strategies involving dynamic ILC policy changes –Complete ties to the learning module More sample application policies –Explore broader range of behaviors, e.g. nondeterminism Standalone Testing –Install ILC on workstation and/or server and monitor live applications/services –Probe ILC response under failures and attacks

26 Improving Defense Parameters and Strategies

27 27 Learning Augmentation: Motivation Why learning? –Extremely difficult to capture all the complexities of the system, particularly interactions among activities –The system is dynamic (static configuration gets out of date) CSISM will learn to – improve the defensive posture better knowledge (about the attacks or attacker), better policies – improve how the system responds to symptoms better connection between response actions and their triggers Adaptation is the key to survival

28 28 Development Plan for Learning in CSISM 1.Responses under normal conditions (Calibration) 2.Situation-dependent responses under attack conditions 3.Multi-stage attacks

29 29 Analysis: RegTime by Quad Quad 0&1 are slower than Quads 2&3. Complex domain: human calibration (incorrectly) claimed that Quad 1 was slowest, missing Quad 0

30 30 Analysis: Registration Times by Client Type caf_plan, chem_haz and maf_plan are slower than other clients Complex domain: human calibration (incorrectly) claimed that caf_plan & maf_plan were slowest because of hand-typed password, missing chem_haz

31 31 Step 1: Calibration Calibrate the parameters of rules for normal operating conditions –Important first step because it learns how to respond to normal conditions –Initially, timing parameters from ILC, e.g. Client Registration, PSQ server local probes, SELinux enforcement, SELinux flapping, File integrity checks Core challenge: Offline Training + Good data + Complex environment - Dynamic system Online Training - Unknown data + Complex environment + Dynamic system CSISM’s Experimental Sandbox + Good data (self-labeled) + Complex environment + Dynamic system Very hard for adversary from “training” the learner!!! Human + Good data - Complex environment - Dynamic system Sandbox approach successfully tried in SRS phase 1

32 32 Step 1: Calibration Using algorithm of Last & Kandel –Calculates a membership score for each sample, based on how similar it is to nearby samples (the distance-to-density ratio). If score < threshold, it is an outlier –It can make estimates even for multi-modal data. xx x x x x x x x x x xx x x x x x x x x x Threshold Score

33 33 Results for CombOps Registration If threshold were 0.90, then x- values inside the green box would be OK Beta=0.001 Beta= 0.0025 Beta= 0.005

34 34 Results for all Registration times Beta=0.0001 Beta=0.0005

35 35 Beta=0.00 05 Results for all Registration times In the demo, you’ll see these two “shoulder” points, indicating upper and lower limits. As more observations are collected, the estimates become more confident of the range of expected values (i.e. tighter estimates to observations)

36 36 Status, Development Plan & Future steps 1.Responses under normal conditions (Calibration) a.Analyze DPASA data (done) b.Integrate with ILC (single node) (done) c.Add experimentation sandbox (single-node) d.Calibrate across nodes 2.Situation-dependent responses under attack conditions 3.Multi-stage attacks

37 Implementation and Integration

38 38 Objectives and Assumptions Objectives –CSISM Components should be reusable and portable Maximize genericity, and clear demarcation between specific and generics Standardized representation, generating CSISM internal representations from higher level specification –Evaluation framework should be “system scale”, easy to construct, easy to inject attack effects into, easy to interface with Emulation Assumptions –Soar can process alerts as fast as they are generated (not to say that the OLC input will not be flooded) –The survivable system ensures that alerts make it to the OLC and Learner – The survivable system ensures the ILC process runs with higher privilege –If the target is not corrupt, OLC’s command will be executed by the survivable system –Source IP addresses are not spoofed (can be satisfied by the ADF cards) Challenges Addressed –Standardized representation of concepts, instances and relationships involved in a survivable system –Time handling in reasoning and evaluation –Thread handling in the reasoning engine

39 39 Integration Framework

40 40 Achievement Summary OLC –Reasoning about accusations, information flow, and some context and protocol specific situations covering all alerts in half of the DPASA attack runs A subset of these is exercisable by the emulated testbed, the rest are tested from Soar (apart from rapid prototyping in Prover9) ILC –Confirmation that reactive response policies for typical defended applications or defense mechanisms can be built from small, reusable rule- based components Learning Augmentation –Calibration– set up and initial example (e.g., registration time) Validation framework for CSISM capabilities –Emulation of a subset of ODV survivable JBI implemented, ongoing Integration –OLC-system under test –Learner-ILC

41 41 Next Steps Challenges/obstacles? – Consistent set of hypotheses Coherence theory Plan for next steps in individual tasks –Outlined in earlier sections Plan for next steps in Integration –KR-work fully integrated with the OLC and system under test –Fuller emulation –ILC- system under test integration –ILC-OLC and Learning-OLC integration –More attack variations and support for red team access –Improved viewport into reasoning and metrics

42 42Conclusion Good start, gathered momentum Preliminary results are promising –OLC coverage –ILC feasibility –Learning insights Cross-project integration potential –Looked into SPDR at more detail Reasoning about attack plan recognition and OLC bin 3 ILC and DRED Same ontological representation –Would like to look into Other projects, for example: –VICI defense against rootkit to protect the ILC Other issues (e.g. timeliness) –Of defense –Interference with the timeliness requirements of the system under test Evaluation vehicle

43 Backup notes

44 44 Enforcement Off no-enforcement.soar Current: –Interpretation: node reports process-protection off, we note that self accusation –Response selection: enforcement-off self accusation causes blocking all ADF NICs on that host Next step: –Treat the self accusation generically—many alerts will be “self-accusations”– they will be handled by a single set of rules –Response selection will consider other actions like restarting a process, rebooting a host, blocking the NICs or isolating the LAN

45 45 Registration callback.soar, prepare-registartion.soar, reboot.soar, gui-up.soar Observation that a client is invited sets up an expectation (that GUI should appear in the future) If the GUI does not appear that triggers some interpretation (see below) Current: –An intermediate condition with a ordered prescription for remedies Reboot the client: It’s a client issue that rebooting may fix Re-register from another SM: If there is an SM/DC/AP issue this may solve the problem If all quads exhausted, try refresh the AP refs and reinvite –If there is a reason to suspect a quad, try isolating that SM before refresh Future: –Hypotheses that the client or the inviting SM may be bad, or the path may be bad –Restrictive reasoning considering info flow and other incoming events to narrow eliminate Maximally consistent set of hypotheses –Select response based on utilities (and predictive reasoning)


Download ppt "Cognitive Support for Intelligent Survivability Management CSISM TEAM June 21, 2007."

Similar presentations


Ads by Google