Self-defending software: Collaborative learning for security and repair Michael Ernst MIT Computer Science & AI Lab.

Self-defending software: Collaborative learning for security and repair Michael Ernst MIT Computer Science & AI Lab

Demo Automatically fix previously unknown security vulnerabilities in COTS software.

Goal: Security for COTS software Requirements: 1.Protect against unknown vulnerabilities 2.Preserve functionality 3.COTS (commercial off-the-shelf) software Key ideas: 4.Learn from failure 5.Leverage the benefits of a community

1. Unknown vulnerabilities Proactively prevent attacks via unknown vulnerabilities –“Zero-day exploits” –No pre-generated signatures –No time for human reaction –Works for bugs as well as attacks

2. Preserve functionality Maintain continuity: application continues to operate despite attacks Important for mission-critical applications –Web servers, air traffic control, communications Technique: create a patch (repair the application) Previous work: crash the program when attacked –Loss of data, overhead of restart, attack recurs

3. COTS software No modification to source or executables No cooperation required from developers –No source information (no debug symbols) –Cannot assume built-in survivability features x86 Windows binaries

4. Learn from failure Each attack provides information about the underlying vulnerability Detect all attacks (of given types) –Prevent negative consequences –First few attacks may crash the application Repairs improve over time –Eventually, the attack is rendered harmless –Similar to an immune system

5. Leverage a community Application community = many machines running the same application (a monoculture) –Convenient for users, sysadmins, hackers Problem: A single vulnerability can be exploited throughout the community Solution: Community members collaborate to detect and resolve problems –Amortized risk: a problem in some leads to a solution for all –Increased accuracy: exercise more behaviors during learning –Shared burden: distribute the computational load

Evaluation Adversarial Red Team exercise –Red Team was hired by DARPA Application: Firefox (v1.0) –Exploits: malicious JavaScript, GC errors, stack smashing, heap buffer overflow, uninitialized memory Results: –Detected all attacks, prevented all exploits –For 15/18 attacks, generated a patch that maintained functionality After observing no more than 13 instances of the attack –No false positives –Low steady-state overhead

Collaborative learning approach Monitoring: Detect attacks Pluggable detector, does not depend on learning Learning: Infer normal behavior from successful executions Repair: Propose, evaluate, and distribute patches based on the behavior of the attack All modes run on each member of the community Modes are temporally and spatially overlapped Monitoring RepairLearning normal executionattack (or bug) Model of behavior

… Collaborative Learning System Architecture Constraints Patch/Repair Code Generation Patches … Client Workstations (Learning)‏Client Workstations (Protected)‏ Patches Patch/Repair results Merge Constraints (Daikon)‏ MPEE Data Acquisition Client Library Application Learning (Daikon) Sample Data MPEE Application Memory Firewall Live Shield Evaluation (observe execution)‏ MPEE Application Memory Firewall Live Shield Evaluation (observe execution)‏ MPEE Data Acquisition Client Library Application Learning (Daikon) Sample Data Central Management System

Structure of the system (Server may be replicated, distributed, etc.) Encrypted, authenticated communication Server Community machines

Learning Community machines Server distributes learning tasks Observe normal behavior Server

Learning … copy_len ≤ buff_size … Clients send inference results Server Community machines Server generalizes (merges results) Clients do local inference Generalize observed behavior copy_len < buff_size copy_len ≤ buff_size copy_len = buff_size

Monitoring Detector collects information and terminates application Server Community machines Detect attacks Assumption: few false positives Detectors used in our research: –Code injection (Memory Firewall) –Memory corruption (Heap Guard) Many other possibilities exist

Monitoring Server Violated: copy_len ≤ buff_size Instrumentation continuously evaluates learned behavior What was the effect of the attack? Community machines Clients send difference in behavior: violated constraints Server correlates constraints to attack

Repair Candidate patches: 1.Set copy_len = buff_size 2.Set copy_len = 0 3.Set buff_size = copy_len 4.Return from procedure Server Propose a set of patches for each behavior correlated to attack Community machines Correlated: copy_len ≤ buff_size Server generates a set of patches

Repair Server Patch 1 Patch 3 Patch 2 Distribute patches to the community Community machines Ranking: Patch 1: 0 Patch 2: 0 Patch 3: 0 …

Repair Ranking: Patch 3: +5 Patch 2: 0 Patch 1: -5 … Server Patch 1 failed Patch 3 succeeded Evaluate patches Success = no detector is triggered When attacked, clients send outcome to server Community machines Detector is still running on clients Server ranks patches

Repair Server Patch 3 Server redistributes the most effective patches Redistribute the best patches Community machines Ranking: Patch 3: +5 Patch 2: 0 Patch 1: -5 …

Outline Overview Learning: create model of normal behavior Monitoring: determine properties of attacks Repair: propose and evaluate patches Evaluation: adversarial Red Team exercise Conclusion

Learning Clients send inference results Community machines Server generalizes (merges results) Clients do local inference Generalize observed behavior … copy_len ≤ buff_size … Server copy_len < buff_size copy_len ≤ buff_size copy_len = buff_size

Dynamic invariant detection Idea: generalize observed program executions Initial candidates = all possible instantiations of constraint templates over program variables Observed data quickly falsifies most candidates Many optimizations for accuracy and speed –Data structures, static analysis, statistical tests, … copy_len < buff_size copy_len ≤ buff_size copy_len = buff_size copy_len ≥ buff_size copy_len > buff_size copy_len ≠ buff_size copy_len: 22 buff_size: 42 copy_len < buff_size copy_len ≤ buff_size copy_len = buff_size copy_len ≥ buff_size copy_len > buff_size copy_len ≠ buff_size Candidate constraints:Remaining candidates: Observation:

Quality of inference results Not sound –Overfitting if observed executions are not representative –Does not affect detection –For repair, mitigated by correlation step –Continued learning improves results Not complete –Templates are not exhaustive Useful! –Sound in practice –Complete enough

Enhancements to learning Relations over variables in different parts of the program –Flow-dependent constraints Distributed learning across the community Filtering of constraint templates, variables

Learning for Windows x86 binaries No static or dynamic analysis to determine validity of values and pointers –Use expressions computed by the application –Can be more or fewer variables (e.g., loop invariants) –Static analysis to eliminate duplicated variables New constraint templates (e.g., stack pointer) Determine code of a function from its entry point

Outline Overview Learning: create model of normal behavior Monitoring: determine properties of attacks –Detector details –Learning from failure Repair: propose and evaluate patches Evaluation: adversarial Red Team exercise Conclusion

Detecting attacks (or bugs) Code injection (Determina Memory Firewall) –Triggers if control jumps to code that was not in the original executable Memory corruption (Heap Guard) –Triggers if sentinel values are overwritten These have low overhead and no false positives Goal: detect problems close to their source

Memory Firewall as Detector NETWORK KERNEL Make payment Change prefs Read statement Write RecordUpdate Registry Open port HIJACK Program Counter HIJACK Program Counter COMPROMISE ENTER call br jmp ENTER Monitoring is simple – Port monitoring or system call monitoring Don’t know good guy from bad guy – Only “known criminals” can be identified Even known bad guys are hard to detect – Encrypted channels Used by IDS, Application Firewalls HIJACK “Catch in the act of criminal behavior” All programs follow strict conventions – ABI (Application Binary Interface) ‏ – The Calling Convention (MS/Intel) ‏ Currently no enforcement All attacks violate some of these conventions COMPROMISE Monitoring can be done – System call monitoring Hard to distinguish between actions of a normal program vs. a compromised program – Leads to false positives Used by “System Call Interception” HIPS systems SYSTEM & APPLICATION MEMORY 3) ABI Violation 1) NO ABI Violation 2) NO ABI Violation Attack Code

Memory Firewall Security Policies Goal: Closest approximation to programmer intent Infer Control Flow Graph nodes and edges –ISA requirements – x86 minimal –OS requirements – page RW- –ABI requirements – imports, exports, SEH –Calling conventions (inter-module)‏ –Compiler idiosyncrasies – C, gcc C, C++, VB, Delphi Restricted Code Origins – Prevent code injection Restricted Control Transfers - Prevent code reuse attacks –For x86 User mode: RET IND CALL IND JUMP

Other possible attack detectors Non-application-specific: –Crash: page fault, assertions, etc. –Infinite loop Application-specific checks: –Tainting, data injection –Heap consistency –User complaints Application behavior: –Different failures/attacks –Unusual system calls or code execution –Violation of learned behaviors Collection of detectors with high false positives Non-example: social engineering

Learning from failures Each attack provides information about the underlying vulnerability –That it exists –Where it can be exploited –How the exploit operates –What repairs are successful

Monitoring Server Community machines Detect attacks Assumption: few false positives Detector collects information and terminates application

Monitoring Server Community machines scanf read_input process_record main Detector collects information and terminates application Client sends attack info to server Where did the attack happen? (Detector maintains a shadow stack)

Monitoring Clients install instrumentation Server scanf read_input process_record main Server generates instrumentation for targeted code locations Server sends instrumentation to all clients Monitoring for main, process_record, … Extra checking in attacked code Evaluate learned constraints Community machines

Monitoring Server Violated: copy_len ≤ buff_size Instrumentation continuously evaluates inferred behavior What was the effect of the attack? Community machines Clients send difference in behavior: violated constraints Server correlates constraints to attack

Correlating attacks and constraints Evaluate only constraints at the attack sites –Low overhead A constraint is correlated to an attack if: –The constraint is violated iff the attack occurs Create repairs for each correlated constraint –There may be multiple correlated constraints –Multiple candidate repairs per constraint

Repair Server Patch 1 Patch 3 Patch 2 Distribute patches to the community Success = no detector is triggered Community machines Ranking: Patch 1: 0 Patch 2: 0 Patch 3: 0 … Patch evaluation uses additional detectors (e.g., crash, difference in attack)

Attack example Target: JavaScript system routine (written in C++) –Casts its argument to a C++ object, calls a virtual method –Does not check type of the argument Attack supplies an “object” whose virtual table points to attacker-supplied code A constraint at the method call is correlated –Constraint: JSRI address target is one of a known set Possible repairs: –Skip over the call –Return early –Call one of the known valid methods –No repair

Triggering a repair Repair may depend on constraint checking – if !(copy_len ≤ buff_size) copy_len = buff_size; –If constraint is not violated, no need to repair –If constraint is violated, an attack is (probably) underway Repair does not depend on the detector –Should fix the problem before the detector is triggered Repair is not identical to what a human would write –A stopgap until an official patch is released –Unacceptable to wait for human response

Evaluating a patch In-field evaluation –No attack detector is triggered –No crashes or other behavior deviations Pre-validation, before distributing the patch: Replay the attack +No need to wait for a second attack +Exactly reproduce the problem –Expensive to record log –Log terminates abruptly –Need to prevent irrevocable effects Run the program’s test suite –May be too sensitive –Not available for COTS software

Red Team Red Team attempts to break our system –Hired by DARPA –(MIT = “Blue Team”) Red Team created 18 Firefox exploits –Each exploit is a webpage –Firefox executes arbitrary code –Malicious JavaScript, GC errors, stack smashing, heap buffer overflow, uninitialized memory

Rules of engagement Firefox 1.0 –Blue team may not tune system to known vulnerabilities –Focus on most security-critical components No access to a community for learning Focus on the research aspects –Red Team may only attack Firefox and the Blue Team infrastructure No pre-infected community members –Future work Red Team has access to the implementation –Red Team may not discard attacks thwarted by the system

Results Blue Team system: –Detected all attacks, prevented all exploits –Repaired 15/18 attacks After observing no more than 13 instances of the attack –Handled simultaneous & intermixed attacks –Suffered no false positives –Monitoring/repair overhead was not noticeable

Results: Caveats Blue team fixed a bug with reusing filenames Communications infrastructure (from subcontractor) failed during the last day Learning overhead is high –Optimizations are possible –Can be distributed over community

Related work Distributed detection –Vigilante [Costa] (for worms; proof of exploit) –Live monitoring [Kıcıman] –Statistical bug isolation [Liblit] Learning (lots) –Typically for anomaly detection –FSMs for system calls Repair [Demsky] –requires specification; not scalable

Credits Saman Amarasinghe Jonathan Bachrach Michael Carbin Danny Dig Michael Ernst Sung Kim Samuel Larsen Carlos Pacheco Jeff Perkins Martin Rinard Frank Sherwood Greg Sullivan Weng-Fai Wong Yoav Zibin Subcontractor: Determina, Inc. Funding: DARPA (PM: Lee Badger) Red Team: SPARTA, Inc.

Contributions Framework for collaborative learning and repair Pluggable detection, learning, repair Handles unknown vulnerabilities Preserves functionality by repairing the vulnerability Learns from failure Focuses resources where they are most effective Implementation for Windows x86 binaries Evaluation via a Red Team exercise

Self-defending software: Collaborative learning for security and repair Michael Ernst MIT Computer Science & AI Lab.

Similar presentations

Presentation on theme: "Self-defending software: Collaborative learning for security and repair Michael Ernst MIT Computer Science & AI Lab."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Self-defending software: Collaborative learning for security and repair Michael Ernst MIT Computer Science & AI Lab.

Similar presentations

Presentation on theme: "Self-defending software: Collaborative learning for security and repair Michael Ernst MIT Computer Science & AI Lab."— Presentation transcript:

Similar presentations

About project

Feedback