Outline System architecture Current work Experiments Next Steps

Slides:



Advertisements
Similar presentations
Debugging ACL Scripts.
Advertisements

GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Using DSVM to Implement a Distributed File System Ramon Lawrence Dept. of Computer Science
Bug Isolation via Remote Program Sampling Ben Liblit, Alex Aiken, Alice X.Zheng, Michael I.Jordan Presented by: Xia Cheng.
CSE 1302 Lecture 8 Inheritance Richard Gesick Figures from Deitel, “Visual C#”, Pearson.
WSN Simulation Template for OMNeT++
Success status, page 1 Collaborative learning for security and repair in application communities MIT & Determina AC PI meeting July 10, 2007 Milestones.
Vulnerability-Specific Execution Filtering (VSEF) for Exploit Prevention on Commodity Software Authors: James Newsome, James Newsome, David Brumley, David.
Lecture 8 Inheritance Richard Gesick. 2 OBJECTIVES How inheritance promotes software reusability. The concepts of base classes and derived classes. To.
Michael Ernst, page 1 Collaborative Learning for Security and Repair in Application Communities Performers: MIT and Determina Michael Ernst MIT Computer.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Learning, Monitoring, and Repair in Application Communities Martin Rinard Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
© 2004, D. J. Foreman 2-1 Concurrency, Processes and Threads.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
IXA 1234 : C++ PROGRAMMING CHAPTER 1. PROGRAMMING LANGUAGE Programming language is a computer program that can solve certain problem / task Keyword: Computer.
Java server pages. A JSP file basically contains HTML, but with embedded JSP tags with snippets of Java code inside them. A JSP file basically contains.
CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
Application Communities Phase II Technical Progress, Instrumentation, System Design, Plans March 10, 2009.
Application Communities Phase 2 (AC2) Project Overview Nov. 20, 2008 Greg Sullivan BAE Systems Advanced Information Technologies (AIT)
MIT/Determina Application Communities, page 1 Approved for Public Release, Distribution Unlimited - Case 9649 Collaborative learning for security and repair.
Michael Ernst, page 1 Application Communities: Next steps MIT & Determina October 2006.
Collaborative learning for security and repair in application communities MIT site visit April 10, 2007 Conclusion.
Editing and Debugging Mumps with VistA and the Eclipse IDE Joel L. Ivey, Ph.D. Dept. of Veteran Affairs OI&T, Veterans Health IT Infrastructure & Security.
Constraint Framework, page 1 Collaborative learning for security and repair in application communities MIT site visit April 10, 2007 Constraints approach.
Emulating Volunteer Computing Scheduling Policies Dr. David P. Anderson University of California, Berkeley May 20, 2011.
Fundamental of Java Programming (630002) Unit – 1 Introduction to Java.
Deployment Diagram.
Application Communities
NETSTORM.
Interfacing the Internet of a Trillion Things
Andy Wang Object Oriented Programming in C++ COP 3330
The architecture of the P416 compiler
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
WWW and HTTP King Fahd University of Petroleum & Minerals
Deployment Diagram.
The Client/Server Database Environment
Distributed Shared Memory
Chapter 9: Virtual Memory – Part I
Chapter 9: Virtual Memory
OptiSystem applications: BER analysis of BPSK with RS encoding
File System Implementation
Software Configuration Management
Distribution and components
The Client/Server Database Environment
Web Caching? Web Caching:.
TIM 58 Chapter 8: Class and Method Design
Main Memory Management
Recall The Team Skills Analyzing the Problem (with 5 steps)
Packet Sniffing.
Unit# 9: Computer Program Development
Chapter 9: Virtual-Memory Management
Lecture 22 Inheritance Richard Gesick.
Objective of This Course
Programming Logic and Design Fourth Edition, Comprehensive
Dynamic Process Allocation in Apache Server
Andy Wang Object Oriented Programming in C++ COP 3330
Topics Introduction Hardware and Software How Computers Store Data
Introduction CSC 111.
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Analysis models and design models
Fundamentals of Python: First Programs
Prof. Leonardo Mostarda University of Camerino
Outline System architecture Experiments
Concurrency, Processes and Threads
Dynamic Binary Translators and Instrumenters
Welcome to ASCA’s Salary & Benefits Survey Benchmarking Portal Results Guide This guide outlines important information regarding the dynamic comparison,
The Heartbleed Bug and Attack
Presentation transcript:

Outline System architecture Current work Experiments Next Steps Progress Architecture of current work Experiments Overhead Accuracy Next Steps The first slide presents the system architecture and the second indicates the area that we are currently working on. The next slides detail our progress and show how the architecture of that work differs from the planned architecture The next slides describe the experiments and the results The final slide describes some of the things we’ll be working on next

Collaborative Learning System Architecture Patch/Repair Code Generation Patch/Repair results Merge Constraints (Daikon) Patches Constraints Central Management System Patches Patches MPEE Data Acquisition Client Library Application Learning (Daikon) Sample Data MPEE Data Acquisition Client Library Application Learning (Daikon) Sample Data MPEE Application Memory Firewall Live Shield Evaluation (observe execution) MPEE Application Memory Firewall Live Shield Evaluation (observe execution) Overall Architecture of the system: Items in boxes that touch are running in the same process. Boxes connected by arrows communicate via messages Dashed lines boxes indicate processes running on the same Workstation. For example (blue background), Learning runs on the same client workstation as does the instrumented application (data acq, client, MPEE, application) Data acquisition is achieved by using the client library provided by Determina as an extension to the MPEE (DynamoRIO) to instrument the application dynamically. Sample data (the value of variables at program points) is sent to a local version of Daikon that calculates constraints. The resulting constraints are sent back to a central location where they are merged by Daikon into a complete set of constraints (that are true for all client executions) The resulting constraints are used to create patches that will check the constraints and repair them when violated. The patches are distributed via the CMS to protected workstations. Results from the patches and any errors encountered at the client are fed back into the Patch/Repair code generation process and different patches and repairs are tested on different workstations. Working repairs are widely distributed and ineffective repairs discarded. The central management system (CMS) handles all communication between clients and the central services. Some additional Determina security checks and some details of patch/repair creation are not shown. … … Client Workstations (Learning) Client Workstations (Protected)

Current Work … … Patch/Repair Code Generation Patch/Repair results Merge Constraints (Daikon) Patches Constraints Central Management System Patches Patches MPEE Data Acquisition Client Library Application Learning (Daikon) Sample Data MPEE Data Acquisition Client Library Application Learning (Daikon) Sample Data MPEE Application Memory Firewall Live Shield Evaluation (observe execution) MPEE Application Memory Firewall Live Shield Evaluation (observe execution) Current work is focused on data acquisition (binary instrumentation) and machine learning (areas on the left that are not grayed out) … … Client Workstations (Learning) Client Workstations (Protected)

Progress Preliminary instrumentation for data acquisition Determina client library enhanced to support data acquisition Primitive and string parameters are logged Pointers are followed one level (e.g., request.field) Debug information is used Optimizations implemented to reduce overhead Daikon enhanced Security specific invariants Merge sample data from multiple executions Community learning integration Instrumentation and learning have been tested together Shared file system used for communication The application is instrumented using the MPEE (DynamoRIO) client library. Determina made a number of enhancements to the client library to enable this work. We are using the debug information generated when compiling the application to determine the types and locations of parameters. We’ve implemented a number of optimizations to improve performance. More can (and will) be done. Daikon was enhanced with the maximum string length and printable string invariants (as discussed in October). It was also modified to merge sample data from multiple executions. As the experiments will show, we have integrated these changes and have promising initial results.

Integration Architecture Patch/Repair Code Generation Patch/Repair results Create Constraints (Daikon) Patches Constraints Shared File System Central Management System Patches Patches MPEE Data Acquisition Client Library Apache Sample Data MPEE Data Acquisition Client Library Apache Sample Data MPEE Application Memory Firewall Live Shield Evaluation (observe execution) MPEE Application Memory Firewall Live Shield Evaluation (observe execution) This figure shows the preliminary architecture being used today (again in the non-grayed out areas) Preliminary data acquisition is working as planned. Constraints across multiple executions are being created, but the constraint creation is centralized rather than implemented on each client. A shared file system is being used in place of the Central Management System (CMS). Work to resolve each of these differences from the final architecture is ongoing. The CMS is currently being augmented to provide communications and the Data Acquisition code is written in such a way that it can be easily switched to the CMS when it becomes available. Work is also being done to the CMS to support running Daikon on client workstations which will distribute the workload and reduce the amount of data that needs to be transferred (calculated constraints are much smaller than the sample data). … … Client Workstations (Learning) Client Workstations (Protected)

Integration Experiments Evaluate community effectiveness by comparing: Learning from one copy of an application Community-based learning (multiple executions) Two experiments Overhead comparison Accuracy comparison Infrastructure Apache web server (HTTPD) on Windows Variables are captured at function entry/exit A community of ten or more executions of Apache is used Each of the experiments will compare a single execution against multiple community executions. We expect both less overhead and greater accuracy by utilizing the community These experiments are small (only ten members of the community and limited executions). Both overhead and accuracy should be improved as we move to larger numbers in the community.

Overhead experiment Baseline Community Learning Instrument 100% of Apache Time a sequence of HTTP GET operations Daikon processes the single output file Community Learning Instrument a different 10% of Apache in 10 executions Instrument a different 1% of Apache in 100 executions Each execution will create a distinct trace of part of the program The combined executions will instrument all of Apache Daikon processes all trace files This experiment compares instrumentation overhead between a single execution of Apache with 100% of the functions instrumented and multiple executions with 1% and 10% of the functions instrumented.

Overhead Results Community learning constraints match baseline constraints Instrumentation overhead is reduced significantly The chart shows the total amount of overhead (in milliseconds) added to Apache to service the requests. In the multiple execution cases, the overhead is the average time per execution. As can be seen, instrumenting only 10% of the program significantly reduces the time (almost 90%). Only instrumenting 1% further reduces the overhead – but also shows that there is a fixed cost that can’t be reduced by decreasing the percent of the program that is instrumented. Note that we expect to be able to optimize the instrumentation to significantly reduce all of these times.

Accuracy Experiment Baseline Community Learning Instrument 100% of Apache Capture data during one HTTP operation Build constraints based on the captured data Test constraints against data captured during all operations Community Learning Capture data during ten HTTP Operations Build constraints based on two operations, three operations, … ten operations Test each set of constraints against the data captured during all operations. This experiment compares the number of false positives found when learning takes place over one execution and when learning takes place over multiple executions. The number of false positives is calculated by applying all of the captured samples (for all ten operations) against the constraints created by the subset of the samples captured in one operation, two operations, three operations, etc. Any sample that violates a constraint is a false positive

Accuracy Experiment Results False positives are reduced as more community learning is used. Its important to note that this is a very small experiment. Learning over more executions and for longer lengths of time should (as this seems to indicate) drive the number of false positives very low.

Possible Next Steps Build constraints on the client and merge them centrally Use CMS to provide communications On the client between data acquisition and Daikon Between Daikon on the client and central processing Investigate approaches for data acquisition without debug information Test constraints against known attacks Implement simple repair algorithms Not all of the above will be done immediately, but we would expect significant progress in several of these areas.