Outline System architecture Experiments

Outline System architecture Experiments
How to use the community to reduce learning cost Algorithm for Merging Constraints Experiments Overhead Accuracy The first slide presents the system architecture and the second indicates the area that we are currently working on. The next slides detail our progress and show how the architecture of that work differs from the planned architecture The next slides describe the experiments and the results The final slide describes some of the things we’ll be working on next

System Architecture Constraints Merge Constraints (Daikon)
Central Management System Patch/Repair Code Generation Patch results … Application Live Shield Client Workstations (Protected) Patches MPEE Application Learning (Daikon) Sample Data … MPEE Application Learning (Daikon) Sample Data Client Workstations (Learning) 1) Sample Data (20MB) << Constraints (4MB) -rwxrwxrwx 1 hunkim noahm :55 httpd.exe_thread_id.dtrace.gz -rw-r--r hunkim hunkim :21 httpd.exe_thread_id.inv.gz 2) MPEE = Managed Program Execution Environment (DynamoRIO) MPEE includes the Daikon front end that outputs the sample data to a file That front end is what Jeff&Sung will talk about Overall Architecture of the system: Items in boxes that touch are running in the same process. Boxes connected by arrows communicate via files (transferred by the CMS mechanism) Dashed lines boxes indicate processes running on the same Workstation. For example (blue background), Learning runs on the same client workstation as does the instrumented application (data acq, client, MPEE, application) Data acquisition is achieved by using the client library provided by Determina as an extension to the MPEE (DynamoRIO) to instrument the application dynamically. Sample data (the value of variables at program points) is sent to a local version of Daikon that calculates constraints. The resulting constraints are sent back to a central location where they are merged by Daikon into a complete set of constraints (that are true for all client executions) The resulting constraints are used to create patches that will check the constraints and repair them when violated. The patches are distributed via the CMS to protected workstations. Results from the patches and any errors encountered at the client are fed back into the Patch/Repair code generation process and different patches and repairs are tested on different workstations. Working repairs are widely distributed and ineffective repairs discarded. The central management system (CMS) handles all communication between clients and the central services. Some additional Determina security checks and some details of patch/repair creation are not shown.

Merge Constraints Example
W=2 X=42 X in {42} X%W=0 X-21*W=0 W<X Y<8 Y<X Y=Z|Y Y%W=0 W=2 X in {3,57} W<X Y<9 Y=Z|Y Y%W=0 Y+3*W=0 W=2 X in {3,42,57} W<X Y<8 Y=Z|Y Y%W=0 + = … … …

Merge Constraints Stateless constraints Sample dependent constraints
Example: x=y , x=y|x , x%y=0 Either true/false Merging algorithm: if the constraint always appear Sample dependent constraints Example: x<42 , x in {1,4,8} , 3x-7y+2z=9 Change as new samples arrive Merging example: x<42 and x<56 are merged to x<42 Update number of samples, missing variables, etc. Corner cases Suppressed invariants, constant optimization Small number of samples for an invariant UpperBound, LowerBound, OneOf, num_samples MISSING_FLOW canBeMissing – each variable is keeping track of the program points in which it appears and the other variables it appears with.

Integration Experiments
Evaluate community effectiveness by comparing: Learning from one copy of an application Community-based learning (multiple executions) Two experiments Overhead comparison Accuracy comparison Infrastructure Apache web server (HTTPD) on Windows A community of ten or more executions of Apache Each of the experiments will compare a single execution against multiple community executions. We expect both less overhead and greater accuracy by utilizing the community These experiments are small (only ten members of the community and limited executions). Both overhead and accuracy should be improved as we move to larger numbers in the community.

Instrumentation Overhead Experiment
Baseline Instrument 100% of Apache Time a sequence of HTTP GET operations (Daikon processes the single output file) Community Learning Instrument a different 10% of Apache in 10 executions Instrument a different 1% of Apache in 100 executions Each execution will create a distinct trace of part of the program The combined executions will instrument all of Apache (Daikon processes all trace files) Community learning constraints match baseline constraints Instrumentation overhead is reduced significantly This experiment compares instrumentation overhead between a single execution of Apache with 100% of the functions instrumented and multiple executions with 1% and 10% of the functions instrumented.

Instrumentation Overhead Results
Community learning constraints match baseline constraints Instrumentation overhead is reduced significantly Lots of optimization: Mutex – thread issues Buffered IO Checking the memory if it is valid (only on page size) The chart shows the total amount of overhead (in milliseconds) added to Apache to service the requests. In the multiple execution cases, the overhead is the average time per execution. As can be seen, instrumenting only 10% of the program significantly reduces the time (almost 90%). Only instrumenting 1% further reduces the overhead – but also shows that there is a fixed cost that can’t be reduced by decreasing the percent of the program that is instrumented. Note that we expect to be able to optimize the instrumentation to significantly reduce all of these times.

Accuracy Experiment Community Learning
Instrument 100% of Apache during 1000 HTTP operations Divide into two sets: learning and testing Build constraints based on 1% of the learning set 2% of the learning set … 100% of the learning set A constraint is a false-positive is if it is violated by a sample in the testing set

Accuracy Experiment Results
False positives are reduced as more community learning is used. Its important to note that this is a very small experiment. Learning over more executions and for longer lengths of time should (as this seems to indicate) drive the number of false positives very low. 1.77% - mtime= force_weak=

Outline System architecture Experiments

Similar presentations

Presentation on theme: "Outline System architecture Experiments"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Outline System architecture Experiments

Similar presentations

Presentation on theme: "Outline System architecture Experiments"— Presentation transcript:

Similar presentations

About project

Feedback