University of Maryland New APIs from P/D Separation James Waskiewicz.

Slides:



Advertisements
Similar presentations
Programs in Memory Bryce Boe 2012/08/29 CS32, Summer 2012 B.
Advertisements

Lists and the Collection Interface Chapter 4. Chapter Objectives To become familiar with the List interface To understand how to write an array-based.
University of Maryland Smarter Code Generation for Dyninst Nick Rutar.
CSCC69: Operating Systems
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin May 2-4, 2011 ProcControlAPI and StackwalkerAPI Integration into Dyninst Todd Frederick and Dan.
Paradyn Project Paradyn / Dyninst Week College Park, Maryland March 26-28, 2012 Paradyn Project Upcoming Features in Dyninst and its Components Bill Williams.
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin May 2-3, 2011 Introduction to the PatchAPI Wenbin Fang, Drew Bernat.
OS Memory Addressing.
Chapter 6 Limited Direct Execution
C++ Sets and Multisets Set containers automatically sort their elements automatically. Multisets allow duplication of elements whereas sets do not. Usually,
Polymorphism From now on we will use g++!. Example (revisited) Goal: Graphics package Handle drawing of different shapes Maintain list of shapes.
OS2-1 Chapter 2 Computer System Structures. OS2-2 Outlines Computer System Operation I/O Structure Storage Structure Storage Hierarchy Hardware Protection.
Anomaly Detection Using Call Stack Information Security Reading Group July 2, 2004 Henry Feng, Oleg Kolesnikov, Prahlad Fogla, Wenke Lee, Weibo Gong Presenter:
Advanced Object-Oriented Programming Features
Home: Phones OFF Please Unix Kernel Parminder Singh Kang Home:
C++ fundamentals.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
Introduction to the Enterprise Library. Sounds familiar? Writing a component to encapsulate data access Building a component that allows you to log errors.
Visual C New Optimizations Ayman Shoukry Program Manager Visual C++ Microsoft Corporation.
University of Maryland Compiler-Assisted Binary Parsing Tugrul Ince PD Week – 27 March 2012.
Paradyn Project Dyninst/MRNet Users’ Meeting Madison, Wisconsin August 7, 2014 The Evolution of Dyninst in Support of Cyber Security Emily Gember-Jacobson.
University of Maryland parseThat: A Robust Arbitrary-Binary Tester for Dyninst Ray Chen.
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Chapter 1: A First Program Using C#. Programming Computer program – A set of instructions that tells a computer what to do – Also called software Software.
Threads, Thread management & Resource Management.
University of Maryland The New Dyninst Event Model James Waskiewicz.
Announcements Assignment 3 due. Invite friends, co-workers to your presentations. Course evaluations on Friday.
Institute for Software Integrated Systems Vanderbilt University Copyright © Vanderbilt University/ISIS 2008 Model Interpreters Janos Mathe based on Peter.
March 17, 2005 Roadmap of Upcoming Research, Features and Releases Bart Miller & Jeff Hollingsworth.
Andrew Bernat, Bill Williams Paradyn / Dyninst Week Madison, Wisconsin April 29-May 1, 2013 New Features in Dyninst
Javadoc: Advanced Features & Limitations Presented By: Wes Toland.
Chapter 8 Object Design Reuse and Patterns. Object Design Object design is the process of adding details to the requirements analysis and making implementation.
Homework Assignment #1 J. H. Wang Oct. 13, Homework #1 Chap.1: 1.24 Chap.2: 2.13 Chap.3: 3.5, 3.13* (or 3.14*) Chap.4: 4.6, 4.12* –(*: optional.
Chapter 5: Ball Worlds Features 2 classes, constant data fields, constructors, extension through inheritance, graphics.
University of Maryland Dynamic Floating-Point Error Detection Mike Lam, Jeff Hollingsworth and Pete Stewart.
November 2005 New Features in Paradyn and Dyninst Matthew LeGendre Ray Chen
University of Maryland Paradyn as a Strict Dyninst Client James Waskiewicz.
CE Operating Systems Lecture 2 Low level hardware support for operating systems.
This recitation 1 An interesting point about A3: Using previous methods to avoid work in programming and debugging. How much time did you spend writing.
Processes and Virtual Memory
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2010 Binary Rewriting with Dyninst Madhavi Krishnan and Dan McNulty.
C++ Inheritance Data Structures & OO Development I 1 Computer Science Dept Va Tech June 2007 © McQuain Generalization versus Abstraction Abstraction:simplify.
Concurrent Hashing and Natural Parallelism Chapter 13 in The Art of Multiprocessor Programming Instructor: Erez Petrank Presented by Tomer Hermelin.
STL CSSE 250 Susan Reeder. What is the STL? Standard Template Library Standard C++ Library is an extensible framework which contains components for Language.
Tips & Tricks: Writing Performant Managed Code Rico Mariani FUNL04 Performance Architect Microsoft Corporation.
April 2007The Deconstruction of Dyninst: Part 1- the SymtabAPI The Deconstruction of Dyninst Part 1: The SymtabAPI Giridhar Ravipati University of Wisconsin,
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin May 2-4, 2011 Paradyn Project Deconstruction of Dyninst: Best Practices and Lessons Learned Bill.
OS Memory Addressing. Architecture CPU – Processing units – Caches – Interrupt controllers – MMU Memory Interconnect North bridge South bridge PCI, etc.
OCR A Level F453: The function and purpose of translators Translators a. describe the need for, and use of, translators to convert source code.
C++ Functions A bit of review (things we’ve covered so far)
Beyond Application Profiling to System Aware Analysis Elena Laskavaia, QNX Bill Graham, QNX.
7-Nov Fall 2001: copyright ©T. Pearce, D. Hutchinson, L. Marshall Oct lecture23-24-hll-interrupts 1 High Level Language vs. Assembly.
Session 2 Basics of PHP.
Introduction to Operating Systems
Instruction sets : Addressing modes and Formats
Chapter 2: Computer-System Structures(Hardware)
New Features in Dyninst 5.1
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Capriccio – A Thread Model
Introduction to Operating Systems
New Features in Dyninst 6.1 and 6.2
Optimizing Your Dyninst Program
Lecture Topics: 11/1 General Operating System Concepts Processes
CS510 Operating System Foundations
EE 312 Final Exam Review.
Procedure Linkages Standard procedure linkage Procedure has
Dynamic Binary Translators and Instrumenters
Presentation transcript:

University of Maryland New APIs from P/D Separation James Waskiewicz

University of Maryland “Separation” completed Paradynd now uses the Dyninst API –Formerly made calls to the low-level code hidden by Dyninst A development/testing nightmare –Now just links to libdyninstAPI like any other mutator –End of a long, several-year process Brute-force final push: –Modify paradynd to use existing APIs as much as possible –Add new APIs to Dyninst as necessary Functionality needed by Paradyn that was not previously available

University of Maryland “Active” Snippet Insertion All instrumentation is now sanity-checked vs. current process state –Requires doing full stack walk(s) for each insertion Stack walks are cached to improve performance in case of multiple insertions –Makes sure that snippets are not added to points that are currently executing inside instrumentation Would cause re-writing of currently executing code (segfault) Insertion may change process state –Changes stackwalks for specific circumstances Eg. Active call site (on the stack), –Modify stack frame to jump into instrumentation upon return.

University of Maryland “Catchup” Snippet Execution Analysis Avoid out-of-sequence errors with complex instrumentation –State-dependant snippets –Implied execution orderings –Example problem: –Snip1: At entry of foo(), turn on timer t –Snip2: At exit of foo(), turn off timer t –Program is stopped at point P, just after the entry of foo() –User inserts Snip1 and Snip2 in an atomic operation continues execution –Snip2 is executed, without Snip1 having been run

University of Maryland Catchup: flag example Flag example: –Consider the call path: main() -> foo() -> bar() -> baz() –Consider the snippets: At foo() entry, set flag At baz() exit, if (flag) then … –Upshot: conditional instrumentation can get lost If this were a dereference: [segfault] So we need a way for stateful instrumentation to be “caught up” with

University of Maryland “Catchup” Analysis, con’t… Solution: –We cannot predict the intent of user snippets –But we CAN return list of snippets that *would* have been run if inserted earlier –Snippets can be run via oneTimeCode() Requires: –Full stack walk for each thread –Per-frame address comparisons Q: Necessity or Value-add? –Most of the analysis for catchup is available by other means in Dyninst Stack walks, address comparisons

University of Maryland Added APIs Bpatch_process –Bool wasRunningWhenAttached() –Bool isMultithreadCapable() –Bool finalizeInsertionSetWithCatchup(…) –Bool oneTimeCodeAsync(…) (overload) Bpatch_snippetHandle –getProcess() Bpatch_snippet –getCostAtPoint(Bpatch_point *p)

University of Maryland Dyninst Object Serialization/Deserialization Binary for performance, XML for interoperability

University of Maryland Why Binary Serialization (Caching)? Large Binaries –We’ve had reports of existing Dyninst analyses taking a prohibitively long time for large binaries (100s of MB) Eg. Full CFG analysis of large statically linked scientific simulators More complex analyses are in the works –Dyninst continues to offer newer and more expensive- to-compute features Control Flow Graphs Data Slicing Stripped binary analysis –Complex tools that use these analyses may find them cost-prohibitive If they have to be re-performed every time the tool is run Why not just save them?

University of Maryland Caching policy Binary serialization should happen transparently –User-controlled on/off switch Bpatch_setCaching(bool) –Granularity: One binary cache file per library / executable –Checksum-based cache invalidation Rebuild cache for a given binary when the binary changes –Example: libc is large and expensive to fully analyze, but it seldom changes Needs to support incremental analysis –User calls to API functions trigger on-demand analyses –Thus caching must also support incremental additions Eg. Successive, more refined tool runs

University of Maryland Why XML Serialization? Create standardized representations for –Basic symbol table information –Abstract program objects Functions, loops, blocks…. –More complex binary analyses CFG, Data Slicing, etc… Exports Dyninst’s expertise for easy use by –Other tools –Interfacing the textual world Parse-able snapshots of programs –Cross-platform aggregation of results Allows Dyninst to use output from other tools in its own analyses –Other tools may perform different and/or richer analysis that would be valuable for Dyninst

University of Maryland Unified serialization… Multiple types of serialization can share the same infrastructure –Leverage c++ and the Dyninst class hierarchy –Keep serialization/deserialization process as extensible as possible Add new types of output down the road? Desired behavior: –serialize(filename, HierarchyRootNode, Translator); Serialize hierarchy into Traverse hierarchy in a (somewhat) generic manner Translator uses overloaded virtual translation functions that can be specialized as needed

University of Maryland … and deserialization Desired behavior: A simple interface –deserialize(file, HierarchyRootNode,Translator) Requires either: Alternative constructor hierarchy –Not consistent with extensibility requirement (need one ctor per I/O format) Default constructor with subsequent setting of values –Functions that translate from serial stream to in- memory object –Child objects can be rebuilt hierarchically, but not all data structures will be saved Hashes, indexing systems, etc. These must be rebuilt as part of deserialization

University of Maryland Simple Example Using SymtabAPI Dyn_Symbol func1 Dyn_Symbol func2 Dyn_Symbol funcN Dyn_Symbol var1 Class Dyn_Symtab { Vector syms; Bool is_a_out; String fname; };

University of Maryland Simple Example Using SymtabAPI Dyn_Symbol func1 Dyn_Symbol func2 Dyn_Symbol funcN Dyn_Symbol var1 Class Dyn_Symtab { Vector syms; Bool is_a_out; String fname; }; Serialize( symtab, toXML, f.xml ) f.xml Translator toXML Open File Write XML preamble open (f.xml) Start_symtab(f)

University of Maryland Simple Example Using SymtabAPI Dyn_Symbol func1 Dyn_Symbol func2 Dyn_Symbol funcN Dyn_Symbol var1 Class Dyn_Symtab { Vector syms; Bool is_a_out; String fname; }; Serialize( symtab, toXML, f.xml ) f.xml Translator toXML Write-out object fields (scalar) open (f.xml) Start_symtab(f) Out_val(fname) Out_val(is_a_out) nm y Translator can output all relevant types

University of Maryland Simple Example Using SymtabAPI Dyn_Symbol func1 Dyn_Symbol func2 Dyn_Symbol funcN Dyn_Symbol var1 Class Dyn_Symtab { Vector syms; Bool is_a_out; String fname; }; Serialize( symtab, toXML, f.xml ) f.xml Translator toXML Write-out object fields (vector) open (f.xml) Start_symtab(f) Out_val(fname) Out_val(is_a_out) Out_vector(syms) nm y N+1 f1 v1 Helper functions take care of container classes Foreach (syms) out_val(sym)

University of Maryland Simple Example Using SymtabAPI Dyn_Symbol func1 Dyn_Symbol func2 Dyn_Symbol funcN Dyn_Symbol var1 Class Dyn_Symtab { Vector syms; Bool is_a_out; String fname; }; Serialize( symtab, toXML, f.xml ) f.xml Translator toXML Finish up, close file open (f.xml) Start_symtab(f) Out_val(fname) Out_val(is_a_out) Out_vector(syms) nm y N+1 f1 v1 Foreach (syms) out_val(sym) End_symtab(f) Close(f)

University of Maryland Simple Example With Binary Output Translator sequence is identical (at the highest structural level) Translator toXML open (f.xml) Start_symtab(f) Out_val(fname) Out_val(is_a_out) Out_vector(syms) Foreach (syms) out_val(sym) End_symtab(f) Close(f) Translator toBin open (f.xml) Start_symtab(f) Out_val(fname) Out_val(is_a_out) Out_vector(syms) Foreach (syms) out_val(sym) End_symtab(f) Close(f)

University of Maryland Simple Example With Binary Output Lowest level data type outputs are specialized per output format Translator toXML open (f.xml) Start_symtab(f) Out_val(fname) Translator toBin open (f.bin) Start_symtab(f) Out_val(fname) TranslatorBase Virtual out_val(name) 0x18; size 0xa3; data… 0x11 0x37. nameValue Higher level outputs are generalized by default, specialized as needed

University of Maryland Recap Paradyn/Dyninst finally disentangled –After many years and many incremental efforts (not just mine) Upcoming serialization / deserialization features will: –Improve tool performance, esp. for Large binaries Repeated expensive analyses –Allow for easier interoperability with other tools via an XML interface XML spec will likely resemble the internal Dyninst class structure Please contact us if you have any specific instances of interoperability we should take into account

University of Maryland Questions?