ASE’14, 18 th September 2014 This work is supported by Microsoft Research through its PhD Scholarship Programme Microsoft is a registered trademark of.

Slides:



Advertisements
Similar presentations
A System to Generate Test Data and Symbolically Execute Programs Lori A. Clarke September 1976.
Advertisements

David Brumley, Pongsin Poosankam, Dawn Song and Jiang Zheng Presented by Nimrod Partush.
Bouncer securing software by blocking bad input Miguel Castro Manuel Costa, Lidong Zhou, Lintao Zhang, and Marcus Peinado Microsoft Research.
Debugging Techniques1. 2 Introduction Bugs How to debug Using of debugger provided by the IDE Exception Handling Techniques.
1 PROJECT Web-based Database Applications Lecture 1: Basic Internet Concepts & Databases - the History.
ARM programmer’s model and assembler Embedded Systems Programming.
1 CS6320 – Why Servlets? L. Grewe 2 What is a Servlet? Servlets are Java programs that can be run dynamically from a Web Server Servlets are Java programs.
Modules, Hierarchy Charts, and Documentation
Software Fault Injection Kalynnda Berens Science Applications International Corporation NASA Glenn Research Center.
Programming Logic and Design, Introductory, Fourth Edition1 Understanding Computer Components and Operations (continued) A program must be free of syntax.
Chapter 11 ASP.NET JavaScript, Third Edition. 2 Objectives Learn about client/server architecture Study server-side scripting Create ASP.NET applications.
20 February Detailed Design Implementation. Software Engineering Elaborated Steps Concept Requirements Architecture Design Implementation Unit test Integration.
Full Guaranteed & Safe OST Converter Software
Implementing High Availability
Zichao Qi, Fan Long, Sara Achour, and Martin Rinard MIT CSAIL
Finding Bugs in Web Applications Using Dynamic Test Generation and Explicit-State Model Checking -Shreyas Ravindra.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Fundamentals of Python: From First Programs Through Data Structures
Guide to Computer Forensics and Investigations Fourth Edition Chapter 12 Investigations.
Page 19/4/2015 CSE 30341: Operating Systems Principles Raid storage  Raid – 0: Striping  Good I/O performance if spread across disks (equivalent to n.
Fundamentals of Python: First Programs
Fruitful functions. Return values The built-in functions we have used, such as abs, pow, int, max, and range, have produced results. Calling each of these.
Map Reduce: Simplified Data Processing On Large Clusters Jeffery Dean and Sanjay Ghemawat (Google Inc.) OSDI 2004 (Operating Systems Design and Implementation)
Software Quality Assurance Lecture #8 By: Faraz Ahmed.
Testing. Definition From the dictionary- the means by which the presence, quality, or genuineness of anything is determined; a means of trial. For software.
1 Defensive Programming and Debugging (Chapters 8 and 23 of Code Complete) Tori Bowman CSSE 375, Rose-Hulman September 21, 2007.
Article: Source Code Review Systems Author: Jason Remillard Presenter: Joe Borosky Class: Principles and Applications of Software Design Date: 11/2/2005.
CS 501: Software Engineering Fall 1999 Lecture 16 Verification and Validation.
Java: Chapter 1 Computer Systems Computer Programming II.
Information Systems Analysis and Design
Chapter 5: Control Structures II (Repetition)
CHAPTER 5: CONTROL STRUCTURES II INSTRUCTOR: MOHAMMAD MOJADDAM.
EGR 2261 Unit 5 Control Structures II: Repetition  Read Malik, Chapter 5.  Homework #5 and Lab #5 due next week.  Quiz next week.
Window NT File System JianJing Cao (#98284).
Python – Part 1 Python Programming Language 1. What is Python? High-level language Interpreted – easy to test and use interactively Object-oriented Open-source.
Debugging in Java. Common Bugs Compilation or syntactical errors are the first that you will encounter and the easiest to debug They are usually the result.
Testing and Debugging Version 1.0. All kinds of things can go wrong when you are developing a program. The compiler discovers syntax errors in your code.
Oracle Data Integrator Procedures, Advanced Workflows.
The Development Process Problem Solving. Problem Solving - Dr. Struble 2 What Is Asked of Computer Programmers? Input Transformation Output Write a computer.
Security - Why Bother? Your projects in this class are not likely to be used for some critical infrastructure or real-world sensitive data. Why should.
ICOM 6115©Manuel Rodriguez-Martinez ICOM 6115 – Computer Networks and the WWW Manuel Rodriguez-Martinez, Ph.D. Lecture 14.
Fundamental Programming: Fundamental Programming K.Chinnasarn, Ph.D.
Chapter 5: Control Structures II (Repetition). Objectives In this chapter, you will: – Learn about repetition (looping) control structures – Learn how.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
1 Ch. 1: Software Development (Read) 5 Phases of Software Life Cycle: Problem Analysis and Specification Design Implementation (Coding) Testing, Execution.
Finding Errors in.NET with Feedback-Directed Random Testing Carlos Pacheco (MIT) Shuvendu Lahiri (Microsoft) Thomas Ball (Microsoft) July 22, 2008.
Software Development Problem Analysis and Specification Design Implementation (Coding) Testing, Execution and Debugging Maintenance.
CSV 889: Concurrent Software Verification Subodh Sharma Indian Institute of Technology Delhi Scalable Symbolic Execution: KLEE.
Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.
Module 7: SQL Server Special Considerations. Overview SQL Server High Availability Unicode.
ASP-2-1 SERVER AND CLIENT SIDE SCRITPING Colorado Technical University IT420 Tim Peterson.
Web Browsing *TAKE NOTES*. Millions of people browse the Web every day for research, shopping, job duties and entertainment. Installing a web browser.
Unit 17: SDLC. Systems Development Life Cycle Five Major Phases Plus Documentation throughout Plus Evaluation…
Automating Configuration Troubleshooting with Dynamic Information Flow Analysis Mona Attariyan Jason Flinn University of Michigan.
MISSION CRITICAL COMPUTING SQL Server Special Considerations.
Security Attacks Tanenbaum & Bo, Modern Operating Systems:4th ed., (c) 2013 Prentice-Hall, Inc. All rights reserved.
MOPS: an Infrastructure for Examining Security Properties of Software Authors Hao Chen and David Wagner Appears in ACM Conference on Computer and Communications.
FILES AND EXCEPTIONS Topics Introduction to File Input and Output Using Loops to Process Files Processing Records Exceptions.
Shadow Shadow of a Doubt: Testing for Divergences Between Software Versions Hristina PalikarevaTomasz KuchtaCristian Cadar ICSE’16, 20 th May 2016 This.
Ocasta: Clustering Configuration Settings for Error Recovery Zhen Huang, David Lie Department of Electrical and Computer Engineering University of Toronto.
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
PDF Recovery Tool Fix Portable Document File Format.
YAHMD - Yet Another Heap Memory Debugger
Crash Dump Analysis - Santosh Kumar Singh.
Chapter 2: System Structures
RDE: Replay DEbugging for Diagnosing Production Site Failures
Outlook Recovery Freeware is the professional tool to fix Outlook Error and PST corruption.
CSE 1020:Software Development
Testing & Security Dr. X.
How To Repair PDF File After Disk Crash???. What is PDF file..??? PDF file is portable document file. This file format is used during the exchange .
Presentation transcript:

ASE’14, 18 th September 2014 This work is supported by Microsoft Research through its PhD Scholarship Programme Microsoft is a registered trademark of Microsoft Corporation Tomasz Kuchta Cristian Cadar Docovery : Toward Generic Automatic Document Recovery Miguel Castro Manuel Costa

The user is unable to open a document 22

Document is corrupt Storage failure, network transfer failure, power outage 33

Application has bugs Buffer overflows, divisions by zero Assertion failures, exceptions Incompatibility across versions / applications 44

Such problems are highly user-visible They account for a large number of security vulnerabilities 55

The root cause of the problem Application is unable to handle corrupt or uncommon documents Example: pine – a text mode client Special “From:” field crashes the program 66 From:

What can we do about that? Try to fix the program Automatic patch generation [GenProg, WCCI’08, ICSE’09; SemFix, ICSE’13; etc.] Try to protect the program Automatic input filter generation [Vigilante, SOSP’05; Shieldgen, S&P’07; etc.] 77

What can we do about that? Try to fix the document Use format specification [DS repair, OOPSLA’03] Learn and apply the correct values [SOAP, ICSE’12] Truncate the document Try to guess the right value Or … 88

Is it possible to fix a broken document, without assuming any input format, in a way that preserves the original contents as much as possible? 99

1111

1212

1313

1414 Byte #4: 'x' Byte #8: 'y' Identify Potentially Corrupt Bytes Taint Tracking 1

1515 Byte #4: 'z' Byte #8: 'y' Change The Bytes To Execute Another Path Symbolic Execution 2 Byte #4: 'x' Byte #8: 'y' Identify Potentially Corrupt Bytes Taint Tracking 1

1616 Pick The Best Candidate Levenshtein distance and manual inspection 3 Byte #4: 'x' Byte #8: 'y' Identify Potentially Corrupt Bytes Taint Tracking 1 Byte #4: 'z' Byte #8: 'y' Change The Bytes To Execute Another Path Symbolic Execution 2

Docovery process 1717 Broken document executionAlternative paths exploration

Taint tracking Track the flow of data from a source (input) to a sink (point of crash) Identifying potentially corrupt bytes Byte-level precision No control flow dependencies No address tainting 1818 Broken document executionAlternative paths exploration Byte #4: 'x' Byte #8: 'y'

Collecting alternative paths Mark the potentially corrupt bytes as symbolic Lazily verify feasibility 1919 Broken document executionAlternative paths exploration

Path selection Last N deepest paths are collected Start from the paths closest to the crash point 2020 Broken document executionAlternative paths exploration

Negate the K th constraint and drop the remaining Ask constraint solver for a satisfying assignment Path P 3 : C 1 ∧ C 2 ∧ C 3 Path P 2 : C 1 ∧ C Broken document executionAlternative paths exploration

Candidate execution Store the candidate Re-run the program natively Successful if not crashing 2222 Broken document executionAlternative paths exploration

Evaluating candidate documents Levenshtein distance (edit distance) Byte-level similarity metric Independent of document format Smaller distance = higher similarity Semi-automatic evaluation of program output Looking for warnings / errors, exit code Similarity to the correct output 2323

Implementation Built on top of KLEE [OSDI’08] Using ZESTI functionality [ICSE’12] Interprets LLVM bitcode of C applications 2525

Benchmarks pr – a pagination utility pine – a text-mode client dwarfdump – a debug information display tool readelf – an ELF file information display tool 2626 BenchmarkDocument typeDocument Sizes Max number of changed bytes pr Plain textup to 256 pages / 1080 KB1 pine MBOX mailboxup to 320 s / 2.3 MB24 dwarfdump DWARF executablesup to 1.1 MB1 readelf ELF object filesup to 1.5 MB8

Bugs Known, real-world bugs injected manually pr, pine, readelf – buffer overflow dwarfdump – division by zero 2727 Benchmark‘Buggy’ sequence prLorem ipsum...0x08 0x08...0x09 EOF pine...From: dwarfdump...GCC: (Ubuntu/Linaro x00 0x00... readelf...0xFD 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF 0xFF...

Taint tracking results pr: Lorem ipsum EOF pine: dwarfdump:...GCC: (Ubuntu/Linaro readelf: FD FF FF FF FF FF FF FF BenchmarkDocument Number of potentially corrupt bytes pr 1 – 256 pages / 4.4 – 1080 KB1 pine 5 – 320 s / 13 KB – 2.3MB25 dwarfdump 62 KB – 1.1 MB2 readelf 54 KB – 1.5 MB16 Regardless of document size

Candidates for pr All the candidates print out correctly 2929 Document‘Buggy’ sequence Original Lorem ipsum...0x08 0x08...0x09 EOF Candidate A Lorem ipsum...0x08 0x08...0x00 EOF Candidate B Lorem ipsum...0x08 0x08...0x0C EOF Candidate C Lorem ipsum...0x08 0x08...0x0A EOF

Candidates for pine 3030 Document‘Buggy’ sequence Original From: Candidate A From: Candidate B From: Candidate C From:

Candidates for dwarfdump Candidate A: debug dump, success return code Candidate B: error 3131 Document‘Buggy’ sequence Original...GCC: (Ubuntu/Linaro x00 0x00... Candidate A...GCC: (Ubuntu/Linaro x01 0x00... Candidate B...GCC: (Ubuntu/Linaro x00 0x01...

Candidates for readelf Candidate A: most of output, but with a warning Candidate B: almost no output and an error Candidate C: almost no output (no debug data) 3232 Document‘Buggy’ sequence Original … … FD FF FF FF FF FF FF FF … Candidate A Candidate B … FE FF FF FF FF FF FF FF … FD FF FF FF FF FF FF FF … Candidate C … … FD FF FF FF FF FF FF FF …

Performance varies across applications Sometimes, the recovery is cheap 3333

Performance varies across applications Sometimes, the recovery is expensive 3434

Performance depends on the executed path 3535

Performance Most time spent on taint tracking and collecting alternative paths First recovery candidate usually within minutes after path exploration starts All collected paths usually explored within minutes 3636 Broken document executionAlternative paths exploration Most time

Limitations of Docovery Fundamental Scalability: complex, highly-structured documents Supports only byte mutations Implementation Can’t handle multiple faults Handles only generic errors No support for document modifications (read-only) Requires C source code of the program 3737

Docovery A novel technique for format-independent document recovery Uses taint tracking and symbolic execution techniques Recovery candidates explore alternative execution paths Successfully recovered Text files Mailboxes Executables Object files