VXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory Massachusetts Institute.

Slides:



Advertisements
Similar presentations
Unmodified Device Driver Reuse and Improved System Dependability via Virtual Machines J. LeVasseur V. Uhlig J. Stoess S. G¨otz University of Karlsruhe,
Advertisements

Threads, SMP, and Microkernels
Copyright © 2014 Pearson Education, Inc. Publishing as Prentice Hall
LINUX-WINDOWS INTERACTION. One software allowing interaction between Linux and Windows is WINE. Wine allows Linux users to load Windows programs while.
OmniVM Efficient and Language- Independent Mobile Programs Ali-Reza Adl-Tabatabai, Geoff Langdale, Steven Lucco and Robert Wahbe from Carnegie Mellon University.
CS 300 – Lecture 22 Intro to Computer Architecture / Assembly Language Virtual Memory.
Software. Application Software performs useful work on general-purpose tasks such as word processing and data analysis. The user interacts with the application.
Exokernel: An Operating System Architecture for Application-Level Resource Management Dawson R. Engler, M. Frans Kaashoek, and James O’Toole Jr. M.I.T.
Linux+ Guide to Linux Certification Chapter 12 Compression, System Backup, and Software Installation.
Operating Systems.
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
Efficient Instruction Set Randomization Using Software Dynamic Translation Michael Crane Wei Hu.
Chapter 3.1:Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access.
Computer Organization
Linux+ Guide to Linux Certification
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
CS 1308 Computer Literacy and the Internet. Introduction  Von Neumann computer  “Naked machine”  Hardware without any helpful user-oriented features.
A DSP-Based Platform for Wireless Video Compression Patrick Murphy, Vinay Bharadwaj, Erik Welsh & J. Patrick Frantz Rice University November 18, 2002.
Introduction 1-1 Introduction to Virtual Machines From “Virtual Machines” Smith and Nair Chapter 1.
Native Client: A Sandbox for Portable, Untrusted x86 Native Code
Linux+ Guide to Linux Certification Chapter Thirteen Compression, System Back-Up, and Software Installation.
Chapter 4 Storage Management (Memory Management).
Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.
Windows 2000 Course Summary Computing Department, Lancaster University, UK.
INVITATION TO COMPUTER SCIENCE, JAVA VERSION, THIRD EDITION Chapter 6: An Introduction to System Software and Virtual Machines.
How Analog and Digital Recording Works Analog converted to digital via an ADV (Analog to Digital Converter = stream of numbers) On playback: digital converted.
AUDIO AND VIDEO COMPRESSION AND IT’S IMPORTANCE ON THE INTERNET Brian Dillinger May 3, 2010.
Marwan Al-Namari 1 Digital Representations. Bits and Bytes Devices can only be in one of two states 0 or 1, yes or no, on or off, … Bit: a unit of data.
A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.
Lecture 12 Virtualization Overview 1 Dec. 1, 2015 Prof. Kyu Ho Park “Understanding Full Virtualization, Paravirtualization, and Hardware Assist”, White.
Introduction Why are virtual machines interesting?
COMP135/COMP535 Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 2 Lecture 2 – Digital Representations.
Lesson 5 MULTIMEDIA. Multimedia on the Web has expanded rapidly as broadband connections have allowed users to connect at faster speeds. Almost all Web.
Efficient Dynamic Heap Allocation of Scratch-Pad Memory Ross McIlroy, Peter Dickman and Joe Sventek Carnegie Trust for the Universities of Scotland.
Audio Formats. Digital sound files must be organized and structured so that your media player can read them. It's just like being able to read and understand.
Software. Introduction n A computer can’t do anything without a program of instructions. n A program is a set of instructions a computer carries out.
Operating Systems A Biswas, Dept. of Information Technology.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
Invitation to Computer Science 6th Edition
Outline Installing Gem5 SPEC2006 for Gem5 Configuring Gem5.
Efficient Software-Based Fault Isolation
Introduction to Operating Systems
Memory Protection: Kernel and User Address Spaces
Computer Architecture & Operations I
Topics Introduction Hardware and Software How Computers Store Data
Chapter 9 – Real Memory Organization and Management
Operating System Structure
Lossy vs Lossless compression
Video Compression - MPEG
IB Computer Science Topic 2.1.1
Memory Protection: Kernel and User Address Spaces
Computer Science I CSC 135.
Memory Protection: Kernel and User Address Spaces
Memory Protection: Kernel and User Address Spaces
Lecture 14 Virtual Memory and the Alpha Memory Hierarchy
Multimedia: making it Work
Virtualization Layer Virtual Hardware Virtual Networking
Chapter 2: The Linux System Part 1
Topics Introduction Hardware and Software How Computers Store Data
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
Lecture Topics: 11/1 General Operating System Concepts Processes
Language Processors Application Domain – ideas concerning the behavior of a software. Execution Domain – Ideas implemented in Computer System. Semantic.
Outline Chapter 2 (cont) OS Design OS structure
Lesson 5: Multimedia on the Web
SCONE: Secure Linux Containers Environments with Intel SGX
Introduction to Virtual Machines
System calls….. C-program->POSIX call
GCSE COMPUTER SCIENCE Topic 3 - Data 3.9 Data Compression.
Memory Protection: Kernel and User Address Spaces
Computer Applications -Generic Elective
Presentation transcript:

VXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

The Ubiquity of Data Compression Everything is compressed these days – Archive/Backup/Distribution: ZIP, tar.gz,... – Multimedia streams: mp3, ogg, wmv,... – Office documents: XML-in-ZIP – Digital cameras: JPEG, proprietary RAW,... – Video camcorders: DV, MPEG-2,...

Compressed Data Formats Observation #1: Data compression formats evolve rapidly

Compressed Data Formats Observation #1: Data compression formats evolve rapidly

Compressed Data Formats Observation #1: Data compression formats evolve rapidly Problems: – Inconvenient: each new algorithm requires decoder install/upgrade – Impedes data portability: data unusable on systems without supported decoder – Threatens long-term data usability: old decoders may not run on new operating systems

Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively

Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively Fully Backward Compatible Extensions

Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively

Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively

Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively

Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively

Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively Itanic

VXA: Virtual Executable Archives Observation 1+2: Instruction formats are historically more durable than compressed data formats Make archive self-extracting (data + executable decoder) To extract data, archive reader runs embedded decoder Archive Writer Archive Reader D D Archive Encoder Decode r

Goals of VXA Make self-extracting archives... Archive Writer Archive Reader D D Archive Encoder Decode r

Goals of VXA Make self-extracting archives... 1Safe: malicious decoders can't compromise host 2Future-proof: simple, well-defined architecture [Lorie] Archive Writer Archive Reader D Emulator D Archive Encoder Decode r

Goals of VXA Make self-extracting archives... 1Safe: malicious decoders can't compromise host 2Future-proof: simple, well-defined architecture [Lorie] 3Easy: allow reuse of existing code, languages, tools Archive Writer Archive Reader D x86 Emulator D Archive Encoder Decode r

Goals of VXA Make self-extracting archives... 1Safe: malicious decoders can't compromise host 2Future-proof: simple, well-defined architecture [Lorie] 3Easy: allow reuse of existing code, languages, tools 4Efficient: practical for short term data packaging too Archive Writer Archive Reader D Fast x86 Emulator D Archive Encoder Decode r

Outline ● Archiver Operation ● vxZIP Archive Format ● Decoder Architecture ● Emulator Design & Implementation ● Evaluation (performance, storage overhead) ● Conclusion

Archive Writer Operation Archive VXA Archiver

Archive Writer Operation Archive D1D1 Uncompressed Input Files General Compressor Decode r 1 VXA Archiver

Archive Writer Operation Archive D1D1 Uncompressed Input Files General Compressor Decode r 1 VXA Archiver

Archive Writer Operation Archive D1D1 D2D2 D3D3 Uncompressed Input Files General Compressor Decode r 1 Image Compressor Decode r 2 Audio Compressor Decode r 3

Archive Writer Operation Archive Uncompressed Input FilesPre-Compressed Input Files D1D1 D2D2 D3D3 General Compressor Image Compressor Audio Compressor Decode r 1 Decode r 2 Decode r 3

Archive Writer Operation Archive D4D4 D5D5 Uncompressed Input FilesPre-Compressed Input Files Image Format Recognizer Decode r 4 Audio Format Recognizer Decode r 5 D1D1 D2D2 D3D3 General Compressor Image Compressor Audio Compressor Decode r 1 Decode r 2 Decode r 3

Archive Reader Operation Archive VXA Archive Reader x86 Emulator D4D4 D5D5 D1D1 D2D2 D3D3

VXA Archive Reader Archive Reader Operation Archive Original Uncompressed Files x86 Emulator Decoder 1 D4D4 D5D5 D1D1 D2D2 D3D3

VXA Archive Reader Archive Reader Operation Archive Original Uncompressed Files x86 Emulator Decoder 1 Decoder 2 Decoder 3 D4D4 D5D5 D1D1 D2D2 D3D3

VXA Archive Reader Archive Reader Operation Archive Original Uncompressed FilesOriginal Pre-Compressed Files x86 Emulator D4D4 D5D5 D1D1 D2D2 D3D3 Decoder 1 Decoder 2 Decoder 3

VXA Archive Reader Archive Reader Operation Archive Original Uncompressed FilesDe-compressed Files x86 Emulator Decoder 4 Decoder 5 D4D4 D5D5 D1D1 D2D2 D3D3 Decoder 1 Decoder 2 Decoder 3

vxZIP Archive Format ● Backward compatible with legacy ZIP format Central Directory Audio file Image file vxZIP Archive

vxZIP Archive Format ● Backward compatible with legacy ZIP format ● Decoders intermixed with archived files Central Directory Audio file FLAC Decoder Audio file Image file JP2 Decoder vxZIP Archive

vxZIP Archive Format ● Backward compatible with legacy ZIP format ● Decoders intermixed with archived files ● Archived files have new extension header pointing to decoder Central Directory Audio file (FLAC-encoded) FLAC Decoder Audio file (FLAC-encoded) Image file (JP2-encoded) JP2 Decoder vxZIP Archive

vxZIP Archive Format ● Backward compatible with legacy ZIP format ● Decoders intermixed with archived files ● Archived files have new extension header pointing to decoder ● Decoders are hidden, “deflated” (gzip) Central Directory Audio file (FLAC-encoded) FLAC Decoder (deflated) Audio file (FLAC-encoded) Image file (JP2-encoded) JP2 Decoder (deflated) vxZIP Archive

vxZIP Decoder Architecture ● Decoders are ELF executables for x86-32 – Can be written in any language, safe or unsafe – Compiled using ordinary tools (GCC) ● Decoders have access to five “system calls”: – read stdin, write stdout, malloc, next file, exit ● Decoders cannot: – open files, windows, devices, network connections,... – get system info: user name, current time, OS type,...

Decoders Ported So Far (using existing implementations in C, mostly unmodified) General-purpose (lossless): ● zlib: Classic gzip/deflate algorithm ● bzip2: Burrows-Wheeler algorithm Still image codecs: ● jpeg: Classic lossy image compression scheme ● jp2: JPEG 2000 wavelet-based algorithm, lossy or lossless Audio codecs: ● flac: Free Lossless Audio Codec ● vorbis: Standard lossy audio codec for Ogg streams

vx32 Emulator Architecture Runs in vxUnZIP process ● Loads decoder into address space sandbox ● Restricts decoder's memory accesses to sandbox ● Dispatches decoder's VXA “system calls” to vxUnZIP (not to host OS!) VXA Decoder Address Space (up to 1GB) vxUnZIP Process Address Space vxUnZIP Application Decoder Address Space 0 0 VXA System Calls vx32 Emulator library

vx32 Emulator Implementation On x86-{32/64} hosts: – Secure fault isolation [Wahbe] – Data sandboxing via custom LDT segments – Code sandboxing via instruction rewriting [Sites, Nethercote] – No privileges or kernel extensions Kernel Address Space Code Rewriting VXA Decoder Address Space (up to 1GB) Flat-Model Code/Data Segment vxUnZIP Application Decoder Data Segment (LDT) 0 0 Controlled Procedure Calls vx32 Emulator library Transformed code cache

vx32 Emulator Implementation On other host architectures: – Portable but slow “fallback” instruction interpreter (mostly done) – Fast x86-to-PowerPC binary translator (in progress) – Hopefully more in the future Emulator implemented as generic library – Can be used for other sandboxing applications

Evaluation Two issues to address: ● Performance overhead of emulated decoders – not important for long-term archival storage, but... – very important for common short-term uses of archives: backups, software distribution, structured documents,... ● Storage overhead of archived decoders

Performance Test Method Run 6 ported decoders on appropriate data sets – Athlon PC running SuSE Linux 9.3 – Measure user-mode CPU time (not wall-clock time) Compare: – Emulated vs native execution – Running on x86-32 vs x86-64 host environment

Performance Overhead

Storage Overhead Archiver stores only one copy of each decoder – Storage cost amortized over all files of same type – Relative overhead depends on size of archive Therefore, measure only absolute decoder size (compressed, as stored in archive)

Storage Overhead

Conclusion VXA makes self-extracting archives... ● Safe: decoders fully sandboxed ● Future-proof: simple, OS-independent environment ● Easy: re-use existing decoders, languages, tools ● Efficient: ≤ 11% slowdown vs native x86-32 Available at: