Download presentation
Presentation is loading. Please wait.
Published byJared Stevenson Modified over 8 years ago
1
VXA: A Virtual Architecture for Durable Compressed Archives Bryan Ford Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology http://pdos.csail.mit.edu/~baford/vxa/
2
The Ubiquity of Data Compression Everything is compressed these days – Archive/Backup/Distribution: ZIP, tar.gz,... – Multimedia streams: mp3, ogg, wmv,... – Office documents: XML-in-ZIP – Digital cameras: JPEG, proprietary RAW,... – Video camcorders: DV, MPEG-2,...
3
Compressed Data Formats Observation #1: Data compression formats evolve rapidly
4
Compressed Data Formats Observation #1: Data compression formats evolve rapidly
5
Compressed Data Formats Observation #1: Data compression formats evolve rapidly Problems: – Inconvenient: each new algorithm requires decoder install/upgrade – Impedes data portability: data unusable on systems without supported decoder – Threatens long-term data usability: old decoders may not run on new operating systems
6
Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively
7
Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively Fully Backward Compatible Extensions
8
Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively
9
Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively
10
Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively
11
Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively
12
Archiving Compressed Data Observation #2: Processor architectures evolve more conservatively Itanic
13
VXA: Virtual Executable Archives Observation 1+2: Instruction formats are historically more durable than compressed data formats Make archive self-extracting (data + executable decoder) To extract data, archive reader runs embedded decoder Archive Writer Archive Reader D D Archive Encoder Decode r
14
Goals of VXA Make self-extracting archives... Archive Writer Archive Reader D D Archive Encoder Decode r
15
Goals of VXA Make self-extracting archives... 1Safe: malicious decoders can't compromise host 2Future-proof: simple, well-defined architecture [Lorie] Archive Writer Archive Reader D Emulator D Archive Encoder Decode r
16
Goals of VXA Make self-extracting archives... 1Safe: malicious decoders can't compromise host 2Future-proof: simple, well-defined architecture [Lorie] 3Easy: allow reuse of existing code, languages, tools Archive Writer Archive Reader D x86 Emulator D Archive Encoder Decode r
17
Goals of VXA Make self-extracting archives... 1Safe: malicious decoders can't compromise host 2Future-proof: simple, well-defined architecture [Lorie] 3Easy: allow reuse of existing code, languages, tools 4Efficient: practical for short term data packaging too Archive Writer Archive Reader D Fast x86 Emulator D Archive Encoder Decode r
18
Outline ● Archiver Operation ● vxZIP Archive Format ● Decoder Architecture ● Emulator Design & Implementation ● Evaluation (performance, storage overhead) ● Conclusion
19
Archive Writer Operation Archive VXA Archiver
20
Archive Writer Operation Archive D1D1 Uncompressed Input Files General Compressor Decode r 1 VXA Archiver
21
Archive Writer Operation Archive D1D1 Uncompressed Input Files General Compressor Decode r 1 VXA Archiver
22
Archive Writer Operation Archive D1D1 D2D2 D3D3 Uncompressed Input Files General Compressor Decode r 1 Image Compressor Decode r 2 Audio Compressor Decode r 3
23
Archive Writer Operation Archive Uncompressed Input FilesPre-Compressed Input Files D1D1 D2D2 D3D3 General Compressor Image Compressor Audio Compressor Decode r 1 Decode r 2 Decode r 3
24
Archive Writer Operation Archive D4D4 D5D5 Uncompressed Input FilesPre-Compressed Input Files Image Format Recognizer Decode r 4 Audio Format Recognizer Decode r 5 D1D1 D2D2 D3D3 General Compressor Image Compressor Audio Compressor Decode r 1 Decode r 2 Decode r 3
25
Archive Reader Operation Archive VXA Archive Reader x86 Emulator D4D4 D5D5 D1D1 D2D2 D3D3
26
VXA Archive Reader Archive Reader Operation Archive Original Uncompressed Files x86 Emulator Decoder 1 D4D4 D5D5 D1D1 D2D2 D3D3
27
VXA Archive Reader Archive Reader Operation Archive Original Uncompressed Files x86 Emulator Decoder 1 Decoder 2 Decoder 3 D4D4 D5D5 D1D1 D2D2 D3D3
28
VXA Archive Reader Archive Reader Operation Archive Original Uncompressed FilesOriginal Pre-Compressed Files x86 Emulator D4D4 D5D5 D1D1 D2D2 D3D3 Decoder 1 Decoder 2 Decoder 3
29
VXA Archive Reader Archive Reader Operation Archive Original Uncompressed FilesDe-compressed Files x86 Emulator Decoder 4 Decoder 5 D4D4 D5D5 D1D1 D2D2 D3D3 Decoder 1 Decoder 2 Decoder 3
30
vxZIP Archive Format ● Backward compatible with legacy ZIP format Central Directory Audio file Image file vxZIP Archive
31
vxZIP Archive Format ● Backward compatible with legacy ZIP format ● Decoders intermixed with archived files Central Directory Audio file FLAC Decoder Audio file Image file JP2 Decoder vxZIP Archive
32
vxZIP Archive Format ● Backward compatible with legacy ZIP format ● Decoders intermixed with archived files ● Archived files have new extension header pointing to decoder Central Directory Audio file (FLAC-encoded) FLAC Decoder Audio file (FLAC-encoded) Image file (JP2-encoded) JP2 Decoder vxZIP Archive
33
vxZIP Archive Format ● Backward compatible with legacy ZIP format ● Decoders intermixed with archived files ● Archived files have new extension header pointing to decoder ● Decoders are hidden, “deflated” (gzip) Central Directory Audio file (FLAC-encoded) FLAC Decoder (deflated) Audio file (FLAC-encoded) Image file (JP2-encoded) JP2 Decoder (deflated) vxZIP Archive
34
vxZIP Decoder Architecture ● Decoders are ELF executables for x86-32 – Can be written in any language, safe or unsafe – Compiled using ordinary tools (GCC) ● Decoders have access to five “system calls”: – read stdin, write stdout, malloc, next file, exit ● Decoders cannot: – open files, windows, devices, network connections,... – get system info: user name, current time, OS type,...
35
Decoders Ported So Far (using existing implementations in C, mostly unmodified) General-purpose (lossless): ● zlib: Classic gzip/deflate algorithm ● bzip2: Burrows-Wheeler algorithm Still image codecs: ● jpeg: Classic lossy image compression scheme ● jp2: JPEG 2000 wavelet-based algorithm, lossy or lossless Audio codecs: ● flac: Free Lossless Audio Codec ● vorbis: Standard lossy audio codec for Ogg streams
36
vx32 Emulator Architecture Runs in vxUnZIP process ● Loads decoder into address space sandbox ● Restricts decoder's memory accesses to sandbox ● Dispatches decoder's VXA “system calls” to vxUnZIP (not to host OS!) VXA Decoder Address Space (up to 1GB) vxUnZIP Process Address Space vxUnZIP Application Decoder Address Space 0 0 VXA System Calls vx32 Emulator library
37
vx32 Emulator Implementation On x86-{32/64} hosts: – Secure fault isolation [Wahbe] – Data sandboxing via custom LDT segments – Code sandboxing via instruction rewriting [Sites, Nethercote] – No privileges or kernel extensions Kernel Address Space Code Rewriting VXA Decoder Address Space (up to 1GB) Flat-Model Code/Data Segment vxUnZIP Application Decoder Data Segment (LDT) 0 0 Controlled Procedure Calls vx32 Emulator library Transformed code cache
38
vx32 Emulator Implementation On other host architectures: – Portable but slow “fallback” instruction interpreter (mostly done) – Fast x86-to-PowerPC binary translator (in progress) – Hopefully more in the future Emulator implemented as generic library – Can be used for other sandboxing applications
39
Evaluation Two issues to address: ● Performance overhead of emulated decoders – not important for long-term archival storage, but... – very important for common short-term uses of archives: backups, software distribution, structured documents,... ● Storage overhead of archived decoders
40
Performance Test Method Run 6 ported decoders on appropriate data sets – Athlon 64 3000+ PC running SuSE Linux 9.3 – Measure user-mode CPU time (not wall-clock time) Compare: – Emulated vs native execution – Running on x86-32 vs x86-64 host environment
41
Performance Overhead
43
Storage Overhead Archiver stores only one copy of each decoder – Storage cost amortized over all files of same type – Relative overhead depends on size of archive Therefore, measure only absolute decoder size (compressed, as stored in archive)
44
Storage Overhead
45
Conclusion VXA makes self-extracting archives... ● Safe: decoders fully sandboxed ● Future-proof: simple, OS-independent environment ● Easy: re-use existing decoders, languages, tools ● Efficient: ≤ 11% slowdown vs native x86-32 Available at: http://pdos.csail.mit.edu/~baford/vxa/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.