Assembly 2005, Helsinki, July Crinkler - compressing Windows 4k intros to EXE files Aske Simon Christensen Rune L. H. Stubbe
Assembly 2005, Helsinki, July Overview Background Compression method Function import Header layout Demo Future plans
Assembly 2005, Helsinki, July Why another one? EXE optimizerCAB compressorBAT inserter EXE file BAT file Most common method: CAB dropping Dropping is a mess We want EXE files!
Assembly 2005, Helsinki, July How is Crinkler different? The normal build process: CompilerAssemberLinkerCruncher C/C++ files ASM files object / library files EXE file
Assembly 2005, Helsinki, July How is Crinkler different? The Crinkler way: CompilerAssemberCrinkler C/C++ files ASM files object / library files EXE file
Assembly 2005, Helsinki, July Why another one? Control over code and data placement –Choose base address –Optimize order for best compression –Separate code and data –Put in extra code Import code Code transformations
Assembly 2005, Helsinki, July Compression method Context modelling + Much better compression ratio than LZX + Well suited for small amounts of data + Small decompression code (< 250 bytes) + Pays off even with the extra header - Extremely slow - Very memory-hungry
Assembly 2005, Helsinki, July Data compression basics Take advantage of self-similarity Find patterns and eliminate them Dictionary compression Statistical compression
Assembly 2005, Helsinki, July Dictionary compression LZ77: Refer repetitions back to original Reasonable compression ratio Fast compression Very fast decompression MISSISSIPPIMISS ISSI PPI
Assembly 2005, Helsinki, July Estimate probability distribution of each symbol based on earlier data PPM: Problem: local MISSISSIPPI Statistical compression
Assembly 2005, Helsinki, July MISSISSIPPI Context modelling Generalization of PPM Look at combinations of recent symbols A bit mask describes a model Problem: Many masks to choose from
Assembly 2005, Helsinki, July Implementation Estimation for each single bit Context is current byte + selection of last 8 Estimate the best collection of masks Estimate the best weights of the masks Keep track of contexts in a hash table Ignore hash collisions Find hash table size with few collisions
Assembly 2005, Helsinki, July Function import Import by name: Name of each function –The import table is a big part of an EXE file Import by ordinal: Number instead of name –Much smaller but quite incompatible Import by hash: Hash code of each function –Small and compatible –Not supported directly Import by hashed ordinal range
Assembly 2005, Helsinki, July Header optimization DOS header Section header PE offset DOS stub PE header Data directories 544 bytes!
Assembly 2005, Helsinki, July Header optimization DOS header Section header PE offset DOS stub PE header Data directories
Assembly 2005, Helsinki, July Header optimization DOS header Section header PE offset DOS stub PE header Data directories
Assembly 2005, Helsinki, July Header optimization DOS header Section header PE offset DOS stub PE header Data directories Ignored
Assembly 2005, Helsinki, July Header optimization DOS header Section header PE offset DOS stub PE header Data directories Ignored 196 bytes!
Assembly 2005, Helsinki, July Header optimization DOS header Section header PE offset DOS stub PE header Data directories Hash code 124 bytes + 18 hash codes!
Assembly 2005, Helsinki, July Demo
Assembly 2005, Helsinki, July Future plans Windows 2000 compatibility Even better compression Section reordering Transformations More feedback 64k specialized version
Assembly 2005, Helsinki, July Thank you Questions? Comments? Suggestions?