The Anatomy of a Zip File How to Carve/Rebuild Zip Files by Hand By Jonathan Glass
Reason for this Presentation Recently, I was charged with creating an forensic challenge at work that focused on data loss. The scenario included a several instances of data exfiltration but one specifically seemed more challenging than most. A portion of the challenge included a zip file that was uploaded directly from a mapped network drive to Google Docs. Participants were given only the memory dump and a dd of the workstation hard drive to work with. The file was never logically written directly to the C:\ drive. No other information was provided. So I will attempt to share my disjointed process of recovering the contents of the zip file without using any prior knowledge of the file.
First strings first. I needed a file name. strings -td mem.dmp | grep –i docs\.google\.com 718149979 ({"id":"0","rt":"3","rd":[{"version":1,"type":"change","payload":"{\"action s\":[{\"actionCategory\":\"open\",\"minimumRole\":20,\"url\":\"https://docs .google.com/file/d/0B5oGkhb7v8qKSmh3S25MMDQxTHc/edit?usp\\u003ddrive_web\"} ],\"attributes\":{\"blob_versionable\":true,\"collaboratorsCanInvite\":true ,\"copyable\":true,\"downloadable\":true,\"mine\":true,\"shareable\":true,\ "subscribed\":true},\"cosmoType\":\"DoclistBlob\",\"fileExtension\":\"zip\" ,\"fileSize\":153847,\"fileSizeFormatted\":\"153,847 bytes\",\"filters\":[\"items\"],\"id\":\"0B5oGkhb7v8qKSmh3S25MMDQxTHc\",\"l astCollaborator\":{\"email\":\"email@gmail.com\",\"id\":\"06018710951436479 518\",\"me\":true,\"name\":\"Hacker Jacks\"},\"lastEditedText\":\"2:52 am\",\"lastEditedUtc\":1414824731418,\"lastModByMeText\":\"2:52 am\",\"lastModByMeUtc\":1414824731418,\"mav\":0,\"mimeType\":\"application/ zip\",\"myRole\":40,\"name\":\"DocumentsToRecover.zip\",\"nameKey\":[47,69, 45,81,65,49,67,79,77,79,69,75,49,45,69,83,49,75,9,91,57,71,1,26,1,26,0],\"o wners\":[{\"email\":\"email@gmail.com\",\"id\":\"06018710951436479518\",\"m e\":true,\"name\":\"Hacker Jack“… Among other interesting information, I found File Name : DocumentsToRecover.zip File Size : 153847
Looking for evidence of the file in memory… strings -td mem.dmp |grep -i documentstorecover 130693005 DocumentsToRecover.zip 153524527 DocumentsToRecover/ThirdFileToRecover.txt 153525327 DocumentsToRecover/FirstFileToRecover.txt 153525887 DocumentsToRecover/ThirdFileToRecover.txt 170308399 DocumentsToRecover/SecondFileToRecover.txt 170373665 DocumentsToRecover/Seconq 170373793 tion/DocumentsToRecover/y 187321391 DocumentsToRecover 187817478 DocumentsToRecover 190532734 H:\Documentation\DocumentsToRecover.zip 193347292 DocumentsToRecover 193353628 DocumentsToRecover 193420596 DocumentsToRecover.zip.lnk 193693784 DocumentsToRecover.zip.lnk 263979807 DocumentsToRecover/EighthFileToRecover.txtPK 263979895 DocumentsToRecover/FifthFileToRecover.txtPK 263979982 DocumentsToRecover/FirstFileToRecover.txtPK 263980069 DocumentsToRecover/FourthFileToRecover.txtPK 263980157 DocumentsToRecover/SecondFileToRecover.txtPK 263980245 DocumentsToRecover/SeventhFileToRecover.txtPK 263980334 DocumentsToRecover/SixthFileToRecover.txtPK 263980421 DocumentsToRecover/ThirdFileToRecover.txt Bingo! The file is still in memory. Now what?
Lets take a step back and look see what a zip file looks like… 00000000 50 4B 03 04 14 00 00 00 08 00 E2 BD 62 45 22 F2 PK..........bE". 00000010 B9 72 16 00 00 00 17 00 00 00 08 00 00 00 46 69 .r............Fi 00000020 6C 65 2E 74 78 74 73 CB CC 49 0D C9 28 4A 4D 75 le.txts..I..(JMu 00000030 CE CF 2B 49 CD 2B 29 E6 E5 02 41 00 50 4B 01 02 ..+I.+)...A.PK.. 00000040 14 00 14 00 00 00 08 00 E2 BD 62 45 22 F2 B9 72 ..........bE"..r 00000050 16 00 00 00 17 00 00 00 08 00 00 00 00 00 00 00 ................ 00000060 01 00 20 00 00 00 00 00 00 00 46 69 6C 65 2E 74 .. .......File.t 00000070 78 74 50 4B 05 06 00 00 00 00 01 00 01 00 36 00 xtPK..........6. 00000080 00 00 3C 00 00 00 00 00 ..<..... This is a single file inside of zip.
Simple Summary of Zip File Parts 00000000 50 4B 03 04 14 00 00 00 08 00 E2 BD 62 45 22 F2 PK..........bE". 00000010 B9 72 16 00 00 00 17 00 00 00 08 00 00 00 46 69 .r............Fi 00000020 6C 65 2E 74 78 74 73 CB CC 49 0D C9 28 4A 4D 75 le.txts..I..(JMu 00000030 CE CF 2B 49 CD 2B 29 E6 E5 02 41 00 50 4B 01 02 ..+I.+)...A.PK.. 00000040 14 00 14 00 00 00 08 00 E2 BD 62 45 22 F2 B9 72 ..........bE"..r 00000050 16 00 00 00 17 00 00 00 08 00 00 00 00 00 00 00 ................ 00000060 01 00 20 00 00 00 00 00 00 00 46 69 6C 65 2E 74 .. .......File.t 00000070 78 74 50 4B 05 06 00 00 00 00 01 00 01 00 36 00 xtPK..........6. 00000080 00 00 3C 00 00 00 00 00 ..<..... Local File Header – Each file in the zip gets a local File Data – The Compressed/Encrypted Contents Of The File Central Directory – Summarizes Local File Descriptors And Contains Additional Info
00000000 50 4B 03 04 14 00 00 00 00 00 A9 98 6B 45 FB 98 PK..........kE.. 00000010 41 19 14 00 00 00 14 00 00 00 09 00 00 00 46 69 A.............Fi 00000020 6C 65 31 2E 74 78 74 46 69 6C 65 4F 6E 65 43 6F le1.txtFileOneCo 00000030 6E 74 65 6E 74 73 21 21 21 0D 0A 50 4B 03 04 14 ntents!!!..PK... 00000040 00 00 00 00 00 A4 98 6B 45 63 DF 9A 45 14 00 00 .......kEc..E... 00000050 00 14 00 00 00 09 00 00 00 46 69 6C 65 32 2E 74 .........File2.t 00000060 78 74 46 69 6C 65 54 77 6F 43 6F 6E 74 65 6E 74 xtFileTwoContent 00000070 73 21 21 21 0D 0A 50 4B 03 04 14 00 00 00 08 00 s!!!..PK........ 00000080 E2 BD 62 45 22 F2 B9 72 16 00 00 00 17 00 00 00 ..bE"..r........ 00000090 09 00 00 00 46 69 6C 65 33 2E 74 78 74 73 CB CC ....File3.txts.. 000000A0 49 0D C9 28 4A 4D 75 CE CF 2B 49 CD 2B 29 E6 E5 I..(JMu..+I.+).. 000000B0 02 41 00 50 4B 01 02 14 00 14 00 00 00 00 00 A9 .A.PK........... 000000C0 98 6B 45 FB 98 41 19 14 00 00 00 14 00 00 00 09 .kE..A.......... 000000D0 00 00 00 00 00 00 00 01 00 20 00 00 00 00 00 00 ......... ...... 000000E0 00 46 69 6C 65 31 2E 74 78 74 50 4B 01 02 14 00 .File1.txtPK.... 000000F0 14 00 00 00 00 00 A4 98 6B 45 63 DF 9A 45 14 00 ........kEc..E.. 00000100 00 00 14 00 00 00 09 00 00 00 00 00 00 00 01 00 ................ 00000110 20 00 00 00 3B 00 00 00 46 69 6C 65 32 2E 74 78 ...;...File2.tx 00000120 74 50 4B 01 02 14 00 14 00 00 00 08 00 E2 BD 62 tPK............b 00000130 45 22 F2 B9 72 16 00 00 00 17 00 00 00 09 00 00 E"..r........... 00000140 00 00 00 00 00 01 00 20 00 00 00 76 00 00 00 46 ....... ...v...F 00000150 69 6C 65 33 2E 74 78 74 50 4B 05 06 00 00 00 00 ile3.txtPK...... 00000160 03 00 03 00 A5 00 00 00 B3 00 00 00 00 00 .............. Zip File with 3 Files
Local File Header 00000000 50 4B 03 04 14 00 00 00 08 00 E2 BD 62 45 22 F2 PK..........bE". 00000010 B9 72 16 00 00 00 17 00 00 00 08 00 00 00 46 69 .r............Fi 00000020 6C 65 2E 74 78 74 le.txt
Reading the Local File Header 00000000 50 4B 03 04 14 00 00 00 08 00 E2 BD 62 45 22 F2 PK..........bE". 00000010 B9 72 16 00 00 00 17 00 00 00 08 00 00 00 46 69 .r............Fi 00000020 6C 65 2E 74 78 74 le.txt Signature Version Flags Compression method File modification time File modification date CRC-32 checksum Compressed size Uncompressed size File name length Extra field length File name 0x04034b50 (read as a little-endian number) Major Ver. 2.0 (14 HEX = 20 Decimal/10) Minor Ver. 0 None 08: deflated 23:47:02 SEE NEXT SLIDE 11/2/2014 SEE NEXT SLIDE 0x72B9F222 1924788770 checksum 16 = 22 bytes 17 = 23 bytes 8 chars F i l e . t x t N/A “File.txt”
MSDOS Timestamp in 2 minutes E2 BD 62 45 E2 BD = BD E2 little endian BD E2 = 10111(23) 101111(47) 00010(2) = 23:47:02 62 45 = 45 62 little endian 45 62 = 0100010(34) 1011(11) 00010(2) = 11/2/2014 0100010(34)represents the years since 1980 http://msdn.microsoft.com/en-us/library/ms724247%28v=vs.85%29.aspx
File headers & file data get stacked upon each other until we get to the Central Directory (The standard supports additional fields depending upon how the options used to create the archive. I’m just hitting the highlights.) Local File1 Header File 1 Data Local File2 Header File 2 Data Local File 3 Header File3 Data . . . Local File N Header File N Data Central Directory
Central Directory The central directory contains more metadata about the files in the archive and also contains encryption information and information about Zip64 (64-bit zip archives) archives. Furthermore, the central directory contains information about archives that span multiple files. At the end of the File! This is can be problematic for exfiltrating large archives over sketchy connections (Tor). This is why attackers often use the RAR format which includes the metadata in the beginning of the file. This allows for some content to be recovered even if only part of the archive is received.
Central Directory
Central Directory 000000B0 02 41 00 50 4B 01 02 14 00 14 00 00 00 00 00 A9 .A.PK........... 000000C0 98 6B 45 FB 98 41 19 14 00 00 00 14 00 00 00 09 .kE..A.......... 000000D0 00 00 00 00 00 00 00 01 00 20 00 00 00 00 00 00 ......... ...... 000000E0 00 46 69 6C 65 31 2E 74 78 74 50 4B 01 02 14 00 .File1.txtPK.... 000000F0 14 00 00 00 00 00 A4 98 6B 45 63 DF 9A 45 14 00 ........kEc..E.. 00000100 00 00 14 00 00 00 09 00 00 00 00 00 00 00 01 00 ................ 00000110 20 00 00 00 3B 00 00 00 46 69 6C 65 32 2E 74 78 ...;...File2.tx 00000120 74 50 4B 01 02 14 00 14 00 00 00 08 00 E2 BD 62 tPK............b 00000130 45 22 F2 B9 72 16 00 00 00 17 00 00 00 09 00 00 E"..r........... 00000140 00 00 00 00 00 01 00 20 00 00 00 76 00 00 00 46 ....... ...v...F 00000150 69 6C 65 33 2E 74 78 74 50 4B 05 06 00 00 00 00 ile3.txtPK...... 00000160 03 00 03 00 A5 00 00 00 B3 00 00 00 00 00 .............. File header 1 File header 2 File header 3 End of Central Directory Record
Central Directory File Header Format Just as before, not all fields are required.
Central Dir Vs Local Headers Fields in the Central Directory Header no present in Local Headers File comm. Len: the length of the file comment Disk # start: the number of the disk on which this file exists Internal attr: Internal file attributes External attr: External file attributes (operating system specific) Offset of local header: Relative offset of local header. This is the offset of where to find the corresponding local file header from the start of the first disk. Extra field File Comment
End of central directory record Signature The signature of end of central directory record. This is always '\x50\x4b\x05\x06'. Disk Number The number of this disk (containing the end of central directory record) Disk # w/cd Number of the disk on which the central directory starts Disk entries The number of central directory entries on this disk Total entries Total number of entries in the central directory. Central directory size Size of the central directory in bytes Offset of cd wrt to starting disk Offset of the start of the central directory on the disk on which the central directory starts Comment len The length of the following comment field ZIP file comment Optional comment for the Zip file
Now back to our case… strings -td mem.dmp |grep -i documentstorecover 130693005 DocumentsToRecover.zip 153524527 DocumentsToRecover/ThirdFileToRecover.txt 153525327 DocumentsToRecover/FirstFileToRecover.txt 153525887 DocumentsToRecover/ThirdFileToRecover.txt 170308399 DocumentsToRecover/SecondFileToRecover.txt 170373665 DocumentsToRecover/Seconq 170373793 tion/DocumentsToRecover/y 187321391 DocumentsToRecover 187817478 DocumentsToRecover 190532734 H:\Documentation\DocumentsToRecover.zip 193347292 DocumentsToRecover 193353628 DocumentsToRecover 193420596 DocumentsToRecover.zip.lnk 193693784 DocumentsToRecover.zip.lnk 263979807 DocumentsToRecover/EighthFileToRecover.txtPK 263979895 DocumentsToRecover/FifthFileToRecover.txtPK 263979982 DocumentsToRecover/FirstFileToRecover.txtPK 263980069 DocumentsToRecover/FourthFileToRecover.txtPK 263980157 DocumentsToRecover/SecondFileToRecover.txtPK 263980245 DocumentsToRecover/SeventhFileToRecover.txtPK 263980334 DocumentsToRecover/SixthFileToRecover.txtPK 263980421 DocumentsToRecover/ThirdFileToRecover.txt File names all in a row? Reminds me of Local File Headers File names followed by “PK” all in a row? Looks like Central Directory File Headers to me.
Looking for evidence of the file in memory… strings -td mem.dmp |grep -i documentstorecover 130693005 DocumentsToRecover.zip 153524527 DocumentsToRecover/ThirdFileToRecover.txt 153525327 DocumentsToRecover/FirstFileToRecover.txt 153525887 DocumentsToRecover/ThirdFileToRecover.txt 170308399 DocumentsToRecover/SecondFileToRecover.txt 170373665 DocumentsToRecover/Seconq 170373793 tion/DocumentsToRecover/y 187321391 DocumentsToRecover 187817478 DocumentsToRecover 190532734 H:\Documentation\DocumentsToRecover.zip 193347292 DocumentsToRecover 193353628 DocumentsToRecover 193420596 DocumentsToRecover.zip.lnk 193693784 DocumentsToRecover.zip.lnk 263979807 DocumentsToRecover/EighthFileToRecover.txtPK 263979895 DocumentsToRecover/FifthFileToRecover.txtPK 263979982 DocumentsToRecover/FirstFileToRecover.txtPK 263980069 DocumentsToRecover/FourthFileToRecover.txtPK 263980157 DocumentsToRecover/SecondFileToRecover.txtPK 263980245 DocumentsToRecover/SeventhFileToRecover.txtPK 263980334 DocumentsToRecover/SixthFileToRecover.txtPK 263980421 DocumentsToRecover/ThirdFileToRecover.txt Bingo! The file is still in memory. Now what?
My Game Plan As far I can tell I am looking at a zip file containing 8 files inside one directory: DocumentsToRecover DocumentsToRecover/EighthFileToRecover.txt DocumentsToRecover/FifthFileToRecover.txt DocumentsToRecover/FirstFileToRecover.txt DocumentsToRecover/FourthFileToRecover.txt DocumentsToRecover/SecondFileToRecover.txt DocumentsToRecover/SeventhFileToRecover.txt DocumentsToRecover/SixthFileToRecover.txt DocumentsToRecover/ThirdFileToRecover.txt Given the space between the Local File Headers and the Central Directory Headers, I am guessing this zip file is not in one contiguous chunk. Grabbing the entire zip file seems improbable.
My Game Plan Grab each file individually by creating 8 zip files. Carve each file’s compressed File Data Create/Carve the Local File Header Create /Carve the Central Directory Unzip Repeat Sounds easy enough, right? **I have since discovered better & faster methods of accomplishing this but this way is the most educational. Lets do it the long way first.
Looking for Headers strings -td /mnt/hgfs/DEMO/zip/mem.dmp | grep -i FirstFileToRecover.txt 186356826 DocumentsToRecover/FirstFileToRecover.txtPK No local header. strings -td /mnt/hgfs/DEMO/zip/mem.dmp | grep -i SecondFileToRecover.txt 186357001 DocumentsToRecover/SecondFileToRecover.txtPK 186441378 DocumentsToRecover/SecondFileToRecover.txt}SKn BOTH HEADERS!
SecondFileToRecover.txt Seek to the offset of the local header Copy local header to new file.
SecondFileToRecover.txt Seek to the offset of the local header Copy local header to new file. Look at the size of the compressed file 0x021D = 541 bytes Seek 541 bytes from the end of the local header…
Start of another local header Copy 541 bytes(File Data) from the end of the local header and paste to the new file Now we need the build the footer Compressed File Data Start of another local header
Seek to the offset of the file namePK we found earlier Copy Central Directory Header to new file
Everything should be good in the CD header except for… We need to change the local header offset to 0x0000 because this is the first/only file in the archive. The local header starts at the beginning of the file. Now we need a End of central directory record to make a complete zip file.
Let’s Just Build One 00000000 50 4B 05 06 00 00 00 00 01 00 01 00 58 00 00 00 PK..........X... 00000010 65 02 00 00 00 00 e..... Signature The signature of end of central directory record. This is always '\x50\x4b\x05\x06'. Disk Number Needs to be 00 because this is the only/first central directory for this archive. Disk # w/cd Disk entries This needs to be set 01 because there is only one central directory for this archive. Total entries Total number of entries in the central directory is 01 because we only have one file Central directory size The central directory header is 88(58 in HEX) bytes long. Offset of cd wrt to starting disk Local Header (72 bytes) + File Data (541 bytes) = 613 (265 HEX) (0x6502) Little Endian Comment len No comment needed 00
Add End of central directory record to the end of the new file and save! Also cross fingers
Extraction works!
Don’t feel like Looking For/Creating a Central Directory? Don’t! All of the information you need to decompress the zip is present in the local header file. If you have the local header and the file data, you don’t need the Central Directory! Although not every ZIP decompressor will use local file headers when the index is unavailable, the tar and cpio front ends to libarchive (bsdtar and bsdcpio) can do streaming decompression when reading through a pipe It will generate errors but it does work. Great for memdumps and pcaps To install on SIFT “apt-get install bsdtar”
3 Local File Headers & File Data but NO CD Still Worked
Why is it important to understand Zip files? Many popular file types use the Zip standard Java JAR (EAR, RAR (Java), WAR) Office Open XML (Microsoft) 2007 and greater (docx, xlsx, pptx) Open Packaging Conventions OpenDocument (ODF) XPI (Mozilla extensions) The only native compression method included with Windows.
References http://en.wikipedia.org/wiki/Zip_(file_format)#Structure https://users.cs.jmu.edu/buchhofp/forensics/formats/pkzip.html http://www.garykessler.net/library/file_sigs.html http://petlibrary.tripod.com/ZIP.HTM http://www.pkware.com/documents/casestudies/APPNOTE.TXT http://www.pelib.com/resources/luevel.txt http://www.diskinternals.com/zip-repair/
Questions?