The Evolution of File Carving Presenters: Muhammad Mohsin Butt(g201103010) COE589 Paper Presentation.

Slides:



Advertisements
Similar presentations
Information Retrieval in Practice
Advertisements

Computer Forensic Analysis By Aaron Cheeseman Excerpt from Investigating Computer-Related Crime By Peter Stephenson (2000) CRC Press LLC - Computer Crimes.
Memory.
COMP091 – Operating Systems 1
Part IV: Memory Management
Word Spotting DTW.
Chapter 4 : File Systems What is a file system?
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
COE-589 Carving contiguous and fragmented files with fast object validation Author: Simson L. Garfinkel Presented by: Mohammad Faizuddin g
Lecture 12: Revision Lecture Dr John Levine Algorithms and Complexity March 27th 2006.
File System Analysis.
File Management Chapter 12. File Management A file is a named entity used to save results from a program or provide data to a program. Access control.
Lecture 6 Image Segmentation
Region Segmentation. Find sets of pixels, such that All pixels in region i satisfy some constraint of similarity.
Computer & Network Forensics
Connecting with Computer Science, 2e
Detecting Near Duplicates for Web Crawling Authors : Gurmeet Singh Mank Arvind Jain Anish Das Sarma Presented by Chintan Udeshi 6/28/ Udeshi-CS572.
Recap of Feb 27: Disk-Block Access and Buffer Management Major concepts in Disk-Block Access covered: –Disk-arm Scheduling –Non-volatile write buffers.
1 Friday, July 07, 2006 “Vision without action is a daydream, Action without a vision is a nightmare.” - Japanese Proverb.
Fundamentals of Multimedia Chapter 7 Lossless Compression Algorithms Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Data Compression Basics & Huffman Coding
The Shortest Path Problem
Overview of Search Engines
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
1.A file is organized logically as a sequence of records. 2. These records are mapped onto disk blocks. 3. Files are provided as a basic construct in operating.
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Objectives Learn what a file system does
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems David Goldschmidt, Ph.D.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Disk Fragmentation 1. Contents What is Disk Fragmentation Solution For Disk Fragmentation Key features of NTFS Comparing Between NTFS and FAT 2.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
Index Building Overview Database tables Building flow (logical) Sequential Drawbacks Parallel processing Recovery Helpful rules.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 9: Memory Management Background Swapping Contiguous Allocation Paging Segmentation.
Automated Reassembly of Document Fragments DFRWS 2002.
OSes: 11. FS Impl. 1 Operating Systems v Objectives –discuss file storage and access on secondary storage (a hard disk) Certificate Program in Software.
Addressing Image Compression Techniques on current Internet Technologies By: Eduardo J. Moreira & Onyeka Ezenwoye CIS-6931 Term Paper.
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
File Storage Organization The majority of space on a device is reserved for the storage of files. When files are created and modified physical blocks are.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 11: File System Implementation.
Module 4.0: File Systems File is a contiguous logical address space.
DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.
Main Memory. Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The.
CE Operating Systems Lecture 17 File systems – interface and implementation.
Sorting and Searching by Dr P.Padmanabham Professor (CSE)&Director
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Digital Forensics Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #8 File Systems September 22, 2008.
Chapter 8 File Systems FAT 12/16/32. Defragmentation Defrag a hard drive – Control Panel  System and Security  Administration tools  Defrag hard drive.
Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]
CHAPTER 51 LINKED LISTS. Introduction link list is a linear array collection of data elements called nodes, where the linear order is given by means of.
W4118 Operating Systems Instructor: Junfeng Yang.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 8: Main Memory.
Digital Forensics Anthony Lawrence. Overview Digital forensics is a branch of forensics focusing on investigating electronic devises. Important in for.
Analysing Image Files Michael Jones. Overview Images and images Binary, octal, hexadecimal File headers and footers Example (image) files Looking for.
Graph-based Segmentation
Information Retrieval in Practice
Module 11: File Structure
File Management.
Multimedia Information Retrieval
Chapter 11: File System Implementation
Database Implementation Issues
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy Chapter 8 11/24/2018.
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy Chapter 9 12/1/2018.
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy Chapter 9 4/5/2019.
Database Implementation Issues
Chapter 5 File Systems -Compiled for MCA, PU
Presentation transcript:

The Evolution of File Carving Presenters: Muhammad Mohsin Butt(g ) COE589 Paper Presentation

Contents Introduction Background Traditional Recovery File Carving Smart Carver Conclusion

Introduction This Survey presents various File Carving techniques. File carving is a forensic technique to recover data based on file structure and content. No file system meta-data is required Main Focus of this paper is on File carving techniques for Fragmented Data.

Background File System Part of OS that manages the creation, deletion, allocation various other functions on files. FAT 32 and NTFS File Systems are most famous for Windows OS. Basic unit of data storage on disks is cluster. Clusters are usually multiples of 512 Bytes.

Background Recovery In FAT -32(File Allocation Table) Files can be allocated in different ways. Contiguous Allocation. Linked Allocation. Indexed Allocation.

Background Contiguous Allocation.Linked Allocation

Background Indexed Allocation

Background Indexed Allocation

Traditional Recovery Techniques These recovery techniques use the met-data of file system to recover data. Data Storage in FAT32

Traditional Recovery Techniques Deletion and Recovery in FAT32.

File Carving What if we don’t have file system meta-data information ?? File carving recovers data without using file system information. Knowledge of Structure of files to be recovered is used. File Carving can be divided into two categories File Carving for non Fragmented data. File Carving for Fragmented data.

File Carving (First Generation) Performed good for non fragmented data. In forensics user data (Images, documents etc) is important to recover. The search pool is reduced by removing operating system files which are detected using their MD5 Hash and keywords. Byte Sequences at prescribed offsets are used to identify files.

File Carving (First Generation) Header and footer information of files to be recovered is used. JPEG image header cluster begin with sequence FFD8. JPEG image footer cluster contains the sequence FFD9. Some files don’t have footer information. BMP image has file size, number of clusters and other info present in header. Number of unallocated clusters as indicated by the header of BMP image are merged for recovery.

File Carving (First Generation) Foremost tool implemented both header to footer carving and also carving based on header and size of file information. Scalpel built on foremost engine improved the performance and memory usage of this file carving techniques. Both these suffer degradation in performance when data is fragmented.

Fragmentation As files are edited, modified and deleted, most hard drives get fragmented. Also depends on allocation methodology of file system. Fragmentation in forensically important files like , WORD document etc. is high. Why?? Because of constant editing, deletion and addition PST files are most fragmented. Wear Leveling Algorithms in Next Gen Hard Drives (SSD) also cause fragmentation.

Fragmentation Fragmented File Recovery

Graph Theoretic Carvers. Provide Recovery of fragmented files. Recovery is formulated as a Hamiltonian Path Problem. Solved using alpha-beta heuristics.

Hamiltonian Path Problem. Given a set of clusters. Find a permutation of these clusters that recovers the correct file. Identify pairs that are adjacent in original document. Assign weights between clusters which represent the likelihood one cluster following the other in original file. The best permutation is the on that maximizes the candidate weights of adjacent clusters.

Hamiltonian Path Problem. Formulated as a graph. Vertices represent clusters. Edges represent weights between clusters. Problem Reduces to finding a maximum weight Hamiltonian path in this graph.

Assigning Weights Weight assignment is the key in this type of carving. Prediction By Partial Matching (PPM) technique is used for assigning weights. PPM is good for Texts.

Assigning Weights Weight Assignment in Images

K-Vertex Disjoint Path Problem. Hamiltonian Path method assumed that all the clusters belong to same file. In actual systems multiple files are fragmented together. Headers of various files are identified from the pool of clusters. Graph is again formed using weights. Now K-disjoint paths are found in this graph using various algorithms where k represents number of headers found in previous step. Developed primarily for recovering images.

K-Vertex Disjoint Path Problem. Various algorithms to find k disjoint paths. Unique Path (UP) Algorithms provides best performance. Each Cluster is assigned to only one file. Incorrect assignment may result in two files incorrectly recovered. Parallel Unique Path Algorithm. Shortest Path First Algorithm.

Parallel Unique Path (PUP). Variation of dijkstra’s single source shortest path algorithm. 1.Given k headers and a pool of clusters. 2.Find the best cluster match for each of the headers. 3.From the matches found in previous step take the best one and assign it to the header. 4.Remove the chosen cluster from the available clusters pool. 5.Find again the best match for found cluster and repeat the step3 until all files recovered.

Parallel Unique Path (PUP).

Shortest Path First This algorithm presents the idea that best recoveries have lowest average path costs. The average path cost is simply the sum of the weights between the clusters of a recovered file divided by the number of clusters. Takes one image at a time. Reconstruct the image. After reconstruction the clusters used are not removed from the cluster pool. This process is repeated for all the images. Out of all the recovered images the one with lowest path cost is assumed as the best recovery. Clusters associated with the best recovery are than removed.

Shortest Path First This algorithm presents the idea that best recoveries have lowest average path costs. The average path cost is simply the sum of the weights between the clusters of a recovered file divided by the number of clusters. Takes one image at a time. Reconstruct the image. After reconstruction the clusters used are not removed from the cluster pool. This process is repeated for all the images. Out of all the recovered images the one with lowest path cost is assumed as the best recovery. Clusters associated with the best recovery are than removed.

Results Shortest Path First provides an accuracy of 88% PUP provides an accuracy of 83% but is faster. Both require edge weights to be pre computed. For large hard drives requirement of forming weights by checking the likelihood between clusters is a major drawback.

BiFragment Gap Carving Most of the real world data is bi-fragmented. This technique works for files with known header and footer. Files should be decodable or be validated via their structure. Works by searching for combinations between identified header and footer.

BiFragment Gap Carving

Smart Carver Can work on fragmented and non fragmented data. Wide variety of file types supported. Preprocessing Data clusters are decrypted or decompressed. Collating Classification of cluster to various file types. Reassembly

Smart Carver (PreProcessing) Compressed and encrypted drive are decrypted/decompressed in this stage. Removing known clusters from the disk based on file system met-data. Helps increase the speed and reduce the amount of data for next phases. Allocated files and Operating system specific data can be pruned since it doesn’t have any use in forensics.

Smart Carver (Collating) Classifies the disk clusters as belonging to certain file types. Reduces the cluster pool in recovery of file of each type. Keyword/Pattern Matching Looking for sequences to determine the type of cluster. E.g. tags in a cluster collates to html file. ASCII characters frequency High frequency of these indicate that data is non Video or Image.

Smart Carver (Collating) File Fingerprints Uses Byte Frequency Distribution (BFD) to determine the type of file. BFD is generated by creating a histogram for the file. A centroid model for each file type is created using the mean and standard deviation of each byte value. Still they face problem differentiating JPEG and ZIP Still a hot research topic.

Smart Carver (ReAssembly) Reassembly can done by Finding the starting fragment of a file that contains the header. Merging clusters belonging to same fragment. Finding the fragmentation point i.e. the last cluster in current segment. Starting point of next fragment. Ending point of last fragment. Last cluster contating the footer.

Smart Carver (ReAssembly) Merging of similar Clusters can be done in two ways. KeyWord/Dictionary This occurs when a word is formed between the two cluster boundaries. E.g. One cluster ends at “he”, second starting at “llo World”. Both can be merged. File Structure File structure can help in merging. Length field in headers indicate the length of data. E.g. in PNG file if length value is k than after k clusters CRC of data associated is present. If the data in between has same CRC than we can merger all clusters in between. Otherwise fragmentation is present.

Smart Carver (ReAssembly) Sequential Hypothesis Parallel Unique Path Algorithm( SHT-PUP) for reassembly. Modification of PUP algorithm. In PUP when best match is found for the available k headers and out of them the best one is selected. The clusters immediately following the newly found clusters are tested using sequential hypothesis testing until a fragmentation point is reached.

Smart Carver (ReAssembly) Sequential Hypothesis Testing. This is done by using the weight vector. i.e. the weights of all clusters in the pool. Two Hypothesis are tested. One that says the clusters belong in sequence to fragment Other says that they don’t. The ratio is used to test the hypothesis.

Conclusion Various File Carving methods for fragmented files are presented in the survey. Problem of finding best weight is still an open research issue.