Forensic Analysis of Database Tampering Kyriacos Pavlou and Richard T. Snodgrass Computer Science Department The University of Arizona
Introduction The problem : How to systematically perform forensic analysis on a compromised database. Recent federal laws (HIPAA, Sarbanes-Oxley Act etc.) and incidents of corporate collusion mandate audit log security. Snodgrass et al. [VLDB04] showed how to detect database tampering. Approach: Hash using a cryptographically strong hash function, notarize data manipulated by transactions and periodically validate. Forensic analysis to ascertain: When the intrusion transpired What data was altered Who the intruder is Why has this transpired
Outline Tamper Detection Forensic Analysis Forensic Algorithms The corruption diagram Types of corruption events Forensic Algorithms Three algorithms Forensic strength Future Work
Tamper Detection Several related ideas that allow tamper detection: DBMS can maintain audit log in background Transaction-time table Append-only Data modified can be cryptographically hashed to produce a secure one-way hash of transaction. Notarize hash value with external notarization service. The hash value cannot change. Implementation optimizations: opportunistic hashing transaction ordering list linked hashing The latest hash value is a hash of all the changes made to the database since database creation.
Tamper Detection Two phases: Normal Processing Validation transactions Two phases: Normal Processing Validation The validation result is a single bit. hash value transactions transactions hashing + notary ID hash value transactions hashing + notary ID rehash hash value notary ID + result
Definitions Corruption Event (CE): any event that corrupts data and compromises database (intrusion, human intervention, bug) Corruption time (tc): actual time instant at which a CE occurred. Validation Event (VE): validation of the audit log by the Notarization Service (NS). Time of VE (tv): time instant at which a VE occurred. Validation Failure vs. Validation Success: NS’s answer to a query for a particular hash value. Denotes tampering or lack thereof respectively. Notarization Event (NE): the notarization of a document by the NS. Time of NE (tn): time instant at which a NE occurred.
Definitions (cntd) Forensic analysis involves the following: Temporal detection: determination of tc Spatial detection: determination of “where,” i.e., the location in the database of the data affected in a CE. This data is termed the corruption locus data (lc). In fact, try to ascertain locus time (tl), the time instant lc was originally stored (transaction commit time). Note that a CE can have many lc’s, termed multi-locus, or a only one lc termed single-locus CE.
The Corruption Diagram When = TRUE VE1 NE3 NE: Notarization Event VE: Validation Event link NE2 link NE1 NE0 Where
The Corruption Diagram When Actual time VE2 VE2 = TRUE IV validation interval NE6 NE: Notarization Event CE . clock time IN notarization interval NE5 VE: Validation Event NE4 CE: Corruption Event = TRUE VE1 NE3 link NE2 link NE1 Commit time NE0 commit time Where
Forensic Analysis If a corruption is detected, the forensic analyzer springs into action. The analyzer tries to ascertain a corruption region: the bounds on the uncertainty of the “where” and “when” of the corruption.
Notarization and Validation Intervals Non-aligned validation just delays detection of tampering. Validation factor IV = V·IN
Analyzing Timestamp Corruption So far considered data-only CEs. We now examine the case where the timestamps of the tuples are changed. Data-only Backdating Postdating Retroactive Introactive ×
Monochromatic Algorithm When Forensic analysis begins T F F F F VE2 = FALSE NE6 CE . time of corruption (tc) NE5 NE4 VE1 = TRUE NE3 Corruption Region: captures the uncertainty as to the position of CE NE2 NE1 tl: place of corruption (commit time) NE0 Where
Monochromatic Algorithm Central insight: data can be rehashed by validator and checked. Corruption region bounds: IV IN Area is solely dependent on the two intervals. Cannot handle CEs involving timestamp corruption. ×
The RGB Forensic Algorithm When B G T T F F F F F F F VE4 = FALSE NE8 Forensic analysis begins CE . Postdating CE IV = 4 days IN = 2 days tc NE7 T Notarization of Red R VE3 = TRUE NE6 NE5 B G T Notarization of Blue & Green VE2 = TRUE tp tp: postdating time NE4 NE3 Notarization of Red R VE1 = TRUE NE2 NE1 x x NE0 tl Where
The RGB Forensic Algorithm Introduction of RGB partial hash chains: Allows the bounding of both tl and tp Incurs extra NS cost Each of two corruption regions bounds: IV IN We would like to reduce the area of the corruption regions. ×
The Polychromatic Algorithm B G F When F T T F F F F F F VE4 = FALSE NE8 Forensic analysis begins CE . IV = 4 days IN = 2 days Desired = 1 day tc NE7 T Notarization of 2 Reds R VE3 = TRUE NE6 NE5 Backdating CE T B G F F Notarization of 2 Blues & 1 Green VE2 = TRUE NE4 Uncertainty can be arbitrarily shrunk via a logarithmic number of red and blue hash chains. NE3 Notarization of 2 Reds R VE1 = TRUE NE2 tb: backdating time NE1 x x NE0 tb tl
The Polychromatic Algorithm Introduction of extra partial hash chains: Reduces uncertainty of corruption region Incurs additional NS cost Uncertainty can be arbitrarily shrunk via a logarithmic number of red and blue hash chains. Hence, the width is no longer dependent on IV and IN .
Forensic Strength Components: Inverse Forensic Strength: Work of forensic analysis Region-area of CE Width of postdating / backdating uncertainty Inverse Forensic Strength: IFS( D , IN ,V ) = ( NumNotarizes( D , IN ,V ) + ForensicAnalysis( D , IN ,V ) ) · RegionArea( IN ,V ) · UncertaintyWidth( D , IN ) where V = IV / IN is the validation factor and D is the number of days before first validation failure. Monochromatic: O( V · D2 · IN ) RGB: O( V · D · IN2 ) We assume that D >> IN . Polychromatic: O( ( V + lg IN ) · D )
Future Work Develop a stronger lower bound for this problem. Accommodate multi-locus and complex CEs. Differentiate postdating and backdating CEs. Implement forensic analysis in validator. Consider interaction between transaction-time storage manager and underlying WORM storage.
Summary We have presented a means of performing forensic analysis. We have introduced a graphical representation to visualize CEs, termed the corruption diagram. We have designed three forensic algorithms. Monochromatic RGB Polychromatic
Acknowledgements NSF grants IIS-0100436, IIS-0415101 and EIA-0080123 and a grant from Microsoft provided partial support for this work.