Database Security And Audit
Databasics Data is stored in form of files Record : is a one related group of data (in a row) Schema : logical structure of database Subschema : a subset of the entire logical structure Relation : a n-value tuple Attribute : names of the variables in the n-value tuple Query : a command which generates a subschema Select, project, join etc
Advantages of Databases Shared access : one uniform logical view of data accessible to all users Minimal redundancy : to prevent users from collecting/storing redundant data Data consistency : change in one value of data is reflected throughout Data integrity : accidental or malicious modifications are detected Controlled access : only authorized users are given access to the data However, these benefits create conflict when security is imposed
Security Requirements of Databases Physical database integrity Recover from power failures, disk crashes etc Logical database integrity Use backups, restore points Special means to update records/ recover failed transactions Element Integrity Field checks (type, range, bound checks), change logs Auditability Need to check who has made changes Incremental access to protected data; through which data modifications can be tracked
Security Requirements of Databases… Access control Not all data need to be given to all users Access control may be needed upto a granularity of element level from schema, subschema, attribute levels Users may infer other field values based on the access they get Database access control needs to take size into consideration User authentication Availability
Reliability and Integrity Measures in Databases Problem : Failure of a system during data modification Solution : Two-phase update, intuitively, do temporary computations and update at a later stage Intent phase : prepare resources to make the update (many repetitions are ok) Commit phase : write a commit flag indicating the end of Intent phase. Start the update process. Repeat if failure occurs.
Redundancy/Internal Consistency Measures Error Correction/Detection Codes : compute over field values, records or over entire database. Use when deleting, retrieving or updating E.g., check sums, CRC codes Duplicate copies of records to recover from errors If original copies were detected to be corrupted
Concurrency Measures Two users may want to update a record at the same time leading to an inconsistent view of the record The read-modify cycle should be treated as an atomic operation Reading a record while it is being updated can be solved by locking reads until updates are finished
Structural Integrity Measures Range comparisons : ensure that the values entered are consistent with acceptable ranges E.g., day of a month cannot be more than 31 State constraints : system invariants that need to be satisfied throughout the database E.g., uniqueness conditions Transition constraints : describe conditions necessary to effect transition of a database E.g., Adding records needs to consider values of some other records, like reducing in-stock quantity might require that in-stock value is higher than that ordered
Sensitive Data & Disclosure Problems Types of Sensitive data Income, identity, description of missions Types of disclosure Exact data Range bound : knowing if the field value lies between known bounds Negative predicates : knowing if a record exists that does not satisfy some conditions Existence : knowing if a record exists in the first place Probability : knowing a record with a certain probability
Inference Problem Def: using non-sensitive data to infer sensitive data Inference techniques : direct & indirect Direct : get information using queries on sensitive fields Indirect : Uses statistics of data to infer individual value (data un-compression??) Sum, count, mean, median, trackers
Inference…sum holmesGreyAdamsWest Male Female Total
Inference…count holmesGreyAdamsWest Male1221 Female2011 Total323
Inference…tracker Tracking : using additional queries that produce small results E.g., Try to find number of white females in a particular dorm The following query may be rejected q=count((SEX=F) and (RACE=C) and (DORM=Holmes)) The result of the above is 1 and hence, DBMS rejects But not the following : count (SEX=F) : value is 6 count ((SEX=F) and ((RACE not C) and (DORM not Holmes))) : value is 5 Subtracting 6-5=1 gives us the desired values More generally, queries can be constructed as a set of linear equalities. Solving the equalities reveals unknown individual values
Controls for Inference Suppression Suppress low-frequency data items Query analysis Concealing Combining results as ranges for example Random data perturbation for statistical queries Much research has gone into inference databases and more is forthcoming. Moreover, database inference suffers from collusion which is a more serious problem
Multi-level Security Sensitivity of data is beyond “sensitive and non-sensitive”. There are several levels of sensitivity : Element level Record level Aggregate level Granularity-combination
Multi-level Security Measures Separation Partitioning : create multiple databases, each with their own sensitivity levels Encryption : encrypt records with a key unique to that sensitivity Problems such as chosen plain text, corruption of records, malicious updates exist Integrity lock and sensitivity locks Assign sensitivity levels to data items Encrypt the sensitivity levels Use cryptographic hashes to protect integrity
Multi-level Secure DB Design Integrity locks : use a trusted controller between DBMS and data to control access Data is either encrypted or perturbed Secure but inefficient Subject to Trojan attacks Trusted front end : Use existing DBMS with a trusted front end. Front end filters all the data user does not need to see Wastage of queries which result in large amounts of data
Multi-level Secure DB Design Commutative Filters : Reformats the query so that DBMS doesn’t retrieve too many records which are rejected by the trusted front end. Advantage is that some work is relegated to the DBMS (due to reformatting of a query into multiple other queries), keeping filter size small. Filtering can be done at : Record level Attribute level Element level Distributed databases : controls access to two or more DBMS with varying levels of sensitivity. Users’ queries are processed based on their access levels
Role-based Access Controls Different organizations give access to users based on the roles they perform Least-privileges : only those permissions required can be assigned to a role Separation : mutually exclusive roles can be invoked to achieve a task Data abstraction: a role can be defined in terms of more complex operations like edit, audit etc Difference between groups and roles Groups are collection of users Roles are collection of users and permissions
RBAC and DBMS RBAC seems natural for DBMS to adopt Several commercial products support RBAC MS Active directory, Oracle, Sybase etc Broad implementation features User role assignment Support for role relationships and constraints Assignable privileges (Database level, table etc) Role-hierarchies (using lattice model)