A Data-Centric Approach to Insider Attack Detection in Database Systems Sunu Mathew Joint Work with: Michalis Petropoulos, Hung Q. Ngo, Shambhu Upadhyaya.

A Data-Centric Approach to Insider Attack Detection in Database Systems Sunu Mathew Joint Work with: Michalis Petropoulos, Hung Q. Ngo, Shambhu Upadhyaya Computer Science and Engineering, Univ. at Buffalo 13 th International Symposium on Recent Advances in Intrusion Detection (RAID 2010) Ottawa, Canada (September 2010)

Insider Threats against Database Systems  Insider Threats – Hard problem with vast scope  Operate within existing rules and policies  High potential for damage  Impersonation  Information Leak  Data Harvesting  Most critical threat  insider access to sensitive information  Hence, focus on effective solution to insider threats against Database Systems 2

Contributions  Modeling user queries from a data-centric viewpoint (as opposed to prior syntax centric)  Extracting feature vectors based on the results of query execution  Sampling strategies to address performance issues  Taxonomy for analyzing query anomalies from a data-centric viewpoint 3

Outline  Limitations of Syntax Centric Approaches  Data-Centric User Profiling  Taxonomy of Query Anomalies (Data-Centric)  Experimental Evaluation  Conclusions and Future work 4

Syntax Centric Modeling  Models queries using a characteristic or frequency vector based on parsing syntax elements ( e.g., [Kamra et al., VLDB J. 2008] ) SELECT p.product_name, p.product_id FROM PRODUCT p WHERE p.cost = 100 and p.weight > 80; SQL command – SELECT, Select Clause Relations – PRODUCT, Select Clause Attributes – product_name, product_id, Where Clause Relations – PRODUCT, Where Clause Attributes – cost, weight SQL command – SELECT, Select Clause Relations – PRODUCT, Select Clause Attributes – product_name, product_id, Where Clause Attributes - cost 5

Limitations of Syntax Centric Approach  Two similar looking queries can produce completely different results SELECT p.product_name, p.product_id FROM PRODUCT p WHERE p.cost > 100 and p.weight = 80;  False negative – similar syntax, different results  Two different looking queries can produce similar results SELECT p.product_name, p.product_id FROM PRODUCT p WHERE p.cost = 100 and p.weight > 80 AND p.product_name is not null;  False positive – different syntax, same results  Syntax Analysis is hard!!( [Calvanese et al. PODS 98] ) 6

Outline  Limitations of Syntax Centric Approaches  Data-Centric User Profiling  Taxonomy of Query Anomalies (Data-Centric)  Experimental Evaluation  Conclusions and Future work 7

Alternate viewpoint – Data Centric  When are two queries similar/different?  Model based on what data is accessed (i.e., results of query execution)  Visualize a database as a single relation (Universal Relation, [Ullman et al., TODS 84] )  Compute a statistical summary (S-Vector) of a query’s result tuples  Fixed size feature vector (depends on DB schema, independent of DB size and number of result tuples)  S-Vectors for past queries used to profile user 8

Statistics Vector For Sample Database 9 Database SchemaS-Vector Features RelationAttribute Product.type.ncount Product.type.ndistinct Product.cost.Min Product.cost.Max Product.cost.Mean Product.cost.StdDev Product.cost.Median Product Product.type (varchar) Product.cost (numeric)

Outline  Limitations of Syntax Centric Approaches  Data-Centric User Profiling  Data-Centric Analysis of Query Anomalies  Experimental Evaluation  Conclusions and Future work 10

Analyzing query anomalies -- 1  Based on schema, statistics of execution result  Type 1: Different Result Schema/Different Result Tuples (Different Rows and Columns) Query 1: SELECT p.cost FROM PRODUCT p WHERE p.type = ’abc’; Query 2: SELECT p.type FROM PRODUCT p WHERE p.cost < 1000;  Both syntax-centric/data-centric should be effective  Typically studied in the context of Role Based Access Control (RBAC) and Masquerade detection 11

Analyzing query anomalies -- 2  Type 2: Similar Result Schema/Different Result Tuples (Similar Columns, Different Rows) Base Query: SELECT * FROM PRODUCT p WHERE p.cost == 1000;  Type 2(a): Distinct Syntax SELECT * FROM PRODUCT p WHERE p.cost < 1000 AND p.type == `abc’;  Type 2(b): Similar Syntax SELECT * FROM PRODUCT p WHERE p.cost 1000;  Syntax-centric schemes “blind” to 2(b) 12

Analyzing query anomalies --3  Type 3: Similar Result Schema/Similar Result Tuples (Similar rows and columns ) Base Query: SELECT p.type FROM PRODUCT p WHERE p.cost < 1000;  Type 3(a): Different Syntax/Similar Semantics SELECT p.type FROM PRODUCT p WHERE p.cost < 1000 AND p.type IN (SELECT q.type FROM PRODUCT q);  Syntax-centric generates false positives  Type 3(b): Different Syntax/Different Semantics SELECT p.type FROM PRODUCT p WHERE true;  Attacker gains additional knowledge (detected by syntax-centric, undetected by data-centric)  3(b) is rare (attacker has to control result statistics) 13

Data-Centric Classification of Query Anomalies 14 Anomaly CasesTypesDetected by Syntax Centric? Detected by Data- Centric? Type 1 (Different Schema/Different Results Yes Type 2 (Similar Schema/Different Results) a) Distinct SyntaxYes b) Similar SyntaxNoYes Type 3 (Similar Schema/Similar Results) a) Different Syntax/ Similar Semantics False positiveYes (allowed) b) Different Syntax/ Different Semantics YesNo (Rare)

Outline  Limitations of Syntax Centric Approaches  Data-Centric User Profiling  Data-Centric Analysis of Query Anomalies  Experimental Evaluation  Conclusions and Future work 15

Experimental Evaluation  Goals  Type 1 and 2 (a) anomalies (Detected by both syntax-centric, data-centric):  Show that data-centric performs at least as well as syntax-centric  Type 2(b) (Undetected by syntax-centric, Detected by data-centric):  Show good performance for the data-centric approach  Type 3 is outside scope (addressed in other research)  Compare with syntax centric ( [Kamra et al. 2008] ) 16

Test Setup 17

Query Result Sampling -- Strategies  A potential drawback of data-centric – Requires query execution  Database server load may not be significant  However, some examination of query results is needed for online detection  Compromise – use only k tuples of the execution results to approximate the S-Vector  Top (Initial) – k  Random – k  Suitable for the pipelined query execution model for commercial RDBMS 18

Evaluating Type 1, 2(a) (Different Query Result Schema/Different Syntax)  Role based anomaly (masquerade) detection  Algorithms: Naïve Bayes, Decision Tree, SVM and Euclidean clustering  Syntax Centric formats – C (crude) quiplet, M (medium) quiplet, F (fine) quiplet ( [Kamra et al. VLDB J. 2008] )  Data-Centric formats (S-Vector, S-V Initial and S-V Random with 20, 10 and 5 result tuples) 19

Results – Type 1, 2(a) (Different Query Result Schema/Different Syntax) 20 RolesAlgorithm C. quip % M. quip F.qui p S-V (all) S-V I-20 S-V R-20 S-V I-10 S-V R-10 S-V I-5 S-V R-5 Chair Vs Faculty N-Bayes81.6785.337585 82.6778.337781.6790 Dec-Tree8887.67 96.3388.3 8988.67 SVM83.38187.6782.3374.677771.3375.676874.33 Clustering73.37265.69292.692.3394 92.693.33  Similar performance as expected, with high detection rates (close to 100%)  Clustering performs better with S Vector  Good results with SV Initial and SV Random

Evaluating Type 2(b) anomalies (Different Query Result Stats/ Similar Syntax)  Volume anomalies, Sensitive tuples  Supervised two-class training is not feasible  Manually created `attack set’ based on dataset  Base query: SELECT * FROM vApplicants WHERE reviewStatusID = `a’ AND reviewStatusID = `b’;  Variation: SELECT * FROM vApplicants WHERE reviewStatusID = `b’ OR reviewStatusID = `b’;  One-class Anomaly Detection  Cluster-based Outlier Detection  Attrib-Deviation 21

Evaluating Type 2(b) (Different Query Result Stats/ Similar Syntax) -- Schemes  Cluster based outlier detection  Queries can be considered as points in some space, construct role (e.g., Faculty) clusters  Approximate a cluster by a single point (e.g., centroid of cluster), use Euclidean distance & threshold for anomaly detection  Attrib-Deviation  Data anomalies  Anomalies in S-V dimensions  Perform anomaly detection on a per-dimension basis (e.g., 3*StdDev threshold), number of anomaly dimensions used to rank test queries 22

Results – Type 2(b) anomalies (Different Query Result Stats/ Similar Syntax) 23 AlgorithmSyntax-CentricData-Centric C qlet M qlet F qlet S-V all S-V I20 S-V R20 S-V I-10 S-V R10 S-V I-5 S-V R5 Cluster- Outlier Detection23.526.417.683.81267.76.445.16.435.4 False Positive 14.411.815.810.53.96.53.95.22.66.6 Attrib- Deviation Detection017.62.987 12.964.5 False Positive 04.8 22.6261523.815.823.820.4  Cluster based -- Data-centric better, Initial-k poor  Attrib-Deviation – Data-centric better, Initial-k effective!

Initial-k Sampling -- Performance  But why does Initial-k perform well with Attrib- Deviation?  Single anomalous attribute  many anomaly dimensions  Per dimension basis -- selective ordering affects anomaly attributes differently  Different statistical measurements may be affected differently (e.g., Max, Median)  Performance drops only when result sample size is too low (e.g., k = 5)  Positive practical implications of Initial-k performance 24

Outline  Limitations of Syntax Centric Approaches  Data-Centric User Profiling  Data-Centric Analysis of Query Anomalies  Experimental Evaluation  Conclusions and Future Work 25

Conclusions and Future Work  Queries other than Select-Project-Join  Union Queries  SELECT g.name, g.gpa FROM GRADS g  UNION  SELECT u.name, u.gpa FROM UGRADS u;  Computed Attributes (Avg, etc.)  GROUP-BY attributes  ORDER-BY may lead to false positives  Dynamic databases, correlated queries  Feasibility of randomized result set  Combining Syntax and Data Centric schemes 26

Questions? Thank You! 27

A Data-Centric Approach to Insider Attack Detection in Database Systems Sunu Mathew Joint Work with: Michalis Petropoulos, Hung Q. Ngo, Shambhu Upadhyaya.

Similar presentations

Presentation on theme: "A Data-Centric Approach to Insider Attack Detection in Database Systems Sunu Mathew Joint Work with: Michalis Petropoulos, Hung Q. Ngo, Shambhu Upadhyaya."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Data-Centric Approach to Insider Attack Detection in Database Systems Sunu Mathew Joint Work with: Michalis Petropoulos, Hung Q. Ngo, Shambhu Upadhyaya.

Similar presentations

Presentation on theme: "A Data-Centric Approach to Insider Attack Detection in Database Systems Sunu Mathew Joint Work with: Michalis Petropoulos, Hung Q. Ngo, Shambhu Upadhyaya."— Presentation transcript:

Similar presentations

About project

Feedback