Presentation is loading. Please wait.

Presentation is loading. Please wait.

HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010.

Similar presentations


Presentation on theme: "HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010."— Presentation transcript:

1 HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010

2 Data Security Concern: Back-End Databases of Web-based Applications Form-based query interfaces provide entrance to both users and attackers. Traditional Attacks Submit malicious requests to break in the hidden database through vulnerable holes in the application, e.g. SQL injection [Vale05]. Many can be detected by prior work. 10/8/2010 2

3 Data Security Concern: Back-End Databases of Web-based Applications Data Harvesting Attacks Iteratively submit legitimate queries to extract data inventory or infer sensitive aggregate information. E.g 1. A competitor of a car rental company A harvested A’s inventory about a popular car. E.g 2. Terrorists inferred that a flight was relatively empty and could be a hijacking target. 10/8/2010 3

4 Anatomy of Data Harvesting Attacks General strategy Iteratively submit legitimate queries with valid fields, analyze the results and then design new queries with the goal of maximizing information gain through limited #queries. Two types of harvesting attacks to consider Crawling Attack Performed by deep web crawling [Madh08] Sampling Attack Performed by uniform random sampling on results of sizes no more than K [Dasg09] 10/8/2010 4

5 How To Defend Against Data Harvesting Attacks Database inference control [Denn83]? Query set restriction is not effective, especially on sampling attacks. Query set restriction and data perturbation [Dasg09] hurt usability. Web robot detection [Tan02]? Data harvesters can camouflage normal users’ http traffic patterns. 10/8/2010 5

6 Our Approach Detection based on search behaviors within sessions Attackers’ search behaviors Diversity Queries are not concentrated and localized, and they reflect very distinct intents Broadness The results of the queries cover a broad scope of the underlying data. 10/8/2010 6

7 HengHa: Detecting Data Harvesting Attacks at Single Session Level Identify data harvesting attackers by examining if their search behaviors in a session show relatively significant diversity and broadness. Diversity -> query correlation Broadness -> result coverage 10/8/2010 7 Heng: query correlation observer Ha: result coverage monitor HengHa DETECTORDETECTOR Web Application DB query result suspicious

8 Queries in a Session That Plans Trip to Chicago Heng: Query Correlation Observer Key idea Frequent predicate value sets as indications of correlations among queries Intuitively, if a session has more frequent predicate value sets with higher supports, and those predicate value sets are more similar to the queries, the queries in this session are more correlated. 10/8/2010 8

9 Ha: Result Coverage Monitor Key idea Sort multi-attribute data D in a total order, e.g. z-curve, that preserves locality. Create a coverage bit vector (CBV), where the bits correspond to the data in the total order. Access a data -> set a bit Training Cluster CBVs to model different data access patterns x y 0 1 2 3 10/8/2010 9 1110 1100 0100 0000

10 Experiment Extracted 98,564 real user query sessions and a data table of 387 records from KDD Cup 2000 clickstream dataset Synthesized 1000 attack sessions [Madh08, Dasg09] Run on a server with Intel 2.4GHz CPU, 3GB RAM and FC 8 OS Performed four folds cross-validation 10/8/2010 10 Effectiveness of Detection in Four ValidationsEfficiency of Detection in Four Validation

11 Conclusion & Future Work Identified non-traditional data harvesting attacks on the back-end databases of web-based applications, i.e. crawling attack and sampling attack. Detection based on identifying attackers’ special search behaviors at single session level, diversity->query correlation observer, broadness->result coverage monitor. Detecting cross-session data harvesting attacks will be considered in the future work. 10/8/2010 11

12 References [Vale05] F. Valeur et al. A learning-based approach to the detection of sql attacks. In DIMVA, pages 123–140, 2005. [Dasg09] A. Dasgupta et al. Privacy preservation of aggregates in hidden databases: why and how? In SIGMOD, pages 153–164, 2009. [Madh08] J. Madhavan et al. Google’s deep web crawl. PVLDB, 1(2):1241–1252, 2008. [Tan02] P.-N. Tan et al. Discovery of web robot sessions based on their navigational patterns. Data Min. Knowl. Discov., 6(1):9–35, 2002. [Denn83] D. E. Denning et al. Inference controls for statistical databases. Computer, 16(7):69–82, 1983. 10/8/2010 12

13 Thanks for Listening 10/8/2010 13


Download ppt "HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010."

Similar presentations


Ads by Google