HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010.

Slides:



Advertisements
Similar presentations
Automatic Misconfiguration Troubleshooting with PeerPressure Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, Yi-Min Wang Microsoft Research Presenter:
Advertisements

On the Privacy of Private Browsing Kiavash Satvat, Matt Forshaw, Feng Hao, Ehsan Toreini Newcastle University DPM’13.
DSPIN: Detecting Automatically Spun Content on the Web Qing Zhang, David Y. Wang, Geoffrey M. Voelker University of California, San Diego 1.
Application Identification in information-poor environments Charalampos Rotsos 02/02/20101 What is application identification Current status My work Future.
Automatic Discovery and Classification of search interface to the Hidden Web Dean Lee and Richard Sia Dec 2 nd 2003.
Privacy and Integrity Preserving in Distributed Systems Presented for Ph.D. Qualifying Examination Fei Chen Michigan State University August 25 th, 2009.
Leveraging User Interactions for In-Depth Testing of Web Applications Sean McAllister, Engin Kirda, and Christopher Kruegel RAID ’08 1 Seoyeon Kang November.
Ragib Hasan University of Alabama at Birmingham CS 491/691/791 Fall 2013 Lecture 3 09/03/2013 Security and Privacy in Cloud Computing.
Leveraging User Interactions for In-Depth Testing of Web Application Sean McAllister Secure System Lab, Technical University Vienna, Austria Engin Kirda.
Presenter Deddie Tjahjono.  Introduction  Website Application Layer  Why Web Application Security  Web Apps Security Scanner  About  Feature  How.
Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.
Deep-Web Crawling “Enlightening the dark side of the web”
Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)
Web Usage Mining Sara Vahid. Agenda Introduction Web Usage Mining Procedure Preprocessing Stage Pattern Discovery Stage Data Mining Approaches Sample.
1 CS 178H Introduction to Computer Science Research What is CS Research?
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Where Are the Nuggets in System Audit Data? Wenke Lee College of Computing Georgia Institute of Technology.
Processing and Analyzing Large log from Search Engine Meng Dou 13/9/2012.
John P., Fang Yu, Yinglian Xie, Martin Abadi, Arvind Krishnamurthy University of California, Santa Cruz USENIX SECURITY SYMPOSIUM, August, 2010 John P.,
Improving Intrusion Detection System Taminee Shinasharkey CS689 11/2/00.
Preventing SQL Injection Attacks in Stored Procedures Alex Hertz Chris Daiello CAP6135Dr. Cliff Zou University of Central Florida March 19, 2009.
ARO–MURI Thoughts on Visualization for Cyber Situation Awareness MURI Meeting July 8–9, 2015 Christopher G. Healey Lihua Hao Steve E. Hutchinson CS Department,
Lecture slides prepared for “Computer Security: Principles and Practice”, 3/e, by William Stallings and Lawrie Brown, Chapter 5 “Database and Cloud Security”.
A Crawler-based Study of Spyware on the Web Authors: Alexander Moshchuk, Tanya Bragin, Steven D.Gribble, and Henry M. Levy University of Washington 13.
Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi Department of Computer Science UC Santa Barbara DBSec 2010.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
1 Detecting Malicious Flux Service Networks through Passive Analysis of Recursive DNS Traces Speaker: Jun-Yi Zheng 2010/03/29.
Accessing the Deep Web Bin He IBM Almaden Research Center in San Jose, CA Mitesh Patel Microsoft Corporation Zhen Zhang computer science at the University.
Master Thesis Defense Jan Fiedler 04/17/98
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Understanding Cross-site Linking in Online Social Networks Yang Chen 1, Chenfan Zhuang 2, Qiang Cao 1, Pan Hui 3 1 Duke University 2 Tsinghua University.
nd Joint Workshop between Security Research Labs in JAPAN and KOREA Profile-based Web Application Security System Kyungtae Kim High Performance.
©2010 John Wiley and Sons Chapter 12 Research Methods in Human-Computer Interaction Chapter 12- Automated Data Collection.
Carnegie Mellon Selected Topics in Automated Diversity Stephanie Forrest University of New Mexico Mike Reiter Dawn Song Carnegie Mellon University.
Privacy Preservation of Aggregates in Hidden Databases: Why and How? Arjun Dasgupta, Nan Zhang, Gautam Das, Surajit Chaudhuri Presented by PENG Yu.
Attacking Data Stores Brad Stancel CSCE 813 Presentation 11/12/2012.
Data Mining By Dave Maung.
A System for Denial-of- Service Attack Detection Based on Multivariate Correlation Analysis.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
SQL INJECTIONS Presented By: Eloy Viteri. What is SQL Injection An SQL injection attack is executed when a web page allows users to enter text into a.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Johnson Lab Database Senior Design Project Management II Spring 06 Mark Nelson.
WebFOCUS Magnify: Search Based Applications Dr. Rado Kotorov Technical Director of Strategic Product Management.
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma
Web Security Lesson Summary ●Overview of Web and security vulnerabilities ●Cross Site Scripting ●Cross Site Request Forgery ●SQL Injection.
Post-Ranking query suggestion by diversifying search Chao Wang.
Data Leakage Detection by R.Kartheek Reddy 09C31D5807 (M.Tech CSE)
Presented By Amarjit Datta
WebWatcher A Lightweight Tool for Analyzing Web Server Logs Hervé DEBAR IBM Zurich Research Laboratory Global Security Analysis Laboratory
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Kali Linux BY BLAZE STERLING. Roadmap  What is Kali Linux  Installing Kali Linux  Included Tools  In depth included tools  Conclusion.
Penetration Testing By Blaze Sterling. Roadmap What is Penetration Testing How is it done? Penetration Testing Tools Kali Linux In depth included tools.
By Collin Donaldson. Hacking is only legal under the following circumstances: 1.You hack (penetration test) a device/network you own. 2.You gain explicit,
Heat-seeking Honeypots: Design and Experience John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy and Martin Abadi WWW 2011 Presented by Elias P.
Differential Analysis on Deep Web Data Sources Tantan Liu, Fan Wang, Jiedan Zhu, Gagan Agrawal December.
 Abstract  Introduction  Literature Survey  Conclusion on Literature Survey  Threat model and system architecture  Proposed Work  Attack Scenarios.
Database and Cloud Security
CSCE 548 Student Presentation Ryan Labrador
Web Application Vulnerabilities, Detection Mechanisms, and Defenses
Flavio Toffalini, Ivan Homoliak, Athul Harilal,
Chapter 12: Automated data collection methods
Differential Privacy in Practice
Defense in Depth Web Server Custom HTTP Handler Input Validation
Bolun Wang*, Yuanshun Yao, Bimal Viswanath§ Haitao Zheng, Ben Y. Zhao
Identifying Slow HTTP DoS/DDoS Attacks against Web Servers DEPARTMENT ANDDepartment of Computer Science & Information SPECIALIZATIONTechnology, University.
Data Warehousing Data Mining Privacy
Threats to Privacy in the Forensic Analysis of Database Systems
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Presentation transcript:

HENGHA: DATA HARVESTING DETECTION ON HIDDEN DATABASES Shiyuan Wang, Divyakant Agrawal, Amr El Abbadi University of California, Santa Barbara CCSW 2010

Data Security Concern: Back-End Databases of Web-based Applications Form-based query interfaces provide entrance to both users and attackers. Traditional Attacks Submit malicious requests to break in the hidden database through vulnerable holes in the application, e.g. SQL injection [Vale05]. Many can be detected by prior work. 10/8/2010 2

Data Security Concern: Back-End Databases of Web-based Applications Data Harvesting Attacks Iteratively submit legitimate queries to extract data inventory or infer sensitive aggregate information. E.g 1. A competitor of a car rental company A harvested A’s inventory about a popular car. E.g 2. Terrorists inferred that a flight was relatively empty and could be a hijacking target. 10/8/2010 3

Anatomy of Data Harvesting Attacks General strategy Iteratively submit legitimate queries with valid fields, analyze the results and then design new queries with the goal of maximizing information gain through limited #queries. Two types of harvesting attacks to consider Crawling Attack Performed by deep web crawling [Madh08] Sampling Attack Performed by uniform random sampling on results of sizes no more than K [Dasg09] 10/8/2010 4

How To Defend Against Data Harvesting Attacks Database inference control [Denn83]? Query set restriction is not effective, especially on sampling attacks. Query set restriction and data perturbation [Dasg09] hurt usability. Web robot detection [Tan02]? Data harvesters can camouflage normal users’ http traffic patterns. 10/8/2010 5

Our Approach Detection based on search behaviors within sessions Attackers’ search behaviors Diversity Queries are not concentrated and localized, and they reflect very distinct intents Broadness The results of the queries cover a broad scope of the underlying data. 10/8/2010 6

HengHa: Detecting Data Harvesting Attacks at Single Session Level Identify data harvesting attackers by examining if their search behaviors in a session show relatively significant diversity and broadness. Diversity -> query correlation Broadness -> result coverage 10/8/ Heng: query correlation observer Ha: result coverage monitor HengHa DETECTORDETECTOR Web Application DB query result suspicious

Queries in a Session That Plans Trip to Chicago Heng: Query Correlation Observer Key idea Frequent predicate value sets as indications of correlations among queries Intuitively, if a session has more frequent predicate value sets with higher supports, and those predicate value sets are more similar to the queries, the queries in this session are more correlated. 10/8/2010 8

Ha: Result Coverage Monitor Key idea Sort multi-attribute data D in a total order, e.g. z-curve, that preserves locality. Create a coverage bit vector (CBV), where the bits correspond to the data in the total order. Access a data -> set a bit Training Cluster CBVs to model different data access patterns x y /8/

Experiment Extracted 98,564 real user query sessions and a data table of 387 records from KDD Cup 2000 clickstream dataset Synthesized 1000 attack sessions [Madh08, Dasg09] Run on a server with Intel 2.4GHz CPU, 3GB RAM and FC 8 OS Performed four folds cross-validation 10/8/ Effectiveness of Detection in Four ValidationsEfficiency of Detection in Four Validation

Conclusion & Future Work Identified non-traditional data harvesting attacks on the back-end databases of web-based applications, i.e. crawling attack and sampling attack. Detection based on identifying attackers’ special search behaviors at single session level, diversity->query correlation observer, broadness->result coverage monitor. Detecting cross-session data harvesting attacks will be considered in the future work. 10/8/

References [Vale05] F. Valeur et al. A learning-based approach to the detection of sql attacks. In DIMVA, pages 123–140, [Dasg09] A. Dasgupta et al. Privacy preservation of aggregates in hidden databases: why and how? In SIGMOD, pages 153–164, [Madh08] J. Madhavan et al. Google’s deep web crawl. PVLDB, 1(2):1241–1252, [Tan02] P.-N. Tan et al. Discovery of web robot sessions based on their navigational patterns. Data Min. Knowl. Discov., 6(1):9–35, [Denn83] D. E. Denning et al. Inference controls for statistical databases. Computer, 16(7):69–82, /8/

Thanks for Listening 10/8/