Detecting Web Attacks Using Multi-Stage Log Analysis

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

CS 290C: Formal Models for Web Software Lecture 1: Introduction Instructor: Tevfik Bultan.
Crawler-Based Search Engine By Ryan Caplet, Morris Wright and Bryan Chapman.
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
BUILDING A SECURE STANDARD LIBRARY Information Assurance Project I MN Tajuddin hj. Tappe Supervisor Mdm. Rasimah Che Mohd Yusoff ASP.NET TECHNOLOGY.
Varun Sharma Security Engineer | ACE Team | Microsoft Information Security
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
Tool name : Firebug A URL for more information about the tool, or where to buy or download it : Firebug is.
Analysis of SQL injection prevention using a proxy server By: David Rowe Supervisor: Barry Irwin.
DIRAC Web User Interface A.Casajus (Universitat de Barcelona) M.Sapunov (CPPM Marseille) On behalf of the LHCb DIRAC Team.
Meir Botner David Ben-David. Project Goal Build a messenger that allows a customer to communicate with a service provider for a fee.
 Prototype for Course on Web Security ETEC 550.  Huge topic covering both system/network architecture and programming techniques.  Identified lack.
Preventing SQL Injection Attacks in Stored Procedures Alex Hertz Chris Daiello CAP6135Dr. Cliff Zou University of Central Florida March 19, 2009.
Analysis of SQL injection prevention using a proxy server By: David Rowe Supervisor: Barry Irwin.
Software Project Documentation. Types of Project Documents  Project Charter  Requirements  Mockups and Prototypes  Test Cases  Architecture / Design.
Analysis of SQL injection prevention using a filtering proxy server By: David Rowe Supervisor: Barry Irwin.
Federated Database Set Up Greg Magsamen ITK478 SIA.
ABSTRACT The JDBC (Java Database Connectivity) API is the industry standard for database- independent connectivity between the Java programming language.
Implementation of a Relational Database as an Aid to Automatic Target Recognition Christopher C. Frost Computer Science Mentor: Steven Vanstone.
Marcelo R.N. Mendes. What is FINCoS? A set of tools for data generation, load submission, and performance measurement of CEP systems; Main Characteristics:
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
ANALYSIS PHASE OF BUSINESS SYSTEM DEVELOPMENT METHODOLOGY.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Fault – Tolerant Distributed Multimedia Streaming Web Application By Nirvan Sagar – Srishti Ganjoo – Syed Shahbaaz Safir
Advanced Higher Computing Science
A Generic Approach to Big Data Alarms Prioritization
Architecture Review 10/11/2004
Big data classification using neural network
Sentiment Analysis of Twitter Data(using HadoopMapreduce)
SOFTWARE TESTING Date: 29-Dec-2016 By: Ram Karthick.
Centralised logging using RSYSLog
Backdooring enemies with a Proxy …..
Data Platform and Analytics Foundational Training
Applying Deep Neural Network to Enhance EMPI Searching
Chapter 2: Database System Concepts and Architecture - Outline
How to Read a Song with Multiple Verses and Refrain
Chapter 2 Database System Concepts and Architecture
Web Development Web Servers.
Data Virtualization Demoette… ADO.NET Client
Honeypot in Mobile Network Security
Title of Training Presentation
Empirical advances in studying relational networks
Hybrid Cloud Architecture for Software-as-a-Service Provider to Achieve Higher Privacy and Decrease Securiity Concerns about Cloud Computing P. Reinhold.
QianZhu, Liang Chen and Gagan Agrawal
ELAC Meeting February 17, 2017.
Haritha Dasari Josue Balandrano Coronel -
BotCatch: A Behavior and Signature Correlated Bot Detection Approach
FORMAL SYSTEM DEVELOPMENT METHODOLOGIES
MID-SEM REVIEW.
CS6604 Project Ensemble Classification
Custom Activities in Azure Data Factory
Waikato Environment for Knowledge Analysis
WEKA.
Title of Training Presentation
Chapter 12: Automated data collection methods
CASAS Reports: Assess, Analyze and Adjust
What’s changed in the Shibboleth 1.2 Origin
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Lecture 1: Multi-tier Architecture Overview
Machine Learning with Weka
SQL SERVER TRANSACTION LOG INSIDE
Tutorial for WEKA Heejun Kim June 19, 2018.
iSRD Spam Review Detection with Imbalanced Data Distributions
Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta
ELAC Meeting October 15, 2015.
Lecture 10 – Introduction to Weka
Title of Training Presentation
Web Application Development Using PHP
Presentation transcript:

Detecting Web Attacks Using Multi-Stage Log Analysis Presented by Akhil Katpally Authors: Melody Moh , Santhosh Pininti, Sindhusha Doddapaneni, Teng-sheng Moh

Agenda Goal of the paper. Introduction. Background and Related Studies. System Design and Implementation. Experiments and Results. Conclusion How presentation will benefit audience: Adult learners are more interested in a subject if they know how or why it is important to them. Presenter’s level of expertise in the subject: Briefly state your credentials in this area, or explain why participants should listen to you.

Goal of the paper Authors have proposed a new Multi-Stage Log Analysis system, which combines both Pattern Matching and supervised Machine Learning methods. This system can effectively detect new SQL-injection attacks. Authors has successfully implemented a Proof-of-concept of the proposed system on Amazon AWS, using Kibana for Pattern Matching and Bayes Net for Machine Learning. Lesson descriptions should be brief.

Introduction With so much dependency on the web in our daily life, its security has become extremely important. One of the major security issues of web applications is SQL-injection. According to OWASP Top 10 Security issues, SQL-injection stands top. Even the emerging cloud technology is accessed through web interfaces, its security is a top priority for Internet-and-Cloud-Services providers(ISP/CSP). Example objectives At the end of this lesson, you will be able to: Save files to the team Web server. Move files to different locations on the team Web server. Share files on the team Web server.

Introduction….Continued Real Time Log Analysis is one major procedure to detect and prevent an SQL-injection attacks. It uses Pattern Matching and Machine Learning techniques. With Pattern Matching, only know injection patterns are recognized and patterns with small changes are recognized. Existing Log Analysis methods for SQL injection detection are based on either Pattern Matching or Machine Learning. Proposed system uses both Pattern Matching and Machine Learning. Example objectives At the end of this lesson, you will be able to: Save files to the team Web server. Move files to different locations on the team Web server. Share files on the team Web server.

Contributions Proposed a multi-stage architecture for detecting SQL injection attacks. Implemented a prototype based on proposed architecture, using Bayes Net and Kibana. Compare the Pro and Cons of Pattern Matching (Kibana) and Machine Learning (Bayes Net) Evaluated the 2-stage system through a series of experiments.

Background and Related Studies SQL injection: Attacker inputs an SQL query, which modifies or damages the database that is connected to the target web application. Order Wise, Blind and Against Database. Log Analysis(Log4j): understanding logs and extracting useful information. Pattern Matching: checks whether a set of words is present in the given text.

Background and Related Studies….Continued Logstash: Data pipelining tools which connects to a variety of sources and receives different types of logs (system, web server, error and application logs). ElasticSearch: Search and data analysis software which gives deep insight on streaming data. Uses apache Lucene. Kibana: Data visualization interface for real-time summarizing and charting of stream data.

Background and Related Studies….Continued Machine Learning: Way of making a computer learn and take action without explicitly programming. Naïve Bayes Classification: Simple probabilistic classifier, builds upon the Bayes theorem, which gives the probability of an event occurring based on the given conditions that are related to the event. Bayes Networks are often used to tackle the independent-attributes assumption of Native Bayes Classification and is helpful and improves performance.

System Design and Implementation Web Application Logic Web Application Log Files Single-Stage Architecture Application Logs are generated using log4j library Either Machine Learning method (Bayes Net)or Pattern Matching method (ELK system)is used for SQL injection detection. log4j Web application users Logstash Preprocessing for WEKA Elasticsearch Bayes Net Model Analyst Kibana Bayes Net Rank

System Design and Implementation….Continued Web Application Logic Multi-Stage Architecture Proposed method combines both machine learning and pattern matching. WEKA is a Machine Learning tool used, initially a model is trained, with the training data. Model generated is tested using 5-fold cross validation and has an accuracy of 78.8% for Bayes Net model. Web Application Log Files log4j Web application users Preprocessing for Kibana Preprocessing for WEKA Elasticsearch Bayes Net Model Analyst Kibana Bayes Net Rank

System Design and Implementation….Continued Log Generation: We can either use parsers and filter to filter out the unnecessary information in the logs, or use logging libraries(log4j) to create custom logs. Preprocessing for WEKA (Single stage) : attributes in test set should match in training set. Preprocessing for WEKA(multi stage) : logs not detected by kibana are input to WEKA. Unix script to convert CSV to ARFF file for WEKA input. Preprocessing for Kibana(multi stage): output of WEKA is used as input of kibana. Unix script to convert ARFF file to text file.

Experiment Setup Dataset: web application logs generated using the Log4j framework. Web Application: Web application developed using Java, Bootstrap, HTML, CSS, JavaScript and MySQL. It is hosted on Amazon AWS Linux instance. Kibana and Bayes Net methods are used. Data Total Logs SQL Logs Regular Logs Training Set 2000 547 1453 Testing Set 10000 2812 7188

Kibana vs Bayes Net Kibana Bayes Net Purpose Mechanism Overhead Used for Detecting SQL injections and visualizing data. Used for classification of logs into SQL-injection and other logs. Mechanism Use Pattern Matching techniques for detection. Use Supervised machine learning to learn and detect attacks Overhead No file conversion is required. It takes directly from text file. Load only ARFF files, so log files need to convert into ARFF. No preprocessing is required. Filters can be used to extract required data. Preprocessing is required. Before passed to model for classification needs to be preprocessed. No training is required. Queries are written for detection. Training is required, which involves manual classification. Pros and Cons A real-time system where new queries may be issued. Not a real-time system as it involves offline training. Can detect only specified patterns, cannot detect new types of SQL-injection. Can detect new patterns, since it considers attributes like IP address while classifying. Results are in visualized form. Easy to analyze. Results in text form. Difficult to analyze

Experiment Results Method Accuracy for SQL Detection (%) Machine Learning: Naïve Bayes 61.7 Machine Learning: Bayes Net 80.0 Pattern Matching: Kibana 85.3 Kibana followed by Bayes Net 94.7 Bayes Net followed by Kibana 95.4

Conclusion A multi-stage log analysis architecture has been proposed, which uses both machine learning and pattern recognition. Experiment results proves two-stage architecture is more accurate and also particularly when Bayes Net model precedes Kibana. Kibana can also provide final output with visualization. Further improvements can be done on Kibana queries and also unsupervised machine learning methods can be used which may lead to real-time log analysis.