Smarter Searching for a Network Packet Database William (Bill) Kenworthy School of Information Technology Murdoch University Perth, Western Australia.

Slides:



Advertisements
Similar presentations
Towards Data Mining Without Information on Knowledge Structure
Advertisements

Chungnam National University DataBase System Lab
Software Requirements
Computer Networks TCP/IP Protocol Suite.
Art Foundations Exam 1.What are the Elements of Art? List & write a COMPLETE definition; you may supplement your written definition with Illustrations.
Chapter 6 Writing a Program
Building a Knowledge Management System as a Life Cycle
Growing Every Child! The following slides are examples of questions your child will use in the classroom throughout the year. The questions progress from.
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Network Security Highlights Nick Feamster Georgia Tech.
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
Credit hours: 4 Contact hours: 50 (30 Theory, 20 Lab) Prerequisite: TB143 Introduction to Personal Computers.
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
Cyber Defence Data Exchange and Collaboration Infrastructure (CDXI)
0 - 0.
Addition Facts
Limitations of the relational model 1. 2 Overview application areas for which the relational model is inadequate - reasons drawbacks of relational DBMSs.
1 9 Moving to Design Lecture Analysis Objectives to Design Objectives Figure 9-2.
1 Data Link Protocols By Erik Reeber. 2 Goals Use SPIN to model-check successively more complex protocols Using the protocols in Tannenbaums 3 rd Edition.
Configuration management
Session # 2 SWE 211 – Introduction to Software Engineering Lect. Amanullah Quadri 2. Fact Finding & Techniques.
Scalable Parallel Intrusion Detection Fahad Zafar Advising Faculty: Dr. John Dorband and Dr. Yaacov Yeesha 1 University of Maryland Baltimore County.
INTRODUCTION TO SIMULATION WITH OMNET++ José Daniel García Sánchez ARCOS Group – University Carlos III of Madrid.
Proposal by CA Technologies, IBM, SAP, Vnomic
1 Analysis of Random Mobility Models with PDE's Michele Garetto Emilio Leonardi Politecnico di Torino Italy MobiHoc Firenze.
Copyright  2003 Dan Gajski and Lukai Cai 1 Transaction Level Modeling: An Overview Daniel Gajski Lukai Cai Center for Embedded Computer Systems University.
Executional Architecture
Online Rubric Assessment Tool for Marine Engineering Course
The Research Proposal Martha J. Bianco, Ph.D.. Purpose  Why have a proposal?
1 EUROPEAN TOPIC CENTRE ON WATER WATERBASE Rivers Content and structure of Waterbase Rivers Update procedure Products based on Waterbase Rivers Future.
Addition 1’s to 20.
Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.
Week 1.
Chapter 12 Analyzing Semistructured Decision Support Systems Systems Analysis and Design Kendall and Kendall Fifth Edition.
Chapter 11 Component-Level Design
Learning Outcomes Participants will be able to analyze assessments
A Survey of Runtime Verification Jonathan Amir 2004.
Data Models There are 3 parts to a GIS: GUI Tools
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Multilingual Text Retrieval Applications of Multilingual Text Retrieval W. Bruce Croft, John Broglio and Hideo Fujii Computer Science Department University.
The Statistics of Fingerprints A Matching Algorithm to be used in an Investigation into the Reliability of the Use of Fingerprints for Identification Bob.
Jeff Shen, Morgan Kearse, Jeff Shi, Yang Ding, & Owen Astrachan Genome Revolution Focus 2007, Duke University, Durham, North Carolina Introduction.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Improved TCAM-based Pre-Filtering for Network Intrusion Detection Systems Department of Computer Science and Information Engineering National Cheng Kung.
Learning Classifier Systems to Intrusion Detection Monu Bambroo 12/01/03.
Building Survivable Systems based on Intrusion Detection and Damage Containment Paper by: T. Bowen Presented by: Tiyseer Al Homaiyd 1.
Towards Extending the Antivirus Capability to Scan Network Traffic Mohammed I. Al-Saleh Jordan University of Science and Technology.
OOSE 01/17 Institute of Computer Science and Information Engineering, National Cheng Kung University Member:Q 薛弘志 P 蔡文豪 F 周詩御.
Chirag N. Modi and Prof. Dhiren R. Patel NIT Surat, India Ph. D Colloquium, CSI-2011 Signature Apriori based Network.
Motif Discovery in Protein Sequences using Messy De Bruijn Graph Mehmet Dalkilic and Rupali Patwardhan.
Defining Digital Forensic Examination & Analysis Tools Brian Carrier.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
Multiuser Detection (MUD) Combined with array signal processing in current wireless communication environments Wed. 박사 3학기 구 정 회.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Chapter 5: Implementing Intrusion Prevention
Copyright OpenHelix. No use or reproduction without express written consent1.
TASHKENT UNIVERSITY OF INFORMATION TECHNOLOGIES Lesson №18 Telecommunication software design for analyzing and control packets on the networks by using.
Spamming Botnets: Signatures and Characteristics Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, and Ivan Osipkov. SIGCOMM, Presented.
Overview of Bioinformatics 1 Module Denis Manley..
TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.
Intrusion Detection Systems Paper written detailing importance of audit data in detecting misuse + user behavior 1984-SRI int’l develop method of.
Protocol Layering Chapter 11.
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Data mining in web applications
POLYGRAPH: Automatically Generating Signatures for Polymorphic Worms
CHAPTER 3 Architectures for Distributed Systems
Presentation transcript:

Smarter Searching for a Network Packet Database William (Bill) Kenworthy School of Information Technology Murdoch University Perth, Western Australia

Content This is a presentation is about an alternative way to search and/or classify data travelling over a network I will be describing the background, methodology and results of the research This research covers two seemingly disparate disciplines so as the conference has a communication focus there is some background on bioinformatics to set the scene This presentation is (almost) maths free! if you want the maths, see the paper :) 2

Motivation Test and validate alternate ways to view network data Better visualise the intrinsic relationships between packets of data based on structure rather than content As part of Investigating the possibilities inherent in mining data via structure based methods Searching using statistical ranking of possible answers 3

About Searching for information in high speed network traffic is difficult - but is basically a "solved" problem! What is still a problem though is searching for partial, obfuscated or spatially separated (in the data stream) search terms The work described here is a successful attempt to use characteristics more commonly associated with biological systems to identify areas of interest in a network data stream 4

Searching Traditional database search results are the result of exact (yes/no) matching based on some regular expression system e.g., [Bb]ank* Traditional database search results are the result of exact (yes/no) matching based on some regular expression system e.g., [Bb]ank* Instead, the algorithms I am recommending match on the low level structure of a sequence of characters character value and position/relationship in the stream character/term substitution results are ranked according to identity score and include false error rate data 5

Problem: dealing with raw bits on a network 6

What do we mean by "bioinformatics" algorithms There are useful parallels between the way data is structured in a stream of network data and a biological genome Target the structures within a data stream for searching Very sophisticated, statistically valid search algorithms were developed for use in searching biological data Results can be statistically correlated and ranked 7

What is constant? - Structure! The property of the algorithms developed for bioinformatics that we are using primarily targets structure: IP numbers will change in the header of an IP packet. BUT the position and placement of other tokens near the IP number does not (fixed size fields) This property extends to data fields Example: DNS data packets will have a similar signature with slight differences depending on the mutable data 8

Structure 00 => A 01 => C 10 => G 11 => T 9

Example plot of relationships 10

Methodology The software used was standard bioinformatics software with the input data modified to suit Most bioinformatics software has been implemented by large teams over many years – it is not practical for an individual to re-implement it for a different purpose :( The software used was standard bioinformatics software with the input data modified to suit Most bioinformatics software has been implemented by large teams over many years – it is not practical for an individual to re-implement it for a different purpose :( Solution - translate packetised network data into bioinformatics compatible data files via mapping ones and zeros to the DNA alphabet – basic data abstraction 11

What? What we are proposing is to intelligently identify network traffic in a way that uses relationships between structural elements embedded in the data rather than the literal content of the data Use this as a method to identify and classify network data into categories against which an event can be notified What we are proposing is to intelligently identify network traffic in a way that uses relationships between structural elements embedded in the data rather than the literal content of the data Use this as a method to identify and classify network data into categories against which an event can be notified We have created a database of known good and bad data samples which allows us to place network data in one of three possible categories: known good known bad unknown 12

Database Creation Created with isolated island networks using generic PC's with various operating systems database pollution was a problem "Good" samples were typical , database, browsing "Bad" samples were from PC's intentionally infected with botnet, virus and worm examples Database is in the form of indexed motifs in a "BLAST" formatted flat file design 13

Process Processing flow is started by extracting a packet of data to user space (via the linux kernel netfilter nfqueue module) The packet (as a whole) is transcoded and searched against the database returned is a set of "motifs" with score and false error rate statistics for each motif matched in the database Event notification is based on a threshold basis according to an election process for the top rated N hits returned (hits are ranked in order of the identity score) 14

Implementation The test design has proven less than reliable under higher packet rates mainly due to inefficient design Next step is to implement the reference design as a Snort IDS module and link to Snorts event notification process where the well designed data handling processes will alleviate the problems mentioned above 15

The Future These techniques have wide applicability to search problems where data is structured but mutable And for something completely different :) Using a similar process for detecting collusion between student assignments based on detecting structural similarities in software coding styles Create database of motifs based on code … search! 16

Existing work? Considering the advantages I have found, very little work has been undertaken in using these algorithms IBM proposed An Intrusion-Detection System Based on the Teiresias Pattern-Discovery Algorithm in 1999 IBM has proposed using the Teiresias algorithm for SPAM filtering in Commentators thought it was interesting but little further activity... 17

Conclusions It works :) Known/Unknown sorting might be a unique niche application Ability to statistically rank similarity is a useful tool opening up access to alternate ways to view search results 18

Questions? William (Bill) Kenworthy Thank you! 19