On The Effectiveness of Kolmogorov Complexity Estimation to Discriminate Semantic Types Presenters: Enkh-Amgalan Baatarjav Kalyan Pathapati Subbu Satyajeet.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
Automatic Video Shot Detection from MPEG Bit Stream Jianping Fan Department of Computer Science University of North Carolina at Charlotte Charlotte, NC.
Yiannis Demiris and Anthony Dearden By James Gilbert.
Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.
Volkan Cevher, Marco F. Duarte, and Richard G. Baraniuk European Signal Processing Conference 2008.
Mutual Information Mathematical Biology Seminar
Statistical Methods Chichang Jou Tamkang University.
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
Presented by Zeehasham Rasheed
1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,
Distributed Representations of Sentences and Documents
Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
Information Fusion Yu Cai. Research Article “Comparative Analysis of Some Neural Network Architectures for Data Fusion”, Authors: Juan Cires, PA Romo,
Part I: Classification and Bayesian Learning
Evaluating Performance for Data Mining Techniques
Information theory, fitness and sampling semantics colin johnson / university of kent john woodward / university of stirling.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
1. Introduction Generally Intrusion Detection Systems (IDSs), as special-purpose devices to detect network anomalies and attacks, are using two approaches.
Kalman filter and SLAM problem
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
GODIAN MABINDAH RUTHERFORD UNUSI RICHARD MWANGI.  Differential coding operates by making numbers small. This is a major goal in compression technology:
Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Template attacks Suresh Chari, Josyula R. Rao, Pankaj Rohatgi IBM Research.
Analysis of the Impact and Interactions of Protocol and Environmental Parameters on Overall MANET Performance Michael W. Totaro and Dmitri D. Perkins Center.
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.
1 ENTROPY-BASED CONCEPT SHIFT DETECTION PETER VORBURGER, ABRAHAM BERNSTEIN IEEE ICDM 2006 Speaker: Li HueiJyun Advisor: Koh JiaLing Date:2007/11/6 1.
Scientific Writing Abstract Writing. Why ? Most important part of the paper Number of Readers ! Make people read your work. Sell your work. Make your.
Chih-Ming Chen, Student Member, IEEE, Ying-ping Chen, Member, IEEE, Tzu-Ching Shen, and John K. Zao, Senior Member, IEEE Evolutionary Computation (CEC),
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Self Organization of a Massive Document Collection Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Teuvo Kohonen et al.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Wireless communications and mobile computing conference, p.p , July 2011.
Information Theory in an Industrial Research Lab Marcelo J. Weinberger Information Theory Research Group Hewlett-Packard Laboratories – Advanced Studies.
Advances in digital image compression techniques Guojun Lu, Computer Communications, Vol. 16, No. 4, Apr, 1993, pp
Action and Gait Recognition From Recovered 3-D Human Joints IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS— PART B: CYBERNETICS, VOL. 40, NO. 4, AUGUST.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
CISC Machine Learning for Solving Systems Problems Presented by: Satyajeet Dept of Computer & Information Sciences University of Delaware Automatic.
UNIT-I INTRODUCTION ANALYSIS AND DESIGN OF ALGORITHMS CHAPTER 1:
Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
NTU & MSRA Ming-Feng Tsai
Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Cyber Security Research on Engineering Solutions Dr. Bhavani.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Fundamentals of Multimedia Chapter 6 Basics of Digital Audio Ze-Nian Li and Mark S. Drew 건국대학교 인터넷미디어공학부 임 창 훈.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
CSCI 631 – Foundations of Computer Vision March 15, 2016 Ashwini Imran Image Stitching.
1 A latent information function to extend domain attributes to improve the accuracy of small-data-set forecasting Reporter : Zhao-Wei Luo Che-Jung Chang,Der-Chiang.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Integrating LiDAR Intensity and Elevation Data for Terrain Characterization in a Forested Area Cheng Wang and Nancy F. Glenn IEEE GEOSCIENCE AND REMOTE.
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Machine Learning for Computer Security
Chapter 3: Maximum-Likelihood Parameter Estimation
Parallel processing is not easy
Automatic Video Shot Detection from MPEG Bit Stream
A review of audio fingerprinting (Cano et al. 2005)
Supervised Time Series Pattern Discovery through Local Importance
Context-based Data Compression
iSRD Spam Review Detection with Imbalanced Data Distributions
EE513 Audio Signals and Systems
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

On The Effectiveness of Kolmogorov Complexity Estimation to Discriminate Semantic Types Presenters: Enkh-Amgalan Baatarjav Kalyan Pathapati Subbu Satyajeet Nimgaonkar SFI Workshop on Adaptive and Resilient Computing Security Stephen Bush and Todd Hughes

Overview Introduction Innovation and security Challenges Detecting variation in the complexity landscape Semantic Type Classification Framework and Experimental Test Set Discrimination Results Conclusion References

Introduction A problem in information system is information assurance Main idea: Complexity based vulnerability analysis Applying Kolmogorov Complexity for estimating and predicting previously unknown vulnerability Progress on experimental validation of vulnerability analysis framework Kolmogorov Complexity VideoVideo

Introduction The salient point of complexity-based vulnerability analysis The better one understands a phenomenon, the more concisely the phenomenon can be described. Goal of science: to develop theories that require minimum size to be fully described The objective of this paper To find whether estimates of complexity can be used to differentiate known types of data based on their complexity

Intro: Benefit Motivating early works: active network java complexity probe toolkit. Tools based on Kolmogorov complexity do not require detailed a priori information about known attacks, but rather compute vulnerability based upon an inherent, underlying property of information itself, namely, its Kolmogorov-Chaitin complexity.

Intro: Innovation and Security A method for vulnerability identification 1. Waiting for an information system to be attacked 2. Surviving the attack 3. Detecting the attack 4. Analyzing the attack 5. Adding result into a knowledge base Attackers and defenders of information system are capable of innovation

Intro: Challenges Length of time required to obtains an accurate sample (performing the analysis in real-time) Stream of data on a network link can be sampled at multiple protocol layers. OSI Model: physical, data link, network, transportation, session, presentation, application Potential attackers target areas of low complexity and high complexity Low complexity: easier to observe and understand High complexity: potentially a good place to hide activities

Intro: Challenges

Detecting variation in complexity landscape For complexity map generation Complexity landscape has sufficient variation Smallest descriptive length of different semantic types Equal or vastly differ Approximation of smallest descriptive length Best descriptor No redundant information Unique essence of entity remains Goal: Maximize discrimination Smallest representation of a sequence

Semantic Type Classification An input stream Different kinds of information Arrives into the complexity probe classifier The classifier Kolmogorov Complexity estimate of the input stream to categorize incoming data into different semantic types. Audio, MS Word Document, Executable, Image, ASCII Text, or Video

Framework and Experimental Test Set Ten randomly chosen samples of each type of data Data filtered to extract header The complexity estimator returns an estimate of its complexity. Mapper determines a semantic type based upon the complexity estimate.

Complexity Estimator Module Estimation using bit streams simple entropy estimator (H) Limpel-Zev (LZ) compression, Zip (Zip) compression, bZip (bZip) compression, and a frequency-based FFT estimator technique (Psi).

Tunable parameters of the Complexity Probe Parameters: specification of filters, sampling rate, window size, and the set of estimator algorithms enabled. The output a single semantic type to identify a.file a vector of semantic types, one for each window

Discrimination Results Discriminate analysis Zip estimator Squared distance between semantic types r relatively large except in the case of the distances circled in red. These types – very close to one another yield a high error rate in discriminating among these types.

Accuracy of the complexity-based system The histogram columns represent the percent of data from the experimental test set correctly classified Combination of entropy types audio and executables as a combined type MS Word and text as a combined type Images and video as combined types

Timing Profile For a complexity estimator, the actual complexity of the data and the window size will have greatest effects on timing. Fig. shows the mean complexity for each estimator for the entire experimental test set.

Time (ms) vs. Window Size (bytes) The fig. shows the expected amount of time for each semantic type as a function of window size. In every case, a larger window size requires more time to estimate complexity.

Time (ms) vs. Complexity (10Video files) The fig. shows the expected amount of time for each semantic type as a function of complexity of the sequence in the window. Time to estimate decreases with increase in complexity.

Time vs. % Correct Discrimination Accuracy vs. Time Discrimination vs. Compression Ratio

Throughput (b/ms) per Semantic Type Throughput for Z & H/semantic Type Throughput for Psi, LZ & BZ/semantic Type

Conclusion Results in this paper analyze whether estimates of complexity have their required resolution to differentiate known types of data based upon their complexity. Results indicates data types can be identified by estimates of their complexity A map of complexity can identify suspicious types Executable data embedded within passive data types

References On The Effectiveness of Kolmogorov. Complexity Estimation to Discriminate. Semantic Types. Stephen F. Bush, Senior Member, IEEE Complexity as a Framework for Prediction, Optimization, and Assurance, Proceedings of the 2002 DARPA Active Networks Conference and Exposition (DANCE 2002), IEEE Computer Society Press, pp , ISBN , May 29-30, 2002, San Francisco, California, USA. Bush, Stephen F., Extended Abstract: Complexity and Vulnerability Analysis, Complexity and Inference, June 2-5, 2003, DIMACS Center, Rutgers University, Piscataway, NJ, Organizers: Mark Hansen, Paul Vitányi, Bin Yu. Kirchher W., Li M., and Vitányi P., The Miraculous Universal Distribution. The Mathematical Intelligencer, Springer-Verlag, New York, Vol. 19, No. 4, Ming Li and Paul Vitányi. Introduction to Kolmogorov Complexity and Its Applications. Springer-Verlag, ISBN