Discovering Outlier Filtering Rules from Unlabeled Data Author: Kenji Yamanishi & Jun-ichi Takeuchi Advisor: Dr. Hsu Graduate: Chia- Hsien Wu.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.
PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, Kevin Chen- Chuan Chang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.
2001/12/181/50 Discovering Robust Knowledge from Databases that Change Author: Chun-Nan Hsu, Craig A. Knoblock Advisor: Dr. Hsu Graduate: Yu-Wei Su.
A Probabilistic Framework for Semi-Supervised Clustering
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Locally Constraint Support Vector Clustering
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Supervised learning: Mixture Of Experts (MOE) Network.
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin The Chinese.
Introduction to machine learning
1. Introduction Generally Intrusion Detection Systems (IDSs), as special-purpose devices to detect network anomalies and attacks, are using two approaches.
Anomaly detection Problem motivation Machine Learning.
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Engineering Applications of Artificial Intelligence,
Mining Preferences from Superior and Inferior Examples KDD’08 1.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
Recent Trends in Text Mining Girish Keswani
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Anthony K.H. Tung Hongjun Lu Jiawei Han Ling Feng 國立雲林科技大學 National.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu
Enron Corpus: A New Dataset for Classification By Bryan Klimt and Yiming Yang CEAS 2004 Presented by Will Lee.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology On Data Labeling for Clustering Categorical Data Hung-Leng.
2001/11/27IDS Lab Seminar1 Adaptive Fraud Detection Advisor: Dr. Hsu Graduate: Yung-Chu Lin Source: Fawcett, Tom and Foster Provost, Journal of Data Mining.
Adding Semantics to Clustering Hua Li, Dou Shen, Benyu Zhang, Zheng Chen, Qiang Yang Microsoft Research Asia, Beijing, P.R.China Department of Computer.
An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
A Fuzzy k-Modes Algorithm for Clustering Categorical Data
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
A New Temporal Pattern Identification Method for Characterization and Prediction of Complex Time Series Events Advisor : Dr. Hsu Graduate : You-Cheng Chen.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.
The Summary of My Work In Graduate Grade One Reporter: Yuanshuai Sun
Text Document Categorization by Term Association Maria-luiza Antonie Osmar R. Zaiane University of Alberta, Canada 2002 IEEE International Conference on.
© Devi Parikh 2008 Devi Parikh and Tsuhan Chen Carnegie Mellon University April 3, ICASSP 2008 Bringing Diverse Classifiers to Common Grounds: dtransform.
Hierarchical Model-Based Clustering of Large Datasets Through Fractionation and Refractionation Advisor : Dr. Hsu Graduate : You-Cheng Chen Author : Jeremy.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Efficient Rule-Based Attribute-Oriented Induction for Data Mining Authors: Cheung et al. Graduate: Yu-Wei Su Advisor: Dr. Hsu.
Multivariate Discretization of Continuous Variables for Set Mining Author:Stephen D. Bay Advisor: Dr. Hsu Graduate: Kuo-wei Chen.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Chinese Named Entity Recognition using Lexicalized HMMs.
Mining Top-n Local Outliers in Large Databases Author: Wen Jin, Anthony K. H. Tung, Jiawei Han Advisor: Dr. Hsu Graduate: Chia- Hsien Wu.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
A Method to Approximate the Bayesian Posterior Distribution in Singular Learning Machines Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
1 Tracking Dynamics of Topic Trends Using a Finite Mixture Model Satoshi Morinaga, Kenji Yamanishi KDD ’04.
Recent Trends in Text Mining
Semi-Supervised Clustering
KDD 2004: Adversarial Classification
Speaker: Jim-an tsai advisor: professor jia-lin koh
Speaker: Jim-an tsai advisor: professor jia-lin koh
A Unifying View on Instance Selection
Lecture 15: Data Cleaning for ML
IDSL, Intelligent Database System Lab
Probabilistic Latent Preference Analysis
EM Algorithm and its Applications
Presentation transcript:

Discovering Outlier Filtering Rules from Unlabeled Data Author: Kenji Yamanishi & Jun-ichi Takeuchi Advisor: Dr. Hsu Graduate: Chia- Hsien Wu

Outline Motivation Objective Introduction Main Framework Outlier Detector - SmartSifter Rule Generator – DL-ESC/DL-SC Experimentation–The network intrusion Experimental Results Conclusion Opinion

Motivation The problem of the SmartSifter’s accuracy The SmartSifter cannot find the general pattern of the identified outliers

Objective Improving the accuracy of SmartSiFter. Discovering a new pattern that outliers in a specific group may commonly have

Introduction Developing SmartSifer : It is an on-line outlier detection algorithm Improving the power of the SamtSifer by combining supervised learning method

Main Framework Classifier L A New Rule

Outlier Detector - SmartSifter ->SS Using a probabilistic (Gaussian mixture) model->P(x,y) = p(x)p(y|x) Employing an on-line discounting learning algorithm (SDLE)/(SDEM) to update the model Giving a score to each datum

Outlier Detector - SmartSifter ->SS (cont.) SDLE algorithm: An on-line discounting variant of the Laplace law based estimation algorithm SDEM algorithm: An on-line discounting variant of the incremental EM (Expectation Maximization) algorithm

Outlier Detector - SmartSifter ->SS (cont.) Outputting a sorted dataset A highly scored data indicates a high possibility be an outlier

Rule Generator – DL-ESC/DL-SC Using a stochastic decision list Employing the principle of minimizing extended stochastic complexity or stochastic complexity

Rule Generator – DL-ESC/DL-SC (cont.)  If ξ makes t 1 true, then μ = v 1 with probability p 1 else if ξ makes t 2 true, then μ = v 2 with probability p 2 ……………………… else μ = v s with probability p s

Experimentation - Network intrusion detection The purpose of our experiment is to detect without making use of the labels concerning intrusions

Experimentation – Dataset (cont.) Using the dataset KDD Cup 1999 prepared for network intrusion detection Using the 13 attributes for DL-ESC Using four attributes for SmartSifter (service,duration,src_bytes,dst_bytes) Only “service” is categorical Y= log(x+0.1),where the base of logarithm is e Generating five datasets S0,S1,S2,S3,S4

Experimentation – Dataset (cont.)

Experimentation – Illustration by an Example (cont.) Update Rule – S1 First Rule – S1 Update Rule – S2

Experimental Results SS : SmartSifter R&S: Rule and SmartSifter (This framework) Using S0 as a training set to construct a filtering rule, each of S1,S2,S3,and S4 is used for test

Experimental Results (cont.)

Conclusion This new framework has two features  Improving the power of SmartSifter  Helping the user discovers a general pattern

Opinion Making the detection process more effective and more understandable This framework can apply to other field