José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using.

Slides:



Advertisements
Similar presentations
Validating the Evaluation of Adaptive Systems by User Profile Simulation Javier Bravo and Alvaro Ortigosa {javier.bravo, Universidad.
Advertisements

PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Automatic Identification of Cognates, False Friends, and Partial Cognates University of Ottawa, Canada University of Ottawa, Canada.
On-line Linear-time Construction of Word Suffix Trees Shunsuke Inenaga (Japan Society for the Promotion of Science & Kyushu University) Masayuki Takeda.
Trie and Search Trees Dr. Andrew Wallace PhD BEng(hons) EurIng
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.
Service Discrimination and Audit File Reduction for Effective Intrusion Detection by Fernando Godínez (ITESM) In collaboration with Dieter Hutter (DFKI)
An Efficient IP Address Lookup Algorithm Using a Priority Trie Authors: Hyesook Lim and Ju Hyoung Mun Presenter: Yi-Sheng, Lin ( 林意勝 ) Date: Mar. 11, 2008.
CS2420: Lecture 19 Vladimir Kulyukin Computer Science Department Utah State University.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Protein Domain Finding Problem Olga Russakovsky, Eugene Fratkin, Phuong Minh Tu, Serafim Batzoglou Algorithm Step 1: Creating a graph of k-mers First,
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Predicting Unix Commands With Decision Tables and Decision Trees Kathleen Durant Third International Conference on Data Mining Methods and Databases September.
Application of Apriori Algorithm to Derive Association Rules Over Finance Data Set Presented By Kallepalli Vijay Instructor: Dr. Ruppa Thulasiram.
Automatic Camera Calibration for Image Sequences of a Football Match Flávio Szenberg (PUC-Rio) Paulo Cezar P. Carvalho (IMPA) Marcelo Gattass (PUC-Rio)
12 -1 Lecture 12 User Modeling Topics –Basics –Example User Model –Construction of User Models –Updating of User Models –Applications.
UNIX Command-line Introduction Terence Parr. Navigating  cd  pwd  ls  pushd/pod  cd  pwd  ls  pushd/pod.
SPAM DETECTION USING MACHINE LEARNING Lydia Song, Lauren Steimle, Xiaoxiao Xu.
Automated malware classification based on network behavior
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
1 Persian Part Of Speech Tagging Mostafa Keikha Database Research Group (DBRG) ECE Department, University of Tehran.
EXTRACT: MINING SOCIAL FEATURES FROM WLAN TRACES: A GENDER-BASED CASE STUDY By Udayan Kumar Ahmed Helmy University of Florida Presented by Ahmed Alghamdi.
Graph Data Management Lab, School of Computer Science gdm.fudan.edu.cn XMLSnippet: A Coding Assistant for XML Configuration Snippet.
Chapter 1 Introduction to Data Mining
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Word Weighting based on User’s Browsing History Yutaka Matsuo National Institute of Advanced Industrial Science and Technology (JPN) Presenter: Junichiro.
UNIX and Shell Programming (06CS36) Unit 1 Continued… Shrinivas R. Mangalwede Department of Computer Science and Engineering K.L.S. Gogte Institute of.
K. J. O’Hara AMRS: Behavior Recognition and Opponent Modeling Oct Behavior Recognition and Opponent Modeling in Autonomous Multi-Robot Systems.
南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Introduction to Computer Programming Using C Session 23 - Review.
High-Speed Packet Classification Using Binary Search on Length Authors: Hyesook Lim and Ju Hyoung Mun Presenter: Yi-Sheng, Lin ( 林意勝 ) Date: Jan. 14, 2008.
Shape-based Similarity Query for Trajectory of Mobile Object NTT Communication Science Laboratories, NTT Corporation, JAPAN. Yutaka Yanagisawa Jun-ichi.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
IDENTIFYING SEMANTIC DIFFERENCES IN ASPECTJ PROGRAMS Martin Görg and Jianjun Zhao Computer Science Department, Shanghai Jiao Tong University.
ICDE, San Jose, CA, 2002 Discovering Similar Multidimensional Trajectories Michail VlachosGeorge KolliosDimitrios Gunopulos UC RiversideBoston UniversityUC.
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Implementation of a Relational Database as an Aid to Automatic Target Recognition Christopher C. Frost Computer Science Mentor: Steven Vanstone.
Intelligent DataBase System Lab, NCKU, Taiwan Josh Jia-Ching Ying 1, Wang-Chien Lee 2, Tz-Chiao Weng 1 and Vincent S. Tseng 1 1 Department of Computer.
LogTree: A Framework for Generating System Events from Raw Textual Logs Liang Tang and Tao Li School of Computing and Information Sciences Florida International.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Learning basic Unix command It 325 operating system.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
黃福銘 (Angus F.M. Huang) ANTS Lab, IIS, Academia Sinica Exploring Spatial-Temporal Trajectory Model for Location.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
On the Intersection of Inverted Lists Yangjun Chen and Weixin Shen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
A shallow description framework for musical style recognition Pedro J. Ponce de León, Carlos Pérez-Sancho and José Manuel Iñesta Departamento de Lenguajes.
A Mental Game as a Source of CS Case Studies
DATA MINING © Prentice Hall.
Supervised Time Series Pattern Discovery through Local Importance
Web Data Extraction Based on Partial Tree Alignment
CIKM Competition 2014 Second Place Solution
CIKM Competition 2014 Second Place Solution
Text Categorization Document classification categorizes documents into one or more classes which is useful in Information Retrieval (IR). IR is the task.
Patterns of Thinking and Writing
Chapter Four UNIX File Processing.
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
UNIX and Shell Programming (06CS36)
UNIX and Shell Programming (06CS36)
Manisha Panta, Avdesh Mishra, Md Tamjidul Hoque, Joel Atallah
Presentation transcript:

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using Statistical Pattern Recognition José Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis Computer Science Department Universidad Carlos III de Madrid Avda. de la Universidad, Leganés, Spain {jiglesia, ledezma,

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition  Motivation and Introduction  Sequence classification  Our approach  L ibrary Creation  Classification  Target Environment  Description  Experiments and Results  Conclusions and Future Works Outline. 1

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition  Motivation and Introduction  Sequence classification  Our approach  L ibrary Creation  Classification  Target Environment  Description  Experiments and Results  Conclusions and Future Works 1 Outline

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Opponent behavior Modelling / Classification (Environment: soccer simulation domain) Motivation. 2 Opponent Modeling Pattern Recognition Off-Line Analysis No-Pattern LogFile Pattern LogFile Base Estrategy Pattern Recognized Patterns On-Line Comparing Method Pattern Detection On-Line Detection Environment Information Advices to Players RoboCup Soccer Server Pattern Recognized

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Behavior Classification Behavior as sequence of elements Sequence Classification Introduction. 3 Sequence: “set of elements ordered so that they can be labelled with the positive integers” (Merriam-Webster Dictionary)

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Motivation and Introduction Sequence classification  Sequence classification  Our approach  L ibrary Creation  Classification  Target Environment  Description  Experiments & Results  Conclusions and Future Works 4 Outline

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Given: Classes = {c 1, c 2, … c n } Sequence E = {e 1, e 2, … e n } Determine: Which class c i Є C does the sequence E belong to. Sequence classification 5

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Motivation and Introduction Sequence classification Our approach  Our approach  L ibrary Creation  Classification  Target Environment  Description  Experiments & Results  Conclusions and Future Works. 6 Outline

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Our approach pwd fs fg … vi man ls … … finger more ls... Sequence 1 Class 1 Sequence 2 Class 2 Sequence n Class n Pattern 1Pattern 2Pattern 3 … Pattern Library Library CreationClassification vi more ls … Pattern to classify Sequence to classify Compare_ Patterns … On-Line Sequence Classification SEQUENCE CLASS Classification Result. 7

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Motivation and Introduction Sequence classification Our approach  Our approach Library Creation  Library Creation  Classification  Target Environment  Description  Experiments & Results  Conclusions and Future Works 8 Outline

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Library Creation. Trie (retrieval) data structure: Special search tree used for storing elements and its prefixes. Every node: – –represents an element –stores useful information (times appeared,…) 9

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Library Creation - An example trie Sequence to insert initially in the trie: {pwd  vi  pwd  vi  pwd  ls} pwd vi pwd vi pwd ls Sequence 10

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Library Creation - An example trie Sequence to insert initially in the trie: {pwd  vi  pwd  vi  pwd  ls} Sub-sequence length: 3 {pwd  vi  pwd  vi  pwd  ls} Sub-sequences to insert in the trie: {pwd  vi  pwd} and {vi  pwd  ls} pwd vi pwd vi pwd ls Sequence 10

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Library Creation- An example trie Library Creation - An example trie Sub-sequences to insert in the trie: pwd  vi  pwd {pwd  vi  pwd} and {vi  pwd  ls} Root 11

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Library Creation- An example trie Library Creation - An example trie Sub-sequences to insert in the trie: pwd  vi  pwd {pwd  vi  pwd} and {vi  pwd  ls} Root pwd [1]vi [1]pwd [1] 11

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Library Creation- An example trie Library Creation - An example trie Sub-sequences to insert in the trie: vi  pwd {pwd  vi  pwd} and {vi  pwd  ls} Root pwd [1]vi [1]pwd [1]vi [1]pwd [1] 11

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Library Creation- An example trie Library Creation - An example trie Sub-sequences to insert in the trie: vi  pwd {pwd  vi  pwd} and {vi  pwd  ls} Root pwd [2]vi [1]pwd [1]vi [1]pwd [1] 11

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Library Creation - An example trie Sub-sequences to insert in the trie: vi  pwd {pwd  vi  pwd} and {vi  pwd  ls} Root pwd [2]vi [1]pwd [1]vi [2]pwd [2]ls [1] 11

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Library Creation- An example trie Library Creation - An example trie Sub-sequences to insert in the trie: vi  pwd {pwd  vi  pwd} and {vi  pwd  ls} Root pwd [3]vi [1]pwd [1]vi [2]pwd [2]ls [1] 11

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Library Creation - An example trie Sub-sequences to insert in the trie: vi  pwd {pwd  vi  pwd} and {vi  pwd  ls} Root pwd [3]vi [1]pwd [1]vi [2]pwd [2]ls [1] 11

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Library Creation - An example trie vi  pwd  {pwd  vi  pwd  vi  pwd  ls} Root pwd [3]vi [1]pwd [1]vi [2]pwd [2]ls [1] 11 pwd vi pwd vi pwd ls

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Evaluate the relation/dependence between an element and its prefix Two approaches: – Frequency-based method.  Statistical dependence method. Our approach: Statistical Value used: Chi-square value. This value is stored in every node of the trie Library Creation - Evaluating Dependences 12

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Event Different event Total PrefixO 11 O 12 O 11 + O 12 Different Prefix O 21 O 22 O 21 + O 22 TotalO 11 + O 21 O 12 + O 22 O 11 + O 12 + O 21 + O 22 O 11 : How many times the current node/element is followed by its prefix. O 12 : How many times the current node/e lement is followed by a different prefix. O 21 : How many times a different prefix (of the same length) is followed by the same node. O 22 : How many times a different prefix (of the same length) is followed by a different node. Expected (E ij )= (Row i Total x Column j Total) Grand Total X 2 = ∑ ∑ (O ij - E ij ) 2 E ij i=1 r k 2 x 2 Contingency Table Library Creation - Evaluating Dependences j=1. 13

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Library Creation - Evaluating Dependences. pwd [3] [5.1] vi [1] [5.1] [4.3] pwd [1] [4.3]vi [2] [3.5] pwd [2] [3.5] [4.3] ls [1] [4.3] ls [2] Sequence Pattern Trie Root  A Sequence Pattern Trie is created for each class. 14

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Motivation and Introduction Sequence classification Our approach  Our approach Library Creation Classification  Classification  Target Environment  Description  Experiments & Results  Conclusions and Future Works 15 Outline

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Classification pwd fs fg … vi man ls … … finger more ls... Sequence 1 Class 1 Sequence 2 Class 2 Sequence n Class n Pattern 2Pattern 3 … Pattern Library Classification vi more ls … Sequence to classify Compare_ Patterns … On-Line Sequence Classification ONLINE SEQUENCE CLASS. Library Creation Pattern to classify TestingTrie Testing Trie Pattern 1 Compare_ Patterns ClassTrie Class Trie 16

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition. 17 pwd [3] [5.1] vi [1] [5.1] [4.3] who [1] [4.3] vi [2] [3.5] who [2] [3.5] Root Classification – Comparing Process Class TrieTesting Trie pwd [3] [7.1] vi [1] [7.1] [ 7.3] pwd [1] [ 7.3] vi [2] [1.5] pwd [2] [1.5] [0.3] ls [1] [0.3] ls [2] Root … If the node (and its prefix) are in both Tries: If ( abs(chi 2 TestingTrie – chi 2 ClassTrie ) ≤ ThresholdValue ): Similarity Similarity between both tries. Chi 2 TestingTrie Result  [Element TestingTrie, Prefix TestingTrie, Chi 2 TestingTrie ]

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition. 17 pwd [3] [5.1] vi [1] [5.1] [4.3] who [1] [4.3] vi [2] [3.5] who [2] [3.5] Root Classification – Comparing Process Class TrieTesting Trie pwd [3] [7.1] vi [1] [7.1] [ 7.3] pwd [1] [ 7.3] vi [2] [1.5] pwd [2] [1.5] [0.3] ls [1] [0.3] ls [2] Root … If the node (and its prefix) are in both Tries: If (abs(5.1 – 7.1) ≤ ThresholdValue ): Similarity Similarity between both tries. 5.1 Result  [vi, pwd, 5.1]

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition. 17 pwd [3] [5.1] vi [1] [5.1] [4.3] who [1] [4.3] vi [2] [3.5] who [2] [3.5] Root Classification – Comparing Process Class TrieTesting Trie pwd [3] [7.1] vi [1] [7.1] [ 7.3] pwd [1] [ 7.3] vi [2] [1.5] pwd [2] [1.5] [0.3] ls [1] [0.3] ls [2] Root … If the node (and its prefix) are only in the Testing Trie: Difference Difference between both tries. Result  [Element TestingTrie, Prefix TestingTrie, (Chi 2 TestingTrie * -1)] Result  [Element TestingTrie, Prefix TestingTrie, (Chi 2 TestingTrie * -1)]

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition. 17 pwd [3] [5.1] vi [1] [5.1] [4.3] who [1] [4.3] vi [2] [3.5] who [2] [3.5] Root Classification – Comparing Process Class TrieTesting Trie pwd [3] [7.1] vi [1] [7.1] [ 7.3] pwd [1] [ 7.3] vi [2] [1.5] pwd [2] [1.5] [0.3] ls [1] [0.3] ls [2] Root … If the node (and its prefix) are only in the Testing Trie: Difference Difference between both tries. Result  [who, pwd  vi, (-4.3)] Result  [who, pwd  vi, (-4.3)]

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition. 17 pwd [3] [5.1] vi [1] [5.1] [4.3] who [1] [4.3] vi [2] [3.5] who [2] [3.5] Root Classification – Comparing Process Class TrieTesting Trie pwd [3] [7.1] vi [1] [7.1] [ 7.3] pwd [1] [ 7.3] vi [2] [1.5] pwd [2] [1.5] [0.3] ls [1] [0.3] ls [2] Root … If the node (and its prefix) are only in the Testing Trie: Difference Difference between both tries. Result  [who, vi, (-3.5)] Result  [who, vi, (-3.5)]

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Result Result: Value 1 [Element 1, Prefix 1, Value 1 ] Value 2 [Element 2, Prefix 2, Value 2 ] Value 3 [Element 3, Prefix 3, Value 3 ] Value 4 [Element 4, Prefix 4, Value 4 ] … Value n [Element n, Prefix n, Value n ] Each comparison (ClassTrie, TestingTrie): A comparision value. Comparison Value 18 Classification – Comparing Process

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Result Result: [vi, pwd, + 5.1] [who, pwd  vi, - 4.3] [who, pwd, - 3.5] Classification – Comparing Process Comparison Value

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Classification pwd fs fg … vi man ls … … finger more ls... Sequence 1 Class 1 Sequence 2 Class 2 Sequence n Class n Pattern 1Pattern 2Pattern 3 … Pattern Library Library CreationClassification vi more ls … Pattern to classify Sequence to classify … ONLINE SEQUENCE CLASS On-Line Sequence Classification Compare_ Patterns comparision value. 19

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition comparision value Classification pwd fs fg … vi man ls … … finger more ls... Sequence 1 Class 1 Sequence 2 Class 2 Sequence n Class n Pattern 1Pattern 2Pattern 3 … Pattern Library Library CreationClassification vi more ls … Pattern to classify Sequence to classify Compare_ Patterns … ONLINE SEQUENCE CLASS On-Line Sequence Classification Greatest Comparison Value. 20

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Motivation and Introduction Sequence classification Our approach L ibrary Creation Classification Target Environment  Target Environment  Description  Experiments & Results  Conclusions and Future Works 21 Outline

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Environment – UNIX command line sequences # Start session 1 cd ~/private/docs ls -laF | more cat foo.txt bar.txt zorch.txt > a.txt exit # End session 1 # Start session 2 cd ~/games/ xquake & fg …**SOF** cd ls -laF | more cat > exit**EOF** … one "file name" argument three "file name" arguments one "file name" argument UNIX computer users Command histories of 9 UNIX computer users at over 2 years UCI Repository of ML Database [Newman C., Hettich S., Merz, C. (1998)] 22

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Motivation and Introduction Sequence classification Our approach L ibrary Creation Classification Target Environment  Target Environment  Description  Experiments & Results  Conclusions and Future Works 23 Outline

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition 9 files (users) containing from about to commands each. 1. Extracting Patterns: 1. Extracting Patterns: A trie is created for each user  Pattern Library Experiments – UNIX command line sequences. 24

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition 9 files (users) containing from about to commands each. 1. Extracting Patterns: 1. Extracting Patterns: A trie is created for each user  Pattern Library Experiments – UNIX command line sequences Classification Algorithm: Sequence to classify (sequences of very different sizes)   Classified in the class with the greatest value (result value).

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition 9 files (users) containing from about to commands each. 1. Extracting Patterns: 1. Extracting Patterns: A trie is created for each user  Pattern Library Experiments – UNIX command line sequences Classification Algorithm: Sequence to classify (sequences of very different sizes)   Classified in the class with the greatest value (result value). 3. Evaluating the result: Calculate: (+) difference between the greatest value and the second greatest value (+) (-) x difference between the real classification value and the greatest value (-) (The greater the difference, the better the classification)

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Results – UNIX command line sequences Unix Commands Classification – User 6. average of 25 simulation results 25 Classification Value Length of the Sequence to classify

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Results – UNIX command line sequences Minimum length for classifying a UNIX Computer User correctly. 26 Unix Computer User (Class) Length of the Sequence to classify

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Motivation and Introduction Sequence classification Our approach Library Creation Classification Target Environment Description Experiments & Results Conclusions and Future Works  Conclusions and Future Works 27 Outline

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition  A threshold must be found  Long time for creating the tries  Results depend on the length of the sub-sequences used to create the trie Conclusions 28

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Effective method to classify UNIX users If a behavior can be represented by sequences, the proposed classification method can be used If a new class is added, only its trie must be created (the others are not modified) This method could be used for other tasks: sequence prediction, sequence clustering… RoboCup Coach 2006 Competition (succesfully results) Conclusions 29

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition OPattern Library  One Trie for all classes (users). OClassification method without threshold value OAnalysis comparing our approach to others (HMMs) Future Works 30

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using Statistical Pattern Recognition José Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis Computer Science Department Universidad Carlos III de Madrid Avda. de la Universidad, Leganés, Spain {jiglesia, ledezma, Thank you!

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using Statistical Pattern Recognition José Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis Computer Science Department Universidad Carlos III de Madrid Avda. de la Universidad, Leganés, Spain {jiglesia, ledezma,

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Sequence Classification Using Statistical Pattern Recognition José Antonio Iglesias, Agapito Ledezma, and Araceli Sanchis Computer Science Department Universidad Carlos III de Madrid Avda. de la Universidad, Leganés, Spain { jiglesia, ledezma, Related to Questions... 29

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Experiments – UNIX command line sequences **SOF** cd ls -laF | more cat > … Pattern/Class User0 **SOF** ls exit ls -laF xquake & fg … **SOF** vi vi ls -la cat … USER 0 Class0 USER 1 Class1 USER 8 Class8 … … Pattern Library **SOF** ls -laF | More cd … Test User Sequence Classification Pattern/Class User1 Pattern/Class User8 User On-Line Є Class c User On-Line vs Class User0  21 User On-Line vs Class User1  49 User On-Line vs Class User2  9 User On-Line vs Class User3  3 User On-Line vs Class User4  12 User On-Line vs Class User5  29 User On-Line vs Class User6  -1 User On-Line vs Class User7  0 User On-Line vs Class User8  11 ClassUser1

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Experiments – UNIX command line sequences **SOF** cd ls -laF | more cat > … Pattern/Class User0 **SOF** ls exit ls -laF xquake & fg … **SOF** vi vi ls -la cat … USER 0 Class0 USER 1 Class1 USER 8 Class8 … … Pattern Library **SOF** ls -laF | More cd … Test User Sequence Classification Pattern/Class User1 Pattern/Class User8 User On-Line Є Class c User On-Line vs Class User0  21 User On-Line vs Class User1  49 User On-Line vs Class User2  9 User On-Line vs Class User3  3 User On-Line vs Class User4  12 User On-Line vs Class User5  29 User On-Line vs Class User6  -1 User On-Line vs Class User7  0 User On-Line vs Class User8  11 ClassUser1

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Experiments – UNIX command line sequences **SOF** cd ls -laF | more cat > … Pattern/Class User0 **SOF** ls exit ls -laF xquake & fg … **SOF** vi vi ls -la cat … USER 0 Class0 USER 1 Class1 USER 8 Class8 … … Pattern Library **SOF** ls -laF | More cd … Test User Sequence Classification Pattern/Class User1 Pattern/Class User8 User On-Line Є Class c User On-Line vs Class User0  21 User On-Line vs Class User1  49 User On-Line vs Class User2  9 User On-Line vs Class User3  3 User On-Line vs Class User4  12 User On-Line vs Class User5  29 User On-Line vs Class User6  -1 User On-Line vs Class User7  0 User On-Line vs Class User8  11 ClassUser1 Correctly Classified

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Experiments – UNIX command line sequences **SOF** cd ls -laF | more cat > … Pattern/Class User0 **SOF** ls exit ls -laF xquake & fg … **SOF** vi vi ls -la cat … USER 0 Class0 USER 1 Class1 USER 8 Class8 … … Pattern Library **SOF** ls -laF | More cd … Test User Sequence Classification Pattern/Class User1 Pattern/Class User8 User On-Line Є Class c User On-Line vs Class User0  21 User On-Line vs Class User1  49 User On-Line vs Class User2  9 User On-Line vs Class User3  3 User On-Line vs Class User4  12 User On-Line vs Class User5  29 User On-Line vs Class User6  -1 User On-Line vs Class User7  0 User On-Line vs Class User8  11 ClassUser1 Correctly Classified 20

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Experiments – UNIX command line sequences **SOF** cd ls -laF | more cat > … Pattern/Class User0 **SOF** ls exit ls -laF xquake & fg … **SOF** vi vi ls -la cat … USER 0 Class0 USER 1 Class1 USER 8 Class8 … … Pattern Library **SOF** ls -laF | More cd … Test User Sequence Classification Pattern/Class User1 Pattern/Class User8 User On-Line Є Class c User On-Line vs Class User0  21 User On-Line vs Class User1  49 User On-Line vs Class User2  9 User On-Line vs Class User3  3 User On-Line vs Class User4  12 User On-Line vs Class User5  29 User On-Line vs Class User6  -1 User On-Line vs Class User7  0 User On-Line vs Class User8  11 ClassUser2

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Experiments – UNIX command line sequences **SOF** cd ls -laF | more cat > … Pattern/Class User0 **SOF** ls exit ls -laF xquake & fg … **SOF** vi vi ls -la cat … USER 0 Class0 USER 1 Class1 USER 8 Class8 … … Pattern Library **SOF** ls -laF | More cd … Test User Sequence Classification Pattern/Class User1 Pattern/Class User8 User On-Line Є Class c User On-Line vs Class User0  21 User On-Line vs Class User1  49 User On-Line vs Class User2  9 User On-Line vs Class User3  3 User On-Line vs Class User4  12 User On-Line vs Class User5  29 User On-Line vs Class User6  -1 User On-Line vs Class User7  0 User On-Line vs Class User8  11 ClassUser2

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Experiments – UNIX command line sequences **SOF** cd ls -laF | more cat > … Pattern/Class User0 **SOF** ls exit ls -laF xquake & fg … **SOF** vi vi ls -la cat … USER 0 Class0 USER 1 Class1 USER 8 Class8 … … Pattern Library **SOF** ls -laF | More cd … Test User Sequence Classification Pattern/Class User1 Pattern/Class User8 User On-Line Є Class c User On-Line vs Class User0  21 User On-Line vs Class User1  49 User On-Line vs Class User2  9 User On-Line vs Class User3  3 User On-Line vs Class User4  12 User On-Line vs Class User5  29 User On-Line vs Class User6  -1 User On-Line vs Class User7  0 User On-Line vs Class User8  11 NO Correctly Classified ClassUser2

José Antonio Iglesias, Agapito Ledezma and Araceli Sanchis Sequence Classification Using Statistical Pattern Recognition Experiments – UNIX command line sequences **SOF** cd ls -laF | more cat > … Pattern/Class User0 **SOF** ls exit ls -laF xquake & fg … **SOF** vi vi ls -la cat … USER 0 Class0 USER 1 Class1 USER 8 Class8 … … Pattern Library **SOF** ls -laF | More cd … Test User Sequence Classification Pattern/Class User1 Pattern/Class User8 User On-Line Є Class c User On-Line vs Class User0  21 User On-Line vs Class User1  49 User On-Line vs Class User2  9 User On-Line vs Class User3  3 User On-Line vs Class User4  12 User On-Line vs Class User5  29 User On-Line vs Class User6  -1 User On-Line vs Class User7  0 User On-Line vs Class User8  11 NO Correctly Classified - 40 ClassUser2