Net-Centric Software and Systems I/UCRC Self-Detection of Abnormal Event Sequences Project Lead: Farokh Bastani, I-Ling Yen, Latifur Khan Date: April 1,

Slides:



Advertisements
Similar presentations
Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.
Advertisements

Experiments on Query Expansion for Internet Yellow Page Services Using Log Mining Summarized by Dongmin Shin Presented by Dongmin Shin User Log Analysis.
Net-Centric Software and Systems I/UCRC Copyright © 2011 NSF Net-Centric I/UCRC. All Rights Reserved. High-Confidence SLA Assurance for Cloud Computing.
1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,
Dynamic Service Composition with QoS Assurance Feb , 2009 Jing Dong UTD Farokh Bastani UTD I-Ling Yen UTD.
Experience, Technology and Focus in Mid Market CRM Soffront Asset management: An Overview.
A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.
Project Plans CSCI102 - Systems ITCS905 - Systems MCS Systems.
Department of Electrical and Computer Engineering Texas A&M University College Station, TX Abstract 4-Level Elevator Controller Lessons Learned.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
November 2011 At A Glance GREAT is a flexible & highly portable set of mission operations analysis tools that increases the operational value of ground.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Distribution Statement A. Approved for public release; distribution is unlimited. Test and Evaluation/Science and Technology Program Rapid Data Analyzer.
Genetic Algorithms Overview Genetic Algorithms: a gentle introduction –What are GAs –How do they work/ Why? –Critical issues Use in Data Mining –GAs.
Chapter 4: Beginning the Analysis: Investigating System Requirements
Intrusion and Anomaly Detection in Network Traffic Streams: Checking and Machine Learning Approaches ONR MURI area: High Confidence Real-Time Misuse and.
Genetic Algorithms: A Tutorial
Data Mining Techniques
SYSTEM ANALYSIS AND DESIGN
Scientific Workflows Within the Process Mining Domain Martina Caccavale 17 April 2014.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Chapter 2: Approaches to System Development
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
On Simultaneous Clustering and Cleaning over Dirty Data
Discovery of Cellular Automata Rules Using Cases Ken-ichi Maeda Chiaki Sakama Wakayama University Discovery Science 2003, Oct.17.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
INTRUSION DETECTION INTRUSION DETECTION INTRUSION DETECTION INTRUSION DETECTION INTRUSION DETECTION INTRUSION DETECTION INTRUSION DETECTION INTRUSION DETECTION.
© Grant Thornton | | | | | Guidance on Monitoring Internal Control Systems COSO Monitoring Project Update FEI - CFIT Meeting September 25, 2008.
Cluster Reliability Project ISIS Vanderbilt University.
Swarm Intelligence 虞台文.
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
Copyright © 2007 OSIsoft, Inc. All rights reserved. Ekho - MES Applications that leverages AF 2.0 Yannick Galipeau Inexcon Technologies Patrick Ramsey.
April 2004 At A Glance CAT is a highly portable exception monitoring and action agent that automates a set of ground system functions. Benefits Automates.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
1 “Genetic Algorithms are good at taking large, potentially huge search spaces and navigating them, looking for optimal combinations of things, solutions.
Net-Centric Software and Systems I/UCRC A Framework for QoS and Power Management for Mobile Devices in Service Clouds Project Lead: I-Ling Yen, Farokh.
MTBC Cloud Computing Initiative  Applications of cloud computing  Overview of the NSF Net-Centric Software and Systems (NCSS) I/UCRC  MTBC and NCSS.
Data Mining In contrast to the traditional (reactive) DSS tools, the data mining premise is proactive. Data mining tools automatically search the data.
Systems Analysis and Design in a Changing World, Fourth Edition
FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University July 21, 2008WODA.
Data Mining Anomaly Detection Lecture Notes for Chapter 10 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to.
Copyright © 2015 NSF Net-Centric I/UCRC. All Rights Reserved. Rev 4 Net-Centric and Cloud Software and Systems I/UCRC Net-Centric and Cloud Software and.
Advances In Software Inspection
Anomaly Detection. Network Intrusion Detection Techniques. Ştefan-Iulian Handra Dept. of Computer Science Polytechnic University of Timișoara June 2010.
A field of study that encompasses computational techniques for performing tasks that require intelligence when performed by humans. Simulation of human.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Net-Centric Software and Systems I/UCRC A Framework for QoS and Power Management for Mobile Devices in Service Clouds Project Lead: I-Ling Yen, Farokh.
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
Fitness Guided Fault Localization with Coevolutionary Automated Software Correction Case Study ISC Graduate Student: Josh Wilkerson, Computer Science ISC.
Resource Optimization for Publisher/Subscriber-based Avionics Systems Institute for Software Integrated Systems Vanderbilt University Nashville, Tennessee.
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
Genetic Algorithms. Solution Search in Problem Space.
EVOLUTIONARY SYSTEMS AND GENETIC ALGORITHMS NAME: AKSHITKUMAR PATEL STUDENT ID: GRAD POSITION PAPER.
Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.
Anomaly Detection Nathan Dautenhahn CS 598 Class Lecture March 3, 2011.
Introduction to Machine Learning, its potential usage in network area,
Experience Report: System Log Analysis for Anomaly Detection
Model Discovery through Metalearning
Automatic cLasification d
Profiling based unstructured process logs
Gerd Kortemeyer, William F. Punch
Research Areas Christoph F. Eick
Need for the subject.
Kostas Kolomvatsos, Christos Anagnostopoulos
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
Evaluating Classifiers for Disease Gene Discovery
Presentation Title August 8, 2019
Presentation Title September 22, 2019
Presentation transcript:

Net-Centric Software and Systems I/UCRC Self-Detection of Abnormal Event Sequences Project Lead: Farokh Bastani, I-Ling Yen, Latifur Khan Date: April 1, 2010 Copyright © 2010 NSF Net-Centric I/UCRC. All rights reserved.

Page 23/4/2016 Project Scope: Given a set of event sequences, determine the normal and abnormal transitions using data mining and automata techniques Develop techniques for problem-specific anomaly detection, including data collection and extraction, a suite of techniques for detecting abnormal event sequences The industry members can share the techniques for abnormal event sequence detection to achieve high quality systems Tasks: 1.Develop Preprocessor for processing log data and extract event sequences 2.Develop cluster based anomaly detection techniques 3.Develop probabilistic finite state automata (PFSA) based anomaly detection techniques 4.Parallelize the algorithms to make them more efficient 5.Apply the techniques on the datasets provided by the industrial partner and report the results Deliverables: A suite of anomaly detection algorithms (cluster-based and PFSA based tools) Anomaly detection results 2009/Current Project Overview Self-Detection of Abnormal Event Sequences Success Criteria: Identify injected anomalies with high precision and recall A M J J A S O N D J F M A 0910 Project Schedule: T1: Implemented preprocessor T2,3,5: Applied clustering and PFSA on the small dataset and obtained results T4: Parallelized algorithms on large dataset T5: Applied algo’s on the large dataset T1: Refined preprocessor & got new results

Page 33/4/ Project Results TASK STAT PROGRESS and ACCOMPLISHMENT 1:Develop Preprocessor for processing log data and extracting event sequences Use lex/yacc to implement a flexible processor. Need to refine the preprocessor to eliminate the noisy data due to initialization and concurrent execution 2: Develop cluster based anomaly detection techniques Completed. Parallelized GA based clustering technique for anomaly detection. 3: Develop PFSA based anomaly detection techniques Completed. Enhanced MDI (minimal divergence inference) to handle event attributes and anomaly detection. 4: Enhance the algorithms to make them more efficient and effective Invented the prefix tree based approach, which facilitates the analysis of very large datasets, reduces processing time over 20 folds. 5: Apply the techniques on the datasets provided by Cisco The results show high precision in identifying injected anomalies Complete Partially Complete Not Started Significant Finding/Accomplishment! The tools from this research have detected the injected anomalies with high precision

Page 43/4/2016 Major Accomplishments, Discoveries and Surprises 1. Enhanced clustering based anomaly detection Developed Multi-objective genetic algorithm to avoid local minimum search Parallelized the algorithm 2. Enhanced PFSA based anomaly detection Implemented the prefix tree scheme Used MDI (minimal divergence inference) Attributes as transition symbols Use some of the attributes directly Quantize other attributes (e.g., time) and use the quantized values 3. Applied to Cisco call signal event sequences Identified all the injected anomalies 4. Invented the prefix tree based approaches 3/4/2016

Page 53/4/2016 New Problems Use prefix tree to greatly enhance the efficiency of the algorithms Event sequences can be built into a prefix tree Prefix tree can be used to group event sequences at different levels of granularity (this is especially the case for datasets containing execution traces) Prefix tree can provide some distance information How to achieve real-time on-the-fly anomaly detection? Need to determine a suitable time interval T Data collected in T should be sufficient to build a good anomaly detection model, while the detection latency is not significant How to handle event sequences created due to concurrent execution? Concurrent execution can generate event sequences of arbitrary order and make anomaly detection difficult Investigate association rule mining techniques for this problem 3/4/ nd closest neighbor

Page 63/4/2016 Proposed Solution 3/4/2016 Enhance existing tools using information provided by prefix tree Clustering-based approaches: Use prefix tree to determine the sequence groups at different granularity levels (object level, method level, exact sequence level); clustering algorithms can then be used to merge these groups into clusters Density-based approaches: Use prefix tree to help determine the k-th nearest neighbor PFSA-based approaches: Always start from prefix tree Enhance existing tools for real-time on-the-fly anomaly detection Collect data D t in (t, t+T], use D t to build the anomaly detection model A t in (t+T, t+2T], use A t for anomaly detection in (t+2T, t+3T] Experimentally determine a suitable T t t+T Collect D t+T Build A t Apply A t–T t+2T Collect D t Build A t–T Apply A t–2T t+3T Collect D t+2T Build A t+T Apply A t … …

Page 73/4/2016 Tasks: 1.Modify the anomaly detections tools for real- time on-the-fly anomaly detection 2.Enhance the anomaly detection techniques using knowledge in prefix tree 3.Continue to Refine the preprocessor Apply the techniques to the datasets Compare the results (time/precision) 4.Develop visualization tool for PFSA 5.Adapt the tools for different datasets A M J J A S O N D J F M A 1011 Project Schedule: Research Goals: 1.Improve existing anomaly detections techniques, specifically for execution traces and event sequences 2.Develop a diverse set of anomaly detection techniques for handling datasets with different characteristics 3.Make the tools available for future anomaly detection tasks Benefits to Industry Partners: 1.A comprehensive set of techniques and tools to allow best analysis of different datasets 2.Real-time on-the-fly anomaly detection capability 3.Rapid adaptation of the tools to handle other application specific datasets 2010/New Project Summary Self-Detection of Abnormal Event Sequences Task 1. Modification Task 4. Visualization Task 2. Prefix tree Task 3. Experiment Task 5? Additional datasets