Using Shapes of Trends in Active Data Mining Duy Lam Norris Boothe.

Slides:



Advertisements
Similar presentations
Intelligent Technologies Module: Ontologies and their use in Information Systems Revision lecture Alex Poulovassilis November/December 2009.
Advertisements

Pattern Finding and Pattern Discovery in Time Series
A Framework for Clustering Evolving Data Streams Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu Presented by: Di Yang Charudatta Wad.
Data Modeling and Database Design Chapter 1: Database Systems: Architecture and Components.
IT Requirements Capture Process. Motivation for this seminar Discovering system requirements is hard. Formally testing use case conformance is hard. We.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Mining Sequential Patterns Authors: Rakesh Agrawal and Ramakrishnan Srikant. Presenter: Jeremy Dalmer.
© Janice Regan, CMPT 102, Sept CMPT 102 Introduction to Scientific Computer Programming The software development method algorithms.
Active Databases as Information Systems
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
GNANA SUNDAR RAJENDIRAN JOYESH MISHRA RISHI MISHRA FALL 2008 BIOINFORMATICS Clustering Method for Repeat Analysis in DNA sequences.
Managing Data Resources
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
Detailed Design Kenneth M. Anderson Lecture 21
1 CIS607, Fall 2005 Semantic Information Integration Presentation by Dong Hwi Kwak Week 5 (Oct. 26)
The AutoSimOA Project Katy Hoad, Stewart Robinson, Ruth Davies Warwick Business School WSC 07 A 3 year, EPSRC funded project in collaboration with SIMUL8.
Requirements Specification
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
An overview of The IBM Intelligent Miner for Data By: Neeraja Rudrabhatla 11/04/1999.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 1: Introduction to Decision Support Systems Decision Support.
WebSphere -DB2 Integration Web Browser Web Server (Apache) WebSphere –JSP/Servlet/EJB DB2 JDBC, SQL HTTP.
Presenter: PCLee Design Automation Conference, ASP-DAC '07. Asia and South Pacific.
Automatic Data Ramon Lawrence University of Manitoba
Privacy-Preserving Data Mining Rakesh Agrawal Ramakrishnan Srikant IBM Almaden Research Center 650 Harry Road, San Jose, CA Published in: ACM SIGMOD.
Sequential circuit design
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Chapter 10 Architectural Design
Data Mining Chun-Hung Chou
Invitation to Computer Science 5th Edition
NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison.
Chapter 4: Organizing and Manipulating the Data in Databases
CHAPTER:14 Simple Queries in SQL Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided by author Slides edited for.
Course on Data Mining: Seminar Meetings Page 1/17 Course on Data Mining ( ): Seminar Meetings Ass. Rules EpisodesEpisodes Text Mining
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
HW#2: A Strategy for Mining Association Rules Continuously in POS Scanner Data.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Chapter 10 Analysis and Design Discipline. 2 Purpose The purpose is to translate the requirements into a specification that describes how to implement.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
1 Lesson 18 Managing and Reporting Database Information Computer Literacy BASICS: A Comprehensive Guide to IC 3, 3 rd Edition Morrison / Wells.
Chapter 2 Relational Database Design and Normalization August
Trajectory Data Mining Dr. Yu Zheng Lead Researcher, Microsoft Research Chair Professor at Shanghai Jiao Tong University Editor-in-Chief of ACM Trans.
1Sequential circuit design Acknowledgement: Most of the following slides are adapted from Prof. Kale's slides at UIUC, USA by Erol Sahin and Ruken Cakici.
1 Chapter 2 Database Environment Pearson Education © 2009.
Operational Semantics Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Program Design. Simple Program Design, Fourth Edition Chapter 1 2 Objectives In this chapter you will be able to: Describe the steps in the program development.
Finding Text Trends l Word usage tracks interest changes l Segment documents by time period l Phrase frequency = number of documents l Phrase must have.
Retele de senzori Curs 2 - 1st edition UNIVERSITATEA „ TRANSILVANIA ” DIN BRAŞOV FACULTATEA DE INGINERIE ELECTRICĂ ŞI ŞTIINŢA CALCULATOARELOR.
Data Models. 2 The Importance of Data Models Data models –Relatively simple representations, usually graphical, of complex real-world data structures.
Business Intelligence Overview. What is Business Intelligence? Business Intelligence is the processes, technologies, and tools that help us change data.
Managing Data Resources File Organization and databases for business information systems.
Data Mining Functionalities
Module 11: File Structure
Security Issues Formalization
THE COMPELLING NEED FOR DATA WAREHOUSING
Introduction to Design Patterns
Behavioral Design Patterns
Basic Concepts in Data Management
Unit# 9: Computer Program Development
Objective of This Course
I don’t need a title slide for a lecture
Semantic Markup for Semantic Web Tools:
Presentation transcript:

Using Shapes of Trends in Active Data Mining Duy Lam Norris Boothe

Shape Querying and Active Data Mining Historical time sequences make up a large portion of data stored in computers Historical time sequences make up a large portion of data stored in computers Mining trends in histories useful Mining trends in histories useful Many applications, including observing trends in stock prices, online bids, and rule mining Many applications, including observing trends in stock prices, online bids, and rule mining

Overview Overview of SDL Overview of SDL SDL language SDL language Applications to data mining Applications to data mining

A (Very) Simple History

Shape Definition Language SDL is a shape definition language used to query the “shapes” of histories SDL is a shape definition language used to query the “shapes” of histories Small, powerful language that allows “blurry” matching Small, powerful language that allows “blurry” matching Designed to make it easy and natural to query Designed to make it easy and natural to query Easily implementable Easily implementable Little non-determinism Little non-determinism

Alphabet SDL allows you to specify an “alphabet” defining transitions SDL allows you to specify an “alphabet” defining transitions Example: Example: SymbolDescription up Slightly increasing transition Up Highly increasing transition down Slightly decreasing transition Down Highly decreasing transition appears Transition from zero to non-zero disappears Transition from non-zero to zero stable The final value nearly equal to initial value zero Both initial and final value are zero

So with this alphabet we can describe a shape So with this alphabet we can describe a shape Use such a description to query a history to produce all subsequences that match the shape Use such a description to query a history to produce all subsequences that match the shape Describing a shape (shape name(parameters) descriptor)

(shape spike() (concat Up up down Down))

Derived Shapes any any allows a shape to have multiple values allows a shape to have multiple values concat concat shapes can be concatenated together contiguously shapes can be concatenated together contiguously (any up Up) (concat down up down up)

Multiple Occurrence Operators Shapes made of multiple contiguous occurrences of the same shape Shapes made of multiple contiguous occurrences of the same shape Resulting subsequences are such that they are neither preceded nor followed by a subsequence that matches P Resulting subsequences are such that they are neither preceded nor followed by a subsequence that matches P (exact 5 (any up Up)) (atleast 3 stable) (atmost 2 (concat disappear appear))

Bounded Occurrence Operators in in permits “blurry” matching by allowing users to state an overall shape without specific details permits “blurry” matching by allowing users to state an overall shape without specific details within the specified time period length, we can have a specified number of occurrences of a shape within the specified time period length, we can have a specified number of occurrences of a shape can have arbitrary gaps and can have overlap can have arbitrary gaps and can have overlap (in 7 (nomore 5 up)) (precisely n P) (noless n P) (nomore n P)

Bounded Occurrence Operators inorder inorder specifies shapes that must appear in a specific order specifies shapes that must appear in a specific order (inorder P 1 P 2... P n )

(in 5 (and (noless 2 (any up Up)) (nomore 1 (any down Down)))) Shape Definition Examples

(in 7 (inorder (atleast 2 (any up Up)) (in 4 (noless 3 (any down Down))))))

Parameterized Shapes Can parameterize shape definitions instead of using concrete values Can parameterize shape definitions instead of using concrete values (shape spike(upcnt dncnt) (concat (exact upcnt (any up Up)) (exact dncnt (any down Down)))) (shape doublepeak(width ht1 ht2) (in width (inorder spike(ht1 ht1) spike(ht2 ht2))))

Advantages of SDL natural and powerful language for expressing shape queries natural and powerful language for expressing shape queries capability of blurry matching capability of blurry matching reduction of output clutter reduction of output clutter efficient implementation efficient implementation

SDL’s Expressive Power SDL is equivalent to regular expressions for regular matching SDL is equivalent to regular expressions for regular matching several features enchance its effectivesness, however several features enchance its effectivesness, however greedy matching and “lookahead” capabilities help reduce output clutter greedy matching and “lookahead” capabilities help reduce output clutter

SDL’s Expressive Power “blurry” matching enables a much more natural and compact specification of certain shapes “blurry” matching enables a much more natural and compact specification of certain shapes For example, if we wanted precisely one occurrence of each a i in any order For example, if we wanted precisely one occurrence of each a i in any order in SDL: in SDL: regular expressions requires at least exponential size to specify! regular expressions requires at least exponential size to specify! (and (precisely 1 a 1 ) (precisely 1 a 2 )... (precisely 1 a n ))

SDL Summary SDL is a small, powerful language for naturally and intuitively expressing shapes found in histories SDL is a small, powerful language for naturally and intuitively expressing shapes found in histories Equivalent in power to regular expressions, but much more effective Equivalent in power to regular expressions, but much more effective Permits “blurry” matching Permits “blurry” matching

Using SDL in Active Data Mining

Static Data Mining Discovery of rules for Discovery of rules for Associations Associations Sequences Sequences Classification Classification Entire data set is mined Entire data set is mined Inherent weakness: Rules are not static Inherent weakness: Rules are not static

Active Data Mining Partition into time periods Partition into time periods Run data mining algorithm on each period Run data mining algorithm on each period Gather rules into a ‘rulebase’ Gather rules into a ‘rulebase’ Create triggers to discover Create triggers to discover Trends in rules Trends in rules Associations between rules Associations between rules

Period 3 Rules Active Data Mining Process Period 1 Rules Large Data Base Rule ID History (support, confidence, etc) 12…n 1 2 … Period 2 Rules

Selected Rules Active Data Mining Process (cont). Rule ID History (support, confidence, etc) 12…n 1 2 … Shape Definition Language Trigger Definition Language Active Data Mining

Active Data Mining Components Shape definitions (SDL) Shape definitions (SDL) (shape name(parameters) descriptor) (shape name(parameters) descriptor) Ex: Ex: (shape spike(upcnt dncnt) (concat (atleast upcnt (any up Up)) (concat (atleast upcnt (any up Up)) (atleast dncnt (any down Down)))) (atleast dncnt (any down Down)))) Queries Queries Triggers Triggers

Queries For rule selection For rule selection Syntax: Syntax: (query (shape (history-name start-time end-time))) (query (shape (history-name start-time end-time))) ‘start’ and ‘end’ specify the end points of history ‘start’ and ‘end’ specify the end points of history Result: rules that match the desired shape Result: rules that match the desired shape Ex: (shape ramp() (concat Up Up)) Ex: (shape ramp() (concat Up Up)) (query (ramp() (confidence start end)))

(shape upramp(len cnt) (in len (noless cnt (any up Up)))) (shape dnramp(len cnt) (in len (noless cnt (any down Down)))) (query (and (upramp(5 3) (support start 10)) (dnramp(5 3) (confidence start 10)))) Larger Query Example Results: rules where support is increasing but confidence is decreasing

Triggers Datastream type functionality Datastream type functionality ECA (Event Condition Action) model used (Chakravarthy et al. 1989) ECA (Event Condition Action) model used (Chakravarthy et al. 1989) Syntax: Syntax: (trigger trigger-name (trigger trigger-name (events events-spec) (condition (shape history-spec)) (actions action-spec)) Events: Events: Rule creation Rule creation History updates History updates

Wave Execution Semantics Stratified execution of triggers – similar to Datalog Stratified execution of triggers – similar to Datalog Set of Events Triggers for those Events Queries for those Triggers Set of Actions/ Events

Trigger Example Identifying rules where support is increasing, but confidence is decreasing Identifying rules where support is increasing, but confidence is decreasing (trigger detect_up (events updatehistory) (condition (upramp 5 4) (support (- end 5) end))) (actions upward)) (trigger detect_dn (events upward) (condition (dnramp 5 4) (confidence (- end 5) end))) (actions notify))

Implementation Implemented on AIX system Implemented on AIX system Part of IBM’s Quest project Part of IBM’s Quest project Successfully tested: Successfully tested: Large set (5 years) of mail order data (2.9 million records) Large set (5 years) of mail order data (2.9 million records) Large set (3 years) of POS (point-of-sale) transactions (6.8 million records) Large set (3 years) of POS (point-of-sale) transactions (6.8 million records)

Future Work At time of paper… At time of paper… Integrate constructs into a SQL relational system Integrate constructs into a SQL relational system Improve incremental computations using partial results of current trigger queries Improve incremental computations using partial results of current trigger queries Since then… Since then… Integrated into the Quest Data Mining System Integrated into the Quest Data Mining System Subsumed into IBM’s data mining products, including Intelligent Miner Subsumed into IBM’s data mining products, including Intelligent Miner Referenced for work in Active Data Mining and “blurry” pattern matching Referenced for work in Active Data Mining and “blurry” pattern matching

References “Querying Shapes of Histories”, by Rakesh Agrawal, Giuseppe Psaila, Edward L. Wimmers, and Mohamed Zait of the IBM Almden Research Center, 1995 “Querying Shapes of Histories”, by Rakesh Agrawal, Giuseppe Psaila, Edward L. Wimmers, and Mohamed Zait of the IBM Almden Research Center, 1995 “Active Data Mining”, by Rakesh Agrawal and Giuseppe Psaila of the IBM Almden Research Center, 1995 “Active Data Mining”, by Rakesh Agrawal and Giuseppe Psaila of the IBM Almden Research Center, 1995 “The Quest Data Mining System”, by Rakesh Agrawal, Manish Mehta, John Shafer, and Ramakrishnan Srikant of the IBM Almden Research Center in coordination with Andreas Arning and Toni Bollinger of the IBM German Software Laboratory, 1996 “The Quest Data Mining System”, by Rakesh Agrawal, Manish Mehta, John Shafer, and Ramakrishnan Srikant of the IBM Almden Research Center in coordination with Andreas Arning and Toni Bollinger of the IBM German Software Laboratory, 1996 IBM Almden Research Center Website: IBM Almden Research Center Website: