A Sequential Pattern Query Language for Supporting Instant Data Mining for e-Services Reza Sadri Carlo Zaniolo Amir.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

09/04/2015Unit 2 (b) Back-Office processes Unit 2 Assessment Criteria (b) 10 marks.
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Information Retrieval in Practice
Selective Dissemination of Streaming XML By Hyun Jin Moon, Hetal Thakkar.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
1 Draft of a Matchmaking Service Chuang liu. 2 Matchmaking Service Matchmaking Service is a service to help service providers to advertising their service.
Optimization of Sequence Queries in Database Systems Reza Sadri Carlo Zaniolo Amir Zarkesh Jafar.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
21 21 Web Content Management Architectures Vagan Terziyan MIT Department, University of Jyvaskyla, AI Department, Kharkov National University of Radioelectronics.
Dunja Mladenić Marko Grobelnik Jožef Stefan Institute, Slovenia.
Progress Report 11/1/01 Matt Bridges. Overview Data collection and analysis tool for web site traffic Lets website administrators know who is on their.
Physical Design CS 543 – Data Warehousing. CS Data Warehousing (Sp ) - Asim LUMS2 Physical Design Steps 1. Develop standards 2.
Infomaster: An information Integration Tool O. M. Duschka and M. R. Genesereth Presentation by Cui Tao.
Web Usage Mining - W hat, W hy, ho W Presented by:Roopa Datla Jinguang Liu.
Chapter 14 The Second Component: The Database.
Data Mining – Intro.
Business Intelligence Business intelligence (BI) refers to all of the applications and technologies used to, provide access to, and information to efforts.
Overview of Search Engines
Business Intelligence
3-1 Chapter Three. 3-2 Secondary Data vs. Primary Data Secondary Data: Data that have been gathered previously. Primary Data: New data gathered to help.
Overview of Distributed Data Mining Xiaoling Wang March 11, 2003.
Building Efficient Time Series Similarity Search Operator Mijung Kim Summer Internship 2013 at HP Labs.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Today  Table/List operations  Parallel Arrays  Efficiency and Big ‘O’  Searching.
Data Mining By Dave Maung.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
The Volcano Optimizer Generator Extensibility and Efficient Search.
Copyright 2004 John Wiley & Sons, Inc Information Technology: Strategic Decision Making For Managers Henry C. Lucas Jr. John Wiley & Sons, Inc Dinesh.
Web-Mining …searching for the knowledge on the Internet… Marko Grobelnik Institut Jožef Stefan.
Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA.
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
Optimization of Sequence Queries in Database Systems Reza Sadri Carlo Zaniolo Amir Zarkesh Jafar.
IMS 4212: Database Implementation 1 Dr. Lawrence West, Management Dept., University of Central Florida Physical Database Implementation—Topics.
Data Mining Copyright KEYSOFT Solutions.
ICS220 – Data Structures and Algorithms Analysis Lecture 14 Dr. Ken Cosh.
Web mining is the use of data mining techniques to automatically discover and extract information from Web documents/services
Introducing Teradata Aster Discovery Platform Getting Started Ahsan Nabi Khan September 25 th, 2015.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Information Retrieval in Practice
Patterns in Sequences and Data Streams
Data Mining Functionalities
Data Mining – Intro.
DATA MINING © Prentice Hall.
Optimization of Sequence Queries in Database Systems
Strategies for improving Web site performance
Introduction C.Eng 714 Spring 2010.
On Improving Website Connectivity by Using Web-Log Data Streams
CS 430: Information Discovery
Pig Latin - A Not-So-Foreign Language for Data Processing
Introduction to Query Optimization
Evaluation of Relational Operations: Other Operations
Relational Databases The Relational Model.
Relational Databases The Relational Model.
Data Warehousing and Data Mining
Knuth-Morris-Pratt KMP algorithm. [over binary alphabet]
Selected Topics: External Sorting, Join Algorithms, …
CLUSTER BY: A NEW SQL EXTENSION FOR SPATIAL DATA AGGREGATION
Overview of Query Evaluation
Evaluation of Relational Operations: Other Techniques
Knuth-Morris-Pratt Algorithm.
Discovery of Significant Usage Patterns from Clickstream Data
Welcome! Knowledge Discovery and Data Mining
15-826: Multimedia Databases and Data Mining
Evaluation of Relational Operations: Other Techniques
Information Retrieval and Web Design
Presentation transcript:

A Sequential Pattern Query Language for Supporting Instant Data Mining for e-Services Reza Sadri Carlo Zaniolo Amir Zarkesh Jafar Adibi

Sequential Patterns in E- Commerce Applications Applications: Targeted Advertising Instant analysis of stock market trends Mining web access logs Fraud detection Requirements: Expressive query language for finding complex patterns in database sequences Efficient and scalable implementation: Query Optimization

State of The Art ADT (e.g.. Informix Datablades): Not flexible enough, no Optimization SEQ: Enhanced ADTs (e. g. sets and sequences) with their own query language SRQL: Adding sequence algebra operators to relational model

SQL-TS A query language for finding complex patterns in sequences Minimal extension of SQL—only the from clause affected A new Query optimization technique based on extensions of the Knuth, Morris & Pratt (KMP) string-search algorithm

Example in Mining Weblogs Consider a table: Sessions(SessNo, ClickTime, PageNo, PageType) That keeps track of pages visited in a session (sequence of requests from the same user) Possible page type: content product description purchase

Example in Mining Weblogs SQL-TS queries to find the ideal scenarios: SELECT B.PageNo, C.ClickTime FROM Sessions CLUSTER BY SessNo SEQUENCE BY ClickTime AS (A, B, C) WHERE A.PageType=‘content’ ANDB.PageType=‘product’ ANDC.PageType=‘purchase’

Example in Fraud Detection Consider a Table log that keeps track of credit card transactions: Spending(Date, AccountNo, Amount) A surge in average spending might be sign of credit card theft.

Example in Fraud Detection Track 30-day average spending and when it increases considerably for two consecutive days: Select Z.AccountNo, Z.Date FROM Spending CLUSTER BY AccountNo SEQUENCE BY Date AS (*X, Y, Z) WHERE COUNT(*X)=30 ANDY.Amount > 5 * AVG(*X.Amount) ANDZ.Amount > 5 * AVG(*X.Amount) Notice the Use of star and aggregates.

Optimized string search:KMP Consider text array text and pattern array p: i text[i] a b a b a b c a b c a j pattern[j] a b a b c a  After failing, use the information acquired so to: - backtrack to shift(j), rather than i+1, and - only check pattern values after next(j) But in SQL-TS we have general predicates & star patterns

shift and next Success for first j-1 elements of pattern. Failure for jth element (when input is at i) Any shift less than shift(j) is guaranteed to lead to failure, Match elements in the pattern starting at next(j) Shifted Pattern i – j i – j + shift(j) + 1i - j + shift(j) + next(j) shift(j) + 1shift(j) + next(j) i j next(j)j - shift(j) Input Pattern shift(j)

Optimal Pattern Search (OPS) Search path for naive algorithm vs. optimized algorithm:

*Z (less than 2% change) *U (less than 2% change) *W (less than 2% change) *Y*R *V *T Optimizing Star Patterns Relaxed Double Bottom: Only considering increases and decreases that are more than 2%

Relaxed Double Bottom: Ninety fold improvement

Conclusions Disjunctive queries, partial ordered domains, aggregates also treated in this approach Old applications—more power & flexibility than Datablades ADTs of commercial DBMSs Ongoing implementation, by building on the user-defined aggregates supported in AXL.