Presentation is loading. Please wait.

Presentation is loading. Please wait.

Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA.

Similar presentations


Presentation on theme: "Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA."— Presentation transcript:

1 Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA

2 E-Commerce Applications zZAIAS Corp: decision support and e-Services for web-based auctions—technology to: ymonitor multiple ongoing auctions, ydetermine the right price for auctioned goods, & ysecure well-priced items by timely bids. zSophisticated analysis of sequence patterns and instant response is needed for first two points. zSimilar requirements in many other applications

3 Sequence Analysis: many applications zMining web access logs for: xCustomer characterization/segmentation xTarget advertising—banner optimization zFrequent sequences in market baskets zAnalysis of stock market trends yDouble bottoms and similar patterns zFraud detection examples examples yStolen credit cards, theft of cellular phones and user ids, zIntrusion Detection: yExample: A peripatetic intruder that attempts successive logins from proximity workstations---spatio-temporal criteria used to detect such an attack

4 State of The Art zADT (e.g.. Informix Datablades): Not flexible enough, no Optimization y In particular not suitable for infinite data streams zSEQ: Enhanced ADTs (e g. sets and sequences) with their own query language zSRQL: Adding sequence algebra operators to relational model

5 SQL-TS zA query language for finding complex patterns in sequences yCompletely based on SQL---Minimal extensions, only the from clause affected yA powerful query optimization technique based on extensions of the Knuth, Morris & Pratt (KMP) string-search algorithm

6 Example in Mining Weblogs Consider a table or a stream of tuples: Sessions(SessNo, ClickTime, PageNo, PageType) That keeps track of pages visited in a session (sequence of requests from the same user) Page types include: content (‘c’) description of product (‘d’) purchase (‘p’) Ads a web merchant dream of: c, d, p

7 3 clicks for a purchase  SQL-TS queries to find the ideal 3-click scenario SELECT B.PageNo, C.ClickTime FROM Sessions PARTITION BY SessNo ORDER BY ClickTime AS (A, B, C) WHERE A.PageType=‘c’ ANDB.PageType=‘d’ ANDC.PageType=‘p’ PARTITION BY and SEQUENCE BY were used in the original paper

8 Credit Card Spending Consider a Table log that keeps track of credit card transactions: Spending(Date, AccountNo, Amount) A surge in average spending might be sign of credit card theft.

9 Credit Card Fraud Detection in SQL-TS Track 30-day average spending and when it increases considerably for two consecutive days: Select Z.AccountNo, Z.Date FROM Spending PARTITION BY AccountNo SEQUENCE BY Date AS (X+, Y, Z) WHERE COUNT(X+)=30 ANDY.Amount> 5*AVG(X+.Amount) ANDZ.Amount > 5*AVG(X+.Amount) z+X denotes 1 or more occurrences of X zAggregates can be computed on the stars

10 Example in Online Auction zstream containing ongoing bids: Bids(auctn_id, Amount, Time) zTable describing auctions: auctions(auctn_id, item_id, min_bid, deadline,…) Find bids that are converging to a fixed price during the last 15 minutes of the auction.

11 Example in Online Auction in SQL-TS Find three successive bids that raise the last bid by less than 2% during the last 15 minutes of auction: SELECT T.auction_id, T.time, T.amount FROM Auctions AS A, Bids PARTITION BY auctn_id ORDER BY time AS (X, Y, Z, T) WHERE A.auctn_id=X.auctn_id ANDX.time + 15 Minute > A.deadline ANDY.amount - X.amount < 0.02*X.amount ANDZ.amount - Y.amount < 0.02*Y.amount ANDT.amount - Z.amount < 0.02*Z.amount

12 Conclusion z SQL-TS is a simple but powerful SQL extension to searche for pattern in sequences and time series. zQL-TS is supported by powerful query optimization techniques based on a generalization of the Knuth, Morris and Pratt text search algorithm.

13 References 1.Reza Sadri, Carlo Zaniolo, Amir Zarkesh, Jafar Adibi: Expressing and optimizing sequence queries in database systems. ACM Transactions on Database Systems (TODS)Volume 29, Issue 2 (June 2004) Expressing and optimizing sequence queries in database systems 2.Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, Jafar Adibi: Optimization of Sequence Queries in Database Systems. PODS 2001. Optimization of Sequence Queries in Database Systems 3.Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, Jafar Adibi: A Sequential Pattern Query Language for Supporting Instant Data Minining for e-Services, VLDB 2001.A Sequential Pattern Query Language for Supporting Instant Data Minining for e-Services 4.R. Sadri, Optimization of Sequence Queries in Database Systems Ph.D. Thesis, UCLA, 2001. 5.P. Seshadri, M. Livny, and R. Ramakrishnan. SEQ: A model for sequence databases. In ICDE,, 1995. 6.P. Seshadri, M. Livny, and R. Ramakrishnan. Sequence query processing. ACM SIGMOD Conference on Management of Data,, May 1994


Download ppt "Patterns in Sequences and Data Streams Carlo Zaniolo Computer Science Department UCLA."

Similar presentations


Ads by Google