Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Sequential Pattern Query Language for Supporting Instant Data Mining for e-Services Reza Sadri Carlo Zaniolo Amir.

Similar presentations


Presentation on theme: "A Sequential Pattern Query Language for Supporting Instant Data Mining for e-Services Reza Sadri Carlo Zaniolo Amir."— Presentation transcript:

1 A Sequential Pattern Query Language for Supporting Instant Data Mining for e-Services Reza Sadri Carlo Zaniolo sadri@procom.com zaniolo@cs.ucla.edu Amir Zarkesh Jafar Adibi azarkesh@u4cast.com jabibi@u4cast.com

2 Sequential Patterns in E- Commerce Applications Applications: Targeted Advertising Instant analysis of stock market trends Mining web access logs Fraud detection Requirements: Expressive query language for finding complex patterns in database sequences Efficient and scalable implementation: Query Optimization

3 State of The Art ADT (e.g.. Informix Datablades): Not flexible enough, no Optimization SEQ: Enhanced ADTs (e. g. sets and sequences) with their own query language SRQL: Adding sequence algebra operators to relational model

4 SQL-TS A query language for finding complex patterns in sequences Minimal extension of SQL—only the from clause affected A new Query optimization technique based on extensions of the Knuth, Morris & Pratt (KMP) string-search algorithm

5 Example in Mining Weblogs Consider a table: Sessions(SessNo, ClickTime, PageNo, PageType) That keeps track of pages visited in a session (sequence of requests from the same user) Possible page type: content product description purchase

6 Example in Mining Weblogs SQL-TS queries to find the ideal scenarios: SELECT B.PageNo, C.ClickTime FROM Sessions CLUSTER BY SessNo SEQUENCE BY ClickTime AS (A, B, C) WHERE A.PageType=‘content’ ANDB.PageType=‘product’ ANDC.PageType=‘purchase’

7 Example in Fraud Detection Consider a Table log that keeps track of credit card transactions: Spending(Date, AccountNo, Amount) A surge in average spending might be sign of credit card theft.

8 Example in Fraud Detection Track 30-day average spending and when it increases considerably for two consecutive days: Select Z.AccountNo, Z.Date FROM Spending CLUSTER BY AccountNo SEQUENCE BY Date AS (*X, Y, Z) WHERE COUNT(*X)=30 ANDY.Amount > 5 * AVG(*X.Amount) ANDZ.Amount > 5 * AVG(*X.Amount) Notice the Use of star and aggregates.

9 Optimized string search:KMP Consider text array text and pattern array p: i 1 2 3 4 5 6 7 8 9 10 11 text[i] a b a b a b c a b c a j 1 2 3 4 5 6 pattern[j] a b a b c a  After failing, use the information acquired so to: - backtrack to shift(j), rather than i+1, and - only check pattern values after next(j) But in SQL-TS we have general predicates & star patterns

10 shift and next Success for first j-1 elements of pattern. Failure for jth element (when input is at i) Any shift less than shift(j) is guaranteed to lead to failure, Match elements in the pattern starting at next(j) Shifted Pattern i – j + 1 1 1 i – j + shift(j) + 1i - j + shift(j) + next(j) shift(j) + 1shift(j) + next(j) i j next(j)j - shift(j) Input Pattern shift(j)

11 Optimal Pattern Search (OPS) Search path for naive algorithm vs. optimized algorithm:

12 *Z (less than 2% change) *U (less than 2% change) *W (less than 2% change) *Y*R *V *T Optimizing Star Patterns Relaxed Double Bottom: Only considering increases and decreases that are more than 2%

13 Relaxed Double Bottom: Ninety fold improvement

14 Conclusions Disjunctive queries, partial ordered domains, aggregates also treated in this approach Old applications—more power & flexibility than Datablades ADTs of commercial DBMSs Ongoing implementation, by building on the user-defined aggregates supported in AXL.


Download ppt "A Sequential Pattern Query Language for Supporting Instant Data Mining for e-Services Reza Sadri Carlo Zaniolo Amir."

Similar presentations


Ads by Google