Download presentation
Presentation is loading. Please wait.
Published byMatilda Gray Modified over 6 years ago
1
When to Update the Sequential Patterns of Stream Data?
Q. Zheng, K. Xu, and S. Ma, in Proc. of the 7th Pacific-Asia In Conference on Knowledge Discovery and Data Mining, 2003. Adviser: Jia-Ling Koh Speaker: Shu-Ning Shin Date:
2
Introduction An experimental method, called TPD (Tradeoff between Performance and Difference), to decide when to update the sequential patterns of stream data by making a tradeoff between the performance of increasingly updating algorithms and the difference of sequential patterns.
3
Stream Data Model (1) Stream event: Stream tuple: Length Stream tuple:
Ei=<ei, tn> ei: stream event type tn: the time of stream event type occurring Stream tuple: Qi=((ek1, ek2, …,ekm), ti)=(Ek1, Ek2, …, Ekm) Length Stream tuple: |Qi|=|(ek1, ek2, …, ekm)|=m
4
Stream Data Model (2) Stream queue: Length of queue:
Sij=<Qi, Qi+1, …, Qj>, where ti< ti+1< …< tj =<(Ei1, …, Eik)…(Ej1, …, Ejm)> Length of queue: |Sij|=<Qi, Qi+1, …, Qj>=j-i+1 Stream viewing window: Wk=<Qm, …, Qn|d=n-m+1> Size of viewing window: |Wk|=n-m+1=d
5
Stream Data Model (3) occur(seqm, Wk): support(seqm, Wk):
|the times of seqm occurring in Wk| Seqm=<ei1, ei2, …, eim> Wk: an stream viewing window support(seqm, Wk): Occur(seqm, Wk) / |Wk|
6
Stream Data Model - Example
S18=<Q1, Q2 ,Q3, Q4, Q5, Q6, Q7, Q8> S18=<E2, E5, E1, (E3, E6), E7, E9, E10> W5=< Q1, Q2 ,Q3, Q4, Q5, Q6, Q7 |d=7>
7
Sliding Stream viewing window
ΔWi: incremental window, i=0, 1, 2, 3, … ΔW0: initial window Wi+1=Wi+ΔWi+1 |ΔW1|/|W0|: incremental ratio of stream data
8
Estimation of difference between the old and new sequential patterns
LWk: old frequent sequences in Wk LWk+1: new frequent sequences in Wk+1 LWkΔ LWk+1 : symmetric difference
9
The Algorithm of Updating Sequential Pattern (IUS) (1)
IUS algorithm uses the frequent and negative border sequences in DB and db as the candidates to compute new frequent sequences and negative border sequences in the updated database U. DB: The original database which contains old time-related data. db: The increment database which contains new time-related data. dd: The decrement database from DB which contains deleted time-related data. U: The updated database. When database being increasingly updated, the total set of data which are equal to DB+db. When database being decreasingly updated, the total set of data which are equal to DB-dd. Support(F, X): the support of the sequence X in the X database, where X ∈ {db, dd, DB, U}. Min_supp:Minimum support threshold of the frequent sequence. Min_nbd_supp: Minimum support threshold of negative border sequence. CX: Candidate sequences in X database, where X ∈{db, dd, DB, U}. LX : Frequent sequences in the X database, where X ∈{db, dd, DB, U}. NBD(X)=CX- LX, where NBD(X) consists of the sequences in X database whose sub_sets are
10
IUS (2) Property1: Let B be a frequent sequence in Wk, if , we have occur(A, DB)>occur(B, DB). Property2: Proof: assume that occur(S,DB)<Min_sup*|DB| and occur(S,db)<Min_sup*|db| occur(S,DB+db)<Min_sup*|DB+db| Support(S,U)<Min_sup, contradict the given condition.
11
IUS – using the stream data model
Wk: The original stream view window which contains old time-related data. ΔWk+1: The increment stream view window which contains new time-related data. Wk+1: The updated stream view window. When stream data being increasingly updated, the total set of data which are equal to Wk+ΔWk+1 Support(F, X): the support of the sequence F in the X stream view windows, where X ∈{ Wk+1 ,Wk, ΔWk+1}. Min_supp :Minimum support threshold of the frequent sequence. Min_nbd_supp: Minimum support threshold of negative border sequence. CX: Candidate sequences in X stream view windows, where X ∈ { Wk+1 ,Wk, ΔWk+1}. LX : Frequent sequences in the X stream view windows, where X ∈ { Wk+1 ,Wk, ΔWk+1}. NBD(X)=CX- LX, where NBD(X) consists of the sequences in X stream view windows whose sub_sets are frequent, its Support is lower than Min_supp and greater than Min_nbd_supp. Note that X ∈ {Wk+1 ,Wk, ΔWk+1}
12
IUS – Algorithm (1)
13
IUS – Algorithm (2)
14
Tradeoff between Performance and Difference (TPD) (1)
Use the speedups to measurement of IUS: Speedup=the execution time of Robust_search / the execution time of IUS Use the difference to measure the old and the new frequent sequence. Use Min-Max normalization:
15
TPD (2) TPD method maps the curve of the speedup and the difference changing with the size of incremental windows into the same graph under the same scale. The points of intersection of the two curves are the suitable range of the incremental ratio of the initial windows for IUS.
16
Experiment conducted a set of experiments to find when to update sequential patterns for stream data. Environment: DELL PC Sever with 2 CPU Pentium II Memory 512M, Disk 16G Operating system: Red Hat Linux 6.0 Data1: the alarms in GSM Networks, contain 194 alarm types and 100k alarm events. The time of alarm events in the data1 range from to
17
Experiment 1 – on Data 1 |initial window|=20k
The intersection point: 6K The suitable range of incremental ratio of initial window: 30% of W0. Experiment 1 – on Data 1 |initial window|=20k
18
Experiment 2 – on Data 1 |initial window|=40k
The intersection point: 9K~10K The suitable range of incremental ratio of initial window: 22.5%~25% of W0.
19
Experiment 3 – on Data 1 |initial window|=50k
The intersection point: 15K~18K The suitable range of incremental ratio of initial window: 30%~36% of W0.
20
Experiment 4 – on Data 1 |initial window|=60k
The intersection point: 10K~12K The suitable range of incremental ratio of initial window: 16.7%~20% of W0.
21
Conclusion TPD method, it is shown experimentally that the suitable range of incremental ratio of initial windows to update is about 20 to 30 percent of the size of initial windows for the IUS algorithm.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.