Download presentation
Presentation is loading. Please wait.
Published byHubert Barrett Modified over 8 years ago
1
Bias Management in Time Changing Data Streams We assume data is generated randomly according to a stationary distribution. Data comes in the form of streams through time. Examples are network monitoring, web applications, sensor networks. This calls for Adaptive Learning Algorithms that change bias through time.
2
Contexts Context 1 Context 2 Context 3 Same stationary distribution Data stream: Sequence of contexts How can we detect when there is a change of context?
3
Change Detection Context 1 Context 2 Context 3 Online algorithms – Detect a change on real time Offline Algorithms – Analyze the whole sequence of data ??
4
Tracking Drifting Concepts Window Approach Continuously monitor accuracy and coverage of the model If no changes are detected increase h If changes are detected decrease h h
5
Dynamic Bias Selection Very Fast Decision Tree Algorithm New example updates statistics If evidence is strong enough a new sub-tree is attached to the leaf node.
6
Dynamic Bias Selection Very Fast Decision Tree Algorithm Hoeffding Bound: We have made n observations randomly. A random variable r has range R. With prob. 1 – A, r is in the range: mean( r ) +- e where e = sqrt( R 2 ln (1/A) / 2n)
7
Dynamic Bias Selection Very Fast Decision Tree Algorithm Now assume B = H(xa) – H(xb) where xa and xb are two features, and H() is the splitting function. Then if B > e with n examples seen at the leaf node, with prob. at least 1 – A, xa is the attribute wit highest value for H().
8
Bayesian Network Classifiers Start with a simple Naïve Bayes (no attribute dependency is assumed). Add dependencies if this brings increase in performance. But too many dependencies increases the number of parameters drastically.
9
Bayesian Network Classifiers K-DBCs stands for k-Dependence Bayesian Classifiers. It is a Bayesian algorithm that allows each attribute to have at most k nodes as parents. We can iteratively add arcs between attributes to maximize a score until no more improvements are achieved.
10
Shewhart P-Chart for Concept Drift Monitor error with limits and warning zones. When error increases beyond tolerance then a new model is created. Algorithm used: Naive Bayes
11
Shewhart P-Chart
12
Lessons Learned Trade-off between cost of update and improve in performance. Strong variance management methods are O.K. in small datasets. But simple methods have high bias.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.