Download presentation
Presentation is loading. Please wait.
1
Mihael Ankerst Boeing Phantom Works Visual Data Mining SC4Devo – July 15 th 2004
2
Visual Data Mining Information Visualization Data Mining Visual Data Mining Data Mining Algorithms ++-- Actionable Evaluation Flexibility User Interaction Visualization --++
3
Visual Data Mining Architecture: Tightly Integrated Visualization Data Knowledge DM-Algorithm Result Visualization of the data Result DM-Algorithm step 1 Data Knowledge DM-Algorithm step n Visualization + Interaction Preceding Visualization (PV) Subsequent Visualization (SV) Tightly integrated Visualization (TIV) Visualization of the result Data DM-Algorithm Knowledge Visualization of the result Result
4
One example for Preceding Visualization Exploring ~10,000 records 50 different stock prices Circle Segments Visualization of Stock Data
5
Two examples for Subsequent Visualization Decision Tree Visualizer (MineSet) pSPSS AnswerTree
6
Tightly Integrated Visualization Result DM-Algorithm step 1 Data Knowledge DM-Algorithm step n Visualization + Interaction Visualization of algorithmic decisions Data and patterns are better understood User can make decisions based on perception User can make decisions based on domain knowledge Visualization of result enables user specified feedback for next algorithmic run Tightly integrated Visualization (TIV)
7
Tightly Integrated Visualization The first prototypes which follow this architecture: Perception-Based Classification (Decision Tree Classification) HD-Eye (Clustering) DataJewel (Temporal Mining)
8
The corresponding DM Method a a a a a a b b b b b b Age Height Problem description : Given a set of objects with known class labels. Description Build model describing the data with respect to the class Prediction Use model to predict the class label of objects Classification
9
Tree Model Tutorial Problem description : Given data describing individuals, find factors that indicate buyers. Algorithm Search through all factors to find one which best divides people into buyers / not buyers Divide groups and repeat on subgroups Outcome Tree uses factors to describe people who are likely to be buyers YM8957 NF4581 YF6329 NM4042 YM1525 ClassSexSalaryAge age < 35 sal < 67 Y N Y
10
Tutorial (2) Extracting Rules from a Decision Tree IF (age < 35) THEN Class = ‘Yes’ IF (age >= 35) AND (sal < 67) THEN Class = ‘No’ IF (age >= 35) AND (sal >= 67) THEN Class = ‘Yes’ age < 35 sal < 67 Y N Y
11
Visual Classification void TreeClassifier (training set T) { if (terminationCondition(T) == true) { createLeaf(); return;} for (each Attribute A i ) FindBestSplitPointIn(A i ); SplitWithBestSplitPoint(T); T 1 = leftSon(); T 2 = rightSon(); TreeClassifier(T 1 ); TreeClassifier(T 2 ); } Binary splits State-of-the-art decision tree classifiers: No backtracking/look-ahead during the tree construction phase No use of domain knowledge
12
Visual Classification Each attribute is sorted and visualized separately Each attribute value is mapped onto a unique pixel The color of a pixel is determined by the class label of the object The order is reflected by the arrangement of the pixels... attr. 1class 0.2Y 0.3Y Y 0.5N 1.1Y... attr. 2class 0.5N 1.3Y 2.0N 2.5Y 5.1N... attr.1attr.2... class 0.323.3...Y 2.42.0...N
13
Visual Classification Attribute 1 Attribute 2 binary split n-ary split An Example: PBC Color Scale Pixel Arrangement Pixel Arrangement... Attribute 1 Attribute 2
14
Visual Classification Calculating an order for categorical attributes: SLIQext [CHH 99] Start with one empty set and one set with all categories D A B C E set 1 set 2 Move that element which yields the best split
15
Visual Classification Calculating an order for categorical attributes: SLIQext [CHH 99] Start with one empty set and one set with all categories D A B C E set 1 set 2 Move that element which yields the best split
16
Visual Classification Calculating an order for categorical attributes: SLIQext [CHH 99] Start with one empty set and one set with all categories D A B C E set 1 set 2 Move that element which yields the best split
17
Visual Classification Calculating an order for categorical attributes: SLIQext [CHH 99] Start with one empty set and one set with all categories D A B C E set 1 set 2 Move that element which yields the best split
18
Visual Classification Calculating an order for categorical attributes: SLIQext [CHH 99] Start with one empty set and one set with all categories D A B C E set 1 set 2 Move that element which yields the best split
19
Visual Classification Calculating an order for categorical attributes: SLIQext [CHH 99] Start with one empty set and one set with all categories D A B C E set 1 set 2 Move that element which yields the best split Default order for categorical attributes
20
Visual Classification Shuttle data set (9 attributes, 43,500 records)
21
Visual Classification Segment data set (19 attributes, 7 classes)
22
Visual Classification age < 35 A New Visualization of a Decision Tree
23
Visual Classification age < 35 Salary G < 40 > 80 [40,80] A New Visualization of a Decision Tree
24
Visual Classification age < 35 G P V Salary G < 40 > 80 [40,80] A New Visualization of a Decision Tree
25
Visual Classification Level 1 Level 2... Level 18 leaf split point inherited split point A Decision Tree for the Segment Data Set
26
Visual Classification Different types of algorithmic support for the user: Propose split Look-ahead Expand subtree
27
Visual Classification System visualizes active node User User removes active node User assigns class label System proposes split point System expands active node User selects split points System performs split of active node System offers look-ahead Legend: User Operation System Operation
28
Visual Classification animated split lines magnified split lines exact split point
29
Visual Classification Automatic Automatic- Manual Manual- Automatic Manual Australian84.980.986.982.7 DNA93.889.293.389.2 Satimage86.484.186.883.5 Segment95.595.096.394.8 Shuttle99.599.699.799.9 Accuracy:
30
Conclusions Tight integration of visualization and data mining algorithms is still a very new area of research Data mining algorithms and visualization technique can nicely complement each other. PBC leverages decision tree algorithms, allows the user to steer the mining process. User involvement during the mining process enables knowledge transfer and capitalizes on human’s perception
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.