Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mihael Ankerst Boeing Phantom Works Visual Data Mining SC4Devo – July 15 th 2004.

Similar presentations


Presentation on theme: "Mihael Ankerst Boeing Phantom Works Visual Data Mining SC4Devo – July 15 th 2004."— Presentation transcript:

1 Mihael Ankerst Boeing Phantom Works Visual Data Mining SC4Devo – July 15 th 2004

2 Visual Data Mining Information Visualization Data Mining Visual Data Mining Data Mining Algorithms ++-- Actionable Evaluation Flexibility User Interaction Visualization --++

3 Visual Data Mining Architecture: Tightly Integrated Visualization Data Knowledge DM-Algorithm Result Visualization of the data Result DM-Algorithm step 1 Data Knowledge DM-Algorithm step n Visualization + Interaction Preceding Visualization (PV) Subsequent Visualization (SV) Tightly integrated Visualization (TIV) Visualization of the result Data DM-Algorithm Knowledge Visualization of the result Result

4 One example for Preceding Visualization  Exploring ~10,000 records 50 different stock prices Circle Segments Visualization of Stock Data

5 Two examples for Subsequent Visualization Decision Tree Visualizer (MineSet) pSPSS AnswerTree

6 Tightly Integrated Visualization Result DM-Algorithm step 1 Data Knowledge DM-Algorithm step n Visualization + Interaction Visualization of algorithmic decisions  Data and patterns are better understood  User can make decisions based on perception  User can make decisions based on domain knowledge  Visualization of result enables user specified feedback for next algorithmic run Tightly integrated Visualization (TIV)

7 Tightly Integrated Visualization The first prototypes which follow this architecture:  Perception-Based Classification (Decision Tree Classification)  HD-Eye (Clustering)  DataJewel (Temporal Mining)

8 The corresponding DM Method a a a a a a b b b b b b Age Height Problem description : Given a set of objects with known class labels.  Description Build model describing the data with respect to the class  Prediction Use model to predict the class label of objects  Classification

9 Tree Model Tutorial Problem description : Given data describing individuals, find factors that indicate buyers.  Algorithm Search through all factors to find one which best divides people into buyers / not buyers Divide groups and repeat on subgroups  Outcome Tree uses factors to describe people who are likely to be buyers YM8957 NF4581 YF6329 NM4042 YM1525 ClassSexSalaryAge age < 35 sal < 67 Y N Y

10 Tutorial (2) Extracting Rules from a Decision Tree IF (age < 35) THEN Class = ‘Yes’ IF (age >= 35) AND (sal < 67) THEN Class = ‘No’ IF (age >= 35) AND (sal >= 67) THEN Class = ‘Yes’ age < 35 sal < 67 Y N Y

11 Visual Classification void TreeClassifier (training set T) { if (terminationCondition(T) == true) { createLeaf(); return;} for (each Attribute A i ) FindBestSplitPointIn(A i ); SplitWithBestSplitPoint(T); T 1 = leftSon(); T 2 = rightSon(); TreeClassifier(T 1 ); TreeClassifier(T 2 ); } Binary splits State-of-the-art decision tree classifiers: No backtracking/look-ahead during the tree construction phase No use of domain knowledge

12 Visual Classification  Each attribute is sorted and visualized separately  Each attribute value is mapped onto a unique pixel  The color of a pixel is determined by the class label of the object  The order is reflected by the arrangement of the pixels... attr. 1class 0.2Y 0.3Y Y 0.5N 1.1Y... attr. 2class 0.5N 1.3Y 2.0N 2.5Y 5.1N... attr.1attr.2... class 0.323.3...Y 2.42.0...N

13 Visual Classification Attribute 1 Attribute 2 binary split n-ary split An Example: PBC Color Scale Pixel Arrangement Pixel Arrangement... Attribute 1 Attribute 2

14 Visual Classification Calculating an order for categorical attributes: SLIQext [CHH 99] Start with one empty set and one set with all categories D A B C E set 1 set 2 Move that element which yields the best split

15 Visual Classification Calculating an order for categorical attributes: SLIQext [CHH 99] Start with one empty set and one set with all categories D A B C E set 1 set 2 Move that element which yields the best split

16 Visual Classification Calculating an order for categorical attributes: SLIQext [CHH 99] Start with one empty set and one set with all categories D A B C E set 1 set 2 Move that element which yields the best split

17 Visual Classification Calculating an order for categorical attributes: SLIQext [CHH 99] Start with one empty set and one set with all categories D A B C E set 1 set 2 Move that element which yields the best split

18 Visual Classification Calculating an order for categorical attributes: SLIQext [CHH 99] Start with one empty set and one set with all categories D A B C E set 1 set 2 Move that element which yields the best split

19 Visual Classification Calculating an order for categorical attributes: SLIQext [CHH 99] Start with one empty set and one set with all categories D A B C E set 1 set 2 Move that element which yields the best split Default order for categorical attributes

20 Visual Classification  Shuttle data set (9 attributes, 43,500 records)

21 Visual Classification  Segment data set (19 attributes, 7 classes)

22 Visual Classification age < 35  A New Visualization of a Decision Tree

23 Visual Classification age < 35 Salary G < 40 > 80 [40,80]  A New Visualization of a Decision Tree

24 Visual Classification age < 35 G P V Salary G < 40 > 80 [40,80]  A New Visualization of a Decision Tree

25 Visual Classification Level 1 Level 2... Level 18 leaf split point inherited split point  A Decision Tree for the Segment Data Set

26 Visual Classification Different types of algorithmic support for the user: Propose split Look-ahead Expand subtree

27 Visual Classification System visualizes active node User User removes active node User assigns class label System proposes split point System expands active node User selects split points System performs split of active node System offers look-ahead Legend: User Operation System Operation

28 Visual Classification animated split lines magnified split lines exact split point

29 Visual Classification Automatic Automatic- Manual Manual- Automatic Manual Australian84.980.986.982.7 DNA93.889.293.389.2 Satimage86.484.186.883.5 Segment95.595.096.394.8 Shuttle99.599.699.799.9 Accuracy:

30 Conclusions  Tight integration of visualization and data mining algorithms is still a very new area of research  Data mining algorithms and visualization technique can nicely complement each other.  PBC leverages decision tree algorithms, allows the user to steer the mining process.  User involvement during the mining process enables knowledge transfer and capitalizes on human’s perception


Download ppt "Mihael Ankerst Boeing Phantom Works Visual Data Mining SC4Devo – July 15 th 2004."

Similar presentations


Ads by Google