Download presentation
Presentation is loading. Please wait.
Published bySamson Williamson Modified over 9 years ago
1
Database Implementation of a Model-Free Classifier Konstantinos Morfonios ADBIS 2007 University of Athens
2
Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation
3
Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation
4
Introduction ω 1 = ω 2 = Classification x = ω = f(x)
5
. “Lazy”“Eager” Introduction x 1 = x 2 = (+) Faster decisions ( - ) Large/complex datasets ( - ) Dynamic datasets ( - ) Dynamic models (Nearest Neighbors)(Decision Trees)
6
Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation
7
Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation
8
Large/complex datasets
9
Motivation
10
Large/complex datasets Dynamic datasets
11
Motivation
12
Large/complex datasets Dynamic datasets Dynamic models
13
Motivation
14
Large/complex datasets Dynamic datasets Dynamic models Lazy (model-free)
15
Motivation Large/complex datasets Dynamic datasets Dynamic models Lazy (model-free) Nearest Neighbors Disk-based
16
Motivation Nearest Neighbors Suffers from “curse of dimensionality” Not reliable [Beyer et al., ICDT 1999] Not indexable [Shaft et al., ICDT 2005] LOCUS (Lazy Optimal Classifier of Unlimited Scalability)
17
Motivation Category? LOCUS (Lazy Optimal Classifier of Unlimited Scalability)
18
Motivation Lazy LOCUS (Lazy Optimal Classifier of Unlimited Scalability)
19
Motivation Lazy LOCUS (Lazy Optimal Classifier of Unlimited Scalability) Scaling?
20
Motivation Lazy LOCUS (Lazy Optimal Classifier of Unlimited Scalability) Based on simple SQL queries
21
Motivation Lazy LOCUS (Lazy Optimal Classifier of Unlimited Scalability) Based on simple SQL queries Accuracy?
22
Motivation Lazy LOCUS (Lazy Optimal Classifier of Unlimited Scalability) Based on simple SQL queries Converges to optimal Bayes Classifier
23
Motivation Lazy LOCUS (Lazy Optimal Classifier of Unlimited Scalability) Based on simple SQL queries Converges to optimal Bayes Classifier Other features?
24
Motivation Lazy LOCUS (Lazy Optimal Classifier of Unlimited Scalability) Based on simple SQL queries Converges to optimal Bayes Classifier Parallelizable
25
Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation
26
Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation
27
LOCUS x = ω 2 = ω 1 = (f 1 [0, 20], f 2 [0, 10]) f2f2 f1f1 Example
28
LOCUS f2f2 f1f1 Ideally: Dense space
29
LOCUS f2f2 f1f1 ω( ) = ? Ideally: Dense space
30
LOCUS f2f2 f1f1 ω( ) =
31
LOCUS f2f2 f1f1 Reality: Many features Large domains Sparse space
32
Reality: Many features Large domains Sparse space LOCUS f2f2 f1f1 ω( ) = ? ?
33
LOCUS f2f2 f1f1 ω( ) = ? ω 1 : 2 ω 2 : 1 3-NN
34
LOCUS f2f2 f1f1 ω( ) = ω 1 : 2 ω 2 : 1 3-NN
35
LOCUS f2f2 f1f1 ω( ) = ? LOCUS
36
f2f2 f1f1 ω( ) = ? ω 1 : 7 ω 2 : 3 LOCUS
37
f2f2 f1f1 ω( ) = ω 1 : 7 ω 2 : 3 LOCUS
38
f2f2 f1f1 Disk-based implementation LOCUS
39
2δ12δ1 2δ22δ2 SELECT ω, count(*) FROM R WHERE f 1 ≥x 1 -δ 1 AND f 1 ≤x 1 +δ 1 AND f 2 ≥x 2 -δ 2 AND f 2 ≤x 2 +δ 2 GROUP BY ω R(f 1, f 2, ω) ω 1 : 7 ω 2 : 3 ω( ) =
40
LOCUS SELECT ω, count(*) FROM R WHERE f 1 ≥x 1 -δ 1 AND f 1 ≤x 1 +δ 1 AND f 2 ≥x 2 -δ 2 AND f 2 ≤x 2 +δ 2 GROUP BY ω R(f 1, f 2, ω) What if R is large? Classical optimization techniques for a well-known type of aggregate queries Indexing Presorting Materialized views
41
LOCUS SELECT ω, count(*) FROM R WHERE f 1 ≥x 1 -δ 1 AND f 1 ≤x 1 +δ 1 AND f 2 ≥x 2 -δ 2 AND f 2 ≤x 2 +δ 2 GROUP BY ω R(f 1, f 2, ω) Method reliability? LOCUS converges to the optimal Bayes classifier as the size of the dataset increases (proof in the paper)
42
LOCUS SELECT ω, count(*) FROM R WHERE f 1 ≥x 1 -δ 1 AND f 1 ≤x 1 +δ 1 AND f 2 ≥x 2 -δ 2 AND f 2 ≤x 2 +δ 2 GROUP BY ω R(f 1, f 2, ω) What if a feature, say f 2, is categorical? (e.g. sex)
43
LOCUS SELECT ω, count(*) FROM R WHERE f 1 ≥x 1 -δ 1 AND f 1 ≤x 1 +δ 1 AND f 2 =x 2 GROUP BY ω R(f 1, f 2, ω) Not a problem, since generally in practice: Combinations of categorical and numeric features Categorical features have small domains Hence, they do not contribute to sparsity What if a feature, say f 2, is categorical? (e.g. sex)
44
Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation
45
Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation
46
SELECT Parallel Execution R1R1 R2R2 R3R3 R4R4 R = R 1 R 2 R 3 R 4
47
Parallel Execution ω 1 : 5 ω 2 : 2 ω 1 : 7 ω 2 : 1 ω 1 : 5 ω 2 : 1 ω 1 : 6 ω 2 : 0 R1R1 R2R2 R3R3 R4R4 Count: distributive function ω 1 : 23 ω 2 : 4 5252 12 3 18 3 23 4
48
ω 1 : 7 ω 2 : 1 ω 1 : 5 ω 2 : 1 ω 1 : 6 ω 2 : 0 ω 1 : 5 ω 2 : 2 Parallel Execution Small network traffic Load balancing Lightweight operations on the main server SELECT R1R1 R2R2 R3R3 R4R4 ω 1 : 7 ω 2 : 1 ω 1 : 5 ω 2 : 1 ω 1 : 6 ω 2 : 0 ω 1 : 5 ω 2 : 2 5252 12 3 18 3 23 4
49
Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation
50
Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation
51
Experimental Evaluation LOCUS vs DTs and NNs (weka) Synthetic datasets Ten functions [Agrawal et al., IEEE TKDE 1993] D = 9 N [5 10 3, 5 10 6 ] Real-world datasets UCI Repository
52
Experimental Evaluation Classification error rate (synthetic datasets, N = 5 10 4 )
53
Experimental Evaluation Effect of dataset size on classification error rate of LOCUS (synthetic datasets, N [5 10 3, 5 10 6 ])
54
Experimental Evaluation Effect of dataset size on time scalability of LOCUS (synthetic datasets, N [5 10 3, 5 10 6 ])
55
Experimental Evaluation Classification error rate (real-world datasets)
56
Experimental Evaluation Effect of dataset size on classification error rate (dataset CovType, N [5 10 3, 5 10 5 ])
57
Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation
58
Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation
59
Conclusions & Future Work LOCUS Lazy (complex/dynamic datasets and models) Efficient (based on simple SQL queries) Reliable (converging to optimal) Parallelizable
60
Conclusions & Future Work Similar techniques for feature selection regression Implementation of a parallel version
61
Questions?
62
Thank you!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.