Download presentation
Presentation is loading. Please wait.
Published byMiranda Warren Modified over 9 years ago
1
Groundwater 3D Geological Modeling: Solving as Classification Problem with Support Vector Machine A. Smirnoff, E. Boisvert, S. J.Paradis Earth Sciences Sector Groundwater
2
Objectives Find an algorithm for automating the 3D modeling procedure from sparse data Test the algorithm on available data Make conclusions about its applicability
3
Groundwater Possible Input Data Well data Surface geology maps Cross-section data Can be used alone or in combination
4
Groundwater Algorithms Currently in Use and Their Limitations Voronoi diagrams Potential fields Normally require too much information and/or additional procedures What if we only have a few sections to start with?
5
Groundwater 3D Reconstruction as a Classification Problem Unit 1 Unit 2 Reconstruction Space Given a set of points in 3D with known geological information For the rest of points in reconstruction space, information is not available Based on known points, classify the rest into known number of units (classes)
6
Groundwater Available Classification Methods Bayesian classification –a priory knowledge of probabilities Nearest-Neighbor classifiers –extremely sensitive to parameter choice and scaling Decision trees –not flexible with many samples Neural networks –slow and difficult to use Support Vector Machine (SVM) –relatively new method –becoming more and more popular
7
Groundwater SVM Algorithm Input: Take a set of training samples with known features and classes Model: Build a model (boundary) separating the training samples Output: Classify any new (unclassified) or test samples using the model
8
Groundwater 1. Original 2. Training set 3. Output X Z Y Binary Reconstruction
9
Groundwater Input Data and Results Total points: 389235 Training Set: 17452 (4.48%) - 2 units on 11 sections Points to be classified: 371783 Input Data: Results: Total classified: 371783 Success: 361909 (97.34%) Failure: 9874 (2.66%)
10
Groundwater Detailed Analysis (Class 1)
11
Groundwater Peeking into the SVM Black Box A simple case: two classes and two features ( e.g., length of petal and sepal in flowers) Training Set: known data vectors : x i, where i = 1, …., l Training Records ( i ) Class Labels ( y i ) Data Vector ( x i ) Feature 1Feature 2 1 124 253 3 168 ………… l 73
12
Groundwater Maximum Margin Separating Hyperplane (MMSH) 1 2 3 4 5 6 7 8 9 10 Feature 1 Feature 2 12436589107 Maximum Margin 1/2 Support Vectors 1 2 3 4 5 6 7 8 9 10 Feature 1 Feature 2 12436589107 Class: +1 Class: -1 1 < 3 < 2 11 22 33 Linearly separable data Which linear separator is the best? V.Vapnik (1995) suggested maximum margin
13
Groundwater Hard Margin Classification-HMSH If w T x+b = 0 is separating hyperplane: Decision function: f(x) = sign(w T x+b), x is a test sample x2x2 x3x3 x1x1 xlxl xixi xixi xixi xixi xixi xixi xixi xixi xixi w T x + b = 0 xixi Class: +1 Class: -1 w T x i + b < 0 w T x i + b > 0 1 2 3 4 5 6 7 8 9 10 Feature 1 Feature 2 12436589107 HMSH
14
Groundwater How to Maximize the Margin? For w T x+b = 0 consider a pipe defined by: Then: or y i ( w T x i +b) 1 Maximize distance between: w T x+b 1 x2x2 Maximize Distance 12436589107 1 2 3 4 5 6 7 8 9 Feature 1 Feature 2 x3x3 x1x1 xlxl xixi xixi xixi xixi xixi xixi xixi xixi xixi w T x + b = +1 xixi Class: +1 Class: -1 w T x i + b < -1 w T x i + b > +1 w T x + b = -1 w T x + b = 0
15
Groundwater Problem Formulation 12436589107 1 2 3 4 5 6 7 8 9 Feature 1 Feature 2 x3x3 x1x1 xlxl x2x2 xixi xixi xixi xixi xixi xixi xixi xixi xixi w T x + b= +1 xixi w T x + b= -1 w T x + b = 0 Or: Quadratic optimization problem Solution exists Distance between : w T x+b 1 is given as: Then:
16
Groundwater Soft Margin Classification - SMSH Data are noisy, not easily separable Allow classification errors by introducing slack variable: Support vectors: ones with distance ½ from SMSH + misclassified ones Thus: Where C – cost or penalty parameter xixi xixi 12436589107 1 2 3 4 5 6 7 8 9 Feature 1 Feature 2 x3x3 x1x1 xlxl x2x2 xixi xixi xixi xixi xixi xixi xixi xixi xixi xixi SMSH HMSH Support Vectors
17
Groundwater Non-Separable Data Data are separable or separable with some noise – no problem (HMSH or SMSH) What if data is not linearly separable in data space? Find a function to re-map data into a higher-dimensional space (feature space) where it is separable e.g., x R 1 -> R 2 0 x 0 x f(x) Class: +1 Class: -1
18
Groundwater Non-Linear SVM 1. Problem Data (Input) Space R 1 0 x Class: -1 0 Class: +1 3. Solution f(x) x Feature Space R 2 : (x) = (x, x 2 ) 0 x2x2 x 2. Solution Class: +1 Class: -1
19
Groundwater Kernel Trick How to find the function in more complicated situation? We do not need to explicitly know the function! Formulation and solution of optimization problem use only inner products of vectors Kernel function inner product of some function in its feature space Thus the final decision function is: f(x) = Σ α i y i K(x i, x) + b K(x i,x)= φ(x i ) T φ(x) f(x) = Σ T x + b ( i weighing factors i >0 only for support vectors)
20
Groundwater Kernel Functions Known kernel functions: linear, polynomial, radial-basis function (RBF), etc. The RBF is the most general form of kernel: The decision function then: The only adjustable kernel parameter is K(x i,x j ) = f(x) = Σα i y i + b
21
Groundwater How Did We Use SVM? e.g., C=2 -8, 2 -7, …, 2 15 ; = 2 -15, 2 -14, …, 2 12 Using geological units as classes Using X, Y, Z coordinates as features Using non-linear SM SVM with RBF kernel Using LIBSVM from National University of Taiwan Only two parameters to control: C and Selecting parameters is a black art, done on try and see basis Simple grid search with validation is recommended
22
Groundwater C and Grid Search lg (C) 121110 9 8 7 6 5 4 3 2 1 0 -2 -3 -4 -5 -6 -7 -8 -9-10-11-12-13-14-15 lg ( ) Proposed Range: C=2 -3 - 2 15 ; = 2 4 - 2 9 9 8 7 6 5 4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 lg (C) lg ( ) -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 - Best Binary Result (97.79% at C=2 1, =2 6 ) - Previous Example (97.34%) All Experiments:
23
Groundwater Influence of C and Low C, High Low C, Low High C, Low High C, High Avg C, Avg
24
Groundwater Multi-Class Classification X Z Y 1 - Organic 2 - Littoral 3 - Clay 4 - Esker 5 - Till 6 - Bedrock 1. Original 2. Training set 3. Output
25
Groundwater Data Statistics and Results ClassTo ClassifyTraining Set% of TotalSuccess % 1. Organic1162480.0118.76 2. Littoral36261930.0537.20 3. Clay126676280.1657.10 4. Esker193059950.2667.65 5. Till151187470.1945.72 6. Bedrock319905148413.8195.45
26
Groundwater Success per Class 0 10 20 30 40 50 60 70 80 90 100 0. 01 0.1 1 10 Training Points per Class (%) Success (%) Organic Littoral Till Clay Esker Bedrock 1
27
Groundwater Area and Volume Comparison 1.00E+08 1.00E+09 1.00E+071.00E+081.00E+09 Original Reconstructed 1.00E+07 Bedrock Esker Till Clay Littoral Organic Area 1.00E+07 1.00E+08 1.00E+09 1.00E+10 1.00E+11 1.00E+071.00E+081.00E+091.00E+101.00E+11 Original Reconstructed Bedrock Esker Till Clay Littoral Organic Volume
28
Groundwater Conclusions The SVM can successfully be used in single and multi-unit 3D geological reconstructions: –Reasonable results are obtained with just a few training sections –Parameters must be picked from the range: C=2 -3 - 2 15 ; = 2 4 - 2 9 –Low C values - less details, more generalized model –High C values - more details, less generalized model –Additional Experiments Demonstrated: Number of units can vary (all units must be represented in training set) Sections can be arbitrarily located Other types of information (well data, surface geology maps) can be used
29
Groundwater References Abe, S., 2005. Support Vector Machines for Pattern Classification. Springer-Verlag, London, 343 pp. Cristianini, N., Shawe-Taylor, J., 2000. Support Vector Machines. Cambridge University Press, 189 pp. Vapnik, V., 1995. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 311 pp.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.