Download presentation
1
COMP3503 Intro to Inductive Modeling
with Daniel L. Silver
2
Agenda Deductive and Inductive Modeling
Learning Theory and Generalization Common Statistical Methods
3
The KDD Process Data Mining Interpretation and Evaluation
Knowledge Selection and Preprocessing p(x)=0.02 Data Consolidation Patterns & Models Data Warehouse Prepared Data Consolidated Data Data Sources
4
Deductive and Inductive Modeling
5
Induction versus Deduction
Top-down verification Deduction Model or General Rule Example A Example B Example C Induction Bottom-up construction
6
Deductive Modeling Top-down (toward the data) verification of an hypothesis The hypothesis is generated within the mind of the data miner Exploratory tools such as OLAP and data visualization software are used Models tend to be used for description
7
Inductive Modeling Bottom-up (from the data) development of an hypothesis The hypothesis is generated by the technology directly from the data Statistical and machine learning tools such as regression, decision trees and artificial neural networks are used Models can be used for prediction
8
Inductive Modeling Objective: Develop a general model or hypothesis from specific examples Function approximation (curve fitting) Classification (concept learning, pattern recognition) f(x) x Show OH of multi-disciplinary nature of study of ANNs x1 x2 A B
9
Learning Theory and Generalization
10
Inductive Modeling = Learning
Basic Framework for Inductive Learning Environment Testing Examples Training Examples Induced Model or Hypothesis Inductive Learning System ~ (x, f(x)) h(x) = f(x)? Output Classification A problem of representation and search for the best hypothesis, h(x). (x, h(x))
11
Inductive Modeling = Data Mining
Ideally, an hypothesis (model) is: Complete – covers all potential examples Consistent – no conflicts Accurate - able to generalize to previously unseen examples Valid – presents a truth Transparent – human readable knowledge Show OH of multi-disciplinary nature of study of ANNs
12
Inductive Modeling Generalization
The objective of learning is to achieve good generalization to new cases, otherwise just use a look-up table. Generalization can be defined as a mathematical interpolation or regression over a set of training points: f(x) x
13
Inductive Modeling Generalization
Generalization accuracy can be guaranteed for a specified confidence level given sufficient number of examples Models can be validated for accuracy by using a previously unseen test set of examples
14
Learning Theory Probably Approximately Correct (PAC) theory of learning (Leslie Valiant, 1984) Poses questions such as: How many examples are needed for good generalization? How long will it take to create a good model? Answers depend on: Complexity of the actual function The desired level of accuracy of the model (75%) The desired confidence in finding a model with this the accuracy (19 times out of 20 = 95%)
15
Learning Theory c - - h + + - - - + Where c and h disagree - - - Space of all possible examples The true error of a hypothesis h is the probability that h will misclassify an instance drawn at random from X, error(h) = P[c(x) h(x)]
16
Learning Theory Three notions of error: Training Error
How often training set is misclassified Test Error How often an independent test set is misclassified True Error How often the entire population of possible examples would be misclassified Must be estimated from the Test Error
17
Linear and Non-Linear Problems
Linear functions Linearly separable classifications Non-linear Problems Non-linear functions Not linearly separable f(x) x A B x2 x1 f(x) B A B x2 x1
18
Inductive Bias Every inductive modeling system has an Inductive Bias
Consider a simple set of training examples like the following: Go to Excel Spreadsheet a check using f(x) x Go to generalize.xls
19
Inductive Bias Can you think of any biases that you commonly use when you are learning something new? Is there one best inductive bias? KISS - Occam’s Razor Simple linear function – least squares fit
20
Inductive Modeling Methods
Automated Exploration/Discovery e.g.. discovering new market segments distance and probabilistic clustering algorithms Prediction/Classification e.g.. forecasting gross sales given current factors statistics (regression, K-nearest neighbour) artificial neural networks, genetic algorithms Explanation/Description e.g.. characterizing customers by demographics inductive decision trees/rules rough sets, Bayesian belief nets x1 x2 A B f(x) x At the university, the corporate sponsors and within the greater business community ... Research Investigate and advance state-of-the art KDD technologies and processes and their application to marketing information Application Identify problems and apply appropriate KDD technologies and processes Facilitate project management though the transfer of success factors Promotion and Resource Act as a catalyst and resource on KDD Publish research and case study results Establish Atlantic Canada as a leader Education Develop and deliver KDD seminar material at the academic and industry level if age > 35 and income < $35k then ...
21
Common Statistical Methods
22
Linear Regression Y = b0 + b1 X1 + b2 X2 +...
The coefficients b0, b1 … determine a line (or hyperplane for higher dim.) that fits the data Closed form solution via least squares (computes the smallest sum of squared distances between the examples and predicted values of Y) Inductive bias: The solution can be modeled by a straight line or hyperplane
23
Linear Regression Y = b0 + b1 X1 + b2 X2 +...
A great way to start since it assumes you are modeling a simple function … Why?
24
Logistic Regression Y = 1/(1+e-Z) Where Z = b0 + b1 X1 + b2 X2 +…
Z Y = 1/(1+e-Z) Where Z = b0 + b1 X1 + b2 X2 +… Output is [0,1] and represents probability The coefficients b0, b1 … determine an S-shaped non-linear curve that best fits data The coefficients are estimated using an iterative maximum-likelihood method Inductive bias: The solution can be modeled by this S-shaped non-linear surface
25
Logistic Regression Can be used for classification problems
1 Y Z Y = 1/(1+e-Z) Where Z = b0 + b1 X1 + b2 X2 +… Can be used for classification problems The output can be used as the probability of being of the class (or positive) Alternatively, any value above a cut-off (typically 0.5) is classified as being a positive example … A logistic regression Javascript page
26
THE END danny.silver@acadiau.ca
27
Learning Theory Example Space X(x,c(x)) c - h x = input attributes c = true class function (e.g. “likes product”) h = hypothesis (model) + - + - - Where c and h disagree The true error of a hypothesis h is the probability that h will misclassify an instance drawn at random from X, err(h) = P[c(x) h(x)]
28
PAC - A Probabilistic Guarantee
Generalization PAC - A Probabilistic Guarantee H = # possible hypotheses in modeling system = desired true error, where (0 < < 1) = desired confidence (1- ), where (0 < < 1) The the number of training examples required to select (with confidence ) a hypothesis h with err(h) < is given by
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.