COMP3503 Intro to Inductive Modeling with Daniel L. Silver
Agenda Deductive and Inductive Modeling Learning Theory and Generalization Common Statistical Methods
The KDD Process Data Mining Interpretation and Evaluation Knowledge Selection and Preprocessing p(x)=0.02 Data Consolidation Patterns & Models Data Warehouse Prepared Data Consolidated Data Data Sources
Deductive and Inductive Modeling
Induction versus Deduction Top-down verification Deduction Model or General Rule Example A Example B Example C Induction Bottom-up construction
Deductive Modeling Top-down (toward the data) verification of an hypothesis The hypothesis is generated within the mind of the data miner Exploratory tools such as OLAP and data visualization software are used Models tend to be used for description
Inductive Modeling Bottom-up (from the data) development of an hypothesis The hypothesis is generated by the technology directly from the data Statistical and machine learning tools such as regression, decision trees and artificial neural networks are used Models can be used for prediction
Inductive Modeling Objective: Develop a general model or hypothesis from specific examples Function approximation (curve fitting) Classification (concept learning, pattern recognition) f(x) x Show OH of multi-disciplinary nature of study of ANNs x1 x2 A B
Learning Theory and Generalization
Inductive Modeling = Learning Basic Framework for Inductive Learning Environment Testing Examples Training Examples Induced Model or Hypothesis Inductive Learning System ~ (x, f(x)) h(x) = f(x)? Output Classification A problem of representation and search for the best hypothesis, h(x). (x, h(x))
Inductive Modeling = Data Mining Ideally, an hypothesis (model) is: Complete – covers all potential examples Consistent – no conflicts Accurate - able to generalize to previously unseen examples Valid – presents a truth Transparent – human readable knowledge Show OH of multi-disciplinary nature of study of ANNs
Inductive Modeling Generalization The objective of learning is to achieve good generalization to new cases, otherwise just use a look-up table. Generalization can be defined as a mathematical interpolation or regression over a set of training points: f(x) x
Inductive Modeling Generalization Generalization accuracy can be guaranteed for a specified confidence level given sufficient number of examples Models can be validated for accuracy by using a previously unseen test set of examples
Learning Theory Probably Approximately Correct (PAC) theory of learning (Leslie Valiant, 1984) Poses questions such as: How many examples are needed for good generalization? How long will it take to create a good model? Answers depend on: Complexity of the actual function The desired level of accuracy of the model (75%) The desired confidence in finding a model with this the accuracy (19 times out of 20 = 95%)
Learning Theory c - - h + + - - - + Where c and h disagree - - - Space of all possible examples The true error of a hypothesis h is the probability that h will misclassify an instance drawn at random from X, error(h) = P[c(x) h(x)]
Learning Theory Three notions of error: Training Error How often training set is misclassified Test Error How often an independent test set is misclassified True Error How often the entire population of possible examples would be misclassified Must be estimated from the Test Error
Linear and Non-Linear Problems Linear functions Linearly separable classifications Non-linear Problems Non-linear functions Not linearly separable f(x) x A B x2 x1 f(x) B A B x2 x1
Inductive Bias Every inductive modeling system has an Inductive Bias Consider a simple set of training examples like the following: Go to Excel Spreadsheet a check using f(x) x Go to generalize.xls
Inductive Bias Can you think of any biases that you commonly use when you are learning something new? Is there one best inductive bias? KISS - Occam’s Razor Simple linear function – least squares fit
Inductive Modeling Methods Automated Exploration/Discovery e.g.. discovering new market segments distance and probabilistic clustering algorithms Prediction/Classification e.g.. forecasting gross sales given current factors statistics (regression, K-nearest neighbour) artificial neural networks, genetic algorithms Explanation/Description e.g.. characterizing customers by demographics inductive decision trees/rules rough sets, Bayesian belief nets x1 x2 A B f(x) x At the university, the corporate sponsors and within the greater business community ... Research Investigate and advance state-of-the art KDD technologies and processes and their application to marketing information Application Identify problems and apply appropriate KDD technologies and processes Facilitate project management though the transfer of success factors Promotion and Resource Act as a catalyst and resource on KDD Publish research and case study results Establish Atlantic Canada as a leader Education Develop and deliver KDD seminar material at the academic and industry level if age > 35 and income < $35k then ...
Common Statistical Methods
Linear Regression Y = b0 + b1 X1 + b2 X2 +... The coefficients b0, b1 … determine a line (or hyperplane for higher dim.) that fits the data Closed form solution via least squares (computes the smallest sum of squared distances between the examples and predicted values of Y) Inductive bias: The solution can be modeled by a straight line or hyperplane
Linear Regression Y = b0 + b1 X1 + b2 X2 +... A great way to start since it assumes you are modeling a simple function … Why?
Logistic Regression Y = 1/(1+e-Z) Where Z = b0 + b1 X1 + b2 X2 +… Z Y = 1/(1+e-Z) Where Z = b0 + b1 X1 + b2 X2 +… Output is [0,1] and represents probability The coefficients b0, b1 … determine an S-shaped non-linear curve that best fits data The coefficients are estimated using an iterative maximum-likelihood method Inductive bias: The solution can be modeled by this S-shaped non-linear surface
Logistic Regression Can be used for classification problems 1 Y Z Y = 1/(1+e-Z) Where Z = b0 + b1 X1 + b2 X2 +… Can be used for classification problems The output can be used as the probability of being of the class (or positive) Alternatively, any value above a cut-off (typically 0.5) is classified as being a positive example … A logistic regression Javascript page
THE END danny.silver@acadiau.ca
Learning Theory Example Space X(x,c(x)) c - h x = input attributes c = true class function (e.g. “likes product”) h = hypothesis (model) + - + - - Where c and h disagree The true error of a hypothesis h is the probability that h will misclassify an instance drawn at random from X, err(h) = P[c(x) h(x)]
PAC - A Probabilistic Guarantee Generalization PAC - A Probabilistic Guarantee H = # possible hypotheses in modeling system = desired true error, where (0 < < 1) = desired confidence (1- ), where (0 < < 1) The the number of training examples required to select (with confidence ) a hypothesis h with err(h) < is given by