Pattern Analysis Prof. Bennett Math Model of Learning and Discovery 1/17/03 Based on Chapter 1 of Shawe-Taylor and Cristianini
Outline What is pattern analysis? Illustrate issues via example Pattern definitions Examples of practical tasks Pattern algorithms Summary
Pattern Analysis The automatic detection of patterns in data from the same source. Make predictions of new data coming from the same source. Data may take many forms: images, text, records of commercial transactions, genome sequences, family tree
Data Driven Analysis D P P2 P3 Mercury 0.24 0.39 0.058 0.059 Venus 0.62 0.72 0.38 Earth 1.00 Mars 1.88 1.53 3.53 3.58 Jupiter 11.90 5.31 142.0 141.00 Saturn 29.30 9.55 870.0 871.00 Kepler Analyzed Brahe’s Planetary Motion Data P = Period D = Average Distance from Sun
Found “Regularities” Observed P3= D2 Developed three laws of planetary motion. Compressible: Data can be represented by one column Predictable: Discovering hidden relations allow us to predict other columns. Third Law is exact.
Data Representation I Nonlinear Model of D and P Linear Model of
Data Representation II Say know plane of orbit so we can represent positions as (x,y) pairs Also know orbit is ellipse
Data Representation Pattern is nonlinear function of x,y Pattern is linear function of Linear relationships are easier to find.
Set of Hypotheses Hypothesis Ellipse compute Hypothesis Circle compute UNDERFITS
Set of Hypotheses Hypothesis any continuous function OVERFITS!!! Depends on size of hypothesis class Use domain knowledge to limit hypotheses
Approximate Pattern Noisy Data
Typical Pattern Analysis Approximate not exact. Data has errors and omissions. Cannot predict graduate school performance from GRE’s and grades alone. Best Representation/Model unknown. Make approximate predictions – need to address how accurate estimates are.
Definition: Exact Pattern A general exact pattern, f, for data source S satisfies for all data x from source S
Approximate Pattern A general approximate pattern, f, for data source S satisfies for all data x from source S
Statistical Pattern A general statistical pattern, f, for data source S generated iid according to distribution D satisfies for all data x from source S
Two and Multiclass Classification Example – Character Recognition two class - is it an A or not? multiclass – what letter is it ?
Regression Example –Determine drug bioavailability through the intestine. Estimate apparent permeability as assayed via intestinal cell line.
Density Estimation Estimate the probability that a particular event occurs, p(x). Use it to detect improbably events like fraud.
Principal Component Analysis Find a projection of the data that captures the major variance in the data. Eigenfaces - capture essential qualities of faces to help ID and reduce storage needs.
Other Tasks Reinforcement Learning Robot senses state of the world, Must learn action to take, Periodically receives rewards – delivers mail punishments – hits wall What is the learning model?
Pattern Analysis Algorithm A Pattern Analysis Algorithm input = finite set of data from source S a.k.a. the training set output = detector function f or no patterns detected
Pattern Algorithm Issues Efficiency and Scalability – memory and CPU requirements, large data sets Robustness – find approximate patterns on noisy data Stability - discover genuine patterns, find same problems on different views of the dataset
Stability Generalization – Find pattern on future data Pattern may exist by chance for finite sample Provide statistical guarantee that pattern truly exist with caveat that with small probability that algorithm may have been mislead.
Example Observe that for state agency that all 20 babies adopted in last 10 years from country x are girls. Pattern, only girls are available for adoption from that country. With probability p=(0.5)220 could observe data even if chance of girls and boys equally likely. So with chance p, we were mislead.
Statistical Learning Theory Produce a pattern based on a finite sample. Provide bounds on the probability that pattern approximately represents a true pattern with some probability. Probably Approximately Correct
Recoding Strategy With proper representation, the problem can become easier (linear model works). Develop general purpose linear learning methods. Change recoding using “kernel functions”
Key Ideas Patterns are regularities in data from a specified source Algorithm takes finite sample and computes pattern Efficiency, robustness, and stability Representation -- Kernels Strategy = Generic Algorithms + Recoding Many Learning Tasks in this framework