Artificial Intelligence CMSC January 27, 2004

Artificial Intelligence CMSC 25000 January 27, 2004
Evolutionary Search Artificial Intelligence CMSC 25000 January 27, 2004

Agenda Motivation: Genetic Algorithms Conclusions Evolving a solution
Modeling search as evolution Mutation Crossover Survival of the fittest Survival of the most diverse Conclusions

Genetic Algorithms Applications
Search parameter space for optimal assignment Not guaranteed to find optimal, but can approach Classic optimization problems: E.g. Traveling Salesman Problem Program design (“Genetic Programming”) Aircraft carrier landings

Genetic Algorithm Example
Cookie recipes (Winston, AI, 1993) As evolving populations Individual = batch of cookies Quality: 0-9 Chromosomes = 2 genes: 1 chromosome each Flour Quantity, Sugar Quantity: 1-9 Mutation: Randomly select Flour/Sugar: +/- 1 [1-9] Crossover: Split 2 chromosomes & rejoin; keeping both

Mutation & Crossover Mutation: 1 1 1 1 2 1 1 2 2 2 1 3 Crossover: 2 2

Fitness Natural selection: Most fit survive
Fitness= Probability of survival to next gen Question: How do we measure fitness? “Standard method”: Relate fitness to quality :0-1; :1-9: Chromosome Quality Fitness 1 4 3 1 1 2 1 1 4 3 2 1 0.4 0.3 0.2 0.1

Genetic Algorithms Procedure
Create an initial population (1 chromosome) Mutate 1+ genes in 1+ chromosomes Produce one offspring for each chromosome Mate 1+ pairs of chromosomes with crossover Add mutated & offspring chromosomes to pop Create new population Best + randomly selected (biased by fitness)

GA Design Issues Genetic design: Population: How many chromosomes?
Identify sets of features = genes; Constraints? Population: How many chromosomes? Too few => inbreeding; Too many=>too slow Mutation: How frequent? Too few=>slow change; Too many=> wild Crossover: Allowed? How selected? Duplicates?

GA Design: Basic Cookie GA
Genetic design: Identify sets of features: 2 genes: flour+sugar;1-9 Population: How many chromosomes? 1 initial, 4 max Mutation: How frequent? 1 gene randomly selected, randomly mutated Crossover: Allowed? No Duplicates? No Survival: Standard method

Example Mutation of 2 Chromosome Quality 1 4 4 Generation 0: 2 2 3
Generation 0: Chromosome Quality Generation 1: Chromosome Quality Generation 3: Chromosome Quality Generation 2: Chromosome Quality

Basic Cookie GA Results
Results are for 1000 random trials Initial state: 1 1-1, quality 1 chromosome On average, reaches max quality (9) in 16 generations Best: max quality in 8 generations Conclusion: Low dimensionality search Successful even without crossover

Adding Crossover Genetic design: Population: How many chromosomes?
Identify sets of features: 2 genes: flour+sugar;1-9 Population: How many chromosomes? 1 initial, 4 max Mutation: How frequent? 1 gene randomly selected, randomly mutated Crossover: Allowed? Yes, select random mates; cross at middle Duplicates? No Survival: Standard method

Basic Cookie GA+Crossover Results
Results are for 1000 random trials Initial state: 1 1-1, quality 1 chromosome On average, reaches max quality (9) in 14 generations Conclusion: Faster with crossover: combine good in each gene Key: Global max achievable by maximizing each dimension independently - reduce dimensionality

Solving the Moat Problem
No single step mutation can reach optimal values using standard fitness (quality=0 => probability=0) Solution A: Crossover can combine fit parents in EACH gene However, still slow: 155 generations on average 1 2 3 4 5 4 3 2 1 2 2 3 3 4 7 8 7 4 5 8 9 8 5 4 7 8 7 4 3 3 2 2 1 2 3 4 5 4 3 2 1

Questions How can we avoid the 0 quality problem?
How can we avoid local maxima?

Rethinking Fitness Goal: Explicit bias to best Solution: Rank method
Remove implicit biases based on quality scale Solution: Rank method Ignore actual quality values except for ranking Step 1: Rank candidates by quality Step 2: Probability of selecting ith candidate, given that i-1 candidate not selected, is constant p. Step 2b: Last candidate is selected if no other has been Step 3: Select candidates using the probabilities

Rank Method Chromosome Quality Rank Std. Fitness Rank Fitness 1 4 1 3
1 2 5 2 7 5 4 3 2 1 1 2 3 4 5 0.4 0.3 0.2 0.1 0.0 0.667 0.222 0.074 0.025 0.012 Results: Average over 1000 random runs on Moat problem - 75 Generations (vs 155 for standard method) No 0 probability entries: Based on rank not absolute quality

Diversity Diversity: Degree to which chromosomes exhibit different genes Rank & Standard methods look only at quality Need diversity: escape local min, variety for crossover “As good to be different as to be fit”

Rank-Space Method Combines diversity and quality in fitness
Diversity measure: Sum of inverse squared distances in genes Diversity rank: Avoids inadvertent bias Rank-space: Sort on sum of diversity AND quality ranks Best: lower left: high diversity & quality

Rank-Space Method W.r.t. highest ranked 5-1
Chromosome Q D D Rank Q Rank Comb Rank R-S Fitness 1 4 3 1 1 2 1 1 7 5 4 3 2 1 0.04 0.25 0.059 0.062 0.05 1 5 3 4 2 1 2 3 4 5 1 4 2 5 3 0.667 0.025 0.222 0.012 0.074 Diversity rank breaks ties After select others, sum distances to both Results: Average (Moat) 15 generations

GA’s and Local Maxima Quality metrics only: Quality + Diversity:
Susceptible to local max problems Quality + Diversity: Can populate all local maxima Including global max Key: Population must be large enough

Genetic Algorithms Evolution mechanisms as search technique
Produce offspring with variation Mutation, Crossover Select “fittest” to continue to next generation Fitness: Probability of survival Standard: Quality values only Rank: Quality rank only Rank-space: Rank of sum of quality & diversity ranks Large population can be robust to local max

Machine Learning: Nearest Neighbor & Information Retrieval Search
Artificial Intelligence CMSC 25000 January 27, 2004

Agenda Machine learning: Introduction Nearest neighbor techniques
Applications: Robotic motion, Credit rating Information retrieval search Efficient implementations: k-d trees, parallelism Extensions: K-nearest neighbor Limitations: Distance, dimensions, & irrelevant attributes

Machine Learning Learning: Acquiring a function, based on past inputs and values, from new inputs to values. Learn concepts, classifications, values Identify regularities in data

Machine Learning Examples
Pronunciation: Spelling of word => sounds Speech recognition: Acoustic signals => sentences Robot arm manipulation: Target => torques Credit rating: Financial data => loan qualification

Machine Learning Characterization
Distinctions: Are output values known for any inputs? Supervised vs unsupervised learning Supervised: training consists of inputs + true output value E.g. letters+pronunciation Unsupervised: training consists only of inputs E.g. letters only Course studies supervised methods

Distinctions: Are output values discrete or continuous? Discrete: “Classification” E.g. Qualified/Unqualified for a loan application Continuous: “Regression” E.g. Torques for robot arm motion Characteristic of task

Distinctions: What form of function is learned? Also called “inductive bias” Graphically, decision boundary E.g. Single, linear separator Rectangular boundaries - ID trees Vornoi spaces…etc…

Machine Learning Functions
Problem: Can the representation effectively model the class to be learned? Motivates selection of learning algorithm For this function, Linear discriminant is GREAT! Rectangular boundaries (e.g. ID trees) TERRIBLE! Pick the right representation! - - - ++ + + + +

Machine Learning Features
Inputs: E.g.words, acoustic measurements, financial data Vectors of features: E.g. word: letters ‘cat’: L1=c; L2 = a; L3 = t Financial data: F1= # late payments/yr : Integer F2 = Ratio of income to expense: Real

Machine Learning Features
Question: Which features should be used? How should they relate to each other? Issue 1: How do we define relation in feature space if features have different scales? Solution: Scaling/normalization Issue 2: Which ones are important? If differ in irrelevant feature, should ignore

Complexity & Generalization
Goal: Predict values accurately on new inputs Problem: Train on sample data Can make arbitrarily complex model to fit BUT, will probably perform badly on NEW data Strategy: Limit complexity of model (e.g. degree of equ’n) Split training and validation sets Hold out data to check for overfitting

Nearest Neighbor Memory- or case- based learning
Supervised method: Training Record labeled instances and feature-value vectors For each new, unlabeled instance Identify “nearest” labeled instance Assign same label Consistency heuristic: Assume that a property is the same as that of the nearest reference case.

Nearest Neighbor Example
Problem: Robot arm motion Difficult to model analytically Kinematic equations Relate joint angles and manipulator positions Dynamics equations Relate motor torques to joint angles Difficult to achieve good results modeling robotic arms or human arm Many factors & measurements

Solution: Move robot arm around Record parameters and trajectory segment Table: torques, positions,velocities, squared velocities, velocity products, accelerations To follow a new path: Break into segments Find closest segments in table Get those torques (interpolate as necessary)

Issue: Big table First time with new trajectory “Closest” isn’t close Table is sparse - few entries Solution: Practice As attempt trajectory, fill in more of table After few attempts, very close

Nearest Neighbor Example II
Credit Rating: Classifier: Good / Poor Features: L = # late payments/yr; R = Income/Expenses Name L R G/P A G B P C G D P E P F G G G H P

Name L R G/P A G A F B P 1 G R C G E H D C D P E P B F G G G 10 20 30 L H P

Name L R G/P I G A F J K P 1 I G K ?? E R H D C J B Distance Measure: Sqrt ((L1-L2)^2 + [sqrt(10)*(R1-R2)]^2)) - Scaled distance 10 20 30 L

Models for Retrieval and Classification
Plethora of models are used Here: Vector Space Model N-grams HMMs

Vector Space Information Retrieval
Task: Document collection Query specifies information need: free text Relevance judgments: 0/1 for all docs Word evidence: Bag of words No ordering information

Vector Space Model Tv Program Computer
Two documents: computer program, tv program Query: computer program : matches 1 st doc: exact: distance=2 vs 0 educational program: matches both equally: distance=1

Vector Space Model Represent documents and queries as
Vectors of term-based features Features: tied to occurrence of terms in collection E.g. Solution 1: Binary features: t=1 if presense, 0 otherwise Similiarity: number of terms in common Dot product

Vector Space Model II Problem: Not all terms equally interesting
E.g. the vs dog vs Levow Solution: Replace binary term features with weights Document collection: term-by-document matrix View as vector in multidimensional space Nearby vectors are related Normalize for vector length

Vector Similarity Computation
Similarity = Dot product Normalization: Normalize weights in advance Normalize post-hoc

Term Weighting “Aboutness” “Specificity”
To what degree is this term what document is about? Within document measure Term frequency (tf): # occurrences of t in doc j “Specificity” How surprised are you to see this term? Collection frequency Inverse document frequency (idf):

Term Selection & Formation
Some terms are truly useless Too frequent, no content E.g. the, a, and,… Stop words: ignore such terms altogether Creation: Too many surface forms for same concepts E.g. inflections of words: verb conjugations, plural Stem terms: treat all forms as same underlying

Matching Topics and Documents
Two main perspectives: Pre-defined, fixed, finite topics: “Text Classification” Arbitrary topics, typically defined by statement of information need (aka query) “Information Retrieval”

Matching Topics and Documents
Documents are “about” some topic(s) Question: Evidence of “aboutness”? Words !! Possibly also meta-data in documents Tags, etc Model encodes how words capture topic E.g. “Bag of words” model, Boolean matching What information is captured? How is similarity computed?

Efficient Implementations
Classification cost: Find nearest neighbor: O(n) Compute distance between unknown and all instances Compare distances Problematic for large data sets Alternative: Use binary search to reduce to O(log n)

Efficient Implementation: K-D Trees
Divide instances into sets based on features Binary branching: E.g. > value 2^d leaves with d split path = n d= O(log n) To split cases into sets, If there is one element in the set, stop Otherwise pick a feature to split on Find average position of two middle objects on that dimension Split remaining objects based on average position Recursively split subsets

K-D Trees: Classification
Yes No L > 17.5? L > 9 ? No Yes Yes No R > 0.6? R > 0.75? R > ? R > ? No Yes No Yes No No Yes Yes Poor Good Good Poor Good Good Poor Good

Efficient Implementation: Parallel Hardware
Classification cost: # distance computations Const time if O(n) processors Cost of finding closest Compute pairwise minimum, successively O(log n) time

Nearest Neighbor: Issues
Prediction can be expensive if many features Affected by classification, feature noise One entry can change prediction Definition of distance metric How to combine different features Different types, ranges of values Sensitive to feature selection

Nearest Neighbor Analysis
Problem: Ambiguous labeling, Training Noise Solution: K-nearest neighbors Not just single nearest instance Compare to K nearest neighbors Label according to majority of K What should K be? Often 3, can train as well

Nearest Neighbor: Analysis
Issue: What is a good distance metric? How should features be combined? Strategy: (Typically weighted) Euclidean distance Feature scaling: Normalization Good starting point: (Feature - Feature_mean)/Feature_standard_deviation Rescales all values - Centered on 0 with std_dev 1

Nearest Neighbor: Analysis
Issue: What features should we use? E.g. Credit rating: Many possible features Tax bracket, debt burden, retirement savings, etc.. Nearest neighbor uses ALL Irrelevant feature(s) could mislead Fundamental problem with nearest neighbor

Nearest Neighbor: Advantages
Fast training: Just record feature vector - output value set Can model wide variety of functions Complex decision boundaries Weak inductive bias Very generally applicable

Summary Machine learning:
Acquire function from input features to value Based on prior training instances Supervised vs Unsupervised learning Classification and Regression Inductive bias: Representation of function to learn Complexity, Generalization, & Validation

Learning: Nearest Neighbor
Artificial Intelligence CMSC 25000 March 6, 2003

Summary: Nearest Neighbor
Training: record input vectors + output value Prediction: closest training instance to new data Efficient implementations Pros: fast training, very general, little bias Cons: distance metric (scaling), sensitivity to noise & extraneous features

The Information Retrieval Task
Goal: Match the information need expressed by user (the Query) With concepts in documents (the Document collection) Issues: How do we represent documents and queries ? How do we know if they're “similar”? Match?

Vector Space Model Represent documents and queries with Similarity:
Pattern of words I.E. Queries and documents with lots of the same words Vector of word occurrences: Each position in vector = word Value of position x in vector = # times word x occurs Similarity: Dot product of document vector & query vector Biggest wins

Three Steps to IR Three phases:
Indexing: Build collection of document representations Convert web pages to doc-rep Vectors of word counts Query construction: Convert query text to vector of word counts Retrieval: Compute similarity between query and doc representation Return closest match: Distance= 1- similarity

Issues Normalization: Weight calculation: Efficiency:
Document lengths, query lengths affect score Weight calculation: Term Frequency (tf) * Inverse Document Frequency (idf) [log N/n] Additional Scaling Efficiency: Inverted indexes; sparse matrix computation

Artificial Intelligence CMSC January 27, 2004

Similar presentations

Presentation on theme: "Artificial Intelligence CMSC January 27, 2004"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Artificial Intelligence CMSC January 27, 2004

Similar presentations

Presentation on theme: "Artificial Intelligence CMSC January 27, 2004"— Presentation transcript:

Similar presentations

About project

Feedback