Machine Learning Foundations of Artificial Intelligence.

Machine Learning Foundations of Artificial Intelligence

2 Learning  What is Learning?  Learning in AI is also called machine learning or pattern recognition.  The basic objective is to allow an intelligent agent to discover autonomously knowledge from experience.  Let’s examine the definition more closely:  “an intelligent agent”: The ability to learn requires a prior level of intelligence and knowledge. Learning has to start from an existing level of capability.  “to discover autonomously”: Learning is fundamentally about an agent recognizing new facts for its own use and acquiring new abilities that reinforce its own existing abilities. Literal programming, i.e. rote learning from instruction, is not useful.  “knowledge”: Whatever is learned has to be represented in some way that the agent can use. “If you can't represent it, you can't learn it” is a corollary of the slogan “Knowledge is power”.  “from experience”: Experience is typically a set of so-called training examples; examples may be categorized or not. They may be random or selected by a teacher. They may include explanations or not.

Foundations of Artificial Intelligence 3 Learning Agent environment agent ? sensors actuators Problem solver Learning element Critic Percepts Actions KB

Foundations of Artificial Intelligence 4 Learning element  Design of a learning element is affected by  Which components of the performance element are to be learned  What feedback is available to learn these components  What representation is used for the components  Type of feedback:  Supervised learning: correct answers for each training example  Unsupervised learning: correct answers not given  Reinforcement learning: occasional rewards/feedback

Foundations of Artificial Intelligence 5 Inductive Learning  Inductive Learning  inductive learning involves learning generalized rules from specific examples (can think of this as the “inverse” of deduction)  main task: given a set of examples, each classified as positive or negative produce a concept description that matches exactly the positive examples  Some Notes:  The examples are coded in some representation language, e.g. they are coded by a finite set of real-valued features.  The concept description is in a certain language that is presumably a superset of the language of possible example encodings.  A “correct” concept description is one that classifies correctly ALL possible examples, not just those given in the training set.  Fundamental Difficulties with Induction  can’t generalize with perfect certainty  examples and concepts are NOT available “directly”; they are only available through representations which may be more or less adequate to capture them  some examples may be classified as both positive and negative  the features supplied may not be sufficient to discriminate between positive and negative examples

Foundations of Artificial Intelligence 6 Inductive Learning Frameworks 1. Function-learning formulation 2. Logic-inference formulation

Foundations of Artificial Intelligence 7 Inductive learning  Simplest form: learn a function from examples f is the target function An example is a pair (x, f(x)) Problem: find a hypothesis h such that h ≈ f given a training set of examples  This is a highly simplified model of real learning:  Ignores prior knowledge  Assumes examples are given

Foundations of Artificial Intelligence 8 Inductive learning  Construct/adjust h to agree with f on training set  h is consistent if it agrees with f on all examples  E.g., curve fitting:

Foundations of Artificial Intelligence 13 Inductive learning  Construct/adjust h to agree with f on training set  h is consistent if it agrees with f on all examples  E.g., curve fitting:  Ockham’s razor: prefer the simplest hypothesis consistent with data

Foundations of Artificial Intelligence 14 Logic-Inference Formulation  Background knowledge KB  Training set  (observed knowledge) that is not logically implied by KB  Inductive inference: Find h  (inductive hypothesis) such that KB and h imply  Usually, not a sound inference h =  is a trivial, but uninteresting solution (data caching)

Foundations of Artificial Intelligence 15 Rewarded Card Example  Deck of cards, with each card designated by [r,s], its rank and suit, and some cards “rewarded”  Background knowledge KB: ((r=1) v … v (r=10))  NUM(r) ((r=J) v (r=Q) v (r=K))  FACE(r) ((s=S) v (s=C))  BLACK(s) ((s=D) v (s=H))  RED(s)  Training set  : REWARD([4,C])  REWARD([7,C])  REWARD([2,S])   REWARD([5,H])   REWARD([J,S])  Possible inductive hypothesis: h  (NUM(r)  BLACK(s)  REWARD([r,s])) Note: There are several possible inductive hypotheses

Foundations of Artificial Intelligence 16 Learning a Predicate  Set E of objects (e.g., cards)  Goal predicate CONCEPT(x), where x is an object in E,  takes the value True or False (e.g., REWARD)  Observable predicates A(x), B(X), …  e.g., NUM, RED  Training set  values of CONCEPT for some combinations of values of the observable predicates

Foundations of Artificial Intelligence 17 A Possible Training Set Ex. #ABCDECONCEPT 1 True FalseTrueFalse 2 TrueFalse True 3 False True False 4 True FalseTrue 5 FalseTrue False 6 True FalseTrue False 7 TrueFalseTrueFalse 8 TrueFalseTrueFalseTrue 9 False True False 10 True FalseTrue Note that the training set does not say whether an observable predicate A, …, E is pertinent or not

Foundations of Artificial Intelligence 18 Learning a Predicate  Set E of objects (e.g., cards)  Goal predicate CONCEPT(x), where x is an object in E,  takes the value True or False (e.g., REWARD)  Observable predicates A(x), B(X), …  e.g., NUM, RED  Training set  values of CONCEPT for some combinations of values of the observable predicates  Find a representation of CONCEPT in the form: CONCEPT(x)  S(A,B, …) where S(A,B, … ) is a sentence built with the observable predicates, e.g.: CONCEPT(x)  A(x)  (  B(x) v C(x))

Foundations of Artificial Intelligence 19 Example set  An example consists of the values of CONCEPT and the observable predicates for some object x  A example is positive if CONCEPT is True, else it is negative  The set X of all examples is the example set  The training set is a subset of X

Foundations of Artificial Intelligence 20  An hypothesis is any sentence h of the form: CONCEPT(x)  S(A,B, …) where S(A,B, … ) is a sentence built with the observable predicates  The set of all hypotheses is called the hypothesis space H  An hypothesis h agrees with an example if it gives the correct value of CONCEPT Hypothesis Space

Foundations of Artificial Intelligence 21 + + + + + + + + + + + + - - - - - - - - - - - - Example set X {[A, B, …, CONCEPT]} Inductive Learning Scheme Hypothesis space H {[CONCEPT(x)  S(A,B, …)]} Training set  Inductive hypothesis h

Foundations of Artificial Intelligence 22 Size of Hypothesis Space  n observable predicates  2 n entries in truth table  In the absence of any restriction (bias), there are 2 2 n hypotheses to choose from  n = 6  2x10 19 hypotheses!

Foundations of Artificial Intelligence 23 h1  NUM(x)  BLACK(x)  REWARD(x) h2  BLACK([r,s])   (r=J)  REWARD([r,s]) h3  ([r,s]=[4,C])  ([r,s]=[7,C])  [r,s]=[2,S])  REWARD([r,s]) h4   ([r,s]=[5,H])   ([r,s]=[J,S])  REWARD([r,s]) agree with all the examples in the training set Multiple Inductive Hypotheses Rewarded Card Example (Continued)

Foundations of Artificial Intelligence 24 Inductive Bias  Need for a system of preferences – called a bias – to compare possible hypotheses  Keep-It-Simple (KIS) Bias  If an hypothesis is too complex it may not be worth learning it  There are much fewer simple hypotheses than complex ones, hence the hypothesis space is smaller  Examples:  Use much fewer observable predicates than suggested by the training set  Constrain the learnt predicate, e.g., to use only “high-level” observable predicates such as NUM, FACE, BLACK, and RED and/or to have simple syntax (e.g., conjunction of literals) If the bias allows only sentences S that are conjunctions of k << n predicates picked from the n observable predicates, then the size of H is O(n k )

Foundations of Artificial Intelligence 25 Version Spaces  Idea: assume you are looking for a CONJUNCTIVE CONCEPT  e.g., spade A, club 7, club 9yes club 8, heart 5 no  concept: odd and black  now notice that the set of conjunctive concepts is partially ordered by specificity any card black odd black spade odd spade 3 of spade at any point, keep most specific and least specific conjuncts consistent with data: most specific: anything more specific misses some positive instances always exists -- conjoin all OK conjunctions least specific: anything less specific admits some negative instances may not be unique -- imagine all you know is club 4 not ok, odd black ok, spade ok, black not ok Idea is to gradually merge least and most specific as data comes in.

Foundations of Artificial Intelligence 26 Version Spaces: Example  Step 0: most specific concept (msc) is the empty set; least specific concept (lsc) is the set of all cards.  Step 1: A-spade is found to be in target set:  msc = {A-spade}  lsc = set of all cards  Step 2: 7-club is found to be in target set:  msc = odd black cards  lsc = set of all cards  Step 3: 8-heart is not in target set  msc = odd black cards  lsc = all odd cards OR all black cards ... The training examples (obtained) incrementally:

Foundations of Artificial Intelligence 27 Predicate as a Decision Tree The predicate CONCEPT(x)  A(x)  (  B(x) v C(x)) can be represented by the following decision tree: A? B? C? True FalseTrue False Example: A mushroom is poisonous iff it is yellow and small, or yellow, big and spotted x is a mushroom CONCEPT = POISONOUS A = YELLOW B = BIG C = SPOTTED

Foundations of Artificial Intelligence 28 Decision Trees  What is a Decision Tree  it takes as input the description of a situation as a set of attributes (features) and outputs a yes/no decision (so it represents a Boolean function)  each leaf is labeled "positive” or "negative", each node is labeled with an attribute (or feature), and each edge is labeled with a value for the feature of its parent node  Attribute-value language for examples  in many inductive tasks, especially learning decision trees, we need a representation language for examples  each example is a finite feature vector  a concept is a decision tree where nodes are features

Foundations of Artificial Intelligence 29 Decision Trees  Example: “is it a good day to play golf?”  a set of attributes and their possible values: outlooksunny, overcast, rain temperaturecool, mild, hot humidityhigh, normal windytrue, false A particular instance in the training set might be: : play In this case, the target class is a binary attribute, so each instance represents a positive or a negative example.

Foundations of Artificial Intelligence 30 Using Decision Trees for Classification  Examples can be classified as follows  1. look at the example's value for the feature specified  2. move along the edge labeled with this value  3. if you reach a leaf, return the label of the leaf  4. otherwise, repeat from step 1  Example (a decision tree to decide whether to go play golf): outlook humiditywindy yes no yes no sunny overcast rain highnormal true false

Foundations of Artificial Intelligence 31 Classification: 3 Step Process  1. Model construction (Learning):  Each record (instance) is assumed to belong to a predefined class, as determined by one of the attributes, called the class label  The set of records used for construction of the model is called training set  The model is usually represented in the form of classification rules, (IF- THEN statements) or decision trees  2. Model Evaluation (Accuracy):  Estimate accuracy rate of the model based on a test set  The known label of test sample is compared to classified result from model  Accuracy rate: percentage of test set samples correctly classified by the model  Test set is independent of training set otherwise over-fitting will occur  3. Model Use (Classification):  The model is used to classify unseen instances (assigning class labels)  Predict the value of an actual attribute

Foundations of Artificial Intelligence 32 Memory-Based Reasoning  Basic Idea: classify new instances based on their similarity to instances we have seen before  also called “instance-based learning”  Simplest form of MBR: Rote Learning  learning by memorization  save all previously encountered instance; given a new instance, find one from the memorized set that most closely “resembles” the new one; assign new instance to the same class as the “nearest neighbor”  more general methods try to find k nearest neighbors rather than just one  but, how do we define “resembles?”  MBR is “lazy”  defers all of the real work until new instance is obtained; no attempts are made to learn a generalized model from the training set  less data preprocessing and model evaluation, but more work has to be done at classification time

Foundations of Artificial Intelligence 33  Collaborative Filtering or “Social Learning”  idea is to give recommendations to a user based on the “ratings” of objects by other users  usually assumes that features in the data are similar objects (e.g., Web pages, music, movies, etc.)  usually requires “explicit” ratings of objects by users based on a rating scale  there have been some attempts to obtain ratings implicitly based on user behavior (mixed results; problem is that implicit ratings are often binary)  Nearest Neighbors Strategy:  Find similar users and predicted (weighted) average of user ratings  We can use any distance or similarity measure to compute similarity among users (user ratings on items viewed as a vector)  In case of ratings, often the Pearson r algorithm is used to compute correlations MBR & Collaborative Filtering

Foundations of Artificial Intelligence 34 MBR & Collaborative Filtering  Collaborative Filtering Example  A movie rating system  Ratings scale: 1 = “detest”; 7 = “love it”  Historical DB of users includes ratings of movies by Sally, Bob, Chris, and Lynn  Karen is a new user who has rated 3 movies, but has not yet seen “Independence Day”; should we recommend it to her? Will Karen like “Independence Day?”

Foundations of Artificial Intelligence 35 Clustering  Cluster:  a collection of data objects that are “similar” to one another and thus can be treated collectively as one group  but as a collection, they are sufficiently different from other groups  Clustering  unsupervised classification  no predefined classes Clustering is a process of partitioning a set of data (or objects) in a set of meaningful sub-classes, called clusters Helps users understand the natural grouping or structure in a data set

Foundations of Artificial Intelligence 36 Distance or Similarity Measures  Measuring Distance  In order to group similar items, we need a way to measure the distance between objects (e.g., records)  Note: distance = inverse of similarity  Often based on the representation of objects as “feature vectors” An Employee DB Term Frequencies for Documents

Foundations of Artificial Intelligence 37 Distance or Similarity Measures  Common Distance Measures:  Manhattan distance:  Euclidean distance:  Cosine similarity:

Foundations of Artificial Intelligence 38 What Is Good Clustering?  A good clustering will produce high quality clusters in which:  the intra-class (that is, intra-cluster) similarity is high  the inter-class similarity is low  The quality of a clustering result also depends on both the similarity measure used by the method and its implementation  The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns  The quality of a clustering result also depends on the definition and representation of cluster chosen

Foundations of Artificial Intelligence 39 Applications of Clustering  Clustering has wide applications in Pattern Recognition  Spatial Data Analysis:  create thematic maps in GIS by clustering feature spaces  detect spatial clusters and explain them in spatial data mining  Image Processing  Market Research  Information Retrieval  Document or term categorization  Information visualization and IR interfaces  Web Mining  Cluster Web usage data to discover groups of similar access patterns  Web Personalization

Foundations of Artificial Intelligence 40 Learning by Discovery  One example: AM by Doug Lenat at Stanford  a mathematical system  inputs: set theory (union, intersection, etc); “how to do mathematics” (based on a book by Polya), e.g., if f is an interesting function of two arguments, then f(x,x) is an interesting function on one, etc.  speculated about what was interesting an made conjectures, etc.  What AM discovered  integers (as equivalence relation on cardinality of sets)  addition (using disjoint union of sets)  multiplication  primes: 1 was interesting, the function returning the cardinality of set of divisors was interesting, etc.  Glodbach’s conjecture: “all even numbers are the sum of two prime numbers”; (note that AM did not prove it, just discovered that it was interesting)  Why was AM so successful?  Connection between LISP and mathematics (mutations of small bits of LISP code are likely to be interesting)  Doesn’t extend to other domains  Lessons from EURISKO (fleet game)

Foundations of Artificial Intelligence 41 Explanation-Based Learning  Explanation- based learning (EBL) systems try to explain why each training instance belongs to the target concept.  The resulting “proof” is then generalized and saved.  If a new instance can be explained in the same manner as a previous instance, then it is also assumed to be a member of the target concept.  Like macro- operators, EBL systems never learn to solve a problem that they couldn’t solve before (in principle).  However, they can become much more efficient at problem-solving by reorganizing the search space.  One of the strengths of EBL is that the resulting “explanations” are typically easy to understand.  One of the weaknesses of EBL is that they rely on a domain theory to generate the explanations.

Foundations of Artificial Intelligence 42 Case-Based Learning  Case-based reasoning (CBR) systems keep track of previously seen instances and apply them directly to new ones.  In general, a CBR system simply stores each “case” that it experiences in a “case base” which represents its memory of previous episodes.  To reason about a new instance, the system consults its case base and finds the most similar case that it’s seen before. The old case is then adapted and applied to the new situation.  CBR is similar to reasoning by analogy. Many people believe that much of human learning is case- based in nature.

Foundations of Artificial Intelligence 43 Connectionist Algorithms  Connectionist models (also called neural networks) are inspired by the interconnectivity of the brain.  Connectionist networks typically consist of many nodes that are highly interconnected. When a node is activated, it sends signals to other nodes so that they are activated in turn.  Using layers of nodes allows connectionist models to learn fairly complex functions.  Neural networks are loosely modeled after the biological processes involved in cognition:  1. Information processing involves many simple elements called neurons.  2. Signals are transmitted between neurons using connecting links.  3. Each link has a weight that controls the strength of its signal.  4. Each neuron applies an activation function to the input that it receives from other neurons. This function determines its output.

Machine Learning Foundations of Artificial Intelligence.

Similar presentations

Presentation on theme: "Machine Learning Foundations of Artificial Intelligence."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Machine Learning Foundations of Artificial Intelligence.

Similar presentations

Presentation on theme: "Machine Learning Foundations of Artificial Intelligence."— Presentation transcript:

Similar presentations

About project

Feedback