Knowledge Representation

Slides:



Advertisements
Similar presentations
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
Advertisements

Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning III Decision Tree Induction
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
1er. Escuela Red ProTIC - Tandil, de Abril, Decision Tree Learning 3.1 Introduction –Method for approximation of discrete-valued target functions.
Classification Algorithms
RIPPER Fast Effective Rule Induction
ICS320-Foundations of Adaptive and Learning Systems
Classification Techniques: Decision Tree Learning
Jesse Davis Machine Learning Jesse Davis
Machine Learning II Decision Tree Induction CSE 473.
Part 7.3 Decision Trees Decision tree representation ID3 learning algorithm Entropy, information gain Overfitting.
Induction of Decision Trees
Machine Learning CSE 473. © Daniel S. Weld Topics Agency Problem Spaces Search Knowledge Representation Reinforcement Learning InferencePlanning.
Three kinds of learning
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
Decision Tree Learning
Machine Learning Chapter 3. Decision Tree Learning
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Mohammad Ali Keyvanrad
Machine Learning Lecture 10 Decision Tree Learning 1.
CpSc 810: Machine Learning Decision Tree Learning.
Learning from Observations Chapter 18 Through
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Machine Learning II Decision Tree Induction CSE 573.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Decision Tree Learning
Well Posed Learning Problems Must identify the following 3 features –Learning Task: the thing you want to learn. –Performance measure: must know when you.
Outline Logistics Review Machine Learning –Induction of Decision Trees (7.2) –Version Spaces & Candidate Elimination –PAC Learning Theory (7.1) –Ensembles.
Chapter 18 Section 1 – 3 Learning from Observations.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
Learning From Observations Inductive Learning Decision Trees Ensembles.
Machine Learning & Datamining CSE 454. © Daniel S. Weld 2 Project Part 1 Feedback Serialization Java Supplied vs. Manual.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Supervise Learning Introduction. What is Learning Problem Learning = Improving with experience at some task – Improve over task T, – With respect to performance.
CSE573 Autumn /09/98 Machine Learning Administrative –Last topic: Decision Tree Learning Reading: 5.1, 5.4 Last time –finished NLP sample system’s.
CSE573 Autumn /11/98 Machine Learning Administrative –Finish this topic –The rest of the time is yours –Final exam Tuesday, Mar. 17, 2:30-4:20.
Decision Tree Learning
Learning from Observations
Learning from Observations
Machine Learning Inductive Learning and Decision Trees
DECISION TREES An internal node represents a test on an attribute.
CS 9633 Machine Learning Decision Tree Learning
Decision Tree Learning
Decision trees (concept learnig)
Machine Learning Lecture 2: Decision Tree Learning.
Introduce to machine learning
Decision trees (concept learnig)
Classification Algorithms
Decision Tree Learning
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Classification and Prediction
Machine Learning Chapter 3. Decision Tree Learning
Machine Learning: Lecture 3
Decision Trees Decision tree representation ID3 learning algorithm
Machine Learning Chapter 3. Decision Tree Learning
Why Machine Learning Flood of data
Decision Trees.
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Learning from Observations
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Decision Trees Decision tree representation ID3 learning algorithm
Learning from Observations
A task of induction to find patterns
Decision Trees Jeff Storey.
A task of induction to find patterns
Presentation transcript:

Knowledge Representation 573 Topics Perception NLP Robotics Multi-agent Reinforcement Learning MDPs Supervised Learning Planning Uncertainty Search Knowledge Representation Expressiveness Tractability Problem Spaces Agency © Daniel S. Weld

} Weekly Course Outline Overview, agents, problem spaces Search Learning Planning (and CSPs) Uncertainty Planning under uncertainty Reinforcement learning Topics Project Presentations & Contest } Search Knowledge Representation © Daniel S. Weld

Search Problem spaces Blind Depth-first, breadth-first, iterative-deepening, iterative broadening Informed Best-first, Dijkstra's, A*, IDA*, SMA*, DFB&B, Beam, hill climb,limited discrep, AO*, LAO*, RTDP Local search Heuristics Evaluation, construction, learning Pattern databases Online methods Techniques from operations research Constraint satisfaction Adversary search © Daniel S. Weld

KR Propositional logic First-order logic Syntax (CNF, Horn clauses, …) Semantics (Truth Tables) Inference (FC, Resolution, DPLL, WalkSAT) First-order logic Syntax (quantifiers, skolem functions, … Semantics (Interpretations) Inference (FC, Resolution, Compilation) Frame systems Two theses… Representing events, action & change © Daniel S. Weld

Machine Learning Overview What is Inductive Learning? Understand a ML technique in depth Induction of Decision Trees Bagging Overfitting Applications © Daniel S. Weld

Inductive Learning Inductive learning or “Prediction”: Classification Given examples of a function (X, F(X)) Predict function F(X) for new examples X Classification F(X) = Discrete Regression F(X) = Continuous Probability estimation F(X) = Probability(X): © Daniel S. Weld

Widely-used Approaches Decision trees Rule induction Bayesian learning Neural networks Genetic algorithms Instance-based learning Support Vector Machines Etc. © Daniel S. Weld

Defining a Learning Problem A program is said to learn from experience E with respect to task T and performance measure P, if it’s performance at tasks in T, as measured by P, improves with experience E. Task T: Playing checkers Performance Measure P: Percent of games won against opponents Experience E: Playing practice games against itself © Daniel S. Weld

Issues What feedback (experience) is available? How should these features be represented? What kind of knowledge is being increased? How is that knowledge represented? What prior information is available? What is the right learning algorithm? How avoid overfitting? © Daniel S. Weld

Choosing the Training Experience Credit assignment problem: Direct training examples: E.g. individual checker boards + correct move for each Supervised learning Indirect training examples : E.g. complete sequence of moves and final result Reinforcement learning Which examples: Random, teacher chooses, learner chooses © Daniel S. Weld

Choosing the Target Function What type of knowledge will be learned? How will the knowledge be used by the performance program? E.g. checkers program Assume it knows legal moves Needs to choose best move So learn function: F: Boards -> Moves hard to learn Alternative: F: Boards -> R Note similarity to choice of problem space © Daniel S. Weld

The Ideal Evaluation Function V(b) = 100 if b is a final, won board V(b) = -100 if b is a final, lost board V(b) = 0 if b is a final, drawn board Otherwise, if b is not final V(b) = V(s) where s is best, reachable final board Nonoperational… Want operational approximation of V: V © Daniel S. Weld

How Represent Target Function x1 = number of black pieces on the board x2 = number of red pieces on the board x3 = number of black kings on the board x4 = number of red kings on the board x5 = num of black pieces threatened by red x6 = num of red pieces threatened by black V(b) = a + bx1 + cx2 + dx3 + ex4 + fx5 + gx6 Now just need to learn 7 numbers! © Daniel S. Weld

Example: Checkers Task T: Performance Measure P: Experience E: Playing checkers Performance Measure P: Percent of games won against opponents Experience E: Playing practice games against itself Target Function V: board -> R Representation of approx. of target function V(b) = a + bx1 + cx2 + dx3 + ex4 + fx5 + gx6 © Daniel S. Weld

Target Function Profound Formulation: Can express any type of inductive learning as approximating a function E.g., Checkers V: boards -> evaluation E.g., Handwriting recognition V: image -> word E.g., Mushrooms V: mushroom-attributes -> {E, P} © Daniel S. Weld

© Daniel S. Weld

More Examples Collaborative Filtering (Parts of) Robocode Controller © Daniel S. Weld

© Daniel S. Weld

Bias Introduced concept by name here Two types The problem of disjunction Two types Restriction Preference © Daniel S. Weld

© Daniel S. Weld

© Daniel S. Weld

© Daniel S. Weld

© Daniel S. Weld

© Daniel S. Weld

© Daniel S. Weld

© Daniel S. Weld

© Daniel S. Weld

© Daniel S. Weld

Decision Trees Convenient Representation Expressive Developed with learning in mind Deterministic Expressive Equivalent to propositional DNF Handles discrete and continuous parameters Simple learning algorithm Handles noise well Classify as follows Constructive (build DT by adding nodes) Eager Batch (but incremental versions exist) © Daniel S. Weld

Concept Learning E.g. Learn concept “Edible mushroom” Target Function has two values: T or F Represent concepts as decision trees Use hill climbing search Thru space of decision trees Start with simple concept Refine it into a complex concept as needed © Daniel S. Weld

Example: “Good day for tennis” Attributes of instances Wind Temperature Humidity Outlook Feature = attribute with one value E.g. outlook = sunny Sample instance wind=weak, temp=hot, humidity=high, outlook=sunny © Daniel S. Weld

Experience: “Good day for tennis” Day Outlook Temp Humid Wind PlayTennis? d1 s h h w n d2 s h h s n d3 o h h w y d4 r m h w y d5 r c n w y d6 r c n s y d7 o c n s y d8 s m h w n d9 s c n w y d10 r m n w y d11 s m n s y d12 o m h s y d13 o h n w y d14 r m h s n © Daniel S. Weld

Decision Tree Representation Good day for tennis? Leaves = classification Arcs = choice of value for parent attribute Outlook Sunny Overcast Rain Humidity Wind Yes Strong Normal High Weak Yes No No Yes Decision tree is equivalent to logic in disjunctive normal form G-Day  (Sunny  Normal)  Overcast  (Rain  Weak) © Daniel S. Weld

© Daniel S. Weld

© Daniel S. Weld

© Daniel S. Weld

DT Learning as Search Nodes Operators Initial node Heuristic? Goal? Decision Trees Tree Refinement: Sprouting the tree Smallest tree possible: a single leaf Information Gain Best tree possible (???) © Daniel S. Weld

What is the Simplest Tree? Day Outlook Temp Humid Wind Play? d1 s h h w n d2 s h h s n d3 o h h w y d4 r m h w y d5 r c n w y d6 r c n s y d7 o c n s y d8 s m h w n d9 s c n w y d10 r m n w y d11 s m n s y d12 o m h s y d13 o h n w y d14 r m h s n What is the Simplest Tree? How good? [10+, 4-] Means: correct on 10 examples incorrect on 4 examples © Daniel S. Weld

Which attribute should we use to split? Successors Yes Humid Wind Outlook Temp Which attribute should we use to split? © Daniel S. Weld

To be decided: How to choose best attribute? Information gain Entropy (disorder) When to stop growing tree? © Daniel S. Weld

Disorder is bad Homogeneity is good No Better Good © Daniel S. Weld

Entropy 1.0 0.5 P as % .00 .50 1.00 © Daniel S. Weld

Entropy (disorder) is bad Homogeneity is good Let S be a set of examples Entropy(S) = -P log2(P) - N log2(N) where P is proportion of pos example and N is proportion of neg examples and 0 log 0 == 0 Example: S has 10 pos and 4 neg Entropy([10+, 4-]) = -(10/14) log2(10/14) - (4/14)log2(4/14) = 0.863 © Daniel S. Weld

 Information Gain Measure of expected reduction in entropy Resulting from splitting along an attribute  v  Values(A) Gain(S,A) = Entropy(S) - (|Sv| / |S|) Entropy(Sv) Where Entropy(S) = -P log2(P) - N log2(N) © Daniel S. Weld

Gain of Splitting on Wind Day Wind Tennis? d1 weak n d2 s n d3 weak yes d4 weak yes d5 weak yes d6 s yes d7 s yes d8 weak n d9 weak yes d10 weak yes d11 s yes d12 s yes d13 weak yes d14 s n Values(wind)=weak, strong S = [10+, 4-] Sweak = [6+, 2-] Ss = [3+, 3-] Gain(S, wind) = Entropy(S) - (|Sv| / |S|) Entropy(Sv) = Entropy(S) - 8/14 Entropy(Sweak) - 6/14 Entropy(Ss) = 0.863 - (8/14) 0.811 - (6/14) 1.00 = -0.029  v  {weak, s} © Daniel S. Weld

Evaluating Attributes Yes Humid Wind Gain(S,Wind) =-0.029 Gain(S,Humid) =0.151 Outlook Temp Value is actually different than this, but ignore this detail Gain(S,Temp) =0.029 Gain(S,Outlook) =0.170 © Daniel S. Weld

Resulting Tree …. Good day for tennis? Outlook Sunny Overcast Rain No [2+, 3-] Yes [4+] No [2+, 3-] © Daniel S. Weld

Recurse! Outlook Sunny Day Temp Humid Wind Tennis? d1 h h weak n d2 h h s n d8 m h weak n d9 c n weak yes d11 m n s yes © Daniel S. Weld

One Step Later… Outlook Sunny Overcast Rain Humidity Yes [4+] No [2+, 3-] Normal High Yes [2+] No [3-] © Daniel S. Weld

Decision Tree Algorithm BuildTree(TraingData) Split(TrainingData) Split(D) If (all points in D are of the same class) Then Return For each attribute A Evaluate splits on attribute A Use best split to partition D into D1, D2 Split (D1) Split (D2) © Daniel S. Weld

Application to Robocode Assume Fixed Architecture If <in-danger> Then escape: If <ok-escape-dir> Then forward 45 Else … Else attack: If <shooting-best> Then if <gun-aimed> Then … Else ram: … From training set generated Learn these classifiers By trial battles © Daniel S. Weld

Movie Recommendation Features? Rambo Matrix Rambo 2 © Daniel S. Weld

Issues Content vs. Social Non-Boolean Attributes Missing Data Scaling up © Daniel S. Weld

Stopped here © Daniel S. Weld

Missing Data 1 Assign attribute most common value at node Day Temp Humid Wind Tennis? d1 h h weak n d2 h h s n d8 m h weak n d9 c weak yes d11 m n s yes Or most common value, … given classification © Daniel S. Weld

Fractional Values 75% h and 25% n Use in gain calculations [0.75+, 3-] Day Temp Humid Wind Tennis? d1 h h weak n d2 h h s n d8 m h weak n d9 c weak yes d11 m n s yes [1.25+, 0-] 75% h and 25% n Use in gain calculations Further subdivide if other missing attributes Same approach to classify test ex with missing attr Classification is most probable classification Summing over leaves where it got divided © Daniel S. Weld

Non-Boolean Features Features with multiple discrete values Construct a multi-way split Test for one value vs. all of the others? Group values into two disjoint subsets? Real-valued Features Discretize? Consider a threshold split using observed values? © Daniel S. Weld

© Daniel S. Weld

Skipping remaining slides © Daniel S. Weld

© Daniel S. Weld

What if data is too big for main memory? How evaluate split points? Discrete attributes Continuous attributes How to partition data once a split point is found? Post pruning © Daniel S. Weld

=> Attribute Lists Training data Attribute lists one per attribute written to disk If attribute is continuous, Sort on that attribute © Daniel S. Weld

Evaluating an Index Gini Index Easy to evaluate – only requires histogram Low value is good Sum over j atttributes gini(D) = 1 - pj2 ginisplit(D) = |D1|/|D| gini(D1) + |D2|/|D| gini(D2) © Daniel S. Weld

Histograms Associated with each node in (growing) tree Record distribution of class values Used to evaluate splits Continuous Attributes C_above C_below Discrete Attributes Count matrix © Daniel S. Weld

Evaluating continuous split points © Daniel S. Weld

Evaluating discrete split points © Daniel S. Weld

Performing the Split Must split every attribute list in two Easy for the ‘winning’ attribute Just scan through old list Add each entry to correct one of two new lists Remaining attributes are harder (No test to apply to those attributes to find correct part) Use RID Details coming up…. © Daniel S. Weld

Splitting a node’s attribute list © Daniel S. Weld

Splitting with Limited Memory Scan thru list for ‘winning’ attribute Just scan through old list Add each entry to correct one of two new lists Insert RID into hash table, noting destination child: L, R Later, scan each of the other attribute lists Probe hash table with RID to find destination Write current record to one of two child lists If hash table is too big for memory, do in stages Only partition splitting attribute part way E.g. until one has scanned as many records as would fit in hash table given memory constraints © Daniel S. Weld

Evaluation Machine had only 16MB RAM © Daniel S. Weld