Knowledge Representation

Knowledge Representation
573 Topics Perception NLP Robotics Multi-agent Reinforcement Learning MDPs Supervised Learning Planning Uncertainty Search Knowledge Representation Expressiveness Tractability Problem Spaces Agency © Daniel S. Weld

} Weekly Course Outline Overview, agents, problem spaces Search
Learning Planning (and CSPs) Uncertainty Planning under uncertainty Reinforcement learning Topics Project Presentations & Contest } Search Knowledge Representation © Daniel S. Weld

Search Problem spaces Blind
Depth-first, breadth-first, iterative-deepening, iterative broadening Informed Best-first, Dijkstra's, A*, IDA*, SMA*, DFB&B, Beam, hill climb,limited discrep, AO*, LAO*, RTDP Local search Heuristics Evaluation, construction, learning Pattern databases Online methods Techniques from operations research Constraint satisfaction Adversary search © Daniel S. Weld

KR Propositional logic First-order logic
Syntax (CNF, Horn clauses, …) Semantics (Truth Tables) Inference (FC, Resolution, DPLL, WalkSAT) First-order logic Syntax (quantifiers, skolem functions, … Semantics (Interpretations) Inference (FC, Resolution, Compilation) Frame systems Two theses… Representing events, action & change © Daniel S. Weld

Machine Learning Overview
What is Inductive Learning? Understand a ML technique in depth Induction of Decision Trees Bagging Overfitting Applications © Daniel S. Weld

Inductive Learning Inductive learning or “Prediction”: Classification
Given examples of a function (X, F(X)) Predict function F(X) for new examples X Classification F(X) = Discrete Regression F(X) = Continuous Probability estimation F(X) = Probability(X): © Daniel S. Weld

Widely-used Approaches
Decision trees Rule induction Bayesian learning Neural networks Genetic algorithms Instance-based learning Support Vector Machines Etc. © Daniel S. Weld

Defining a Learning Problem
A program is said to learn from experience E with respect to task T and performance measure P, if it’s performance at tasks in T, as measured by P, improves with experience E. Task T: Playing checkers Performance Measure P: Percent of games won against opponents Experience E: Playing practice games against itself © Daniel S. Weld

Issues What feedback (experience) is available?
How should these features be represented? What kind of knowledge is being increased? How is that knowledge represented? What prior information is available? What is the right learning algorithm? How avoid overfitting? © Daniel S. Weld

Choosing the Training Experience
Credit assignment problem: Direct training examples: E.g. individual checker boards + correct move for each Supervised learning Indirect training examples : E.g. complete sequence of moves and final result Reinforcement learning Which examples: Random, teacher chooses, learner chooses © Daniel S. Weld

Choosing the Target Function
What type of knowledge will be learned? How will the knowledge be used by the performance program? E.g. checkers program Assume it knows legal moves Needs to choose best move So learn function: F: Boards -> Moves hard to learn Alternative: F: Boards -> R Note similarity to choice of problem space © Daniel S. Weld

The Ideal Evaluation Function
V(b) = 100 if b is a final, won board V(b) = -100 if b is a final, lost board V(b) = 0 if b is a final, drawn board Otherwise, if b is not final V(b) = V(s) where s is best, reachable final board Nonoperational… Want operational approximation of V: V © Daniel S. Weld

How Represent Target Function
x1 = number of black pieces on the board x2 = number of red pieces on the board x3 = number of black kings on the board x4 = number of red kings on the board x5 = num of black pieces threatened by red x6 = num of red pieces threatened by black V(b) = a + bx1 + cx2 + dx3 + ex4 + fx5 + gx6 Now just need to learn 7 numbers! © Daniel S. Weld

Example: Checkers Task T: Performance Measure P: Experience E:
Playing checkers Performance Measure P: Percent of games won against opponents Experience E: Playing practice games against itself Target Function V: board -> R Representation of approx. of target function V(b) = a + bx1 + cx2 + dx3 + ex4 + fx5 + gx6 © Daniel S. Weld

Target Function Profound Formulation:
Can express any type of inductive learning as approximating a function E.g., Checkers V: boards -> evaluation E.g., Handwriting recognition V: image -> word E.g., Mushrooms V: mushroom-attributes -> {E, P} © Daniel S. Weld

More Examples Collaborative Filtering (Parts of) Robocode Controller
© Daniel S. Weld

Bias Introduced concept by name here Two types
The problem of disjunction Two types Restriction Preference © Daniel S. Weld

Decision Trees Convenient Representation Expressive
Developed with learning in mind Deterministic Expressive Equivalent to propositional DNF Handles discrete and continuous parameters Simple learning algorithm Handles noise well Classify as follows Constructive (build DT by adding nodes) Eager Batch (but incremental versions exist) © Daniel S. Weld

Concept Learning E.g. Learn concept “Edible mushroom”
Target Function has two values: T or F Represent concepts as decision trees Use hill climbing search Thru space of decision trees Start with simple concept Refine it into a complex concept as needed © Daniel S. Weld

Example: “Good day for tennis”
Attributes of instances Wind Temperature Humidity Outlook Feature = attribute with one value E.g. outlook = sunny Sample instance wind=weak, temp=hot, humidity=high, outlook=sunny © Daniel S. Weld

Experience: “Good day for tennis”
Day Outlook Temp Humid Wind PlayTennis? d1 s h h w n d2 s h h s n d3 o h h w y d4 r m h w y d5 r c n w y d6 r c n s y d7 o c n s y d8 s m h w n d9 s c n w y d10 r m n w y d11 s m n s y d12 o m h s y d13 o h n w y d14 r m h s n © Daniel S. Weld

Decision Tree Representation
Good day for tennis? Leaves = classification Arcs = choice of value for parent attribute Outlook Sunny Overcast Rain Humidity Wind Yes Strong Normal High Weak Yes No No Yes Decision tree is equivalent to logic in disjunctive normal form G-Day  (Sunny  Normal)  Overcast  (Rain  Weak) © Daniel S. Weld

DT Learning as Search Nodes Operators Initial node Heuristic? Goal?
Decision Trees Tree Refinement: Sprouting the tree Smallest tree possible: a single leaf Information Gain Best tree possible (???) © Daniel S. Weld

What is the Simplest Tree?
Day Outlook Temp Humid Wind Play? d1 s h h w n d2 s h h s n d3 o h h w y d4 r m h w y d5 r c n w y d6 r c n s y d7 o c n s y d8 s m h w n d9 s c n w y d10 r m n w y d11 s m n s y d12 o m h s y d13 o h n w y d14 r m h s n What is the Simplest Tree? How good? [10+, 4-] Means: correct on 10 examples incorrect on 4 examples © Daniel S. Weld

Which attribute should we use to split?
Successors Yes Humid Wind Outlook Temp Which attribute should we use to split? © Daniel S. Weld

To be decided: How to choose best attribute?
Information gain Entropy (disorder) When to stop growing tree? © Daniel S. Weld

Disorder is bad Homogeneity is good
No Better Good © Daniel S. Weld

Entropy 1.0 0.5 P as % © Daniel S. Weld

Entropy (disorder) is bad Homogeneity is good
Let S be a set of examples Entropy(S) = -P log2(P) - N log2(N) where P is proportion of pos example and N is proportion of neg examples and 0 log 0 == 0 Example: S has 10 pos and 4 neg Entropy([10+, 4-]) = -(10/14) log2(10/14) - (4/14)log2(4/14) = 0.863 © Daniel S. Weld

 Information Gain Measure of expected reduction in entropy
Resulting from splitting along an attribute  v  Values(A) Gain(S,A) = Entropy(S) (|Sv| / |S|) Entropy(Sv) Where Entropy(S) = -P log2(P) - N log2(N) © Daniel S. Weld

Gain of Splitting on Wind
Day Wind Tennis? d1 weak n d2 s n d3 weak yes d4 weak yes d5 weak yes d6 s yes d7 s yes d8 weak n d9 weak yes d10 weak yes d11 s yes d12 s yes d13 weak yes d14 s n Values(wind)=weak, strong S = [10+, 4-] Sweak = [6+, 2-] Ss = [3+, 3-] Gain(S, wind) = Entropy(S) (|Sv| / |S|) Entropy(Sv) = Entropy(S) - 8/14 Entropy(Sweak) - 6/14 Entropy(Ss) = (8/14) (6/14) 1.00 =  v  {weak, s} © Daniel S. Weld

Evaluating Attributes
Yes Humid Wind Gain(S,Wind) =-0.029 Gain(S,Humid) =0.151 Outlook Temp Value is actually different than this, but ignore this detail Gain(S,Temp) =0.029 Gain(S,Outlook) =0.170 © Daniel S. Weld

Resulting Tree …. Good day for tennis? Outlook Sunny Overcast Rain No
[2+, 3-] Yes [4+] No [2+, 3-] © Daniel S. Weld

Recurse! Outlook Sunny Day Temp Humid Wind Tennis? d1 h h weak n
d2 h h s n d8 m h weak n d9 c n weak yes d11 m n s yes © Daniel S. Weld

One Step Later… Outlook Sunny Overcast Rain Humidity Yes [4+] No
[2+, 3-] Normal High Yes [2+] No [3-] © Daniel S. Weld

Decision Tree Algorithm
BuildTree(TraingData) Split(TrainingData) Split(D) If (all points in D are of the same class) Then Return For each attribute A Evaluate splits on attribute A Use best split to partition D into D1, D2 Split (D1) Split (D2) © Daniel S. Weld

Application to Robocode
Assume Fixed Architecture If <in-danger> Then escape: If <ok-escape-dir> Then forward 45 Else … Else attack: If <shooting-best> Then if <gun-aimed> Then … Else ram: … From training set generated Learn these classifiers By trial battles © Daniel S. Weld

Missing Data 1 Assign attribute most common value at node
Day Temp Humid Wind Tennis? d1 h h weak n d2 h h s n d8 m h weak n d9 c weak yes d11 m n s yes Or most common value, … given classification © Daniel S. Weld

Fractional Values 75% h and 25% n Use in gain calculations
[0.75+, 3-] Day Temp Humid Wind Tennis? d1 h h weak n d2 h h s n d8 m h weak n d9 c weak yes d11 m n s yes [1.25+, 0-] 75% h and 25% n Use in gain calculations Further subdivide if other missing attributes Same approach to classify test ex with missing attr Classification is most probable classification Summing over leaves where it got divided © Daniel S. Weld

Non-Boolean Features Features with multiple discrete values
Construct a multi-way split Test for one value vs. all of the others? Group values into two disjoint subsets? Real-valued Features Discretize? Consider a threshold split using observed values? © Daniel S. Weld

What if data is too big for main memory?
How evaluate split points? Discrete attributes Continuous attributes How to partition data once a split point is found? Post pruning © Daniel S. Weld

=> Attribute Lists Training data Attribute lists one per attribute
written to disk If attribute is continuous, Sort on that attribute © Daniel S. Weld

Evaluating an Index Gini Index
Easy to evaluate – only requires histogram Low value is good Sum over j atttributes gini(D) = 1 - pj2 ginisplit(D) = |D1|/|D| gini(D1) + |D2|/|D| gini(D2) © Daniel S. Weld

Histograms Associated with each node in (growing) tree
Record distribution of class values Used to evaluate splits Continuous Attributes C_above C_below Discrete Attributes Count matrix © Daniel S. Weld

Performing the Split Must split every attribute list in two
Easy for the ‘winning’ attribute Just scan through old list Add each entry to correct one of two new lists Remaining attributes are harder (No test to apply to those attributes to find correct part) Use RID Details coming up…. © Daniel S. Weld

Splitting with Limited Memory
Scan thru list for ‘winning’ attribute Just scan through old list Add each entry to correct one of two new lists Insert RID into hash table, noting destination child: L, R Later, scan each of the other attribute lists Probe hash table with RID to find destination Write current record to one of two child lists If hash table is too big for memory, do in stages Only partition splitting attribute part way E.g. until one has scanned as many records as would fit in hash table given memory constraints © Daniel S. Weld

Knowledge Representation

Similar presentations

Presentation on theme: "Knowledge Representation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Knowledge Representation

Similar presentations

Presentation on theme: "Knowledge Representation"— Presentation transcript:

Similar presentations

About project

Feedback