Download presentation
Presentation is loading. Please wait.
1
Knowledge Representation
573 Topics Perception NLP Robotics Multi-agent Reinforcement Learning MDPs Supervised Learning Planning Uncertainty Search Knowledge Representation Expressiveness Tractability Problem Spaces Agency © Daniel S. Weld
2
} Weekly Course Outline Overview, agents, problem spaces Search
Learning Planning (and CSPs) Uncertainty Planning under uncertainty Reinforcement learning Topics Project Presentations & Contest } Search Knowledge Representation © Daniel S. Weld
3
Search Problem spaces Blind
Depth-first, breadth-first, iterative-deepening, iterative broadening Informed Best-first, Dijkstra's, A*, IDA*, SMA*, DFB&B, Beam, hill climb,limited discrep, AO*, LAO*, RTDP Local search Heuristics Evaluation, construction, learning Pattern databases Online methods Techniques from operations research Constraint satisfaction Adversary search © Daniel S. Weld
4
KR Propositional logic First-order logic
Syntax (CNF, Horn clauses, …) Semantics (Truth Tables) Inference (FC, Resolution, DPLL, WalkSAT) First-order logic Syntax (quantifiers, skolem functions, … Semantics (Interpretations) Inference (FC, Resolution, Compilation) Frame systems Two theses… Representing events, action & change © Daniel S. Weld
5
Machine Learning Overview
What is Inductive Learning? Understand a ML technique in depth Induction of Decision Trees Bagging Overfitting Applications © Daniel S. Weld
6
Inductive Learning Inductive learning or “Prediction”: Classification
Given examples of a function (X, F(X)) Predict function F(X) for new examples X Classification F(X) = Discrete Regression F(X) = Continuous Probability estimation F(X) = Probability(X): © Daniel S. Weld
7
Widely-used Approaches
Decision trees Rule induction Bayesian learning Neural networks Genetic algorithms Instance-based learning Support Vector Machines Etc. © Daniel S. Weld
8
Defining a Learning Problem
A program is said to learn from experience E with respect to task T and performance measure P, if it’s performance at tasks in T, as measured by P, improves with experience E. Task T: Playing checkers Performance Measure P: Percent of games won against opponents Experience E: Playing practice games against itself © Daniel S. Weld
9
Issues What feedback (experience) is available?
How should these features be represented? What kind of knowledge is being increased? How is that knowledge represented? What prior information is available? What is the right learning algorithm? How avoid overfitting? © Daniel S. Weld
10
Choosing the Training Experience
Credit assignment problem: Direct training examples: E.g. individual checker boards + correct move for each Supervised learning Indirect training examples : E.g. complete sequence of moves and final result Reinforcement learning Which examples: Random, teacher chooses, learner chooses © Daniel S. Weld
11
Choosing the Target Function
What type of knowledge will be learned? How will the knowledge be used by the performance program? E.g. checkers program Assume it knows legal moves Needs to choose best move So learn function: F: Boards -> Moves hard to learn Alternative: F: Boards -> R Note similarity to choice of problem space © Daniel S. Weld
12
The Ideal Evaluation Function
V(b) = 100 if b is a final, won board V(b) = -100 if b is a final, lost board V(b) = 0 if b is a final, drawn board Otherwise, if b is not final V(b) = V(s) where s is best, reachable final board Nonoperational… Want operational approximation of V: V © Daniel S. Weld
13
How Represent Target Function
x1 = number of black pieces on the board x2 = number of red pieces on the board x3 = number of black kings on the board x4 = number of red kings on the board x5 = num of black pieces threatened by red x6 = num of red pieces threatened by black V(b) = a + bx1 + cx2 + dx3 + ex4 + fx5 + gx6 Now just need to learn 7 numbers! © Daniel S. Weld
14
Example: Checkers Task T: Performance Measure P: Experience E:
Playing checkers Performance Measure P: Percent of games won against opponents Experience E: Playing practice games against itself Target Function V: board -> R Representation of approx. of target function V(b) = a + bx1 + cx2 + dx3 + ex4 + fx5 + gx6 © Daniel S. Weld
15
Target Function Profound Formulation:
Can express any type of inductive learning as approximating a function E.g., Checkers V: boards -> evaluation E.g., Handwriting recognition V: image -> word E.g., Mushrooms V: mushroom-attributes -> {E, P} © Daniel S. Weld
16
© Daniel S. Weld
17
More Examples Collaborative Filtering (Parts of) Robocode Controller
© Daniel S. Weld
18
© Daniel S. Weld
19
Bias Introduced concept by name here Two types
The problem of disjunction Two types Restriction Preference © Daniel S. Weld
20
© Daniel S. Weld
21
© Daniel S. Weld
22
© Daniel S. Weld
23
© Daniel S. Weld
24
© Daniel S. Weld
25
© Daniel S. Weld
26
© Daniel S. Weld
27
© Daniel S. Weld
28
© Daniel S. Weld
29
Decision Trees Convenient Representation Expressive
Developed with learning in mind Deterministic Expressive Equivalent to propositional DNF Handles discrete and continuous parameters Simple learning algorithm Handles noise well Classify as follows Constructive (build DT by adding nodes) Eager Batch (but incremental versions exist) © Daniel S. Weld
30
Concept Learning E.g. Learn concept “Edible mushroom”
Target Function has two values: T or F Represent concepts as decision trees Use hill climbing search Thru space of decision trees Start with simple concept Refine it into a complex concept as needed © Daniel S. Weld
31
Example: “Good day for tennis”
Attributes of instances Wind Temperature Humidity Outlook Feature = attribute with one value E.g. outlook = sunny Sample instance wind=weak, temp=hot, humidity=high, outlook=sunny © Daniel S. Weld
32
Experience: “Good day for tennis”
Day Outlook Temp Humid Wind PlayTennis? d1 s h h w n d2 s h h s n d3 o h h w y d4 r m h w y d5 r c n w y d6 r c n s y d7 o c n s y d8 s m h w n d9 s c n w y d10 r m n w y d11 s m n s y d12 o m h s y d13 o h n w y d14 r m h s n © Daniel S. Weld
33
Decision Tree Representation
Good day for tennis? Leaves = classification Arcs = choice of value for parent attribute Outlook Sunny Overcast Rain Humidity Wind Yes Strong Normal High Weak Yes No No Yes Decision tree is equivalent to logic in disjunctive normal form G-Day (Sunny Normal) Overcast (Rain Weak) © Daniel S. Weld
34
© Daniel S. Weld
35
© Daniel S. Weld
36
© Daniel S. Weld
37
DT Learning as Search Nodes Operators Initial node Heuristic? Goal?
Decision Trees Tree Refinement: Sprouting the tree Smallest tree possible: a single leaf Information Gain Best tree possible (???) © Daniel S. Weld
38
What is the Simplest Tree?
Day Outlook Temp Humid Wind Play? d1 s h h w n d2 s h h s n d3 o h h w y d4 r m h w y d5 r c n w y d6 r c n s y d7 o c n s y d8 s m h w n d9 s c n w y d10 r m n w y d11 s m n s y d12 o m h s y d13 o h n w y d14 r m h s n What is the Simplest Tree? How good? [10+, 4-] Means: correct on 10 examples incorrect on 4 examples © Daniel S. Weld
39
Which attribute should we use to split?
Successors Yes Humid Wind Outlook Temp Which attribute should we use to split? © Daniel S. Weld
40
To be decided: How to choose best attribute?
Information gain Entropy (disorder) When to stop growing tree? © Daniel S. Weld
41
Disorder is bad Homogeneity is good
No Better Good © Daniel S. Weld
42
Entropy 1.0 0.5 P as % © Daniel S. Weld
43
Entropy (disorder) is bad Homogeneity is good
Let S be a set of examples Entropy(S) = -P log2(P) - N log2(N) where P is proportion of pos example and N is proportion of neg examples and 0 log 0 == 0 Example: S has 10 pos and 4 neg Entropy([10+, 4-]) = -(10/14) log2(10/14) - (4/14)log2(4/14) = 0.863 © Daniel S. Weld
44
Information Gain Measure of expected reduction in entropy
Resulting from splitting along an attribute v Values(A) Gain(S,A) = Entropy(S) (|Sv| / |S|) Entropy(Sv) Where Entropy(S) = -P log2(P) - N log2(N) © Daniel S. Weld
45
Gain of Splitting on Wind
Day Wind Tennis? d1 weak n d2 s n d3 weak yes d4 weak yes d5 weak yes d6 s yes d7 s yes d8 weak n d9 weak yes d10 weak yes d11 s yes d12 s yes d13 weak yes d14 s n Values(wind)=weak, strong S = [10+, 4-] Sweak = [6+, 2-] Ss = [3+, 3-] Gain(S, wind) = Entropy(S) (|Sv| / |S|) Entropy(Sv) = Entropy(S) - 8/14 Entropy(Sweak) - 6/14 Entropy(Ss) = (8/14) (6/14) 1.00 = v {weak, s} © Daniel S. Weld
46
Evaluating Attributes
Yes Humid Wind Gain(S,Wind) =-0.029 Gain(S,Humid) =0.151 Outlook Temp Value is actually different than this, but ignore this detail Gain(S,Temp) =0.029 Gain(S,Outlook) =0.170 © Daniel S. Weld
47
Resulting Tree …. Good day for tennis? Outlook Sunny Overcast Rain No
[2+, 3-] Yes [4+] No [2+, 3-] © Daniel S. Weld
48
Recurse! Outlook Sunny Day Temp Humid Wind Tennis? d1 h h weak n
d2 h h s n d8 m h weak n d9 c n weak yes d11 m n s yes © Daniel S. Weld
49
One Step Later… Outlook Sunny Overcast Rain Humidity Yes [4+] No
[2+, 3-] Normal High Yes [2+] No [3-] © Daniel S. Weld
50
Decision Tree Algorithm
BuildTree(TraingData) Split(TrainingData) Split(D) If (all points in D are of the same class) Then Return For each attribute A Evaluate splits on attribute A Use best split to partition D into D1, D2 Split (D1) Split (D2) © Daniel S. Weld
51
Application to Robocode
Assume Fixed Architecture If <in-danger> Then escape: If <ok-escape-dir> Then forward 45 Else … Else attack: If <shooting-best> Then if <gun-aimed> Then … Else ram: … From training set generated Learn these classifiers By trial battles © Daniel S. Weld
52
Movie Recommendation Features? Rambo Matrix Rambo 2 © Daniel S. Weld
53
Issues Content vs. Social Non-Boolean Attributes Missing Data
Scaling up © Daniel S. Weld
54
Stopped here © Daniel S. Weld
55
Missing Data 1 Assign attribute most common value at node
Day Temp Humid Wind Tennis? d1 h h weak n d2 h h s n d8 m h weak n d9 c weak yes d11 m n s yes Or most common value, … given classification © Daniel S. Weld
56
Fractional Values 75% h and 25% n Use in gain calculations
[0.75+, 3-] Day Temp Humid Wind Tennis? d1 h h weak n d2 h h s n d8 m h weak n d9 c weak yes d11 m n s yes [1.25+, 0-] 75% h and 25% n Use in gain calculations Further subdivide if other missing attributes Same approach to classify test ex with missing attr Classification is most probable classification Summing over leaves where it got divided © Daniel S. Weld
57
Non-Boolean Features Features with multiple discrete values
Construct a multi-way split Test for one value vs. all of the others? Group values into two disjoint subsets? Real-valued Features Discretize? Consider a threshold split using observed values? © Daniel S. Weld
58
© Daniel S. Weld
59
Skipping remaining slides
© Daniel S. Weld
60
© Daniel S. Weld
61
What if data is too big for main memory?
How evaluate split points? Discrete attributes Continuous attributes How to partition data once a split point is found? Post pruning © Daniel S. Weld
62
=> Attribute Lists Training data Attribute lists one per attribute
written to disk If attribute is continuous, Sort on that attribute © Daniel S. Weld
63
Evaluating an Index Gini Index
Easy to evaluate – only requires histogram Low value is good Sum over j atttributes gini(D) = 1 - pj2 ginisplit(D) = |D1|/|D| gini(D1) + |D2|/|D| gini(D2) © Daniel S. Weld
64
Histograms Associated with each node in (growing) tree
Record distribution of class values Used to evaluate splits Continuous Attributes C_above C_below Discrete Attributes Count matrix © Daniel S. Weld
65
Evaluating continuous split points
© Daniel S. Weld
66
Evaluating discrete split points
© Daniel S. Weld
67
Performing the Split Must split every attribute list in two
Easy for the ‘winning’ attribute Just scan through old list Add each entry to correct one of two new lists Remaining attributes are harder (No test to apply to those attributes to find correct part) Use RID Details coming up…. © Daniel S. Weld
68
Splitting a node’s attribute list
© Daniel S. Weld
69
Splitting with Limited Memory
Scan thru list for ‘winning’ attribute Just scan through old list Add each entry to correct one of two new lists Insert RID into hash table, noting destination child: L, R Later, scan each of the other attribute lists Probe hash table with RID to find destination Write current record to one of two child lists If hash table is too big for memory, do in stages Only partition splitting attribute part way E.g. until one has scanned as many records as would fit in hash table given memory constraints © Daniel S. Weld
70
Evaluation Machine had only 16MB RAM © Daniel S. Weld
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.