Presentation is loading. Please wait.

Presentation is loading. Please wait.

Knowledge Representation

Similar presentations


Presentation on theme: "Knowledge Representation"— Presentation transcript:

1 Knowledge Representation
573 Topics Perception NLP Robotics Multi-agent Reinforcement Learning MDPs Supervised Learning Planning Uncertainty Search Knowledge Representation Expressiveness Tractability Problem Spaces Agency © Daniel S. Weld

2 } Weekly Course Outline Overview, agents, problem spaces Search
Learning Planning (and CSPs) Uncertainty Planning under uncertainty Reinforcement learning Topics Project Presentations & Contest } Search Knowledge Representation © Daniel S. Weld

3 Search Problem spaces Blind
Depth-first, breadth-first, iterative-deepening, iterative broadening Informed Best-first, Dijkstra's, A*, IDA*, SMA*, DFB&B, Beam, hill climb,limited discrep, AO*, LAO*, RTDP Local search Heuristics Evaluation, construction, learning Pattern databases Online methods Techniques from operations research Constraint satisfaction Adversary search © Daniel S. Weld

4 KR Propositional logic First-order logic
Syntax (CNF, Horn clauses, …) Semantics (Truth Tables) Inference (FC, Resolution, DPLL, WalkSAT) First-order logic Syntax (quantifiers, skolem functions, … Semantics (Interpretations) Inference (FC, Resolution, Compilation) Frame systems Two theses… Representing events, action & change © Daniel S. Weld

5 Machine Learning Overview
What is Inductive Learning? Understand a ML technique in depth Induction of Decision Trees Bagging Overfitting Applications © Daniel S. Weld

6 Inductive Learning Inductive learning or “Prediction”: Classification
Given examples of a function (X, F(X)) Predict function F(X) for new examples X Classification F(X) = Discrete Regression F(X) = Continuous Probability estimation F(X) = Probability(X): © Daniel S. Weld

7 Widely-used Approaches
Decision trees Rule induction Bayesian learning Neural networks Genetic algorithms Instance-based learning Support Vector Machines Etc. © Daniel S. Weld

8 Defining a Learning Problem
A program is said to learn from experience E with respect to task T and performance measure P, if it’s performance at tasks in T, as measured by P, improves with experience E. Task T: Playing checkers Performance Measure P: Percent of games won against opponents Experience E: Playing practice games against itself © Daniel S. Weld

9 Issues What feedback (experience) is available?
How should these features be represented? What kind of knowledge is being increased? How is that knowledge represented? What prior information is available? What is the right learning algorithm? How avoid overfitting? © Daniel S. Weld

10 Choosing the Training Experience
Credit assignment problem: Direct training examples: E.g. individual checker boards + correct move for each Supervised learning Indirect training examples : E.g. complete sequence of moves and final result Reinforcement learning Which examples: Random, teacher chooses, learner chooses © Daniel S. Weld

11 Choosing the Target Function
What type of knowledge will be learned? How will the knowledge be used by the performance program? E.g. checkers program Assume it knows legal moves Needs to choose best move So learn function: F: Boards -> Moves hard to learn Alternative: F: Boards -> R Note similarity to choice of problem space © Daniel S. Weld

12 The Ideal Evaluation Function
V(b) = 100 if b is a final, won board V(b) = -100 if b is a final, lost board V(b) = 0 if b is a final, drawn board Otherwise, if b is not final V(b) = V(s) where s is best, reachable final board Nonoperational… Want operational approximation of V: V © Daniel S. Weld

13 How Represent Target Function
x1 = number of black pieces on the board x2 = number of red pieces on the board x3 = number of black kings on the board x4 = number of red kings on the board x5 = num of black pieces threatened by red x6 = num of red pieces threatened by black V(b) = a + bx1 + cx2 + dx3 + ex4 + fx5 + gx6 Now just need to learn 7 numbers! © Daniel S. Weld

14 Example: Checkers Task T: Performance Measure P: Experience E:
Playing checkers Performance Measure P: Percent of games won against opponents Experience E: Playing practice games against itself Target Function V: board -> R Representation of approx. of target function V(b) = a + bx1 + cx2 + dx3 + ex4 + fx5 + gx6 © Daniel S. Weld

15 Target Function Profound Formulation:
Can express any type of inductive learning as approximating a function E.g., Checkers V: boards -> evaluation E.g., Handwriting recognition V: image -> word E.g., Mushrooms V: mushroom-attributes -> {E, P} © Daniel S. Weld

16 © Daniel S. Weld

17 More Examples Collaborative Filtering (Parts of) Robocode Controller
© Daniel S. Weld

18 © Daniel S. Weld

19 Bias Introduced concept by name here Two types
The problem of disjunction Two types Restriction Preference © Daniel S. Weld

20 © Daniel S. Weld

21 © Daniel S. Weld

22 © Daniel S. Weld

23 © Daniel S. Weld

24 © Daniel S. Weld

25 © Daniel S. Weld

26 © Daniel S. Weld

27 © Daniel S. Weld

28 © Daniel S. Weld

29 Decision Trees Convenient Representation Expressive
Developed with learning in mind Deterministic Expressive Equivalent to propositional DNF Handles discrete and continuous parameters Simple learning algorithm Handles noise well Classify as follows Constructive (build DT by adding nodes) Eager Batch (but incremental versions exist) © Daniel S. Weld

30 Concept Learning E.g. Learn concept “Edible mushroom”
Target Function has two values: T or F Represent concepts as decision trees Use hill climbing search Thru space of decision trees Start with simple concept Refine it into a complex concept as needed © Daniel S. Weld

31 Example: “Good day for tennis”
Attributes of instances Wind Temperature Humidity Outlook Feature = attribute with one value E.g. outlook = sunny Sample instance wind=weak, temp=hot, humidity=high, outlook=sunny © Daniel S. Weld

32 Experience: “Good day for tennis”
Day Outlook Temp Humid Wind PlayTennis? d1 s h h w n d2 s h h s n d3 o h h w y d4 r m h w y d5 r c n w y d6 r c n s y d7 o c n s y d8 s m h w n d9 s c n w y d10 r m n w y d11 s m n s y d12 o m h s y d13 o h n w y d14 r m h s n © Daniel S. Weld

33 Decision Tree Representation
Good day for tennis? Leaves = classification Arcs = choice of value for parent attribute Outlook Sunny Overcast Rain Humidity Wind Yes Strong Normal High Weak Yes No No Yes Decision tree is equivalent to logic in disjunctive normal form G-Day  (Sunny  Normal)  Overcast  (Rain  Weak) © Daniel S. Weld

34 © Daniel S. Weld

35 © Daniel S. Weld

36 © Daniel S. Weld

37 DT Learning as Search Nodes Operators Initial node Heuristic? Goal?
Decision Trees Tree Refinement: Sprouting the tree Smallest tree possible: a single leaf Information Gain Best tree possible (???) © Daniel S. Weld

38 What is the Simplest Tree?
Day Outlook Temp Humid Wind Play? d1 s h h w n d2 s h h s n d3 o h h w y d4 r m h w y d5 r c n w y d6 r c n s y d7 o c n s y d8 s m h w n d9 s c n w y d10 r m n w y d11 s m n s y d12 o m h s y d13 o h n w y d14 r m h s n What is the Simplest Tree? How good? [10+, 4-] Means: correct on 10 examples incorrect on 4 examples © Daniel S. Weld

39 Which attribute should we use to split?
Successors Yes Humid Wind Outlook Temp Which attribute should we use to split? © Daniel S. Weld

40 To be decided: How to choose best attribute?
Information gain Entropy (disorder) When to stop growing tree? © Daniel S. Weld

41 Disorder is bad Homogeneity is good
No Better Good © Daniel S. Weld

42 Entropy 1.0 0.5 P as % © Daniel S. Weld

43 Entropy (disorder) is bad Homogeneity is good
Let S be a set of examples Entropy(S) = -P log2(P) - N log2(N) where P is proportion of pos example and N is proportion of neg examples and 0 log 0 == 0 Example: S has 10 pos and 4 neg Entropy([10+, 4-]) = -(10/14) log2(10/14) - (4/14)log2(4/14) = 0.863 © Daniel S. Weld

44  Information Gain Measure of expected reduction in entropy
Resulting from splitting along an attribute v  Values(A) Gain(S,A) = Entropy(S) (|Sv| / |S|) Entropy(Sv) Where Entropy(S) = -P log2(P) - N log2(N) © Daniel S. Weld

45 Gain of Splitting on Wind
Day Wind Tennis? d1 weak n d2 s n d3 weak yes d4 weak yes d5 weak yes d6 s yes d7 s yes d8 weak n d9 weak yes d10 weak yes d11 s yes d12 s yes d13 weak yes d14 s n Values(wind)=weak, strong S = [10+, 4-] Sweak = [6+, 2-] Ss = [3+, 3-] Gain(S, wind) = Entropy(S) (|Sv| / |S|) Entropy(Sv) = Entropy(S) - 8/14 Entropy(Sweak) - 6/14 Entropy(Ss) = (8/14) (6/14) 1.00 = v  {weak, s} © Daniel S. Weld

46 Evaluating Attributes
Yes Humid Wind Gain(S,Wind) =-0.029 Gain(S,Humid) =0.151 Outlook Temp Value is actually different than this, but ignore this detail Gain(S,Temp) =0.029 Gain(S,Outlook) =0.170 © Daniel S. Weld

47 Resulting Tree …. Good day for tennis? Outlook Sunny Overcast Rain No
[2+, 3-] Yes [4+] No [2+, 3-] © Daniel S. Weld

48 Recurse! Outlook Sunny Day Temp Humid Wind Tennis? d1 h h weak n
d2 h h s n d8 m h weak n d9 c n weak yes d11 m n s yes © Daniel S. Weld

49 One Step Later… Outlook Sunny Overcast Rain Humidity Yes [4+] No
[2+, 3-] Normal High Yes [2+] No [3-] © Daniel S. Weld

50 Decision Tree Algorithm
BuildTree(TraingData) Split(TrainingData) Split(D) If (all points in D are of the same class) Then Return For each attribute A Evaluate splits on attribute A Use best split to partition D into D1, D2 Split (D1) Split (D2) © Daniel S. Weld

51 Application to Robocode
Assume Fixed Architecture If <in-danger> Then escape: If <ok-escape-dir> Then forward 45 Else … Else attack: If <shooting-best> Then if <gun-aimed> Then … Else ram: From training set generated Learn these classifiers By trial battles © Daniel S. Weld

52 Movie Recommendation Features? Rambo Matrix Rambo 2 © Daniel S. Weld

53 Issues Content vs. Social Non-Boolean Attributes Missing Data
Scaling up © Daniel S. Weld

54 Stopped here © Daniel S. Weld

55 Missing Data 1 Assign attribute most common value at node
Day Temp Humid Wind Tennis? d1 h h weak n d2 h h s n d8 m h weak n d9 c weak yes d11 m n s yes Or most common value, … given classification © Daniel S. Weld

56 Fractional Values 75% h and 25% n Use in gain calculations
[0.75+, 3-] Day Temp Humid Wind Tennis? d1 h h weak n d2 h h s n d8 m h weak n d9 c weak yes d11 m n s yes [1.25+, 0-] 75% h and 25% n Use in gain calculations Further subdivide if other missing attributes Same approach to classify test ex with missing attr Classification is most probable classification Summing over leaves where it got divided © Daniel S. Weld

57 Non-Boolean Features Features with multiple discrete values
Construct a multi-way split Test for one value vs. all of the others? Group values into two disjoint subsets? Real-valued Features Discretize? Consider a threshold split using observed values? © Daniel S. Weld

58 © Daniel S. Weld

59 Skipping remaining slides
© Daniel S. Weld

60 © Daniel S. Weld

61 What if data is too big for main memory?
How evaluate split points? Discrete attributes Continuous attributes How to partition data once a split point is found? Post pruning © Daniel S. Weld

62 => Attribute Lists Training data Attribute lists one per attribute
written to disk If attribute is continuous, Sort on that attribute © Daniel S. Weld

63 Evaluating an Index Gini Index
Easy to evaluate – only requires histogram Low value is good Sum over j atttributes gini(D) = 1 - pj2 ginisplit(D) = |D1|/|D| gini(D1) + |D2|/|D| gini(D2) © Daniel S. Weld

64 Histograms Associated with each node in (growing) tree
Record distribution of class values Used to evaluate splits Continuous Attributes C_above C_below Discrete Attributes Count matrix © Daniel S. Weld

65 Evaluating continuous split points
© Daniel S. Weld

66 Evaluating discrete split points
© Daniel S. Weld

67 Performing the Split Must split every attribute list in two
Easy for the ‘winning’ attribute Just scan through old list Add each entry to correct one of two new lists Remaining attributes are harder (No test to apply to those attributes to find correct part) Use RID Details coming up…. © Daniel S. Weld

68 Splitting a node’s attribute list
© Daniel S. Weld

69 Splitting with Limited Memory
Scan thru list for ‘winning’ attribute Just scan through old list Add each entry to correct one of two new lists Insert RID into hash table, noting destination child: L, R Later, scan each of the other attribute lists Probe hash table with RID to find destination Write current record to one of two child lists If hash table is too big for memory, do in stages Only partition splitting attribute part way E.g. until one has scanned as many records as would fit in hash table given memory constraints © Daniel S. Weld

70 Evaluation Machine had only 16MB RAM © Daniel S. Weld


Download ppt "Knowledge Representation"

Similar presentations


Ads by Google