Download presentation
Presentation is loading. Please wait.
1
Learning
2
What is learning?
3
Supervised Learning Training data that has been classified Examples
Concept learning Decision trees Markov models Nearest neighbor Neural Nets (in coming weeks) Inductive Bias - limits imposed by assumptions! Especially what factors we choose as inputs
4
Rote Learning Store training data
Limitation - does not extend beyond what has been seen Example: concept learning
5
Concept Learning Inductive learning with generalization
Given training data: tuples <a1, a2, a3,…> Boolean value ai - can be any value ? Is used for a don’t care positive Null is used for don’t care negative
6
A hypothesis if a tuple that is true
hg <?, ?, ….> - most general - always true hs <null, null, …> most specific always false hg >= hs Defines a partially ordered lattice
7
Training Method Use the lattice to generate the most general hypothesis Weakness Inconsistent data Data errors
9
Decision Trees ID3 algorithm Entropy: a measure of information
p(I)log2 p(I) entropy of an element Entropy of the system of information: Sum - p(I)log2 p(I) P(I) is instances of I / total instances This is done over the outputs of the tree
10
Gain(S, A) = Entropy(S) - S ((|Sv| / |S|) * Entropy(Sv))
Gain is a measure of the effectiveness of a attribute Gain(S, A) = Entropy(S) - S ((|Sv| / |S|) * Entropy(Sv)) Sv Number of outputs with value v is attribute S is the number of elements in the outputs
11
ID3 Greedy algorithm Select the attribute by the largest gain
Iterate until done
12
Markov Models Markov Chain is a set of states
State transitions are probabilistic State xi goes to state xj with P(xj | xi) This can be extended to allow the probability to depend on a set of past states (Memory)
13
Example from the Text Given a set of words, Markov chain to generate similar words For each letter position of the words, compute probability Use a matrix of counts Count[from][to] Normalize rows by total count in row
14
Nearest Neighbor 1NN: Use vectors to represent entities
Use distance measure between vectors to locate closest known entity Can be effected by noisy data
15
kNN - better Use k closest neighbors and vote
16
Other techniques Yet to cover!
Evolutionary algorithms Neural nets
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.