Rutgers CS440, Fall 2003 Review session
Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction to probability 2.Bayesian networks 3.Hidden Markov models & Kalman filters 4.Dynamic Bayesian networks 5.Decision making under uncertainty (static) 6.Markov decision processes 7.Decision trees 8.Statistical learning in BNs 9.Learning with incomplete data (EM) 10.Neural networks 11.( Support vector machines ) 12.( Reinforcement learning )
Rutgers CS440, Fall 2003 Uncertainty & probability Random variables (discrete & continuous) Joint, marginal, prior, conditional probabilities Bayes’ rule Independence & conditional independence
Rutgers CS440, Fall 2003 Bayesian networks Representation of joint probability distributions & densities Dependency / independency Markov blanket Bayes ball rules Inference in BNs –Enumeration –Variable elimination –Sampling (simulation) –Rejection sampling –Likelihood weighting
Rutgers CS440, Fall 2003 Hidden Markov models & Kalman filters Hidden Markov models –Representation –Inference ( forward, backward, Viterbi ) Kalman filters –Representation –Inference ( forward )
Rutgers CS440, Fall 2003 Dynamic Bayesian networks Representation Reduction to HMMs (discrete cases) Particle filtering
Rutgers CS440, Fall 2003 Decision making under uncertainty (static) Preferences & utility Utility of money Maximum expected utility principle Decision graphs Value of perfect information
Rutgers CS440, Fall 2003 Markov random processes Decision making in dynamic situations Bellman equations Value iteration Policy iteration
Rutgers CS440, Fall 2003 Decision trees Inductive learning –Test & training set Ockham’s razor Decision trees Representation Learning Attribute selection –Entropy –Information gain Realizable, non-realizable, redundant
Rutgers CS440, Fall 2003 Statistical learning Optimal prediction: Bayesian prediction Maximum likelihood (ML) and maximum a posteriori (MAP) learning ML learning of Bayesian network parameters for complete datasets
Rutgers CS440, Fall 2003 EM & Incomplete data Incomplete/missing data Data completion, completed (log) likelihood Expectation-maximization algorithm
Rutgers CS440, Fall 2003 Neural networks Artificial neurons – perceptron Representation & linear separability Perceptron (gradient) learning Feed-forward, multilayer and recurrent networks (Hopfield)
Rutgers CS440, Fall 2003 Sample Problem 1 Imagine you wish to recognize bad “widgets” produced by your factory. You’re able to measure two numeric properties of each widget: P1 and P2. The value of each property is discretized to be one of {low (L), normal (N), high(H)}. You randomly grab 5 widgets off of your assembly line and extensively test whether or not they are good, obtaining the following results: P1P2Result L N good H L bad N H good L H bad N N good Explain how you could use this data and Bayes’ Rule to determine whether the following new widget is more likely to be a good or a bad one (be sure to show your work and explain any assumptions/simplifications you make): L L ? Solution: Assuming P1 and P2 are conditionally independent, the best prediction is (L,L) -> bad (regardless of whether P1 and P2 have the same or different distributions.)
Rutgers CS440, Fall 2003 Sample Problem 2 Assume that User A and User B equally share a computer, and that you wish to write a program that determines which person is currently using the computer. You choose to create a (first-order) Markov Model that characterizes each user’s typing behavior. You decide to group their keystrokes into three classes and have estimated the transition probabilities, producing the two graphs below. Both users always start in the Other state upon logging in. Now imagine that the current user logs on and immediately types the following: IOU $15 Who is more likely to be the current user? Show and explain your calculations. Lett er Dig it Oth er User B Lett er Dig it Oth er User A Solution: User A.
Rutgers CS440, Fall 2003 Sample Problem 3 ( First-grader Maggie has divided her books into two groups, those she likes and those she doesn’t. The five (5) books that Maggie likes contain (only) the following words: animal (5 times), mineral (15 times), vegetable (1 time), see(1 time) The ten (10) books that Maggie does not like contain (only) the following words: animal (5 times), mineral (10 times), vegetable (30 times), spot(1 time) Using the Naïve Bayes assumption, determine whether it is more probable that Maggie likes the following book than that she dislikes it. Show and explain your work. see mineral vegetable (These three words are the entire contents of this new book.) Solution: Maggie is more likely to like the book (even true if one assumes Maggie does not have a prior preference for liking or disliking books).
Rutgers CS440, Fall 2003 Homework discussion Which grading method is “better”? –Full average (“ave”) –Drop lowest score (“drop”) –Extra credit (“extra”)
Rutgers CS440, Fall 2003 Homework discussion