Exam Review Session William Cohen.

Slides:

Advertisements

Similar presentations

Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.

Advertisements

Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.

Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)

Midterm Review. The Midterm Everything we have talked about so far Stuff from HW I won’t ask you to do as complicated calculations as the HW Don’t need.

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Course outline and schedule Introduction (Sec )

Bayesian Networks Alan Ritter.

Topic models for corpora and for graphs. Motivation Social graphs seem to have –some aspects of randomness small diameter, giant connected components,..

1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.

Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.

Look out, It’s SAT time… Soon!. Revision is the key to success… To do well in your SATs your revision should be up and running by Easter…and that means.

Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.

Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.

CS Statistical Machine learning Lecture 24

Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.

Techniques for Dimensionality Reduction

CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.

Midterm Exam Review Notes William Cohen 1. General hints in studying Understand what you’ve done and why – There will be questions that test your understanding.

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

Analysis of Social Media MLD , LTI William Cohen

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.

CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.

Slide 1 Directed Graphical Probabilistic Models: inference William W. Cohen Machine Learning Feb 2008.

Strategies & Knowledge for Independent Learning individual Work SKIL SKIL cards are sets of cards that explain how to use different learning strategies.

Analysis of Social Media MLD , LTI William Cohen

ESCP (S1) Guided Reading Ms El-Hendi. Part 2: Understanding Expository Text Class Discussion.

A Brief Introduction to Bayesian networks

CSE 486/586 Distributed Systems Wrap-up

Dimensionality Reduction and Principle Components Analysis

Revising Maths (1) You need a topic list. Print off from Jones the Sums, use contents page in front of a Revision Guide, organise your exercise books,

Matt Gormley Lecture 11 October 5, 2016

© 2013 McGraw-Hill Companies. All Rights Reserved.

Classification of unlabeled data:

CSE 486/586 Distributed Systems Wrap-up

Review for the Midterm Exam

Data Mining Lecture 11.

Parent Power Create Space Physical Time Emotional.

Strategies for Multiplication

Jun Xu Harbin Institute of Technology China

1. Overview of revision: statistics and basic principles

Latent Dirichlet Analysis

Read R&N Ch Next lecture: Read R&N

CS 188: Artificial Intelligence Spring 2007

Hints for Giving Presentations (A) Anatomy of a Talk

CSCI 5822 Probabilistic Models of Human and Machine Learning

Topic models for corpora and for graphs

PHYS 202 Intro Physics II Catalog description: A continuation of PHYS 201 covering the topics of electricity and magnetism, light, and modern physics.

Effective Presentation

GCSE Revision In response to a large number of Y11 students asking for advice on how to revise….. Introduction & revision planning Revision techniques.

Michal Rosen-Zvi University of California, Irvine

Solving Differential Equations

Expectation-Maximization & Belief Propagation

Class #16 – Tuesday, October 26

LECTURE 15: REESTIMATION, EM AND MIXTURES

CS 188: Artificial Intelligence Spring 2007

Junghoo “John” Cho UCLA

Topic models for corpora and for graphs

Hints for Giving Presentations (A) Anatomy of a Talk

Revision Techniques that work

It’s revision time again

CS 188: Artificial Intelligence Spring 2006

Cornell Notes Note-taking strategy that will

CS 188: Artificial Intelligence Fall 2008

Presentation transcript:

Exam Review Session William Cohen

What can I tell you about the exam? It’s qualitatively similar to the midterm Sample/old exams for 10-601 are probably pretty representative About 80% is on topics from after the midterm Questions might overlap with HWs Everyone can drop one HW but…. You should read through the HW you skipped carefully and be familiar with the algorithms implemented understand the questions Look over the review points in the lecture pages from the wiki

General hints in exam taking You can bring in one 8 ½ by 11” sheet (front and back) Look over everything quickly and skip around probably nobody will know everything on the test if it seems really complicated you’re likely missing something If you’re not sure what we’ve got in mind: state your assumptions clearly in your answer. There’s room for this even on true/false If you look at a question and don’t know the answer: we probably haven’t told you the answer but we’ve told you enough to work it out imagine arguing for some answer and see if you like it

Post mid-term topics Clustering Active learning Semi-supervised learning Graphical models HMMs and sequential data Topic models Deep learning PCA and matrix factorization Reinforcement learning

Post mid-term topics: clustering We went in-depth into one method: ___ You should understand how it works and the differences between the variants we discussed You should understand the definitions and algorithmic outlines of the hierarchical methods as well

Post mid-term topics: clustering Later (3/28) we came back and looked EM as learning method for a “soft version” of k-means: _______ You should know how EM and k-means relate

Post mid-term topics: active learning Understand the key points and the examples in the lecture what’s the formal model (the API) for an active learner? how can active learning be used to distinguish between two hypothesis that sometimes agree and sometimes disagee? what kinds of queries does an active learner make? how do you design an algorithm to use those queries effectively? how do implement binary-search like algorithms for different types of concept classes?

Post mid-term topics: SSL Understand the key points and the examples in the lecture what’s the formal model (the API) for SSL? Remember we also came back to this topic later: SSL with naïve Bayes and k-means variants using EM and DGMs Free advice: there’s nothing on graph-based SSL on the exam

Post mid-term topics: DGMs Boy, did we talk about these a lot!

Post mid-term topics: DGMs Semantics How does a graph encode a joint probability? How compact (how many parameters) is a DGM compared to a naïve encoding? How do you determine conditional independence in a DGM? How do you learn parameters for a DGM? When do you use counts and MAPs and when do you need to use EM?

Post mid-term topics: DGMs Lots of (sadly important) terminology blocking, d-separation, explaining away, Markov blanket, message passing, belief propagation, polytree, … can you define these? can you recognize them or draw them?

Post mid-term topics: DGMs DGMS for describing learning algorithms parameters in a DGM inference and learning what does the DGM for ___ look like, and what do the individual variables mean? supervised naïve Bayes? mixture of Gaussians? HMMs? LDA?

Post mid-term topics: DGMs HMMs how are they like (?) DGMs how do you do inference and learning? what are they useful for? Free advice: there are no questions on CRFs on the final

Post mid-term topics: DGMs DGMS and topic models what are topic models? what sort of DGMs can be used model text? what is Gibbs sampling, and when would you use it? connection: belief propagation: what is it and when do you use it? when would you prefer Gibbs sampling to BP?

Post mid-term topics: deep learning Lots of terminology & equations: vanishing gradients, saturation, reLU, … don’t spend too much time memorizing these they will all be different in a year or two Some architectures I went in to more details on with pictures and such LSTMs CNNs (which was a HW) You should definitely understand how these are supposed to work and what the different pieces are

PCA and MF and CF What is PCA? what do those pictures mean? How do the different PCA vectors relate to each other? are they always orthogonal? If I take data in N dimensions and re-express it in the first K dimensions formed by PCA, what does the new data look like? How are PCA and MF related? What are the algorithms used for PCA and for MF? What is collaborative filtering and how is MF used for it? E.g., what M gets F-ed?

PCA and MF What is PCA? what do those pictures mean? How do the different PCA vectors relate to each other? are they always orthogonal? If I take data in N dimensions and re-express it in the first K dimensions formed by PCA, what does the new data look like? How are PCA and MF related? What are the algorithms used for PCA and for MF?

Post mid-term topics Clustering ✔ Active learning ✔ Semi-supervised learning ✔ Graphical models ✔ HMMs and sequential data Topic models Deep learning ✔ PCA and matrix factorization ✔ Reinforcement learning

Final notes I’m having office hours this week Tuesday at 11am not Thursday at 11am Good luck!

Sample questions

Sample questions List some pairs X,Y so that I<X,A,Y> How many parameters are needed to define the joint?

Sample questions Write down the joint distribution P(X1,…,X7)

Sample Questions Complete the diagram for: N documents of length L K topics Zij topic indicator for doc i, position j Wij word for doc i, position j αprior on topic multinomial θi topic distribution for doc i βi prior on work multinomials φk word multinomial for topic k Which variables have a discrete domain?

Sample questions Repeat for the second principle component

Sample questions Repeat for the second principle component