Download presentation
Presentation is loading. Please wait.
1
COMP 328: Final Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology http://www.cse.ust.hk/~lzhang/ Can be used as cheat sheet
2
Page 2 Pre-Midterm l Algorithms for supervised learning n Decision trees n Instance-based learning n Naïve Bayes classifiers n Neural networks n Support vector machines l General issues regarding supervised learning n Classification error and confidence interval n Bias-Variance tradeoff n PAC learning theory
3
Post-Midterm l Clustering n Distance-Based Clustering n Model-Based Clustering l Dimension Reduction n Principal Component Analysis l Reinforcement Learning l Ensemble Learning
4
Clustering
5
Distance/Similarity Measures
6
Distance-Based Clustering l Partitional and Hierarchical clustering
7
K-Means: Partitional Clustering
8
l Different initial points might lead to different partitions l Solution: n Multiple runs, n Use evaluation criteria such as SSE to pick the best one
9
Hierarchical Clustering l Agglomerative and Divisive
10
Cluster Similarity
11
Cluster Validation l External indices n Entropy: Average purity of clusters obtained n Mutual Information between class label and cluster label
12
Cluster Validation l External Measure n Jaccard Index n Rand Index Measure similarity between two relationships: in-same-class & in-same-cluster # pairs in same cluster# pairs in diff cluster # pairs w/ same labelab # pairs w/ diff labelcd
13
Cluster Validation l Internal Measure n Dunn’s index
14
Cluster Validation l Internal Measure
15
Post-Midterm l Clustering n Distance-Based Clustering n Model-Based Clustering l Dimension Reduction n Principal Component Analysis l Reinforcement Learning l Ensemble Learning
16
Model-Based Clustering l Assume data generated from a mixture model with K components l Estimate parameters of the model from data l Assign objects to clusters based posterior probability: Soft Assignment
17
Gaussian Mixtures
18
Learning Gaussian Mixture Models
19
EM
21
l l(t): Log likelihood of model after t-th iteration l l(t): increases monotonically with t l But might go to infinite in case of singularity n Solution: place bound on eigen values of covariance matrix l Local maximum n Multiple restart n Use likelihood to pick best model
22
EM and K-Means l K-Means is hard-assignment EM
23
Mixture Variable for Discrete Data
24
Latent Class Model
25
Learning Latent Class Models Always converges
26
Post-Midterm l Clustering n Distance-Based Clustering n Model-Based Clustering l Dimension Reduction n Principal Component Analysis l Reinforcement Learning l Ensemble Learning
27
Dimension Reduction l Necessary because there are data sets with large numbers of attributes that are difficult to learning algorithms to handle.
28
Principal Component Analysis
32
PCA Solution
33
PCA Illustration
34
Eigenvalues and Projection Error
35
Post-Midterm l Clustering n Distance-Based Clustering n Model-Based Clustering l Dimension Reduction n Principal Component Analysis l Reinforcement Learning l Ensemble Learning
36
Reinforcement Learning
37
Markov Decision Process l A model of how agent interact with its environment
38
Markov Decision Process
39
Value Iteration
40
Reinforcement Learning
41
Q-Learning
42
l From Q-function based value iteration l Ideas n In-place/asynchronous value iteration n Approximate expectation using samples n ε-greedy policy (for exploration/exploitation) tradeoff
43
Time Difference Learning
44
Sarsa is also time difference learning
45
Post-Midterm l Clustering n Distance-Based Clustering n Model-Based Clustering l Dimension Reduction n Principal Component Analysis l Reinforcement Learning l Ensemble Learning
46
Ensemble Learning
47
Bagging: Reduce Variance
48
Boosting: Reduce Classification Error
49
AdaBoost: Exponential Error
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.