Download presentation
Presentation is loading. Please wait.
Published byTamsyn Young Modified over 9 years ago
1
CHAPTER 1: Introduction
2
2 Why “Learn”? Machine learning is programming computers to optimize a performance criterion using example data or past experience.
3
3 Applications Association Analysis Supervised Learning Unsupervised Learning
4
Learning Associations Basket analysis: P (Y | X ) probability that somebody who buys X also buys Y where X and Y are products/services. Example: P ( chips | beer ) = 0.7 Market-Basket transactions
5
5 Classification: Applications Aka Pattern recognition Face recognition: Pose, lighting, occlusion (glasses, beard), make-up, hair style Character recognition: Different handwriting styles. Speech recognition: Temporal dependency. Use of a dictionary or the syntax of the language. Sensor fusion: Combine multiple modalities; eg, visual (lip image) and acoustic for speech Medical diagnosis: From symptoms to illnesses Web Advertizing: Predict if a user clicks on an ad on the Internet.
6
6 Face Recognition Training examples of a person Test images AT&T Laboratories, Cambridge UK http://www.uk.research.att.com/facedatabase.html
7
7 Unsupervised Learning Learning “what normally happens” No output Clustering: Grouping similar instances Other applications: Summarization, Association Analysis Example applications Customer segmentation in CRM Image compression: Color quantization Bioinformatics: Learning motifs
8
K-means Overview An unsupervised clustering algorithm “K” stands for number of clusters, it is typically a user input to the algorithm; some criteria can be used to automatically estimate K It is an approximation to an NP-hard combinatorial optimization problem K-means algorithm is iterative in nature It converges, however only a local minimum is obtained Works only for numerical data Easy to implement
9
K-means clustering example
10
K-means: summary Algorithmically, very simple to implement K-means converges, but it finds a local minimum of the cost function Works only for numerical observations K is a user input; alternatively BIC (Bayesian information criterion) or MDL (minimum description length) can be used to estimate K Outliers can considerable trouble to K-means
11
Hierarchical Clustering Two types: (1) agglomerative (bottom up), (2) divisive (top down) Agglomerative: two groups are merged if distance between them is less than a threshold Divisive: one group is split into two if intergroup distance more than a threshold Can be expressed by an excellent graphical representation called “dendogram”, when the process is monotonic: dissimilarity between merged clusters is increasing. Agglomerative clustering possesses this property. Not all divisive methods possess this monotonicity. Heights of nodes in a dendogram are proportional to the threshold value that produced them.
12
Hierarchical Clustering
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.