PCA, population structure, and K-means clustering BNFO 601.

Slides:



Advertisements
Similar presentations
Clustering. How are we doing on the pass sequence? Pretty good! We can now automatically learn the features needed to track both people But, it sucks.
Advertisements

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Christoph F. Eick Questions and Topics Review Dec. 10, Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2. K-means.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
K-means clustering Hongning Wang
The loss function, the normal equation,
Single nucleotide polymorphisms and applications Usman Roshan BNFO 601.
Local Clustering Algorithm DISCOVIR Image collection within a client is modeled as a single cluster. Current Situation.
Population structure identification BNFO 602 Roshan.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Clustering Color/Intensity
Basic Data Mining Techniques
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
CSE182-L17 Clustering Population Genetics: Basics.
Unsupervised Learning
Visual Recognition Tutorial
Ant Colony Optimization to Resource Allocation Problems Peng-Yeng Yin and Ching-Yu Wang Department of Information Management National Chi Nan University.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
6 6.3 © 2012 Pearson Education, Inc. Orthogonality and Least Squares ORTHOGONAL PROJECTIONS.
Affinity Groups 2015 Data from Public Reports
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Clustering & Dimensionality Reduction 273A Intro Machine Learning.
Radial Basis Function Networks
CSE598C Project: Dynamic virtual server placement Yoojin Hong.
Evaluating Performance for Data Mining Techniques
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
CSIE Dept., National Taiwan Univ., Taiwan
Clustering Methods K- means. K-means Algorithm Assume that K=3 and initially the points are assigned to clusters as follows. C 1 ={x 1,x 2,x 3 }, C 2.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Unsupervised Learning. Supervised learning vs. unsupervised learning.
Polyhedral Optimization Lecture 5 – Part 2 M. Pawan Kumar Slides available online
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Clustering Unsupervised learning introduction Machine Learning.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
CSSE463: Image Recognition Day 23 Midterm behind us… Midterm behind us… Foundations of Image Recognition completed! Foundations of Image Recognition completed!
Machine Learning Queens College Lecture 7: Clustering.
Flat clustering approaches
Information-Theoretic Co- Clustering Inderjit S. Dhillon et al. University of Texas, Austin presented by Xuanhui Wang.
Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying.
Clustering (1) Chapter 7. Outline Introduction Clustering Strategies The Curse of Dimensionality Hierarchical k-means.
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Iterative K-Means Algorithm Based on Fisher Discriminant UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Mantao Xu to be presented.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Clustering Approaches Ka-Lok Ng Department of Bioinformatics Asia University.
Clustering Usman Roshan CS 675. Clustering Suppose we want to cluster n vectors in R d into two groups. Define C 1 and C 2 as the two groups. Our objective.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Clustering Usman Roshan.
Classification of unlabeled data:
Haim Kaplan and Uri Zwick
Basic machine learning background with Python scikit-learn
Spectral Clustering.
The Heritage of Pathogen Pressures and Ancient Demography in the Human Innate- Immunity CD209/CD209L Region  Luis B. Barreiro, Etienne Patin, Olivier Neyrolles,
AIM: Clustering the Data together
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
Problem Definition Input: Output: Requirement:
Clustering 77B Recommender Systems
Combined analysis of two different ancestry informative assays using SNPs and Indels in Eurasian populations  T.E. Gross, D. Zaumsegel, M.A. Rothschild,
Chapter - 3 Single Layer Percetron
Clustering.
Ethnic Groups and Religion in Africa
Clustering Usman Roshan CS 675.
The Heritage of Pathogen Pressures and Ancient Demography in the Human Innate- Immunity CD209/CD209L Region  Luis B. Barreiro, Etienne Patin, Olivier Neyrolles,
Presentation transcript:

PCA, population structure, and K-means clustering BNFO 601

Publicly available real data Datasets (Noah Rosenberg’s lab): –East Asian admixture: 10 individuals from Cambodia, 15 from Siberia, 49 from China, and 16 from Japan; 459,188 SNPs –African admixture: 32 Biaka Pygmy individuals, 15 Mbuti Pygmy, 24 Mandenka, 25 Yoruba, 7 San from Namibia, 8 Bantu of South Africa, and 12 Bantu of Kenya; 454,732 SNPs –Middle Eastern admixture: contains 43 Druze from Israil- Carmel, 47 Bedouins from Israel-Negev, 26 Palestinians from Israel-Central, and 30 Mozabite from Algeria-Mzab; 438,596 SNPs

East Asian admixture

African admixture

Middle Eastern admixture

Clustering Suppose we want to cluster n vectors in R d into two groups. Define C 1 and C 2 as the two groups. Our objective is to find C 1 and C 2 that minimize where m i is the mean of class C i

K-means algorithm for two clusters Input: Algorithm: 1.Initialize: assign x i to C 1 or C 2 with equal probability and compute means: 2.Recompute clusters: assign x i to C 1 if ||x i -m 1 ||<||x i -m 2 ||, otherwise assign to C 2 3.Recompute means m 1 and m 2 4.Compute objective 5.Compute objective of new clustering. If difference is smaller than then stop, otherwise go to step 2.

K-means Is it guaranteed to find the clustering which optimizes the objective? It is guaranteed to find a local optimal We can prove that the objective decreases with subsequence iterations

Proof sketch of convergence of k-means Justification of first inequality: by assigning x j to the closest mean the objective decreases or stays the same Justification of second inequality: for a given cluster its mean minimizes squared error loss