Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris Amos.

Slides:



Advertisements
Similar presentations
Population structure.
Advertisements

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Meta-analysis for GWAS BST775 Fall DEMO Replication Criteria for a successful GWAS P
Machine Learning Lecture 8 Data Processing and Representation
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Genome-wide association mapping Introduction to theory and methodology
Objectives Cover some of the essential concepts for GWAS that have not yet been covered Hardy-Weinberg equilibrium Meta-analysis SNP Imputation Review.
MALD Mapping by Admixture Linkage Disequilibrium.
Lab 13: Association Genetics. Goals Use a Mixed Model to determine genetic associations. Understand the effect of population structure and kinship on.
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Principal Component Analysis
Principal Component Analysis
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
Principle Component Analysis What is it? Why use it? –Filter on your data –Gain insight on important processes The PCA Machinery –How to do it –Examples.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
CSE 300: Software Reliability Engineering Topics covered: Software metrics and software reliability.
Bayesian belief networks 2. PCA and ICA
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 9 Data Analysis Martin Russell.
Techniques for studying correlation and covariance structure
Agenda Dimension reduction Principal component analysis (PCA)
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
1 Activity and Motion Detection in Videos Longin Jan Latecki and Roland Miezianko, Temple University Dragoljub Pokrajac, Delaware State University Dover,
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Principal Components Analysis (PCA). a technique for finding patterns in data of high dimension.
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Population Stratification
Population Stratification
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Input: A set of people with/without a disease (e.g., cancer) Measure a large set of genetic markers for each person (e.g., measurement of DNA at various.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.
Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.
Chapter 7 Multivariate techniques with text Parallel embedded system design lab 이청용.
Lab 13: Association Genetics December 5, Goals Use Mixed Models and General Linear Models to determine genetic associations. Understand the effect.
CSSE463: Image Recognition Day 27 This week This week Today: Applications of PCA Today: Applications of PCA Sunday night: project plans and prelim work.
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete populations Katarzyna Bryc Postdoctoral Fellow, Reich Lab, Harvard.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
CSSE463: Image Recognition Day 25 This week This week Today: Applications of PCA Today: Applications of PCA Sunday night: project plans and prelim work.
Principal Components Analysis ( PCA)
Central limit theorem - go to web applet. Correlation maps vs. regression maps PNA is a time series of fluctuations in 500 mb heights PNA = 0.25 *
Principal components analysis
Principal Component Analysis (PCA)
CSSE463: Image Recognition Day 27
Factor Analysis An Alternative technique for studying correlation and covariance structure.
9.3 Filtered delay embeddings
Principal components analysis
Principal Component Analysis (PCA)
Marker heritability Biases, confounding factors, current methods, and best practices Luke Evans, Matthew Keller.
Principal Component Analysis
Combining Evidence of Natural Selection with Association Analysis Increases Power to Detect Malaria-Resistance Variants  George Ayodo, Alkes L. Price,
Techniques for studying correlation and covariance structure
Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Principal Components Analysis
CSSE463: Image Recognition Day 25
ALL the following plots are subject to the filtering :
Factor Analysis An Alternative technique for studying correlation and covariance structure.
Principal Components What matters most?.
Digital Image Processing Lecture 21: Principal Components for Description Prof. Charlene Tsai *Chapter 11.4 of Gonzalez.
CSSE463: Image Recognition Day 25
Principal Component Analysis
James A. Lautenberger, J. Claiborne Stephens, Stephen J
Selecting a Maximally Informative Set of Single-Nucleotide Polymorphisms for Association Analyses Using Linkage Disequilibrium  Christopher S. Carlson,
Outline Variance Matrix of Stochastic Variables and Orthogonal Transforms Principle Component Analysis Generalized Eigenvalue Decomposition.
Presentation transcript:

Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris Amos

Introduction Analysis of association between markers and disease causing loci because of strong linkage (i.e. linkage disequilibrium) is more efficient than linkage analysis When samples arise from different ethnic groups, or an admixed population, spurious association occurs, resulting in false positives

Introduction Genomic control approach (GC) Transmission/disequilibrium test (TDT) Structured association (MCMC) Principle component approaches –Traditional: marker-oriented –Eigenstrat: sample-oriented Eigenstart Theory: implemented in EIGENSTART and HelixTree

Eigenstrat References 1. Price, Alkes L., Patterson, Nick J. Plenge, Robert M. Weinblatt, Michael E. Shadick, Nancy A. Reich, David. (2006). ユ Principal Components Analysis Corrects for Stratification in Genome-Wide Associations Studies ユ. Nature Genetics 38, Patterson N, Price AL, Reich D (2006) Population Structure and Eigenanalysis PLoS Genet 2(12): e190. doi: /journal.pgen

Eigenstart Theory: Model Data for association test: MK1MK2…….MKN Ind1g11g12…….g1N ………………………………….. IndMgM1gM2…….gMN

Eigenstrat Theory: Model Define random vector with M components for the M individuals Values of genotypes of the M Individuals at any marker are a special realization of this random vector

Eigenstrat Theory: Model The randomness is from both drawing genotypes and choosing allele frequency Under this model, genetically independent individuals will not be independent to each other Covariance between individuals from different subpopulations are smaller than that from the same subpopulations

Eigenstrat Theory: Model Only population properties of the PCA are considered (no sample properties considered), in order to gain some theoretical guidelines for interpreting PC-PC plots

Case 1: one-subpopulation Covariance matrix

Case 1: one-subpopulation Large eigenvalue Eigenvector: Small eigenvalue

Case 1: one-subpopulation Large eigenvalue reflects co-variation of individuals Small eigenvalues reflect variations between individuals Neither is for population stratification!

Case 1: one-subpopulation Zero-mean transform

Case 2: two-subpopulations Random vector Covariance matrix

Case 2: two-subpopulation There are two large eigenvalues, with corresponding eigenvectors having constant values for individuals in the same subpopulations. --- They are mixture of variances caused by stratification and intra- population co-variations Small eigenvalues are the same as in homogenous population

Case 2: Two-subpopulation Zero-mean transform

Case 2: two-subpopulation The two large eigenvalues and corresponding eigenvectors

Case 2: two-subpopulation case Reflecting variation caused by stratification If there are only two subpopulations, do NOT plot a PC vs PC figure; only the eigenvector of the largest eigenvalue shows the population structure.

Case 3: Three subpopulations There are now three sub- populations

Case 3: Three subpopulations Zero-mean transform

Case 3

General Case: K subpopulations

Summary Only large eigenvalues reflect variations caused by stratification There are K-1 large eigenvalues if there are K subpopulations If there are merely two subpopulations, only the eigenvector of the first largest eigenvalue tells the population structure; no two-dimensional PC-PC plot should be inspected In the case of multiple subpopulations, all K-1 vectors of the large eigenvalues should be carefully inspected in order to classify individuals into K subpopulations and infer the inter-population relationships

…… First off, if you choose as many components as there are markers, if that ユ s possible, you will wind up subtracting out ALL effects, thus getting nothing from your tests! The best answer consists of first simply obtaining the components themselves and their corresponding eigenvectors. (Do this either while running uncorrected tests or from the separate PCA window.) Then look at the pattern of the eigenvalues. If the first few are very large compared with the remaining eigenvalues, then use that many components in a second analysis in which you DO apply the PCA technique. ……. Helix Manual: