Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete populations Katarzyna Bryc Postdoctoral Fellow, Reich Lab, Harvard.

Slides:



Advertisements
Similar presentations
FMRI Methods Lecture 10 – Using natural stimuli. Reductionism Reducing complex things into simpler components Explaining the whole as a sum of its parts.
Advertisements

Statistical perturbation theory for spectral clustering Harrachov, 2007 A. Spence and Z. Stoyanov.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Correlation Mechanics. Covariance The variance shared by two variables When X and Y move in the same direction (i.e. their deviations from the mean are.
Patterns of population structure and admixture among human populations Katarzyna Bryc OEB 275br February 19, 2013.
Lab 13: Association Genetics. Goals Use a Mixed Model to determine genetic associations. Understand the effect of population structure and kinship on.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Brian Kinlan UC Santa Barbara Integral-difference model simulations of marine population genetics.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
Population structure identification BNFO 602 Roshan.
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
Pathway Analysis. Goals Characterize biological meaning of joint changes in gene expression Organize expression (or other) changes into meaningful ‘chunks’
Agenda Dimension reduction Principal component analysis (PCA)
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
Differential relatedness of African Americans to populations within West Africa Katarzyna Bryc 1**, Amy Williams 1**, Nick Patterson 2, Solomon Musani.
Regression and Correlation Methods Judy Zhong Ph.D.
Chapter 2 Dimensionality Reduction. Linear Methods
Presented By Wanchen Lu 2/25/2013
Next. A Big Thanks Again Prof. Jason Bohland Quantitative Neuroscience Laboratory Boston University.
Extensions of PCA and Related Tools
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
Microevolution  Look at processes by which inherited traits change over time  Changes in numbers & types of alleles  Measured in terms of frequency.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Association of Pattern Dystrophy With an HTRA1 Single-Nucleotide Polymorphism Jaouni T, Averbukh E, Burstyn-Cohen T, et al. Association of pattern dystrophy.
Population Stratification
Experimental validation. Integration of transcriptome and genome sequencing uncovers functional variation in human populations Tuuli Lappalainen et al.
1 Dimension Reduction Examples: 1. DNA MICROARRAYS: Khan et al (2001): 4 types of small round blue cell tumors (SRBCT) Neuroblastoma (NB) Rhabdomyosarcoma.
Large-scale recombination rate patterns are conserved among human populations David Serre McGill University and Genome Quebec Innovation Center UQAM January.
1 Sample Geometry and Random Sampling Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of Networking.
Math 5364/66 Notes Principal Components and Factor Analysis in SAS Jesse Crawford Department of Mathematics Tarleton State University.
Lab 13: Association Genetics December 5, Goals Use Mixed Models and General Linear Models to determine genetic associations. Understand the effect.
Unsupervised Learning Motivation: Given a set of training examples with no teacher or critic, why do we learn? Feature extraction Data compression Signal.
© Buddy Freeman, 2015 Let X and Y be two normally distributed random variables satisfying the equality of variance assumption both ways. For clarity let.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
Principle Component Analysis and its use in MA clustering Lecture 12.
Godfrey Hardy ( ) Wilhelm Weinberg ( ) Hardy-Weinberg Principle p + q = 1 Allele frequencies, assuming 2 alleles, one dominant over the.
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
3 “Products” of Principle Component Analysis
Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris Amos.
Principal Components Analysis ( PCA)
Central limit theorem - go to web applet. Correlation maps vs. regression maps PNA is a time series of fluctuations in 500 mb heights PNA = 0.25 *
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Principal components analysis
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Principal Component Analysis
Principal Component Analysis (PCA)
Common variation, GWAS & PLINK
Gil McVean Department of Statistics
Exploring Microarray data
Principal components analysis
KEY CONCEPT Hardy-Weinberg equilibrium provides a framework for understanding how populations evolve.
Fig. 2. —The 26 models implemented in this study
Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA.
SVD, PCA, AND THE NFL By: Andrew Zachary.
Analysis of Audio Using PCA
Incorporating changing population size into the coalescent
Principal Components Analysis
Digital Image Processing Lecture 21: Principal Components for Description Prof. Charlene Tsai *Chapter 11.4 of Gonzalez.
Factor Analysis (Principal Components) Output
Distribution of eigenvalues from an eigendecomposition of the genomic relatedness matrix for all 110 lines excluding one large eigenvalue where . Distribution.
The principles of genetic association
Hardy-Weinberg Lab Data
Jung-Ying Tzeng, Daowen Zhang  The American Journal of Human Genetics 
The Time and Place of European Gene Flow into Ashkenazi Jews
Outline Variance Matrix of Stochastic Variables and Orthogonal Transforms Principle Component Analysis Generalized Eigenvalue Decomposition.
Presentation transcript:

Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete populations Katarzyna Bryc Postdoctoral Fellow, Reich Lab, Harvard Medical School Visiting Postdoctoral Fellow, 23andMe Rosenberg lab meeting, Stanford University January 22, 2014

Goal: think a lot about PCA Role in population genetics – Exploratory data analysis – Population structure inference Relationship to other methods Deepen understanding of the math – i.e., what is an eigenvalue exactly? Better interpret, understand, and judge PCA results

Principal Components Analysis (PCA) Invented in 1901 by Karl Pearson Goes by many names; lots of overlap with methods used in other fields – Singular Value Decomposition (SVD) – Eigenvalue decomposition of covariance matrix – Factor analysis – Spectral decomposition in signal processing Nothing intrinsic to PCA for genetic data – it’s just a method

Role of PCA natural selection genetic drift mutation gene flow recombination population structure  PCA allele frequency Population genetics

PCA in population genetics Learning about human history Visualization Luigi Luca Cavalli-Sforza The History and Geography of Human Genes (1994) Based on 194 blood polymorphisms from 42 populations suggested waves of expansion. Genes mirror geography within Europe Novembre et al. (2008) Nature Based on 500K SNPs from 3,000 Europeans

PCA in population genetics Demography Sampling Admixture McVean (2009) PLoS Gen View as matrix factorization unifies PCA and ADMIXTURE/STRUCTURE Engelhart & Stephens (2010) PLoS Gen

PCA in population genetics Test for correlation with geography Wang et al. (2010) Stat. App. Gen. Mol. Bio. Procrustes transform of the data; PCA significantly similar to geographic coordinates Eigenanalysis: detecting and quantifying structure Formal test for structure x is approximately distributed as Tracy-Widom Patterson et al. (2006) PLoS Gen

To scale or not to scale PCA is not scale-invariant Typically each attribute (SNP) is normalized – Makes sense if you want each SNP to be “weighted” equally – But: Normalization by the sample variance (for a SNP) = normalization by a random variable. Eek! For mathematical tractability, we do not normalize.