Association Modeling With iPlant

Slides:



Advertisements
Similar presentations
Planning breeding programs for impact
Advertisements

Association Tests for Rare Variants Using Sequence Data
Association Mapping as a Breeding Strategy
Experimental crosses. Inbred Strain Cross Backcross.
Qualitative and Quantitative traits
ASSOCIATION MAPPING WITH TASSEL Presenter: VG SHOBHANA PhD Student CPMB.
GBS & GWAS using the iPlant Discovery Environment
Genome-wide association mapping Introduction to theory and methodology
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
PAG 2011 TASSEL Terry Casstevens 1, Peter Bradbury 2,3, Zhiwu Zhang 1, Yang Zhang 1, Edward Buckler 1,2,4 1 Institute.
QTL Mapping R. M. Sundaram.
MALD Mapping by Admixture Linkage Disequilibrium.
Quantitative Genetics Theoretical justification Estimation of heritability –Family studies –Response to selection –Inbred strain comparisons Quantitative.
Statistical association of genotype and phenotype.
Lab 13: Association Genetics. Goals Use a Mixed Model to determine genetic associations. Understand the effect of population structure and kinship on.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
Lecture 5 Artificial Selection R = h 2 S. Applications of Artificial Selection Applications in agriculture and forestry Creation of model systems of human.
Quantitative Genetics
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Estimating recombination rates using three-site likelihoods Jeff Wall Program in Molecular and Computational Biology, USC.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Quantitative Genetics
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
QTL mapping in animals. It works QTL mapping in animals It works It’s cheap.
Methods of Genome Mapping linkage maps, physical maps, QTL analysis The focus of the course should be on analytical (bioinformatic) tools for genome mapping,
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
ConceptS and Connections
Fine mapping QTLs using Recombinant-Inbred HS and In-Vitro HS William Valdar Jonathan Flint, Richard Mott Wellcome Trust Centre for Human Genetics.
Population Stratification
1 Association Analysis of Rare Genetic Variants Qunyuan Zhang Division of Statistical Genomics Course M Computational Statistical Genetics.
Gene Hunting: Linkage and Association
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Complex Traits Most neurobehavioral traits are complex Multifactorial
Quantitative Genetics
Overview of developments. Nested Association Mapping (NAM) Jianming Yu, James B. Holland, Michael D. McMullen and Edward S. Buckler, Genetics, Vol. 178,
QTL Mapping in Heterogeneous Stocks Talbot et al, Nature Genetics (1999) 21: Mott et at, PNAS (2000) 97:
INTRODUCTION TO ASSOCIATION MAPPING
Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
Lab 13: Association Genetics December 5, Goals Use Mixed Models and General Linear Models to determine genetic associations. Understand the effect.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
Association between genotype and phenotype
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
Population structure at QTL d A B C D E Q F G H a b c d e q f g h The population content at a quantitative trait locus (backcross, RIL, DH). Can be deduced.
GenABEL: an R package for Genome Wide Association Analysis
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Chapter 22 - Quantitative genetics: Traits with a continuous distribution of phenotypes are called continuous traits (e.g., height, weight, growth rate,
Why you should know about experimental crosses. To save you from embarrassment.
Using Merlin in Rheumatoid Arthritis Analyses Wei V. Chen 05/05/2004.
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
Efficient calculation of empirical p- values for genome wide linkage through weighted mixtures Sarah E Medland, Eric J Schmitt, Bradley T Webb, Po-Hsiu.
Genetic mapping and QTL analysis - JoinMap and QTLNetwork -
Washington State University
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
upstream vs. ORF binding and gene expression?
From: Will genomic selection be a practical method for plant breeding?
Washington State University
Genome Wide Association Studies using SNP
Mapping Quantitative Trait Loci
MULTIPLE GENES AND QUANTITATIVE TRAITS
Genome-wide Association Studies
Genetic Drift, followed by selection can cause linkage disequilibrium
Presentation transcript:

Association Modeling With iPlant

Goals of this Section Familiarize with the basic concepts of quantitative genetics: Traits, phenotypes, genotypes Understand the basics of trait mapping Understand the conceptual foundations of association studies Lear how to perform a genome wide association study in the iPlant Discovery Environment Obtain genotypes Run a Mixed Linear Model

Phenotype Observable (measurable) trait (character) of an organism Trait: eye color Phenotype: wild type (red), white eyed, orange eyed http://www.unc.edu/depts/our/hhmi/hhmi-ft_learning_modules/fruitflymodule/phenotypes.html

Qualitative Traits Campbell, 8e

Controlled by One Locus

Co-segregation in Pedigree Donahue, R. P., et al., Probable assignment of the Duffy blood group locus to chromosome 1 in man, Proceedings of the National Academy of Sciences 61, 949-955 (1968).

Quantitative Trait Carlos Harjes

Trait Varies on a Continuous Scale Frequency Trait Value

Quantitative Traits Probably caused by multiple loci Interaction effects Environment If the mean trait value for individuals with marker state MM is different from the mean trait value of individuals with marker state mm (i.e. the marker is associated with the phenotype), then the marker is linked to a quantitative trait locus.

Individuals Trait value Markers Marker #6 Mean Trait Value Present 110 ± 10 Absent 115 ± 13 Marker #3 Mean Trait Value Present 99 ± 5 Absent 118 ± 8

Quantitative Genetics Exploring the Genetic Architecture* Underlying Quantitative Traits *Genetic Architecture How many loci? Which location? How strong?

Tools for Statistical Genetics in the DE Purpose Genotype by Sequencing Workflow Automatic pipeline for extracting SNPs from GBS data (with genome from user or from iPlant database) UNEAK pipeline Automatic pipeline for extracting SNPs from GBS data without reference genomes MLM workflow Automatic workflow for fitting Mixed Linear Model GLM workflow Automatic workflow for fitting General Linear Model QTLC workflow Automatic workflow for composite interval mapping QTL simulation workflow Automatic workflow for simulating trait data with given linkage map PLINK PLINK implementation of various association models Zmapqtl Interval mapping and composite interval mapping with the options to perform a permutation test LRmapqtl Linear regression modeling SRmapqtl Stepwise regression modeling AntEpiSeeker Epistatic interaction modeling Random Jungle Random Forest implementation for GWAS FaST-LMM Factored Spectrally Transformed Linear Mixed Modeling Qxpak Versatile mixed modeling gluH2P Convert Hapmap format to Ped format LD Linkage Disequilibrium plot Structure Estimation of population structure PGDSpider Data conversion tool GLMstrucutre GLM with population structure as fixed effect

A Model for Quantitative Traits Phenotype Genotype Environment P = G + E + GG + GE P = G + e P=Phenotype G=Genotype E=Environment GG=Interaction between genotypes GE=Interaction between genotype and environment

A Statistical Model for QTLs P=G + e yij trait value in individual j with genotype i β0 population average of trait value β1 effect of marker i on trait value xi marker genotype i εij error term General Linear Model (in matrix notation): Y=Xb + e Note: If errors are not normally distributed, use generalized linear models

http://concord.org/publications/newsletter/2009-spring/genetics

Linkage Mapping (QTL Mapping) Designed population F2 Recombinant inbred (RIL) Double-Haploid (DH) Back-cross (B2)

Limitation of Linkage Mapping Needs large number of related individuals Resolution limited (interval contains 100s of genes) QTL position and effect are confounded

Association Mapping Use random collection of individuals from natural population Very dense marker map = very high resolution

Linkage & Recombination Recombination causes linkage decay Other factors affecting LD: Selection (artificial or natural) Drift Mutations Population structure Demography

Linkage Disequilibrium

Pitfalls: Population Structure Difference in allele frequencies between subpopulations Due to neutral or adaptive processes Can create spurious association

No association within groups

Similar effect due to presence of related individuals (esp. in plants) Can be accounted for using the data: Estimate number of subpopulations Assign individuals to subpopulation Estimate kinship

Accounting for Random Effects: Mixed Linear Models "Cost" associated with estimating a parameter We are not interested in the value of the parameter, only the variance Q-K method (structured association) y=Xβ+Sα+Qv+Zu+e Fixed effects: β Vector of fixed effects α Vector of SNPs effects v Vector of subpopulation effects Random effects: u Vector of kinship effects e Residuals Q Matrix of population association (STRUCTURE) X, S, Z Incidence Matrices

Traits MLM Markers Population Structure STRUCTURE Kinship TASSEL

Obtain Markers Genome Resequencing Workflow Genotyping By Sequencing

MLM Pipeline for GWAS Ed Buckler (Cornell University) TASSEL marker trait filter convert impute K GLM MLM Zhang et al. Nature Genetics. 2010; doi:10.1038/ng.546 http://www.maizegenetics.net/statistical-genetics http://www.maizegenetics.net/tassel/docs/Tassel_User_Guide_3.0.pdf

MLM Input Files Hapmap file Phenotype data Kinship matrix* traits strain Hapmap file Phenotype data Kinship matrix* Population structure* Population structure 3 populations sum to 1 strain * Kinship matrix & population structure data can be generated using TASSEL or with “MLM Workflow” App in DE

Origin Hapmap file: Phenotype data Kinship matrix Population structure Download (e.g. http://triticeaetoolbox.org/) Convert from PLINK (.map/.ped) using Tassel 3 Conversion Impute with NPUTE Transform to numerical format with NumericalTransform Phenotype data Kinship matrix Generate from hapmap marker data with Kinship Population structure Generate using ParallelStructure Convert to matrix with Structure2Tassel

MLM Output MLM1.txt MLM2.txt MLM3.txt See TASSEL manual for details: Marker “df” degrees of freedom “F” F distribution for test of marker “p” p-value “errordf” df used for denominator of F-test etc. MLM2.txt Estimated effect for each allele for each marker MLM3.txt The compression results shows the likelihood, genetic variance, and error variance for each compression level tested during the optimization process. See TASSEL manual for details: http://www.maizegenetics.net/tassel/docs/Tassel_User_Guide_3.0.pdf

THANKS!