October 2008BMI Chair Talk © Brian S. Yandell1 networking in biochemistry: building a mouse model of diabetes Brian S. Yandell, UW-Madison October 2008.

Slides:



Advertisements
Similar presentations
The genetic dissection of complex traits
Advertisements

12 Jan 2006Hort Retreat © Brian S Yandell1 Genetics, Microarrays & Evolution: Issues as a Statistician (sans formula) Brian S. Yandell Horticulture, Statistics.
METHODS FOR HAPLOTYPE RECONSTRUCTION
QTL Mapping R. M. Sundaram.
Inferring Causal Phenotype Networks Elias Chaibub Neto & Brian S. Yandell UW-Madison June 2010 QTL 2: NetworksSeattle SISG: Yandell ©
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.
April 2010UW-Madison © Brian S. Yandell1 Bayesian QTL Mapping Brian S. Yandell University of Wisconsin-Madison UW-Madison,
Multiple Correlated Traits Pleiotropy vs. close linkage Analysis of covariance – Regress one trait on another before QTL search Classic GxE analysis Formal.
Model SelectionSeattle SISG: Yandell © QTL Model Selection 1.Bayesian strategy 2.Markov chain sampling 3.sampling genetic architectures 4.criteria.
Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.
Seattle Summer Institute : Systems Genetics for Experimental Crosses Brian S. Yandell, UW-Madison Elias Chaibub Neto, Sage Bionetworks
April 2008UW-Madison © Brian S. Yandell1 Bayesian QTL Mapping Brian S. Yandell University of Wisconsin-Madison UW-Madison,
Quantile-based Permutation Thresholds for QTL Hotspots Brian S Yandell and Elias Chaibub Neto 17 March © YandellMSRC5.
Professor Brian S. Yandell joint faculty appointment across colleges: –50% Horticulture (CALS) –50% Statistics (Letters & Sciences) Biometry Program –MS.
QTL 2: OverviewSeattle SISG: Yandell © Seattle Summer Institute 2010 Advanced QTL Brian S. Yandell, UW-Madison
Monsanto: Yandell © Building Bridges from Breeding to Biometry and Biostatistics Brian S. Yandell Professor of Horticulture & Statistics Chair of.
Computational Infrastructure for Systems Genetics Analysis Brian Yandell, UW-Madison high-throughput analysis of systems data enable biologists & analysts.
October 2007Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison
October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison
BayesNCSU QTL II: Yandell © Bayesian Interval Mapping 1.what is Bayes? Bayes theorem? Bayesian QTL mapping Markov chain sampling18-25.
Yandell © June Bayesian analysis of microarray traits Arabidopsis Microarray Workshop Brian S. Yandell University of Wisconsin-Madison
Yandell © 2003JSM Dimension Reduction for Mapping mRNA Abundance as Quantitative Traits Brian S. Yandell University of Wisconsin-Madison
Lecture 22: Quantitative Traits II
Yandell © 2003Wisconsin Symposium III1 QTLs & Microarrays Overview Brian S. Yandell University of Wisconsin-Madison
Why you should know about experimental crosses. To save you from embarrassment.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
13 October 2004Statistics: Yandell © Inferring Genetic Architecture of Complex Biological Processes Brian S. Yandell 12, Christina Kendziorski 13,
April 2008UW-Madison © Brian S. Yandell1 Bayesian QTL Mapping Brian S. Yandell University of Wisconsin-Madison UW-Madison,
Gene Mapping for Correlated Traits Brian S. Yandell University of Wisconsin-Madison Correlated TraitsUCLA Networks (c)
September 2002Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Quantitative Trait Loci using Markov chain Monte Carlo in Experimental Crosses.
NSF StatGen: Yandell © NSF StatGen 2009 Bayesian Interval Mapping Brian S. Yandell, UW-Madison overview: multiple.
ModelNCSU QTL II: Yandell © Model Selection for Multiple QTL 1.reality of multiple QTL3-8 2.selecting a class of QTL models comparing QTL models16-24.
Bayesian Variable Selection in Semiparametric Regression Modeling with Applications to Genetic Mappping Fei Zou Department of Biostatistics University.
Plant Microarray Course
Quantile-based Permutation Thresholds for QTL Hotspots
Quantile-based Permutation Thresholds for QTL Hotspots
Professor Brian S. Yandell
Identifying QTLs in experimental crosses
Computational Infrastructure for Systems Genetics Analysis
upstream vs. ORF binding and gene expression?
Model Selection for Multiple QTL
Multiple Traits & Microarrays
Bayesian Model Selection for Multiple QTL
Bayesian Inference for QTLs in Inbred Lines I
Graphical Diagnostics for Multiple QTL Investigation Brian S
Gene mapping in mice Karl W Broman Department of Biostatistics
Inferring Genetic Architecture of Complex Biological Processes BioPharmaceutical Technology Center Institute (BTCI) Brian S. Yandell University of Wisconsin-Madison.
Bayesian Model Selection for Quantitative Trait Loci with Markov chain Monte Carlo in Experimental Crosses Brian S. Yandell University of Wisconsin-Madison.
Inferring Genetic Architecture of Complex Biological Processes Brian S
Jax Workshop © Brian S. Yandell
Inferring Causal Phenotype Networks
High Throughput Gene Mapping Brian S. Yandell
Bayesian Interval Mapping
Brian S. Yandell University of Wisconsin-Madison 26 February 2003
Bayesian Model Selection for Multiple QTL
Genome-wide Association Studies
Bayesian Interval Mapping
Inferring Causal Phenotype Networks Driven by Expression Gene Mapping
Seattle SISG: Yandell © 2007
Seattle SISG: Yandell © 2008
examples in detail days to flower for Brassica napus (plant) (n = 108)
Inferring Genetic Architecture of Complex Biological Processes Brian S
Seattle SISG: Yandell © 2006
R/qtlbim Software Demo
Note on Outbred Studies
Brian S. Yandell University of Wisconsin-Madison
Seattle SISG: Yandell © 2009
Bayesian Interval Mapping
eQTL Tools a collaboration in progress
Presentation transcript:

October 2008BMI Chair Talk © Brian S. Yandell1 networking in biochemistry: building a mouse model of diabetes Brian S. Yandell, UW-Madison October Real knowledge is to know the extent of one’s ignorance. Confucius (on a bench in Seattle)‏

October 2008BMI Chair Talk © Brian S. Yandell2 outline 1.how did I got here? 2.what problems caught my eye? 3.what have I done, anyway? 4.how do I work in teams? 5.what challenges remain?

October 2008BMI Chair Talk © Brian S. Yandell3 how did I get here? Biostatistics, School of Public Health, UC-Berkeley 1981 –RA/TA with EL Scott, J Neyman, CL Chiang, S Selvin –PhD 1981 non-parametric inference for hazard rates (Kjell A Doksum)‏ –Annals of Statistics (1983) 50 citations to date (2 in 2008)‏ research evolution –early career focus on survival analysis –shift to non-parametric regression ( )‏ –shift to statistical genomics (1991--)‏ joined Biometry Program at UW-Madison in 1982 –attracted by chance to blend statistics, computing and biology –valued balance of mathematical theory against practice –enjoyed developing methodology driven by collaboration

October 2008BMI Chair Talk © Brian S. Yandell4 Yandell “Lab” Projects Bayesian QTL Model Selection –R software development (Whipple Neely)‏ –collaboration with UAB & Jackson Labs –data analysis of SCD1, ins10 meta-analysis for fine mapping Sorcs1 –Chr 19 QTL introgressed as congenic lines –combined analysis across to increase power QTL-based causal biochemical networks –algorithm development (Elias Chaibub)‏ –data analysis with Christine Ferrara, Duke U

October 2008BMI Chair Talk © Brian S. Yandell5 Rosetta: Schadt, Zhang, Zhu UAB: Allison, Yi stat/hort: Yandell BMI: Kendziorski, Broman, Craven Jax: Churchill, von Smith Duke: Newgaard, Ferrara biochem: Attie, Keller, Zhu

October 2008BMI Chair Talk © Brian S. Yandell6 Pareto diagram of QTL effects major QTL on linkage map major QTL minor QTL polygenes (modifiers)‏

October 2008BMI Chair Talk © Brian S. Yandell7 problems of single QTL approach wrong model: biased view –fool yourself: bad guess at locations, effects –detect ghost QTL between linked loci –miss epistasis completely low power bad science –use best tools for the job –maximize scarce research resources –leverage already big investment in experiment

October 2008BMI Chair Talk © Brian S. Yandell8 advantages of multiple QTL approach improve statistical power, precision –increase number of QTL detected –better estimates of loci: less bias, smaller intervals improve inference of complex genetic architecture –patterns and individual elements of epistasis –appropriate estimates of means, variances, covariances asymptotically unbiased, efficient –assess relative contributions of different QTL improve estimates of genotypic values –less bias (more accurate) and smaller variance (more precise)‏ –mean squared error = MSE = (bias) 2 + variance

October 2008BMI Chair Talk © Brian S. Yandell9 QTL mapping idea observe phenotype y, marker genotypes m genetic architecture  identifies model –number and location of QTL –gene action and epistasis (pairwise interactions)‏ missing data: genotypes q at may be unknown –pr(q | m,,  )‏ –form of genotype model well known phenotype y depends on genotype q –pr(y | q, µ,  )‏ –often linear model in q –possible interactions among QTL (epistasis)‏

October 2008BMI Chair Talk © Brian S. Yandell10

October 2008BMI Chair Talk © Brian S. Yandell11 how does phenotype y improve guess of QTL genotypes q? what are probabilities for genotype q between markers? recombinants AA:AB all 1:1 if ignore y and if we use y?

October 2008BMI Chair Talk © Brian S. Yandell12 Gibbs sampler for loci indicators QTL at pseudomarkers loci indicators   = 1 if QTL present  = 0 if no QTL present Gibbs sampler on loci indicators  –relatively easy to incorporate epistasis –Yi et al. (2005, 2007 Genetics)‏ (earlier work of Yi, Ina Hoeschele)‏

October 2008BMI Chair Talk © Brian S. Yandell13 likelihood and posterior likelihood relates “known” data (y,m,q) to unknown values of interest ( ,,  )‏ –pr(y,q|m, ,,  ) = pr(y|q, ,  ) pr(q|m,,  )‏ –mix over unknown genotypes (q)‏ posterior turns likelihood into a distribution –weight likelihood by priors –rescale to sum to 1.0 –posterior = likelihood * prior / constant

October 2008BMI Chair Talk © Brian S. Yandell14 Bayes theorem for QTLs

October 2008BMI Chair Talk © Brian S. Yandell15 why use a Bayesian approach? first, do both classical and Bayesian –always nice to have a separate validation –each approach has its strengths and weaknesses classical approach works quite well –selects large effect QTL easily –directly builds on regression ideas for model selection Bayesian approach is comprehensive –samples most probable genetic architectures –formalizes model selection within one framework –readily (!) extends to more complicated problems

October 2008BMI Chair Talk © Brian S. Yandell16 Markov chain sampling construct Markov chain around posterior –posterior is stable distribution of Markov chain –use MC samples to estimate posterior sample QTL model unknowns from full conditionals –update unknowns one at a time or in batches

October 2008BMI Chair Talk © Brian S. Yandell17 Bayes posterior vs. maximum likelihood LOD: classical Log ODds –maximize likelihood over effects µ –R/qtl scanone/scantwo: method = “em” LPD: Bayesian Log Posterior Density –average posterior over effects µ –R/qtl scanone/scantwo: method = “imp”

October 2008BMI Chair Talk © Brian S. Yandell18 LOD & LPD: 1 QTL n.ind = 100, 10 cM marker spacing

October 2008BMI Chair Talk © Brian S. Yandell19 marginal LOD or LPD what is contribution of a QTL adjusting for all others? –improvement in LPD due to QTL at locus –contribution due to main effects, epistasis, GxE? how does adjusted LPD differ from unadjusted LPD? –raised by removing variance due to unlinked QTL –raised or lowered due to bias of linked QTL –analogous to Type III adjusted ANOVA tests can ask these same questions using classical LOD –see Broman’s newer tools for multiple QTL inference

October 2008BMI Chair Talk © Brian S. Yandell20 1-QTL LOD vs. marginal LPD 1-QTL LOD

October 2008BMI Chair Talk © Brian S. Yandell21 hyper data: scanone

October 2008BMI Chair Talk © Brian S. Yandell22 what is best estimate of QTL? find most probable pattern –1,4,6,15,6:15 has posterior of 3.4% estimate locus across all nested patterns –Exact pattern seen ~100/3000 samples –Nested pattern seen ~2000/3000 samples estimate 95% confidence interval using quantiles > best <- qb.best(qbHyper)‏ > summary(best)$best chrom locus locus.LCL locus.UCL n.qtl > plot(best)‏ Manichaikul et al Genetics (in review)‏

October 2008BMI Chair Talk © Brian S. Yandell23 what patterns are “near” the best? size & shade ~ posterior distance between patterns –sum of squared attenuation –match loci between patterns –squared attenuation = (1-2r) 2 –sq.atten in scale of LOD & LPD multidimensional scaling –MDS projects distance onto 2-D –think mileage between cities

October 2008BMI Chair Talk © Brian S. Yandell24 Software for Bayesian QTLs R/qtlbim : Properties –cross-compatible with R/qtl –new MCMC algorithms Gibbs with loci indicators; no reversible jump –epistasis, fixed & random covariates, GxE –extensive graphics Software history –initially designed (Satagopan, Yandell 1996)‏ –major revision and extension (Gaffney 2001)‏ –R/bim to CRAN (Wu, Gaffney, Jin, Yandell 2003)‏ –R/qtlbim to CRAN (Yi, Yandell et al. 2006)‏ Publications –Yi et al. (2005); Yandell et al. (2007); Yi et al. (2007ab)‏

October 2008BMI Chair Talk © Brian S. Yandell25 glucoseinsulin (courtesy AD Attie)‏ BTBR mouse is insulin resistant B6 is not make both obese…

October 2008BMI Chair Talk © Brian S. Yandell26 studying diabetes in an F2 mouse model: segregating panel from inbred lines –B6.ob x BTBR.ob  F1  F2 –selected mice with ob/ob alleles at leptin gene (Chr 6)‏ –sacrificed at 14 weeks, tissues preserved physiological study (Stoehr et al Diabetes)‏ –mapped body weight, insulin, glucose at various ages gene expression studies –RT-PCR for a few mRNA on 108 F2 mice liver tissues (Lan et al Diabetes; Lan et al Genetics)‏ –Affymetrix microarrays on 60 F2 mice liver tissues U47 A & B chips, RMA normalization design: selective phenotyping (Jin et al Genetics)‏

October 2008BMI Chair Talk © Brian S. Yandell27 log10(ins10) Chr 19 black=all blue=male red=female purple=sex- adjusted solid=512 mice dashed=311 mice

October 2008BMI Chair Talk © Brian S. Yandell28 Sorcs1 study in mice: 11 sub-congenic strains marker regression meta-analysis within-strain permutations Nature Genetics 2006 Clee, Yandell et al.

October 2008BMI Chair Talk © Brian S. Yandell29 we were lucky! BTBR background needed to see SORCS1 epistatic interaction of chr 19 and 8 … discovered much later

October 2008BMI Chair Talk © Brian S. Yandell30 Sorcs1 gene & SNPs

October 2008BMI Chair Talk © Brian S. Yandell31 Sorcs1 study in humans Diabetes 2007 Goodarzi et al.

October 2008BMI Chair Talk © Brian S. Yandell32 2M observations 30,000 traits 60 mice

October 2008BMI Chair Talk © Brian S. Yandell33 experimental context B6 x BTBR obese mouse cross –model for diabetes and obesity –500+ mice from intercross (F2)‏ –collaboration with Rosetta/Merck genotypes –5K SNP Affymetrix mouse chip –care in curating genotypes! (map version, errors, …)‏ phenotypes –clinical phenotypes (>100 / mouse)‏ –gene expression traits (>40,000 / mouse / 4-6 tissues)‏ –other molecular traits (proteomic, miRNA, metabolomic)‏

October 2008BMI Chair Talk © Brian S. Yandell34 QTL mapping thousands of gene expression traits PLoS Genetics 2006 Lan, Chen et al.

October 2008BMI Chair Talk © Brian S. Yandell35 red=trans blue=cis QTLs on chr n gray scale for variance

October 2008BMI Chair Talk © Brian S. Yandell36 Chaibub Neto et al. (2008)‏ Genetics

October 2008BMI Chair Talk © Brian S. Yandell37 causal phenotype networks goal: mimic biochemical pathways with directed (causal) networks problem: association (correlation) does not imply causation resolution: bring in driving causes –genotypes (at conception)‏ –processes earlier in time

October 2008BMI Chair Talk © Brian S. Yandell38 Causal vs Reactive? (Elias Chaibub, Brian Yandell) y1 causes y2: y1 ~ g1 and y2 ~ g2*y1

October 2008BMI Chair Talk © Brian S. Yandell39 Ferrara et al.

October 2008BMI Chair Talk © Brian S. Yandell40 inferring phenotype networks build in prior pathway knowledge (PPI, TF)‏ –co-map correlated traits Banerjee, Yandell, Yi (2008 Genetics)‏ –pathways induce correlation structure ramp up to 100s, 1000s of phenotypes? –danger of mixing unrelated pathways –want closely linked upstream (causal) drivers

October 2008BMI Chair Talk © Brian S. Yandell41 Rosetta: Schadt, Zhang, Zhu UAB: Allison, Yi stat/hort: Yandell BMI: Kendziorski, Broman, Craven Jax: Churchill, von Smith Duke: Newgaard, Ferrara biochem: Attie, Keller, Zhu

October 2008BMI Chair Talk © Brian S. Yandell42 why build Web eQTL tools? common storage/maintainence of data –one well-curated copy –central repository –reduce errors, ensure analysis on same data automate commonly used methods –biologist gets immediate feedback –statistician can focus on new methods –codify standard choices

October 2008BMI Chair Talk © Brian S. Yandell43 how does one build tools? no one solution for all situations use existing tools wherever possible –new tools take time and care to build! –downloaded databases must be updated regularly human component is key –need informatics expertise –need continual dialog with biologists build bridges (interfaces) between tools –Web interface uses PHP –commands are created dynamically for R continually rethink & redesign organization

October 2008BMI Chair Talk © Brian S. Yandell44 steps in using Web tools user enters data on Web page PHP tool interprets user data PHP builds R script R run on script –creates plots, summaries, warnings PHP grabs results & displays on page user examines, saves user modifies data and reruns

October 2008BMI Chair Talk © Brian S. Yandell45 raw data or fancy results? raw data flexible but slow –LOD profiles for 100 (1000) traits? fancy results from sophisticated analysis –IM, MIM, BIM, MOM analysis –too complicated to put in biologists’ hands? methods are unrefined, state-of-art, research tools use of methods involved many subtle choices –batch computation over weeks compute once, save, display many times

October 2008BMI Chair Talk © Brian S. Yandell46

October 2008BMI Chair Talk © Brian S. Yandell47 LOD profiles: many traits

October 2008BMI Chair Talk © Brian S. Yandell LOD interval approximate 95% CI

October 2008BMI Chair Talk © Brian S. Yandell49 red=trans blue=cis QTLs on chr n gray scale for variance

October 2008BMI Chair Talk © Brian S. Yandell50 what challenges remain? from eQTL to candidate pathways –statistical issues networks, correlated traits better model selection approaches –biological evidence (Weiss 2007 Genetics)‏ Mouse to human to mouse KOs, etc. upgrade informatics environment –harden local code (R, Python, PHP, …)‏ –build on other high throughput systems Swertz, Jansen (2007); Stein (2008) Nat Rev Gen

October 2008BMI Chair Talk © Brian S. Yandell51 many thanks Karl Broman Jackson Labs Gary Churchill Hao Wu Randy von Smith U AL Birmingham David Allison Nengjun Yi Tapan Mehta Samprit Banerjee Ram Venkataraman Daniel Shriner Michael Newton Hyuna Yang Daniel Sorensen Daniel Gianola Liang Li my students Jaya Satagopan Fei Zou Patrick Gaffney Chunfang Jin Elias Chaibub Neto W Whipple Neely Jee Young Moon USDA Hatch, NIH/NIDDK (Attie), NIH/R01 (Yi, Broman)‏ Tom Osborn David Butruille Marcio Ferrera Josh Udahl Pablo Quijada Alan Attie Jonathan Stoehr Hong Lan Susie Clee Jessica Byers Mark Keller