High Density Oligo Arrays for Single Feature Polymorphism Genotyping and Mapping Justin Borevitz Ecology & Evolution University of Chicago

Slides:



Advertisements
Similar presentations
Why this paper Causal genetic variants at loci contributing to complex phenotypes unknown Rat/mice model organisms in physiology and diseases Relevant.
Advertisements

Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Identification of markers linked to Selenium tolerance genes
Genetics of Adaptation: Arabidopsis thaliana as an ecological model Justin Borevitz Ecology & Evolution University of Chicago naturalvariation.org.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
1.Generate mutants by mutagenesis of seeds Use a genetic background with lots of known polymorphisms compared to other genotypes. Availability of polymorphic.
A Genomic Survey of Polymorphism and Linkage Disequilibrium Imran Mohiuddin Magnus Nordborg, Ph.D. University of Southern California.
Genomic Approaches to the Genetics of Adaptation Justin Borevitz Ecology & Evolution University of Chicago
Toward the genomics of Adaptation to seasonal environments in Arabidopsis thaliana Justin Borevitz Ecology & Evolution University of Chicago
Genomics of Natural Variation in Arabidopsis thaliana Justin Borevitz Salk Institute naturalvariation.org.
Natural Variation in Arabidopsis thaliana Light Response: Genomic Approaches Justin Borevitz Salk Institute naturalvariation.org.
MicroArray Evolution: expression to mapping and back again Justin Borevitz Salk Institute naturalvariation.org.
High Resolution Patterns of Variation in the Arabidopsis Genome Justin Borevitz University of Chicago naturalvariation.org.
Light response QTL in Arabidopsis thaliana: LIGHT1 cloning Justin Borevitz Ecology & Evolution University of Chicago naturalvariation.org.
Microarrays for mapping and expression analysis: Toward the genetic determinants of light response adaptation in Arabidopsis and Aquilegia Justin Borevitz.
Ecological Genomics Underlying Plant Evolution Deer mouse burrowBirds/insects in a cotton woodFresh water and marine invasives Aquilegia, Arabidopsis,
Introduction to Computational Biology Topics. Molecular Data Definition of data  DNA/RNA  Protein  Expression Basics of programming in Matlab  Vectors.
Genomic Methods for Cloning QTL Justin Borevitz University of Chicago naturalvariation.org.
Composite/ LegumeCotton wood, OakArabidopsis lyrata Miriam grass Aquilegia, Arabidopsis, Mimulus? Indiana Dunes National Lakeshore Justin Borevitz Ecology.
Markers, mapping, and expression using arrays Justin Borevitz Salk Institute naturalvariation.org.
Arrays as tools for Natural Variation studies: Mapping, Haplotyping, and gene expression Justin Borevitz University of Chicago naturalvariation.org`
Genetics and Genomics of Light Response adaptation in Arabidopsis thaliana Justin Borevitz Ecology & Evolution University of Chicago
Toward the genetic basis of adaptation using arrays Justin Borevitz Ecology & Evolution University of Chicago
Identification and Genotyping of Single Feature Polymorphisms in Complex Genomes Justin Borevitz University of Chicago naturalvariation.org.
Natural Variation in Light Response using Whole Genome Tiling Arrays Justin Borevitz Ecology & Evolution University of Chicago
EcoSystems Biology EcoSystems Biology
Toward the genetic basis of adaptation using arrays Justin Borevitz Ecology & Evolution University of Chicago
Toward the Ecological Genomics Underlying Plant Adaptation Deer mouse burrowBirds/insects in a cotton woodFresh water and marine invasives Aquilegia, Arabidopsis,
Genomics tools to identify the molecular basis of complex traits Justin Borevitz Salk Institute naturalvariation.org.
Genetics and Genomics of Light Response adaptation in Arabidopsis thaliana Justin Borevitz Ecology & Evolution University of Chicago
Global dissection of cis and trans regulatory variations in Arabidopsis thaliana Xu Zhang Borevitz Lab.
Toward the genetic basis of adaptation: Arrays/Association Mapping Justin Borevitz Ecology & Evolution University of Chicago
QTL mapping using Single Feature Polymorphisms Justin Borevitz Salk Institute naturalvariation.org.
Haplotype mapping with Single Feature Polymorphisms in Arabidopsis Justin Borevitz Ecology & Evolution University of Chicago
High Resolution Patterns of Variation in the Arabidopsis Genome Justin Borevitz Ecology & Evolution University of Chicago naturalvariation.org.
EXtreme Array Mapping and Haplotype analysis Using Arrays Justin Borevitz Salk Institute naturalvariation.org.
Genomic Systems underlying the genetics of adaptation in Arabidopsis thaliana Justin Borevitz Ecology & Evolution University of Chicago
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
SNP/Tiling arrays for very high density marker based breeding and QTL candidate gene identification Justin Borevitz Ecology & Evolution University of Chicago.
Mechanisms of Sustainable re Development: Lessons from Plants Justin Borevitz Ecology & Evolution University of Chicago
Towards the Arabidopsis Haplotype Map using Arrays Justin Borevitz Salk Institute naturalvariation.org.
Studies of Genome Wide Molecular Variation in Arabidopsis thaliana using Arrays Justin Borevitz Salk Institute naturalvariation.org.
Toward the genetic basis of adaptation: Arrays/Association Mapping Justin Borevitz Ecology & Evolution University of Chicago
Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang Borevitz Lab Whole genome transcriptome variation in Arabidopsis thaliana Xu Zhang.
Tiling arrays for genetic, epigentic, and environmental variation in Arabidopsis thaliana Justin Borevitz Ecology & Evolution University of Chicago
Array Genotyping to Dissect Quantitative Trait Loci in Arabidopsis thaliana Justin Borevitz Ecology and Evolution University of Chicago naturalvariation.org.
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Using mutants to clone genes Objectives 1. What is positional cloning? 2.What is insertional tagging? 3.How can one confirm that the gene cloned is the.
Natural Variation in Arabidopsis ecotypes. Using natural variation to understand diversity Correlation of phenotype with environment (selective pressure?)
Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.
IAP workshop, Ghent, Sept. 18 th, 2008 Mixed model analysis to discover cis- regulatory haplotypes in A. Thaliana Fanghong Zhang*, Stijn Vansteelandt*,
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
Development and Application of SNP markers in Genome of shrimp (Fenneropenaeus chinensis) Jianyong Zhang Marine Biology.
High Resolution Patterns of Variation in the Arabidopsis Genome Justin Borevitz University of Chicago naturalvariation.org.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Ecological and Evolutionary Systems biology: Conceptual and molecular tools for analysis Justin Borevitz Ecology & Evolution University of Chicago
Toward the genetic basis of adaptation using arrays Justin Borevitz Ecology & Evolution University of Chicago
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
1 Paper Outline Specific Aim Background & Significance Research Description Potential Pitfalls and Alternate Approaches Class Paper: 5-7 pages (with figures)
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
ChipViewer is coded to visualize and analyze the tiling chip data.
Linking Genetic Variation to Important Phenotypes
Volume 17, Issue 8, Pages (April 2007)
Volume 22, Issue 2, Pages (January 2012)
Flowering-time QTL in crosses of Lz-0 with Ler and Col.
Presentation transcript:

High Density Oligo Arrays for Single Feature Polymorphism Genotyping and Mapping Justin Borevitz Ecology & Evolution University of Chicago

Which arrays should be used? Spotted arrays Arizona 29, mers ATH1, Affymetrix expression GeneChip 202,806 unique 25bp oligo nucleotides features AtTILE1, universal whole genome array every ~35bp, > 3Million PM features Re-sequencing array 120M*8bp –20 Accessions, Perlegen, –Max Planck (Weigel), USC (Nordborg) GeneChip

RNADNA Universal Whole Genome Array Transcriptome Atlas Expression levels Tissues specificity Transcriptome Atlas Expression levels Tissues specificity Gene Discovery Gene model correction Non-coding/ micro-RNA Antisense transcription Gene Discovery Gene model correction Non-coding/ micro-RNA Antisense transcription Alternative Splicing Comparative Genome Hybridization (CGH) Insertion/Deletions Comparative Genome Hybridization (CGH) Insertion/Deletions Methylation Chromatin Immunoprecipitation ChIP chip Chromatin Immunoprecipitation ChIP chip Polymorphism SFPs Discovery/Genotyping Polymorphism SFPs Discovery/Genotyping ~35 bp tile,non-repetitive regions, “good” binding oligos,evenly spaced

ChipViewer: Mapping of transcriptional units of ORFeome From 2000v At1g09750 (MIPS) to the latest AGI At1g v Annotation (MIPS) The latest AGI Annotation

SNP SFP MMMMMM MMMMMM Chromosome (bp) conservation SNP ORFa start AAAAA Transcriptome Atlas ORFb deletion Improved Genome Annotation

Talk Outline Single Feature Polymorphisms (SFPs) Barley SFPs Uses of SFPs Haplotype analysis Expression

Potential Deletions

Spatial Correction Spatial Artifacts Improved reproducibility Next: Quantile Normalization

False Discovery and Sensitivity PM only SAM threshold 5% FDR GeneChip SFPs nonSFPs Cereon marker accuracy % Sequence Sensitivity Polymorphic % Non-polymorphic False Discovery rate: 3% Test for independence of all factors: Chisq = , df = 1, p-value = 1.845e-40 SAM threshold 18% FDR GeneChip SFPs nonSFPs Cereon marker accuracy % Sequence Sensitivity Polymorphic % Non-polymorphic False Discovery rate: 13% Test for independence of all factors: Chisq = , df = 1, p-value = 1.309e-59 3/4 Cvi markers were also confirmed in PHYB 90%80%70% 41%53%85% 90%80%70% 67%85%100% Cereon may be a sequencing Error TIGR match is a match

Effect of SNP position 340 Candidate Polymorphisms False negative True Positive

Complex Genomes? Signal to Noise with Large Genomes RNA, less complex, but differential expression

Barley SFPs

RNA 2 genotypes, 18 replicates

False Discovery Rate RNA RNA hybridization 17 Golden Promise 19 Morex, 6 tissues SAM Analysis for the Two-Class Unpaired Case Assuming Unequal Variances s0 = (The 5 % quantile of the s values.) Number of permutations: 500 MEAN number of falsely called genes is computed. Deltap0CalledFALSEFDR

Barley SFPs Genomic DNA 3 genotypes 3 replicates

False Discovery Rate DNA Genomic DNA hybridizaiton 3 replicates 3 genotypes SAM Analysis for the Multi-Class Case with 3 Classes s0 = (The 25 % quantile of the s values.) Number of permutations: 100 MEAN number of falsely called genes is computed. Deltap0CalledFALSEFDR

Sequence Verification of SFPs RNAGeneChip mxSFPnonSFPgpSFP Sequence MX Non- polymorphic GP Chisq = , df = 4, p-value = 0

Position of SNP

Barley SFPs per probeset

Uses of SFPs Recombination Events Mapping Mendelian mutations Mapping QTL Deletions Haplotyping

Chip genotyping of a Recombinant Inbred Line 29kb interval Discovery 6 replicates X $500 12,000 SFPs = $0.25 Typing 1 replicate X $500 12,000 SFPs = $0.041

Map bibb 100 bibb mutant plants 100 wt mutant plants

bibb mapping ChipMap AS1 Bulk segregant Mapping using Chip hybridization bibb maps to Chromosome2 near ASYMETRIC LEAVES1

BIBB = ASYMETRIC LEAVES1 Sequenced AS1 coding region from bib-1 …found g -> a change that would introduce a stop codon in the MYB domain bibbas1-101 MYB bib-1 W49* as-101 Q107* as1 bibb AS1 (ASYMMETRIC LEAVES1) = MYB closely related to PHANTASTICA located at 64cM

Array Mapping Hazen et al Plant Physiology (2005) chr1 chr2 chr3 chr4 chr5

eXtreme Array Mapping 15 tallest RILs pooled vs 15 shortest RILs pooled

LOD eXtreme Array Mapping Allele frequencies determined by SFP genotyping. Thresholds set by simulations cM LOD Composite Interval Mapping RED2 QTL Chromosome 2 RED2 QTL 12cM Red light QTL RED2 from 100 Kas/ Col RILs (Wolyn et al Genetics 2004)

eXtreme Array Mapping BurC F2

XAM Lz x Col F2 QTL Lz x Ler F2 (Werner et al Genetics 2005)

X RED2 QTL mark1 mark2 Select recombinants by PCR >200 from >1250 plants High Low ~2Mb ~8cM >400 SFPs Col Kas Col het Col ~2 Kas hetCol het ~43 Kas Col Kashet Kas ~268 ~43~539 ~43 ~268~43 ~2 het ~539 Kas eXtreme Array Fine Mapping

Potential Deletions >500 potential deletions 45 confirmed by Ler sequence 23 (of 114) transposons Disease Resistance (R) gene clusters Single R gene deletions Genes involved in Secondary metabolism Unknown genes

Potential Deletions Suggest Candidate Genes FLOWERING1 QTL Chr1 (bp) Flowering Time QTL caused by a natural deletion in FLM MAF1 FLM natural deletion (Werner et al PNAS 2005)

Fast Neutron deletions FKF1 80kb deletion CHR1cry2 10kb deletion CHR1 Het

Array Haplotyping What about Diversity/selection across the genome? A genome wide estimate of population genetics parameters, θ w, π, Tajima’D, ρ LD decay, Haplotype block size Deep population structure? Col, Lz, Bur, Ler, Bay, Shah, Cvi, Kas, C24, Est, Kin, Mt, Nd, Sorbo, Van, Ws2 Fl-1, Ita-0, Mr-0, St-0, Sah-0

Array Haplotyping Inbred lines Low effective recombination due to partial selfing Extensive LD blocks ColLerCviKasBayShahLzNd Chromosome1 ~500kb

Distribution of T-stats null (permutation) actual Not ColColNANA duplications 32,427 Calls 208,729 12,250 SFPs

Sequence confirmation of SFPs SFPSNPTotalFPRFDRSensitivity bay %25.0%54.1% bur %29.8%57.9% cvi %21.7%58.7% ler %22.0%62.7% lz %18.9%75.0% mr %17.9%63.2% mt %26.1%70.8% sorbo %29.7%49.1% ws %13.8%53.2%

SFPs for reverse genetics 14 Accessions 30,950 SFPs`

Chromosome Wide Diversity

Diversity 50kb windows

Tajima’s D like 50kb windows RPS4 unknown

R genes vs bHLH

Consider SFPs during expression Remove SFPs Allele specific expression

differences may be due to expression or hybridization

PAG1 down regulated in Cvi PLALE GREEN1 knock out has long hypocotyl in red light

References Hazen, S.P., Borevitz, J.O., Harmon, F.G., Pruneda-Paz, J.L., Schultz, T.F., Yanovsky, M.J., Liljegren, S.J., Ecker, J.R., Kay, S.A. Rapid array mapping of circadian clock and developmental mutations in Arabidopsis (Plant Physiology in Press)Rapid array mapping of circadian clock and developmental mutations in Arabidopsis Rostoks N, Borevitz JO, Hedley PE, Russell J, Mudie S, Morris J, Cardle L, Marshall DF, Waugh R Single Feature Polymorphism discovery in the barley transcriptome (Genome Biology In Press)Single Feature Polymorphism discovery in the barley transcriptome Werner JD, Borevitz JB, Uhlenhaut H, Ecker JR, Chory J, Weigel D FRIGIDA-independent variation in flowering time of natural A. thaliana accessions (Genetics In Press)FRIGIDA-independent variation in flowering time of natural A. thaliana accessions Werner JD, Borevitz JO, Warthmann N, Trainer GT, Ecker JR, Chory J, Weigel D. Quantitative trait locus mapping and DNA array hybridization identify an FLM deletion as a cause for natural flowering-time variation. Proc Natl Acad Sci U S A Feb 15;102(7): Supplemental data and analysis scripts Quantitative trait locus mapping and DNA array hybridization identify an FLM deletion as a cause for natural flowering-time variationSupplemental data and analysis scripts Wolyn DJ, Borevitz JO, Loudet O, Schwartz C, Maloof J, Ecker J, Berry CC, Chory J. Light Response QTL Identified with Composite Interval and eXtreme Array Mapping in Arabidopsis thaliana Genetics 2004 Jun;167(2): Light Response QTL Identified with Composite Interval and eXtreme Array Mapping in Arabidopsis thaliana Supplemental data and analysis scripts Borevitz J, Liang D, Plouffe D, Chang H, Zhu T, Weigel D, Berry C, Winzeler E, Chory J Large Scale Identification of Single Feature Polymorphisms in Complex Genomes. Genome Research Mar; 13(3): Large Scale Identification of Single Feature Polymorphisms in Complex Genomes Supplemental data and analysis scripts

Review Single Feature Polymorphisms (SFPs) can be used to Identify recombination breakpoints eXtreme Array Mapping Potential deletions (candidate genes) Haplotyping Diversity/Selection Association Mapping PostDoc Positions

NaturalVariation.org Salk Jon Werner Joanne Chory Joseph Ecker Max Planck Detlef Weigel UC San Diego Charles Berry Scripps Sam Hazen Elizabeth Winzeler Salk Jon Werner Joanne Chory Joseph Ecker Max Planck Detlef Weigel UC San Diego Charles Berry Scripps Sam Hazen Elizabeth Winzeler University of Chicago Xu Zhang Evadne Smith UC Davis Julin Maloof University of Guelph, Canada Dave Wolyn Sainsbury Laboratory Jonathan Jones University of Chicago Xu Zhang Evadne Smith UC Davis Julin Maloof University of Guelph, Canada Dave Wolyn Sainsbury Laboratory Jonathan Jones