Markers, mapping, and expression using arrays Justin Borevitz Salk Institute naturalvariation.org
Talk Outline Intro Natural Variation/ QTL Transcription profiling to identify QTL candidate genes Identification of Single Feature Polymorphisms Future Prospects
PHYB
differences may be due to expression or hybridization
Deletions
Spatial Correction Spatial Artifacts Improved reproducibility
False Discovery and Sensitivity PM only SAM threshold 5% FDR GeneChip SFPs nonSFPs Cereon marker accuracy % Sequence Sensitivity Polymorphic % Non-polymorphic False Discovery rate: 3% Test for independence of all factors: Chisq = , df = 1, p- value = 1.845e -40 SAM threshold 18% FDR GeneChip SFPs nonSFPs Cereon marker accuracy % Sequence Sensitivity Polymorphic % Non-polymorphic False Discovery rate: 13% Test for independence of all factors: Chisq = , df = 1, p-value = 1.309e-59 90%80%70% 41%53%85% 90%80%70% 67%85%100% Cereon may be a sequencing Error TIGR match is a match
Chip genotyping of a Recombinant Inbred Line 29kb interval
bibb mutant phenotypes Colbib-1det days bib-3 three independent recessive alleles medial sepals remain attached “cabbage-like” rosette leaves flowers open prematurely fruit appear more slender bib-3 Ler bib-3
bibb mapping ChipMap AS1 Bulk segregant Mapping using Chip hybridization bibb maps to Chromosome2 near ASYMETRIC LEAVES1
BIBB=AS1 Sequenced AS1 coding region from bib-1 …found g -> a change that would introduce a stop codon in the MYB domain bibbas1-101 MYB bib-1 W49* as-101 Q107* as1 bibb AS1 (ASYMMETRIC LEAVES1) = MYB closely related to PHANTASTICA located at 64cM
Future Bulk segregant Mapping Extend to Quantitative Traits Map multiple genes (mutation modifiers) Fine mapping after PCR identification of recombinants Multiple models (epistasis) Changes in allele F2, F3 frequency (selection) 20% variance 20 simulations
Potential Deletions 111 potential deletions 45 confirmed by Ler sequence 23 (of 114) transposons Disease Resistance (R) gene clusters Single R gene deletions Genes involved in Secondary metabolism Unknown genes
Gene expression revisited Now that we know what features are polymorphic we can determine which gene expression differences are real and which are due to polymorphism. New RNA analysis algorithm— account for spatial correction, polymorphisms, and feature differences
Genes Expression for a candidate gene Features (probes) for At1g22360 Spatially corrected log PM intensity Gene Expression index that accounts for feature differences and polymorphisms
Gene expression revisited SAM True False Difference FDR threshold Positive Positive
Gene expression revisited FLC controls flowering time Difference detected at 3 day old seedling stage
Gene expression revisited PAG1 down regulated in Cvi pag1 Knock Out is pale
pag1 KO is light insensitive pag1 KO has long hypocotyl in red light
pag1 KO is early flowering
25 bp 16 bp 25 bp 8 bp 1st Set 2nd Set 3rd Set Complete Genome Tiling Chip Polymorphisms (re-sequencing) Global methylation (Methylome) Comparative Genomics (Brassica) New Gene Discovery Improve Annotation Alternative Splicing Micro RNAs 9 Whole-Genome Expression Chips 2 Splicing Chips 2 5’ Mapping Chips Validate features Extra Chips: ChIP – Chip (DNA binding sites)
ChipViewer: Mapping of transcriptional units of ORFeome From 2000v At1g09750 (MIPS) to the latest AGI At1g v Annotation (MIPS) The latest AGI Annotation
Syngenta Hur-Song Chang Tong Zhu NaturalVariation.org Salk Jon Werner Todd Mockler Sarah Liljegren Joanne Chory Detlef Weigel Joseph Ecker UC Davis Julin Maloof UC San Diego Charles Berry Scripps Elizabeth Winzeler Salk Jon Werner Todd Mockler Sarah Liljegren Joanne Chory Detlef Weigel Joseph Ecker UC Davis Julin Maloof UC San Diego Charles Berry Scripps Elizabeth Winzeler
Future Projects Design 2 nd generation expression array $ ,000 25mer features, expect 12,000 SFPs (2accessions) Validated gene models (exon intron) 2 features per exon, alternative splicing 5’ 3’ Untranslated region for gene family/ polymorphism Micro RNAs Validated “good” hybridization intensities
Haplotype Map – 20 Accessions 3 replicates, SFP discovery estimated 85,000 SFPs 1.4kb resolution Association Studies – 120 Accessions 1 replicate, Genotyping Bulk Segregant Mapping – Confirm Associations in specific crosses Future Projects DNA
Future work with Natural Variation VanC advanced intercross RIL population Backcross collections
True natural variation in gene expression 20 Accessions 3 replicates, (polymorphism accounted for) Cis regulatory variation/ Imprinting reciprocal F1s 3 replicates Transcriptome QTL Map – 100 best VanC Advanced Intercross Lines How many loci control the variation in gene transcription? Candidate TF and binding sites? Future Projects RNA