Mapping Sites of Transcription Across the Drosophila Genome Using High Resolution Tiling Microarrays LBNL, Berkeley CA August 20, 2007 A. WillinghamAffymetrix,

Slides:



Advertisements
Similar presentations
RNA-seq library prep introduction
Advertisements

1 Q1-Q3 results. 2 RF lengths 3 Filtered RF length distribution.
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Application of available statistical tools Development of specific, more appropriate statistical tools for use with microarrays Functional annotation of.
We processed six samples in triplicate using 11 different array platforms at one or two laboratories. we obtained measures of array signal variability.
Module 12 Human DNA Fingerprinting and Population Genetics p 2 + 2pq + q 2 = 1.
Breakdown of 244 total (Yale+Vega) Pseudogenes Amongst Various ENCODE Regions 211 Yale, 178 Vega, Union is 244 More pseudogenes in the manually picked.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Peter Tsai Bioinformatics Institute, University of Auckland
Gene regulation in cancer 11/14/07. Overview The hallmark of cancer is uncontrolled cell proliferation. Oncogenes code for proteins that help to regulate.
1 Institute for Systems Biology Enabling new genomics technologies in the ISB Microarray Facility B. Marzolf 1, P. Troisch 1 Multiple platforms support.
Comparison of array detected transcription map with GENCODE/HAVANA annotations in ENCODE regions.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Chris Chander, Luke Adea BioSci D145 Feb. 12, 2015
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Applied Biosystems 7900HT Fast Real-Time PCR System I. Real-time RT-PCR analysis of siRNA-induced knockdown in mammalian cells (Amit Berson, Mor Hanan.
and analysis of gene transcription
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
Fine Structure and Analysis of Eukaryotic Genes
Chapter 5 Nucleic Acid Hybridization Assays A. Preparation of nucleic acid probes: 1. Labeling DNA & RNA - Nick Translation - Random primed DNA labeling.
Page 1 Mouse Genome CGH Microarray 44A. Page 2 Mouse Genome CGH Microarray Kit 44A Designed for CGH, Validated with samples of known aberrations Designed.
This Week: Mon—Omics Wed—Alternate sequencing Technologies and Viromics paper Next Week No class Mon or Wed Fri– Presentations by Colleen D and Vaughn.
Analyzing your clone 1) FISH 2) “Restriction mapping” 3) Southern analysis : DNA 4) Northern analysis: RNA tells size tells which tissues or conditions.
-The methods section of the course covers chapters 21 and 22, not chapters 20 and 21 -Paper discussion on Tuesday - assignment due at the start of class.
Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.
How do you identify and clone a gene of interest? Shotgun approach? Is there a better way?
Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.
Gene expression and DNA microarrays Old methods. New methods based on genome sequence. –DNA Microarrays Reading assignment - handout –Chapter ,
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Expression of the Genome The transcriptome. Decoding the Genetic Information  Information encoded in nucleotide sequences contained in discrete units.
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2.
Scenario 6 Distinguishing different types of leukemia to target treatment.
ARK-Genomics: Centre for Comparative and Functional Genomics in Farm Animals Richard Talbot Roslin Institute and R(D)SVS University of Edinburgh Microarrays.
The generalized transcription of the genome Víctor Gámez Visairas Genomics Course 2014/15.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Comparison of Microarray Data Generated from Degraded RNA using Five Different Target Synthesis Methods and Commercial Microarrays Scott Tighe and Tim.
MCB 317 Genetics and Genomics Topic 11 Genomics. Readings Genomics: Hartwell Chapter 10 of full textbook; chapter 6 of the abbreviated textbook.
Proposed redefinition of “gene” requires it to have a biological role Gerstein MB, …, Snyder M Genome Res 17: example of complexities observed.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Human Genome.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Chapter 10: Genetic Engineering- A Revolution in Molecular Biology.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
August 20, 2007 BDGP modENCODE Data Production. BDGP Data Production Project Goals 21,000 RACE experiments 6,000 cDNA’s from directed screening and full.
No reference available
Supplemental Figure 1. Bias-corrected NGS bioinformatics strategies. Paired-end DNA sequencing reveals the sequence of the genomic clone, the sample ID.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
Engineering magnetosomes to express novel proteins Which ones? Tweaking p18 Linker Deleting or replacing GFP Something else? TRZN Oxalate decarboxylases.
Ch. 11: DNA Replication, Transcription, & Translation Mrs. Geist Biology, Fall Swansboro High School.
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Presented by: Matthew Tippin, Bianca Sanchez Mora
The Transcriptional Landscape of the Mammalian Genome
RNA-Seq for the Next Generation RNA-Seq Intro Slides
Human Genome Project.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
ENCODE Pseudogenes and Transcription
Volume 5, Issue 3, Pages (November 2013)
Mark M Metzstein, H.Robert Horvitz  Molecular Cell 
CHAPTER 12 DNA Technology and the Human Genome
Protein Occupancy Landscape of a Bacterial Genome
Joseph Rodriguez, Jerome S. Menet, Michael Rosbash  Molecular Cell 
Volume 41, Issue 2, Pages (January 2011)
Derek de Rie and Imad Abuessaisa Presented by: Cassandra Derrick
Presentation transcript:

Mapping Sites of Transcription Across the Drosophila Genome Using High Resolution Tiling Microarrays LBNL, Berkeley CA August 20, 2007 A. WillinghamAffymetrix, Inc

I. Affymetrix’s Contribution to Specific Aims and Milestones II. Previous Studies Manak et al analysis of developmental transcriptome III. Initial Results for Aim I sample preparation & data processing first look at cell line data on 35bp arrays pilot analysis of brand-new 7bp arrays IV. RACE-array example of ENCODE extension analysis of genes on Chr21 & 22 V. Summary and Steps for Moving Forward Presentation Outline

Specific Aim samples on 35-bp genome tiling arrays 24 samples on 7-bp genome tiling array sets 160 RACE-fragment pools (16,000 prod’s) Specific Aim 2 RNAi of 120 RNA binding proteins on arrays Specific Aim 3 Northern blotting of ncRNA models

RNA Samples and Genome Tiling Arrays

Milestones

stepwise nature of individual aims & responsibilities involvement & interdependencies of each step propose shifting milestones to more of a “ramp-up” model Timeline for Milestones

Previous Studies Manak et al. Nature Genetics, v38 Sep 2006

Transcription Analysis of Early (0-24hr) of Drosophila Embryogenesis 70% Annotated 30% Unannotated Manak et al. Nature Genetics, v38 Sep 2006

Differential expression in Drosophila embryogenesis (~40kb region of Chromosome 3R) 5’ TSS 0-2 hr 2-4 hr 4-6 hr 6-8 hr 8-10 hr hr hr hr hr hr hr hr 19Kb Maternally Expressed Genes (Restarted in two patterns)

Unannotated transcription updates known gene annotations Manak et al. Nature Genetics, v38 Sep 2006 Drosophila: 5`-sites predicted by txn co-reg. ~1500 genes avg 1 st intron size = ~20kb avg 1 st annotated intron = ~1.7kb

Initial Results of Aim I

Affymetrix sample preparation & data generation pipeline sample treatment & QC DNase-treat BioAnalyzer 1 st -strand cDNA synth. random primed Superscript-II 2 nd -strand cDNA synth. DNA Pol-I save aliquot for downstream QC label & hybridize to arrays TdT-based end labeling CEL file generation signal graph generation median-scaling q-norm bioreps select bandwidth transfrag generation select min-run select max-gap data distribution tomeweb hosting FTP to servers? deliver to DCC, GEO, etc this example highlights method for generation of RNA maps but is similar for other applications: RNA maps of long and short RNAs RACE-array maps RNAi knockdown experiments chromatin-immunoprecipitation quality control overlap w/ RACE Northern blots QPCR of cDNA

Current Sample Prep (5 cell line samples completed in triplicate) (for 3 other cell lines, several samples failed) Hosted at

RNA QC by Agilent BioAnalyzer

Chr2L: Transcription Expression Maps Across ~50 Kb ML- DmD4- c1 ML- DmBG 3-c2 Kc167 CME- W1- Cl8

Chr2L: Transcription Expression Maps Across ~25 Kb ML- DmD4- c1 ML- DmBG 3-c2 Kc167 CME- W1- Cl8

transcription in 4 Drosophila cell lines: overlapping transcription

transcription in 4 Drosophila cell lines: overlapping annotation

RNA Samples and Genome Tiling Arrays 7 nt resolution arrays new 7bp design 5 arrays, total of ~14.4 million probes by comparison, 35bp array has ~3.1 million probes 5bp design required 7 arrays… 40% more chips required 1512 arrays instead of 1080 replicates & strand not calculated in original budget updated genome version (release 5) used for design repeats can be masked or unmasked virtual probes existing 35-bp design 1 array, total of ~3.1 million probes Affy commercial group will produce an “updated” 2.0 design 39bp resolution, release5-based design however, we will continue using the current design 35bp resolution more optimal for RNA maps 7bp arrays have better coverage & newer design question of $ cost per array? comparison of nucleotide coverage (dm3, release5) 35bp array = 111,117,940 nt 7bp array masked = 107,355,171 nt 7bp array unmasked = 118,523,115 nt

Cherbas total RNA samples from 2 cell lines (KC & clone8) Same labeled reactions hyb’d to 35bp and 7bp arrays Signal graphs generated in TAS: 2 technical replicates for each sample were q- norm together Bandwidth = 30 (7bp) or 50 (35bp), Norm target = 200 Transfrags generated in TAS using 5% bacterial negative controls 7bp arrays: min-run 50, max-gap 10 35bp arrays: min-run 50, max-gap 90 Intersections of 7bp vs 35bp and overlap with FlyBase annotations performed in Galaxy Hosted at: 7bp-pilot/ 7bp-pilot/ Share with modENCODE DCC & ArrayExpress to determine whole- chromosome vs whole-chip data hosting New 7-bp 5-chip array compared to 35-bp 1-chip array

Improved exon discrimination by transfrags from 7bp arrays

Pseudo-ROC curves comparing base-pair coverage & overlap with annotated exons five different thresholds for calculated probe false-positive rate were used 1%...3%...5%...7%...10% (7% and 10% not shown for 35bp array) 7bp arrays clearly have a significantly lower false-positive rate for forming transfrags from bacterial negative regions ~4-5 fold lower than 35bp arrays attributable to higher probe density and different min-run & max-gap rules

35bp and 7bp arrays have similar amount of bp coverage in transfrags BUT 7bp arrays have 50-65% more transfrags 7bp transfrags are more “fragmented” and do a better job of delineating exons with small introns 7-bp array has better “resolution” of small exons Intersection with annotations shows both 35bp and 7bp arrays are detecting similar amounts of transcription as measured by bp coverage Summary: 7bp arrays

Improved exon discrimination by transfrags from 7bp arrays

modENCODE RACE array methodology 5` RACE for 16,000 Drosophila genes choice of tissues? hybridize products (in pools of 100) to 35bp arrays 1Mb separation between genes confirm presence of transfrags identify new, “rare” transfrags due to amplification of PCR human ENCODE project has done a similar study on the genes present on chromosomes 21 & 22

RACE Analysis of Coding Genes DeGeorge Critical Region 14 gene Kapranov, et al. Genome Res. (2005)

Conclusions array types & applications pilot analysis of 7bp arrays updated for dm3-release5 genome annotation: bpmaps & IGB sample processing pipeline & data generation multiple applications require different types of graphs & transfrags bandwidth0 versus smoothing (e.g. bandwidth50) RACE array lessons learned by ENCODE QC and validation some of the specific aims (Northerns, RACE) will address these additional analysis such as RT-PCR and QPCR validation of novel transcripts data hosted at affy-transcriptome website: sharing pilot data with DCC (Nicole Washington) to facilitate the process Steps Moving Forward adjusting milestones? changes in samples? (usage of 7bp versus 35bp) shifting focus in favor of more analysis of small RNAs? data hosting and transfer issues?

Acknowledgements Computation S. Ghosh H. Tammana N. Garg S. Dike J. Cheng Molecular Biology I. Bell J. Drenkow E. Dumais J. Dumais R. Duttagupta P. Kapranov A. Willingham J. Manak AFFX Transcriptome Group Tom Gingeras

supplemental slides

Kapranov et al. Science, v316 Jun 2007

same intronic expression seen by all arrays

value-of-probe-density

value-of-smoothing

value-of-unmasked

masking-issue-in-exons

unmasked regions are frequently higher