ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2.

Slides:



Advertisements
Similar presentations
RNA-seq library prep introduction
Advertisements

Serial Analysis of Gene Expression Velculescu, V., Zhang, L., Vogelstein, B. Kinzler, K. (1995) Science.
Transcriptional regulation and promoter analysis
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
McPromoter – an ancient tool to predict transcription start sites
Transcriptomics Jim Noonan GENE 760.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
CSE182-L12 Gene Finding.
Bioinformatics Alternative splicing Multiple isoforms Exonic Splicing Enhancers (ESE) and Silencers (ESS) SpliceNest Lecture 13.
Chris Chander, Luke Adea BioSci D145 Feb. 12, 2015
MiRNA targets Using undergraduate molecular biology labs to discover targets of miRNAs in humans Adam Idica, Jordan Thompson, Irene Munk Pedersen, Pavan.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Making, screening and analyzing cDNA clones Genomic DNA clones
CHAPTER 3 GENE EXPRESSION IN EUKARYOTES (cont.) MISS NUR SHALENA SOFIAN.
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Fine Structure and Analysis of Eukaryotic Genes
Genome Informatics 2005 ~ 220 participants 1 keynote speaker: David Haussler 47 talks 121 posters.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
Igor Ulitsky.  “the branch of genetics that studies organisms in terms of their genomes (their full DNA sequences)”  Computational genomics in TAU ◦
Copyright, ©, 2002, John Wiley & Sons, Inc.,Karp/CELL & MOLECULAR BIOLOGY 3E Control of Gene Expression Prokaryotes and Operons.
Expression of the Genome The transcriptome. Decoding the Genetic Information  Information encoded in nucleotide sequences contained in discrete units.
Grupo 5. 5’site 3’site branchpoint site exon 1 intron 1 exon 2 intron 2 AG/GT CAG/NT.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Understanding genes using mathematical tools Adam Sartiel COMPUGEN.
Mapping Sites of Transcription Across the Drosophila Genome Using High Resolution Tiling Microarrays LBNL, Berkeley CA August 20, 2007 A. WillinghamAffymetrix,
MPL Identification of alternative spliced mRNA variants related to cancers by genome-wide ESTs alignment KIM DAE SOO Oncogene Apr.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
The Drosophila Gene Collection Mark Stapleton Berkeley Drosophila Genome Project Lawrence Berkeley National Lab.
Chapter 7 Analyzing DNA and gene structure, variation and expression 1.Sequencing and genotyping DNA Standard/manual DNA sequencing using dideoxynucleotide.
Proposed redefinition of “gene” requires it to have a biological role Gerstein MB, …, Snyder M Genome Res 17: example of complexities observed.
Genomics.
The Havana-Gencode annotation GENCODE CONSORTIUM.
Mark D. Adams Dept. of Genetics 9/10/04
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
From Genomes to Genes Rui Alves.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Introduction to RNAseq
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Multiple Species Gene Finding using Gibbs Sampling Sourav Chatterji Lior Pachter University of California, Berkeley.
Comparative Genomics Methods for Alternative Splicing of Eukaryotic Genes Liliana Florea Department of Computer Science Department of Biochemistry GWU.
Lecture 18 – Functional Genomics Based on chapter 8 Functional and Comparative Genomics Copyright © 2010 Pearson Education Inc.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
August 20, 2007 BDGP modENCODE Data Production. BDGP Data Production Project Goals 21,000 RACE experiments 6,000 cDNA’s from directed screening and full.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Accessing and visualizing genomics data
IB Saccharomyces cerevisiae - Jan Major model system for molecular genetics. For example, one can clone the gene encoding a protein if you.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
RNA Sequencing and transcriptome reconstruction Manfred G. Grabherr.
Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.
Using DNA Subway in the Classroom Genome Annotation: Red Line.
Canadian Bioinformatics Workshops
Canadian Bioinformatics Workshops
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
bacteria and eukaryotes
The Transcriptional Landscape of the Mammalian Genome
Basics of BLAST Basic BLAST Search - What is BLAST?
Experimental Verification Department of Genetic Medicine
TSS Annotation Workflow
Gene Sizes Vary Strachan p146 DYSTROPHIN.
RNA sequencing (RNA-Seq) and its application in ovarian cancer
Gene Sizes Vary Strachan p146 DYSTROPHIN.
Alex M. Plocik, Brenton R. Graveley  Molecular Cell 
Schematic representation of a transcriptomic evaluation approach.
Presentation transcript:

modENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2

Aim 2.2 Experimental Validation of Transcript Models 1.Experimental verification of selected splice sites in transcript models (short RT-PCR) 2.Mapping transcript ends using RACE 3.Screening cDNA libraries for transcripts 4.Recovering cDNA clones using long RT-PCR 5.High-throughput sequencing of small RNAs 6.Submitting sequence data to databases 7.Reviewing the transcriptome annotation

Experiments at LBNL Transcript Ends TSSs: 20,000 targeted 5’ RACE experiments poly-A: 1,000 targeted 3’ RACE experiments Full-Length Transcript Structures 6,000 cDNA screens and full-insert sequencing 3,000 long RT-PCRs and full-insert sequencing Small RNA Sequencing 15 runs on on 454 Life Sciences device Size fractionate < 500 nt (larger range than Eric Lai)

Mapping TSSs 5’ RLM-RACE is a simple, scalable method RLM primer replaces the 5’ CAP structure Gene specific primers are nested & near 5’ end Sequence 8 clones Direct sequencing is also proposed but is difficult We are prioritizing transcripts and tissues using our 5’ EST data

TSSs: Slippery vs Discrete head RACE products larval RACE products cDNAs

Cap-Trapped 5’ ESTs Define Discrete…

…and Slippery Transcripotion Start Sites

How Many TSSs Does bowl Have?

5’ RACE Plans Identify TSSs that are well mapped by 5’ EST data Test RLM-RACE production protocol on 96 well mapped TSSs to measure experimental success rate Prioritize 5’ RACE experiments: 1. Transcripts with < 8 RE ESTs, using mixed embryo RNA 2. Transcripts with ESTs from other embryo-derived libraries 3. Transcripts with < 8 RH/TA ESTs 4. Transcripts with larval/pupal ESTs 5. Transcript without ESTs. Use appropriate RNA samples. Develop statistical description of “slipperiness” Biological validation with microarrays & P elements

Computationally predicted conserved exons validated by cDNA screening and sequencing I. Gene modificationsII. Identification of New Genes

cDNA and Long RT-PCR Plans Identify all transcripts that are well defined by cDNA sequence - complete & spliced ORF, poly-A tail, (not necessarily a defined TSS) Identify targets for cDNA screening (DGC goals in parentheses) (Transcripts with a community cDNA but no BDGP cDNA) (Transcripts with truncated ORFs) (Alternative transcripts that encode alternative coding sequences) 1. Conserved ORFs that failed on the first SLIP attempt: choose best RNA 2. Transfrags & RACEfrags that are not captured in sequenced transcripts Identify targets for long RT-PCR - targets that fail in SLIP screening on the best RNA sample - RT-PCR is probably more sensitive than SLIP but seems limited to ~2 kb cDNA and RT-PCR design depends on Aim 1 & Aim 2.1 and should be an iterative process. Biological validation using integrated description of all data

An Unannotated Transfrag

A Relatively Rare Transript CG31036: chordotonal neurons, lateral and head sensory neurons

High Throughput Sequencing Plan Pyrosequence RNA samples on 454 Life Sciences device - consider alternative platforms, e.g. Solexa Select 15 target tissues for analysis Define a transcript size range to target - avoid redundancy with Eric Lai: < 50 bases vs bases - consider avoiding tRNAs Align transcript sequences and integrate with models Biological validation: Compare to microarray data Conservation in other species, including structure for ncRNAs Functional genomics in Aim 3

Some Questions for Discussion How many genes & transcripts in Drosophila? How many genes with multiple transcripts? CDSs? Are these expressed in different cell types? Can we segregate them in different RNA samples to avoid mixed RACE, cDNA and RT-PCR products? How do we prioritize screening What will we miss? How do we know when we’re done?

Future Directions Do different promoter motifs correlate with “slipperiness”, tissue, stage? Confidence scores associated with exons, transcripts and gene models: How do we measure confidence? How confident can we be? How much data do we need per gene?