Alignment of mRNAs to genomic DNA Sequence Martin Berglund Khanh Huy Bui Md. Asaduzzaman Jean-Luc Leblond.

Slides:



Advertisements
Similar presentations
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Advertisements

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Annotating a Scarlet Runner Bean genome fragment put together by shotgun sequencing Scarlet Runner ean Max Bachour.
20,000 GENES IN HUMAN GENOME; WHAT WOULD HAPPEN IF ALL THESE GENES WERE EXPRESSED IN EVERY CELL IN YOUR BODY? WHAT WOULD HAPPEN IF THEY WERE EXPRESSED.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction methods Gene indices Mapping cDNA on genomic DNA Genome-genome.
RNA and Protein Synthesis
Gene Expression And Regulation Bioinformatics January 11, 2006 D. A. McClellan
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
BI420 – Course information Web site: Instructor: Gabor Marth Teaching.
Bioinformatics Alternative splicing Multiple isoforms Exonic Splicing Enhancers (ESE) and Silencers (ESS) SpliceNest Lecture 13.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
RNA.
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Doug Brutlag 2011 Next Generation Sequencing and Human Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University.
Fine Structure and Analysis of Eukaryotic Genes
Genome Sequencing & App. of DNA Technologies Genomics is a branch of science that focuses on the interactions of sets of genes with the environment. –
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
The Ensembl Gene set The “Genebuild” 21 April 2008.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
DNA MICROARRAYS WHAT ARE THEY? BEFORE WE ANSWER THAT FIRST TAKE 1 MIN TO WRITE DOWN WHAT YOU KNOW ABOUT GENE EXPRESSION THEN SHARE YOUR THOUGHTS IN GROUPS.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
Bioinformatics Overview, NCBI & GenBank JanPlan 2012.
Genome Sequencing & App. of DNA Technologies Genomics is a branch of science that focuses on the interactions of sets of genes with the environment. –
Expression of the Genome The transcriptome. Decoding the Genetic Information  Information encoded in nucleotide sequences contained in discrete units.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Chapter 21 Eukaryotic Genome Sequences
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Predicting protein degradation rates Karen Page. The central dogma DNA RNA protein Transcription Translation The expression of genetic information stored.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
Sackler Medical School
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
Transcription Packet #10 Chapter #8.
Control of Gene Expression Chapter Proteins interacting w/ DNA turn Prokaryotic genes on or off in response to environmental changes  Gene Regulation:
EB3233 Bioinformatics Introduction to Bioinformatics.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
DNA LIBRARIES Dr. E. What Are DNA Libraries? A DNA library is a collection of DNA fragments that have been cloned into a plasmid and the plasmid is transformed.
How can we find genes? Search for them Look them up.
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
Lecture 18 – Functional Genomics Based on chapter 8 Functional and Comparative Genomics Copyright © 2010 Pearson Education Inc.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Patent Innovation Christine Chen 9/15/2008.  In general, patents must be:  Novel (not known previously) genetic sequences  Non-obvious (not just a.
ESTs Ian Keller Laboratory Techniques in Molecular Bio.
-1- Module 3: RNA-Seq Module 3 BAMView Introduction Recently, the use of new sequencing technologies (pyrosequencing, Illumina-Solexa) have produced large.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
NCode TM miRNA Analysis Platform Identifies Differentially Expressed Novel miRNAs in Adenocarcinoma Using Clinical Human Samples Provided By BioServe.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Work Presentation Novel RNA genes in A. thaliana Gaurav Moghe Oct, 2008-Nov, 2008.
bacteria and eukaryotes
The Transcriptional Landscape of the Mammalian Genome
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Experimental Verification Department of Genetic Medicine
 The human genome contains approximately genes.  At any given moment, each of our cells has some combination of these genes turned on & others.
Genomes and Their Evolution
Pick a Gene Assignment 4 Requirements
Access to Sequence Data and Related Information
Gene Annotation with DNA Subway
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Ensembl Genome Repository.
Next Generation Sequencing and Human Genome Databases
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Introduction to Alternative Splicing and my research report
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Alignment of mRNAs to genomic DNA Sequence Martin Berglund Khanh Huy Bui Md. Asaduzzaman Jean-Luc Leblond

Objectives To examine how ESTs collected in the public EST division of Genbank/EMBL compare to the known structure of a gene. The analysis is used to identify exons and to show evidence of alternative splicing The NCBI alignment tool Spidey is used for the analysis.

Outline ESTs and gene prediction Alternative Splicing UniGene and Spidey HIP2 Analysis HIP2’s ESTs

Expressed Sequence Tags (ESTs) To determine which genes (or parts of genes) that are expressed in a particular cell type or tissue, mRNAs are isolated and reverse transcribed into cDNA. Short fragments of these cDNAs (ESTs) are then sequenced. The resulting EST sequences are compared with the nucleotide sequence of the entire genome (or the sequence of a single gene) to locate the gene (or parts of a gene) that contains each EST.

ESTs and Gene Prediction Unlike DNA, cDNA contains only expressed DNA sequences. If a region of sequence matches ESTs with high stringency then the region is probably an exon of a gene.

NCBI UniGene Database UniGene partitions GenBank sequences into a non- redundant set of gene-oriented clusters. Each UniGene cluster contains sequences that represent a unique gene, as well as related information such as the tissue types in which the gene has been expressed and map location. Features: Clusters in UniGene database are generated automatically. It’s not true that sequences in one cluster can be always assembled into one contiguous sequences. UniGene clusters are only available for model organisms.

SPIDEY Aligns one or more mRNA sequences to a single genomic sequence. Try to determine the exon/intron structure, returning one or more models of the genomic structure, including the genomic/mRNA alignments for each exon.

Alternative Splicing

ESTs and RNA Splicing ESTs cDNA Recombination of the gene IntronExon

HIP2 Huntingtin Interacting Protein 2 This protein has been implicated in the degradation of huntingtin and suppression of apoptosis. Huntingtin appears to play a critical role in nerve For human, located on chromosome: 4

Alternative Splicing of HIP2 HIP2 xx xx x xx----xx-xxxxxxxxxxxxxxxxxxxxx Different human mRNAs for this protein show alternative splicing mRNA 1 xx xx xx----xx-xxxxx mRNA 2 -x xx x xx----xx-xx-xxxxxxxxxxxxxxxxxxxx mRNA 4 xx xx x xx----xx-xx-xxxx mRNA 5 xx xx x xx----xx-xx-xxxx mRNA xxxxxxx mRNA 8 xx xx x xx----xx-xxxxxxxxxxxxxxxxxxxx mRNA 10 xx xx x xx----xx-xx-xxxxxxxxxxxxxxxxxxx

Analysis HIP2’s ESTs Download 614 ESTs of HIP2 from GenBank using Bioperl module Align all ESTs to HIP2 using Spidey locally Parse Spidey results using a Perl script

Analysis HIP2’s ESTs

Conclusion Alternative splicing is evident from the EST data Some of the ESTs show different coding regions other than the exons specified in HIP2’s annotation.

Thank you for listening