On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.

Slides:



Advertisements
Similar presentations
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Advertisements

1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Finding Eukaryotic Open reading frames.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Gene Expression Overview
Transcription in eucaryotes The basic chemistry of RNA synthesis in eukaryotes is the same as in prokaryotes. Genes coding for proteins are coded for by.
CSE182-L12 Gene Finding.
DNA and RNA. I. DNA Structure Double Helix In the early 1950s, American James Watson and Britain Francis Crick determined that DNA is in the shape of.
Bioinformatics Alternative splicing Multiple isoforms Exonic Splicing Enhancers (ESE) and Silencers (ESS) SpliceNest Lecture 13.
The Influence of Alternative Splicing in Protein Structure The fact that gene number is not significantly different between mammals and some invertebrates.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Lecture 12 Splicing and gene prediction in eukaryotes
UCSC Known Genes Version 3 Take 10. Overall Pipeline Get alignments etc. from database Remove antibody fragments Clean alignments, project to genome Cluster.
Anum kamal(BB ) Umm-e-Habiba(BB ). Gene splicing “Gene splicing is the removal of introns from the primary trascript of a discontinuous gene.
Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.
Eukaryotic cells modify RNA after transcription
Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Genome Annotation BBSI July 14, 2005 Rita Shiang.
1. Bacterial genomes - genes tightly packed, no introns... HOW TO FIND GENES WITHIN A DNA SEQUENCE? Scan for ORFs (open reading frames) - check all 6 reading.
Molecular Biology Primer. Starting 19 th century… Cellular biology: Cell as a fundamental building block 1850s+: ``DNA’’ was discovered by Friedrich Miescher.
RNA and Protein Synthesis
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Sequence & course material repository Annotation (sequences & evidence) Manuals (DNA, Subway, Apollo, JalView) Presentations.
Molecular Biology in a Nutshell (via UCSC Genome Browser) Personalized Medicine: Understanding Your Own Genome Fall 2014.
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
LECTURE CONNECTIONS 14 | RNA Molecules and RNA Processing © 2009 W. H. Freeman and Company.
Sackler Medical School
Mark D. Adams Dept. of Genetics 9/10/04
What is central dogma? From DNA to Protein
Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.
Eukaryotic Gene Structure. 2 Terminology Genome – entire genetic material of an individual Transcriptome – set of transcribed sequences Proteome – set.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Introduction to Bioinformatics II Lecture 5 By Ms. Shumaila Azam.
How can we find genes? Search for them Look them up.
Research about Alternative Splicing recently 楊佳熒.
12/16/14 StarterConnection/Exit: What is the true meaning of the word mutation? Are mutations bad / harmful? 12/16/14 Protein Synthesis Writing
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Chapter 3 The Interrupted Gene.
Lesson Four Structure of a Gene. Gene Structure What is a gene? Gene: a unit of DNA on a chromosome that codes for a protein(s) –Exons –Introns –Promoter.
-1- Module 3: RNA-Seq Module 3 BAMView Introduction Recently, the use of new sequencing technologies (pyrosequencing, Illumina-Solexa) have produced large.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Finding genes in the genome
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Ligate tags SAGE: Procedure Digest with “Tagging enzyme” BsmFI tm Isolate mRNA, RT to cDNA Digest with “Anchoring.
The Central Dogma of Molecular Biology DNA  RNA  Protein  Trait.
RNA processing and Translation. Eukaryotic cells modify RNA after transcription (RNA processing) During RNA processing, both ends of the primary transcript.
Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.
Using DNA Subway in the Classroom Genome Annotation: Red Line.
Eukaryotic Gene Regulation
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.
GROUP 2 DNA TO PROTEIN. 9.1 RICIN AND YOUR RIBOSOMES.
bacteria and eukaryotes
Using DNA Subway in the Classroom
The Transcriptional Landscape of the Mammalian Genome
Eukaryotic Gene Structure
Distribution of Introns among Full Length cDNA
School of Pharmacy, University of Nizwa
Are Complex Behaviors Specified by Dedicated Regulatory Genes
Developmental Genetics
Introduction to Bioinformatics II
Genes Code for Proteins
Pharmacogenomic variability and anaesthesia
Introduction to Alternative Splicing and my research report
Gene Structure.
Gene Structure.
Presentation transcript:

On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10: , 2004

Genomics Bioinformatics Large-scale Biology

The Real Revolution Early 20 th century: Mendel and the inheritance laws Mid 20 th century: DNA as the genetic element (Avery) Mid 20 th century: Watson and Crick and the structure of DNA. 70’s and 80’s: Molecular biology/biotechnology 90’s and 21th century: Genomics and Bioinformatics Paradigm in Biology: Evolution by means of natural selection (Darwin and Wallace, mid 19 th century)

Bioinformatics Development of tools Development of tools Gateway to explore new datasets Gateway to explore new datasets Processing of data derived from large- scale projects Processing of data derived from large- scale projects A new way to do hypothesis-driven science A new way to do hypothesis-driven science

Splicing (1977) Roberts and Sharp (Nobel 1993)

ExonsIntrons mRNA Coding Non-coding

Splicing Splicing depends on recognition of exon-intron boundaries Splice sites are generic and consist solely of: 5’ boundary 3’ boundary Acceptor site Polypyrimidine tract

.....if they occur at the boundaries of the regions to be spliced out, can change the splicing pattern, resulting in the deletion or addition of whole sequences of amino acids. Walter Gilbert. Why genes in pieces. Nature 271:501, 1978.

At least half of all human genes undergo alternative splicing Biological significance or spurious events?

Alternative splicing 1. Chromosomal ratio activates txn of Sxl in females only 2. SXL controls splicing of tra-2 mRNA 3. Females: exon 2 (which has a stop codon) is removed via SXL Males: exon 2 is not removed. 4. Males: no active TRA Females: TRA is made. 5. TRA directs splicing of dsx mRNA in specific manner; in males default splicing occurs.

Alternative Splicing – Auditory Hair Cells Cytosol PM AVSGRK AVSGRKAMFARYVPEIAALILNRKKYGGTFNSTRGRK Ca 2+ concentration at which K + channel opens depends on alternative splicing of K + channel – 576 possible alternative splicing combinations K + channel Dotted lines show regions of the protein dependent on splicing Picture of human cochleal hair cells from Sound frequency Cytosolic Ca 2+ concentration K + channel opens Therefore Ca 2+ concentration ‘decodes’ frequency

Types of alternative splicing: Exon skipping Intron Retention 5´3´ Alternative 5’ splic. site Alternative 3’ splic. site mRNA

Large-scale analysis of intron retention in the human transcriptome Pedro F.A. Galante, Noboru Jo Sakabe, Natanja Slager, Sandro J. de Souza

Examples of intron retention events with biological significance Msl2 in Drosophila Msl2 in Drosophila P element in Drosophila P element in Drosophila retroviruses retroviruses

Transmembrane domain In immature B cells an intron containing an early translational stop signal is removed yielding a long transcript. The additional sequence encodes an transmembrane region. Hydrophilic stretch This intron is not removed in activated B cells, giving rise to a truncated (secreted) product Ig gene Immature B Cell Stop codons Hydrophilic tail Transmembrane domain Activation Immature B cells express membrane-bound Ig. Activation leads to production of secreted form

Intron retention and cancer CD44several tumors Gastrin receptorpancreas Ret tyrosine kinasepheochromocytomas Fas receptorT-cell lymphoma

Transcriptome Database EST data Known mRNAs SAGE data Genome Data

Genome-based cDNA clustering Exon 1 DNA RNAm cluster Exon 2Exon 3

Transcript Mapping P53

Types of Data

Retention RetentionPrototype Full length ESTTotal EST2594n.d2594 Total Dataset

Experimental validation

14% of all human genes show evidence of intron retention Kan, States & Gish (2002) 36% of RefSeq database! After sample statistics: 5%

Distribution of events along transcripts. elite group events in observedexpected CDS 287 (53%) 502 (93%) 5’ UTR 84 (15%) 27 (5%) 3’ UTR 170 (32%) 12 (2%) MGCObservedexpected 87 (52%) 155 (93%) 15 (9%) 8 (5%) 65 (39%) 4 (2%) This bias can be a product of: Underreporting of sequences Nonsense-mediated decay (NMD) p << 0.005

2563 out of 3195 (80%) sequences with a retained intron had an exon/exon boundary downstream of the retention event.

Retained introns are shorter P<<<<0.001

Domains encoded by retained introns

Number of domains entirely encoded by: Retained introns only: 02 Exon-intron-exon:31 Number of domains partially encoded by: Retained introns only: 25 Exon-intron-exon: 10

Retained introns have a higher GC content P<<<<0.001

Did retained introns encode protein domains? Only retained introns in the CDS were used. Only retained introns in the CDS were used. Only retained introns defined by full- length mRNAs were used. Only retained introns defined by full- length mRNAs were used. Protein sequences were searched against PFAM database. Protein sequences were searched against PFAM database.

Codon Usage

Conservation of intron retention in mouse cDNA sequences 40%-57% of all retained introns present a mouse hit Identity of orthologous retained introns is 84% Non-retained introns is 60%; Exons 87% Mouse cDNA also corresponds to an retention variant 26% - 10 out of 46

Frequency of stop codon Expected: cases where the retention generates a putative truncated protein TACTTGTGCGTAGTCCCCGCGATCTAACGCCACGATGGATGACACTGTGA exon retained intron Stop codons – TAG, TGA, TAA Found 651 stop codons mRNA cds stop cds p-value << TACTTGTGCGTAGTCCCCGCGATCTAACGCCACGATGGATGACAC

GC content for sequences upstream and downstream the premature stop codon – 88 cases GC 58% stop exon retained intron GC 49% Are under selective pressure for coding potential 5’3’

Why the argument of ‘selection’ is important? As noted originally by Gilbert (1978), mutations that affect splicing can allow the production of new proteins without the loss of the original one If, however, the new variant has some biological significance, selection will act to maintain the function of this variant. Therefore, there should not be any “negative selection” on this variant.

TissueT/NIRBreastT1.52* N0.62 ProstateT1.45* N0.44 BrainT2.52* N3.16 ColonT0.85 N0.60 Intron Retention in Tumors

w/ downstream spliced intron w/ hit w/ mouse cDNAs* encoding protein domains* experimentally validated (both forms) 2563/ % 74/ % 47/ % 2/2 * full-length vs full-length set and retained intron entirely in the CDS Towards a reliable set of intron retention events

Second International Conference on Bioinformatics and Computational Biology /10/2004 Angra dos Reis

Group of Computational Biology Sandro J. de Souzatennis player Helena SamaiaResearch Assistant Ana C. PereiraAdmin. Assistant Maarten LeerkesPh.D student Noboru SakabePh.D student Maria VibranovskiPh.D student Elza HelenaPh.D student Natanja SlaterPh.D student Pedro Galante Ph.D student Elisson C. Osorioprogrammer Jorge E. de SouzaPh.D student Rodrigo Soaresprogrammer Andre Zaiatssystem admin.