Presentation is loading. Please wait.

Presentation is loading. Please wait.

On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004.

Similar presentations


Presentation on theme: "On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004."— Presentation transcript:

1 On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004

2 Genomics Bioinformatics Large-scale Biology

3 The Real Revolution Early 20 th century: Mendel and the inheritance laws Mid 20 th century: DNA as the genetic element (Avery) Mid 20 th century: Watson and Crick and the structure of DNA. 70’s and 80’s: Molecular biology/biotechnology 90’s and 21th century: Genomics and Bioinformatics Paradigm in Biology: Evolution by means of natural selection (Darwin and Wallace, mid 19 th century)

4 Bioinformatics Development of tools Development of tools Gateway to explore new datasets Gateway to explore new datasets Processing of data derived from large- scale projects Processing of data derived from large- scale projects A new way to do hypothesis-driven science A new way to do hypothesis-driven science

5

6 Splicing (1977) Roberts and Sharp (Nobel 1993)

7 ExonsIntrons mRNA Coding Non-coding

8

9

10 Splicing Splicing depends on recognition of exon-intron boundaries Splice sites are generic and consist solely of: 5’ boundary 3’ boundary Acceptor site Polypyrimidine tract

11 .....if they occur at the boundaries of the regions to be spliced out, can change the splicing pattern, resulting in the deletion or addition of whole sequences of amino acids. Walter Gilbert. Why genes in pieces. Nature 271:501, 1978.

12 At least half of all human genes undergo alternative splicing Biological significance or spurious events?

13 Alternative splicing 1. Chromosomal ratio activates txn of Sxl in females only 2. SXL controls splicing of tra-2 mRNA 3. Females: exon 2 (which has a stop codon) is removed via SXL Males: exon 2 is not removed. 4. Males: no active TRA Females: TRA is made. 5. TRA directs splicing of dsx mRNA in specific manner; in males default splicing occurs.

14 Alternative Splicing – Auditory Hair Cells Cytosol PM AVSGRK AVSGRKAMFARYVPEIAALILNRKKYGGTFNSTRGRK Ca 2+ concentration at which K + channel opens depends on alternative splicing of K + channel – 576 possible alternative splicing combinations K + channel Dotted lines show regions of the protein dependent on splicing Picture of human cochleal hair cells from http://www.sickkids.on.ca/otolaryngology/Hearloss.asp Sound frequency Cytosolic Ca 2+ concentration K + channel opens Therefore Ca 2+ concentration ‘decodes’ frequency

15 Types of alternative splicing: Exon skipping Intron Retention 5´3´ Alternative 5’ splic. site Alternative 3’ splic. site mRNA

16 Large-scale analysis of intron retention in the human transcriptome Pedro F.A. Galante, Noboru Jo Sakabe, Natanja Slager, Sandro J. de Souza

17 Examples of intron retention events with biological significance Msl2 in Drosophila Msl2 in Drosophila P element in Drosophila P element in Drosophila retroviruses retroviruses

18 Transmembrane domain In immature B cells an intron containing an early translational stop signal is removed yielding a long transcript. The additional sequence encodes an transmembrane region. Hydrophilic stretch This intron is not removed in activated B cells, giving rise to a truncated (secreted) product Ig gene Immature B Cell Stop codons Hydrophilic tail Transmembrane domain Activation Immature B cells express membrane-bound Ig. Activation leads to production of secreted form

19 Intron retention and cancer CD44several tumors Gastrin receptorpancreas Ret tyrosine kinasepheochromocytomas Fas receptorT-cell lymphoma

20 Transcriptome Database EST data Known mRNAs SAGE data Genome Data

21 Genome-based cDNA clustering Exon 1 DNA RNAm cluster Exon 2Exon 3

22 Transcript Mapping P53

23 Types of Data

24 Retention RetentionPrototype Full length ESTTotal 6406911120 EST2594n.d2594 Total27936913127 Dataset

25 Experimental validation

26 14% of all human genes show evidence of intron retention Kan, States & Gish (2002) 36% of RefSeq database! After sample statistics: 5%

27 Distribution of events along transcripts. elite group events in observedexpected CDS 287 (53%) 502 (93%) 5’ UTR 84 (15%) 27 (5%) 3’ UTR 170 (32%) 12 (2%) MGCObservedexpected 87 (52%) 155 (93%) 15 (9%) 8 (5%) 65 (39%) 4 (2%) This bias can be a product of: Underreporting of sequences Nonsense-mediated decay (NMD) p << 0.005

28 2563 out of 3195 (80%) sequences with a retained intron had an exon/exon boundary downstream of the retention event.

29 Retained introns are shorter P<<<<0.001

30 Domains encoded by retained introns

31 Number of domains entirely encoded by: Retained introns only: 02 Exon-intron-exon:31 Number of domains partially encoded by: Retained introns only: 25 Exon-intron-exon: 10

32 Retained introns have a higher GC content P<<<<0.001

33 Did retained introns encode protein domains? Only retained introns in the CDS were used. Only retained introns in the CDS were used. Only retained introns defined by full- length mRNAs were used. Only retained introns defined by full- length mRNAs were used. Protein sequences were searched against PFAM database. Protein sequences were searched against PFAM database.

34 Codon Usage

35 Conservation of intron retention in mouse cDNA sequences 40%-57% of all retained introns present a mouse hit Identity of orthologous retained introns is 84% Non-retained introns is 60%; Exons 87% Mouse cDNA also corresponds to an retention variant 26% - 10 out of 46

36 Frequency of stop codon Expected: 1064 88 cases where the retention generates a putative truncated protein TACTTGTGCGTAGTCCCCGCGATCTAACGCCACGATGGATGACACTGTGA exon retained intron Stop codons – TAG, TGA, TAA Found 651 stop codons mRNA cds stop cds p-value << 0.005 TACTTGTGCGTAGTCCCCGCGATCTAACGCCACGATGGATGACAC

37 GC content for sequences upstream and downstream the premature stop codon – 88 cases GC 58% stop exon retained intron GC 49% Are under selective pressure for coding potential 5’3’

38 Why the argument of ‘selection’ is important? As noted originally by Gilbert (1978), mutations that affect splicing can allow the production of new proteins without the loss of the original one If, however, the new variant has some biological significance, selection will act to maintain the function of this variant. Therefore, there should not be any “negative selection” on this variant.

39 TissueT/NIRBreastT1.52* N0.62 ProstateT1.45* N0.44 BrainT2.52* N3.16 ColonT0.85 N0.60 Intron Retention in Tumors

40 w/ downstream spliced intron w/ hit w/ mouse cDNAs* encoding protein domains* experimentally validated (both forms) 2563/3195 80 % 74/152 49 % 47/151 31 % 2/2 * full-length vs full-length set and retained intron entirely in the CDS Towards a reliable set of intron retention events

41 Second International Conference on Bioinformatics and Computational Biology www.icobicobi.com.br 25-28/10/2004 Angra dos Reis

42 Group of Computational Biology Sandro J. de Souzatennis player Helena SamaiaResearch Assistant Ana C. PereiraAdmin. Assistant Maarten LeerkesPh.D student Noboru SakabePh.D student Maria VibranovskiPh.D student Elza HelenaPh.D student Natanja SlaterPh.D student Pedro Galante Ph.D student Elisson C. Osorioprogrammer Jorge E. de SouzaPh.D student Rodrigo Soaresprogrammer Andre Zaiatssystem admin.

43


Download ppt "On the biological significance of alternative splicing: a bioinformatics approach Sandro J. de Souza TDR, 07/05/2004 RNA 10:757-765, 2004."

Similar presentations


Ads by Google