Download presentation
Presentation is loading. Please wait.
Published byФеренц Стојиљковић Modified over 6 years ago
1
closing in on the set of human genes. The ENCODE project.
roderic guigó i serra Bioinformàtica UPF Curs 2004/2005 2/23/2019 Bioinformàtica UPF Març 2005
2
2/23/2019 Bioinformàtica UPF Març 2005
3
gene number estimates (ii) the genome era
from Harrisson et al. (2002) 2/23/2019 Bioinformàtica UPF Març 2005
4
gene classes under represented in the current gene sets
intronless genes fast evolving genes genes with atypical coding content low or rare transcripts transcripts of unknown function (TUFs) genes undergoing non-canonical splicing selenoproteins … 2/23/2019 Bioinformàtica UPF Març 2005
5
fast evolving genes 4-helical cytokine family: conservation of exonic structure in absence of sequence conservation 2/23/2019 Bioinformàtica UPF Març 2005
6
non canonical splicing
Two major exceptions to the almost univeral U2 GT-AG rule U2 GC-AG introns U12 AT-AC introns But a pletora of other minor exceptions 2/23/2019 Bioinformàtica UPF Març 2005
7
SelU: a novel selenoprotein family, Castellano et al
SelU: a novel selenoprotein family, Castellano et al., EMBO reports 2004 Selenoproteins are proteins that incorporate the aminoacid selenocysteine Sec, the 21st amino acid. Sec is encoded by UGA. Recodification of the UGA mediated by the SECIS element 2/23/2019 Bioinformàtica UPF Març 2005
8
alternative splicing 2/23/2019 Bioinformàtica UPF Març 2005
9
transcription not associated with known genes
2/23/2019 Bioinformàtica UPF Març 2005
10
2/23/2019 Bioinformàtica UPF Març 2005
11
2/23/2019 Bioinformàtica UPF Març 2005
12
ENCODE pilot phase 1% of the genome. 44 regions
target selection. commitee to select sequence targets manual targets – a lot of information radom targets – stratified by non exonic conservation with mouse gene density 2/23/2019 Bioinformàtica UPF Març 2005
13
2/23/2019 Bioinformàtica UPF Març 2005
14
gene prediction in ENCODE a collaboration between HAVANA and ENCODE
gene prediction in ENCODE a collaboration between HAVANA and ENCODE. GOAL:identify all protein coding genes in the ENCODE regions Roderic Guigó, IMIM Stylianos Antonarakis, Geneve Alexandre Reymond Ewan Birney, EBI Michael Brent, WashU Lior Pachter, Berkeley Manolis Dermitzkakis, Sanger Jennifer Ashurst, Tim Hubbard 2/23/2019 Bioinformàtica UPF Març 2005
15
experimental validation of genes annotated in VEGA 13 first regions:
138 49 6 Experimental validation of the single exon annotated 5'RACEs to obtain full length mRNA(s) RT-PCRs to check the 99 junctions in process 40 in process in process 59 done => 9 positive Bidirectionnal RACEs to obtain full length mRNAs 2/23/2019 Bioinformàtica UPF Març 2005 in process
16
13 first regions annotated in VEGA
1 to 34 transcripts per locus (34 :RP11-353C18.2, RNPC2, ENr333) 6.86 1.67 Whole genome (known):1.68 1 to 44 exons per transcript (44: RP11-167N , NUP188, Enr232) 2.51 7.59 Whole genome (known):9.65 2/23/2019 Bioinformàtica UPF Març 2005
17
experimental validation of genes annotated in VEGA
99 RT-PCRs performed to check introns from 49 novel transcripts/ putative: => results for 59 RT-PCRs: 9 positive --> 40 other RT-PCRs in process 2/23/2019 Bioinformàtica UPF Març 2005
18
gene predictions outside of VEGA
Gene predictions from 6 computational gene prediction programs and 3 EST-based methods: computational EST-based 2/23/2019 Bioinformàtica UPF Març 2005
19
Gene predictions outside of VEGA annotations
In 13 ENCODE regions, 1255 unique predicted introns (by one or more of the 9 methods) are not annotated in VEGA: - 380 (30%) extend VEGA objects (1) - 530 (42%) are in introns of VEGA objects (2) - 11 (1%) link exons from distinct VEGA objects (3) - 334 (27%) are completely outside of VEGA annotations (4) VEGA: Predictions: (1) (2) (3) (4) 2/23/2019 Bioinformàtica UPF Març 2005
20
Gene predictions outside of VEGA annotations
RT-PCR on intron junctions (exon pairs) 1255 predicted intron junctions tested 44 successfully amplified (but 20 provided intron lengths different from those expected) only 15 out of the 44 are in new loci, and only 5 are not overlapping pseudogenes overall only about 3.5% tested positive, and only as little as 0.5% may correspond to novel genes. 2/23/2019 Bioinformàtica UPF Març 2005
21
chimeras 2/23/2019 Bioinformàtica UPF Març 2005
22
KUA and UEV, Thomson et al., Genome Research 2000
2/23/2019 Bioinformàtica UPF Març 2005
23
EST based prediction of chimeras
human mouse total non-overlapping 14,959 15,106 adjacent in the same orientation 7,679 7,865 linked by ESTs maintaining the ORF 56 37 including no new intervening exons 42 26 rtp-pcr positive 11 2/23/2019 Bioinformàtica UPF Març 2005
24
systematic search for functional chimeras in ENCODE
321 non-overlapping transcripts. 165 adjacent pairs in the same orientation. force GENEID to predict single complete transcripts expanding the two genes. 2/23/2019 Bioinformàtica UPF Març 2005
25
126 predictions obtained 98 tested 4 positives 2/23/2019
Bioinformàtica UPF Març 2005
26
one example
27
Junction validated by RT-PCR
one novel transcript appears to produce a chimeric form with a known gene Novel transcript (RP4-614O4.5) Junction validated by RT-PCR Known gene (ITGB4BP) 5' 3' ENr333 2/23/2019 Bioinformàtica UPF Març 2005
28
chimeric genes results in the ENCODE regions indicates that chimerism could affect at least 5% of tandem human genes. chimerism could be a means to create additional gene diversity. challenges the concept of gene (a more dynamic view of the genome) we need to validate their functional meaning: proteomics data comparative genomics analysis after learning from the ENCODE regions, extrapolate to the whole genome. 2/23/2019 Bioinformàtica UPF Març 2005
29
http://genome.imim.es/gencode IMIM (Barcelona) Berkeley Roderic Guigo
France Denoeud Julien Lagarde Eduardo Eyras Jan-Jaap Wesselink Robert Castelo Genis Parra Noura Dabouseh University of Geneva, Stylianos Antonarakis Alexandre Reymond Catherine Ucla EBI Ewan Birney Damian Keefe Washington University Michael Brent Michael Stevens Berkeley Lior Pachter Bernd Sturmfels Nicolas Bray Marta Casanellas Sourav Chatterji Colin Dewey Mathias Drton Nicholas Eriksson Sagi Snir The Wellcome Trust Sanger Institute Population and comparative genomics Manolis Dermitzakis Informatics (HAVANA annotation group) Jennifer Ashurst Tim Hubbard Adam Frankish David Swarbreck James Gilbert
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.