Download presentation
Presentation is loading. Please wait.
Published byWillis Black Modified over 9 years ago
1
1 Many to 1 Gene Associations The following slides show a few examples of gene predictions by one annotation group that overlap one or more genes from another group. Some of the examples that follow also illustrate issues related to - differences in annotation type (e.g., pseudogene versus gene),and -in confusing nomenclature (e.g., different genes assigned the same official gene name).
2
2 2:110788585..110968584 One gene or two? Orientation issue for OTT15152?
3
3 8:4238129..4254528One gene or two?
4
4 3:105659594..105759593 One gene or two?
5
5 11:69491920..69516919One gene or two or three?
6
6 5:106920574..107155573 One gene or two or three?
7
7 6:145313224..145563223 One gene or two? The VEGA gene model seems to unite two separate gene models in NCBI see mRNA
8
8 One gene or two? 9:15109186..15189185
9
9 7:127057560..127247315 One gene or two?
10
10 7:52670474..52680473One gene or two? EST CX236436 Has 257 aa upstream CDS Another joining variant (rat mRNA U25653, mouse EST CF172660), displaying upstream CDS (not actually annotated like that) None of the evidence (mouse or rat) shows distinct upstream gene at the moment.
11
11 4:146600055..146731054One gene or two? It’s a heavily duplicated region: 10666 is more or less duplicate of 10670
12
12 2:37243166..37343165 n:m ENSMUSG00000050714 and ENSMUSG00000066798 overlap OTTMUSG00000012648 and OTTMUSG00000012652 Zbt26 and Zbtb6 share 5’ UTR exon but have non-overlapping CDSs
13
13 2:155895575..155939706 n:m ENSMUSG00000074643 overlaps ENSMUSG00000038171 OTTMUSG00000016087, OTTMUSG00000016088, and OTTMUSG00000019746 moved to Cpne1 (19746) already part of Cpne1 (19746) coding regions don’t overlap
14
14 6:113326848..113366847 n:m OTTMUSG00000017554 overlaps OTTMUSG00000016376 and EG68089 and EG101100. overlaps limited to UTR/non-coding
15
15 8:47538900..47638899 Are EG667337 and EG14081 different genes? can’t see any evidence for that structure
16
16 7:126992968..127042967 Are EG233805 and EG1000043396 different genes?
17
17 6:122655579..122665578 Are EG71950 and EG100038891 different genes? can’t see any evidence for that micro-intron
18
18 16:84828048..84836547 Are EG11957 and EG100039950 different genes?
19
19 7:87385985..87410984 Are EG61000042379 and EG269954 different genes?
20
20 9:43622106..43900000 Are EG61000042548 and EG21838 different genes?
21
21 13:22073239..22080498 Are OTT00466 and OTT13227 different genes? A new mRNA BC097347 has appeared which extends the 5’ end to include the ATG of 00466 (similar to human POM121). So now they’re variants of same locus 00466 even though they don’t share a splice.
22
22 4:122937497..122988738 Are OTT08975 and OTT08978 different genes? Yes. They share a splice, so 08975 is now a variant of 08978.
23
23 3:94933437..94938148 Are OTT22306 and OTT19657 different genes? Yes. I’ve put them all under 22306 (Scnm1). But there’s more to this picture: in BL/6 this gene is possibly a pseudogene because of a strain- specific premature stopcodon about 30bp from the end of the penultimate exon supported by mRNA AK013948.
24
24 3:107728458..107736457 Are OTT25890 and OTT07101 the same gene? Yes. Made 25890 part of 07101.
25
25 1:172123537..172148536 Are OTT21542 and OTT21543 different genes? Yes. But made 21542 an artefact. 7bp of the mRNA is repeated on genomic sequence around the ggc and cag “splices”.
26
26 1:173164903..173177002 Are OTT21571 and OTT21573 different genes? No. Already fixed
27
27 2:90744183..90753182 Are OTT14319 and OTT14315 different genes? I’ve made OTT14315 part of 14319. Normally when the transcripts don’t share a splice, they’re kept separate. 14315 is based on EST AV283747. I’ve found it’s companion BY716767. Aligning it against the BAC, it matches the first exon of transcript 33789 and very vaguely the second exon as well. Oddly the homology is very weak, while AV283747 is 100% match ????
28
28 4:42236997..42261996 Are ENS78738 and ENS78736 different genes? Are the genes predicted new members of the chemokine (C-C motif) ligand family? In Ensembl multiple gene predictions are assigned to the same gene symbol/MGI id.
29
29 15:79611961..79691960 One gene or two or three? Are Nptxr and Cbx6 Overlapping? artefact (has two non-splices) Case to be made for all three options! Currently annotated merged transcripts as part of Nptxr as the proportion of that CDS is bigger. Option to make it three genes is attractive.
30
30 2:120535197..120698446 One gene or two? Are Cdan1 and Ttbk2 Overlapping? cDNA AK220258, retained intron (in Cdan1 portion) and apart from that the CDSs do not join up anyway. Both loci got their own CpG and pA features.
31
31 X:9598695..9848694 Srpx and Rpgr Overlapping? One gene or two? cDNA BC036959 and AK046821; last exon is in frame with 2nd coding exon of Srpx, but continues beyond exon to end in pA features.
32
32 2:181092767..181132366 Zgpat and Lime1 Overlapping? One gene or two? mRNA AK173276; retained intron EST BQ552943; CDSs in-frame not shown: cDNA BC034599; contains all exons of both genes but because joining splice is beyond Zgpat 3’ UTR (in Lime1 5’ UTR), it is NMD. both loci have pA features
33
33 5:31435474..31485473 One gene or two? Mpv17 and Gtf3c2 overlapping? CDSs are in-frame but additional variation in a downstream exon would cause NMD; based on EST AA111369. CDSs are in-frame; based on cDNA AK138760. pA features and CpG island. Mpv17 very conserved in human, rat, cow, frog, zebrafish (same length +/- 1 aa; >70% id). But no own CpG.
34
34 16:96582252..96792251One gene or two? Are Pcp4 and Igsf5 two different genes? Next slide cDNA AK164699 100% also in rat, human
35
35 In Ensembl currently it looks as though Pcp4 and Igsf5 are considered synonyms for the same gene?
36
36 6:87895874..87954921One gene or two? NCBI gene is a pseudogene, Ensembl gene is a protein coding gene. Pseudogene Protein coding gene
37
37 13:75781991..75782990 Pseudogene Protein coding gene
38
38 14:3046445..3080444 Pseudogene Protein coding gene
39
39 Retrotransposed vs pseudogene 6:128882645..128993644 Pseudogene Retrotransposed
40
40 Gene Family Challenges killer cell lectin-like receptor (Klra) family UDP glucuronosyltransferase 1 family Gene families present many challenges to determining equivalency among gene predictions and for nomenclature. Examples from two gene families are shown in the following slides…. cysteine-rich perinuclear theca C-type lectin domain family 2
41
41 6:129837719..130337718 killer cell lectin-like receptor (Klra) family Next slide
42
42 6:130198815..130298814 killer cell lectin-like receptor (Klra) family Gene identity crisis! Pseudogene Protein coding gene Next slide transcript pseudogene stopcodon stopcodon supported by 100% cDNA
43
43 6:130275414..130375413 1.Overlapping NCBI annotation 2.Overlapping features of different types 1. 2. Pseudogene Protein coding gene currently a pseudogene in otter ?!
44
44 1:89943192..90125441 Next slide UDP glucuronosyltransferase 1 family Ensembl maintains a single gene id for all of the members of the family.
45
45 9:24428665..24431164 cysteine-rich perinuclear theca Gene identity crisis!
46
46 C-type lectin domain family 2 6:128882645..128993644 Ensembl and VEGA predict only a single gene with multiple transcripts rather than two genes Clec2g and Clec2f.
47
47 Vega hasn’t annotated Clec2f, period. In actual fact that gene doesn’t exist as such. The “Clec2f” locus is a partial duplication of the Clec2g locus (last four exons). Though the duplicate exons have diverged from the parent, they still are open. However, there is no trace of the first exon and no locus-specific transcriptional evidence. We would annotate this as an unprocessed pseudogene. The three-exon gene between Clec2g and “Clec2f” actually overlaps another Clec2 pseudogene (in this case a duplication of the last three exons). And just a 200 bp further there’s another Clec2 pseudogene consisting of a duplication of the penultimate exon broken into two fragments plus part of the last exon. This pseudogene overlaps the big termal exon. Cleg2g “Cleg2f” pseudo
48
48 Clec2g“Clec2f”
49
49 Unique to MGI MGI does not have a high-throughput computational genome annotation pipeline. However, we integrated the results of high throughput cDNA sequencing projects into the database prior to the availability of the mouse genome. Many of these genes have remained unique to MGI. The following slides illustrate several cases where MGI has a gene that has not been predicted by one of the three major annotation groups. Many (most) of these MGI-unique genes are from the RIKEN cDNA sequencing initiative. Many of them likely represent non-protein coding genes.
50
50 11:79796866..79857365
51
51 move “c” splice site no corresponding splice site 205 splice site missing base “c” would destroy either splice When aligned against forward strand (same as Suz12), no splice sites; when aligned in reverse against reverse strand some (questionable) splice sites. Conclusion: garbage!
52
52 9:106742778..106752777 Unique to MGI
53
53 11:69491920..69516919
54
54
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.