Download presentation
Presentation is loading. Please wait.
1
Analyses of ORFans in microbial and viral genomes Journal club presentation on Mar. 14 Albert Yu
2
ORFan Defenition: an ORF with no detectable sequence similarity to other ORFs in the database considered Nearly all genomes have ORFans (df %) The more genomes sequenced, the more ORFans have found Most are annotated as hypothetical proteins of unknown function (no exp.)
3
ORFan continue More data… real, functional proteins 3D nstructure conserved in closely related species (Ka/Ks) Origin of ORFans ???????? Viral genome Microbial genome ? Viral laterally transferred genes (especially phages)
4
Viral genome Microbial genome
5
Question: the origin of ORFans Test hypothesis: ORFans have been acquired through lateral gene transfer from viruses To find homologs to these microbial ORFans within the virus sequence database
6
Genome-wide quantitative study BLASTP 277 microbial genomes 1456 viral genomes H(g): the number of genomes having at least one homolog of ORFan g U(g): uniqueness: the genomic distance between the genomes with ORFan g
7
Classification of ORFans Singleton: without any homolog wherever H=1, BLASTP=1 Paralogous: homologs in the same genome H=1, BLASTP>1 Orthologous: homologs within very closely related microbial genome H>1, U <= 0.1(by observations)
8
The U-value for all ORFs in prokaryote genomes In total: ORFs: 818906 ORFans: 110186 S: 64324(7.8%) P: 10419(1.3%) O: 35443(4.3%) 0.64 S or p O
9
ORFans-VH%(OVH): % of ORFans having homologs in viruses (0% ~ 63.8%) Non-ORFans-VH%(NOVH): % of non- ORFans having homologs in viruses (4.1% ~ 18.2%) The strength of the hypothesis = the value between these two VH%
10
Percentages of microbial ORFs with homologs in viruses Red: OVH Blue: NOVH 24 phylogenetic clades Bacteria Archea Firmicutes Gamma proteobacteria
11
The average % of OVH and NOVH in various groups 148 66 63 10% vs 9 % 8.5% vs 2.7 % 6.6% vs 0.8 %
12
Conclusion Most OVH << NOVH: current evidence supporting the hypothesis is weak Firmicutes and Gamma-proteobacteria have the highest number of homologs in viruses (viral database is biased) Viral database bias 1456 viruses 280 phages (109--Gamma; 102--Firmicutes; 69--others) Sampling ?????
13
Viral genome Microbial genome
14
277 Microbial genomes 1456 viruses All-virus-DB: 43566 ORFs 280 phages (20%) Phage-DB: 18368 ORFs (42%) ORFans: all-virus: 13078(30%) (v.s. all-virus-DB) 8200 (v.s. all nr, env-nr) all-phage: 6765 (v.s. all-virus-DB) 7047 (v.s. phage-DB)
15
Some characteristics of ORFans Bacterial ORFans are shorter than non- ORFans on average Bacterial ORFans have significant lower GC3 content than non-ORFans
16
The length of Viral ORFans and non-ORFans Length: Non-ORFans > ORFans
17
Length: ORFans < non-ORFans GC3%: ORFans < non-ORFans
18
The number of ORFs per genome in 1456 viruses Focusing on phage: higher %
19
The growing of the number of phage ORFans (consistent) Drop to 0 ? Keep increasing 38.4%
20
Each microbial species is a host for at least 10 phage species --- the phage diversity is at least 10 times higher than microbial diversity Only 280 phage genomes in database (low phage sampling)
21
Less than 5 phages Virus sampling bias between and within groups
22
The H-value percentages for all phage ORFs and prokaryotic ORFs prokaryotes phages 9.1% - ORFans 11.3% - ortho 38.4% - ORFans 32.4% - ortho
23
the H-value percentages of phage ORFs
24
4397(61.5%) / 7150(63.8%) / 11212 (prophage/ prokaryotic homologs/ phage non-ORFans) 589(44.7%) / 1317(18.7%) / 7047 (prophage/ prokaryotic homologs/ phage ORFans) 4987(58.9%)/8467(46.4%)/18248 (prophage/ prokaryotic homologs/ phage ORFs)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.