Download presentation
Presentation is loading. Please wait.
Published byLorin Alexina Stanley Modified over 9 years ago
1
1 Computational functional genomics Lital Haham Sivan Pearl
2
2 Introduction Piles of information but only flakes of knowledge. The existing information: Collections of genomic sequences. Expression profiles Protein-protein interactions And many more…
3
3 Introduction Computational biology strives to extract the maximal possible information from known sequences, by classifying them according to their homologous relationships, predicting their biochemical activity, cellular function, 3-dimensional structures and evolutionary origin.
4
4 The COG -Clusters of Orthologous Groups of proteins Identification of orthologs is critical for reliable prediction of gene function in newly sequenced genomes. Reflects one-to-one, one-to-many and many-to- many relationships. The purpose of COG is to serve as a platform for functional annotation of newly sequenced genomes and for study of genome evolution.
5
5 The COG -statistics In 2003, there are 3307 COGs including 74059 proteins from 43 genomes. Genomes from- Bacteria, Archaea and Eukaryota. The database includes 17 functional groups.
6
6 The COG - make on your own COG construction procedure is based on the notion that any group of at least 3 proteins from distant genomes that are more similar to each other than to any other protein from the same genomes, are most likely to belong to an orthologous family.
7
7 The COG - make on your own All-against-all protein sequence comparison Detect and collapse paralogs Detect triangles of mutually genome specific best hits Merge triangles with a common side, to form COG
8
8 The COG - make on your own
9
9 The COG - adding new genomes The COGNITOR program adds new proteins to pre-existing COGs on the basis of multiple Best Hits. 60-80% of the proteins of prokaryotes could be included.
10
10 The COG - more applications: Detecting missed genes. Convenient for variety of evolutionary- oriented analyses of protein families.
11
11 Methods Experimental method: Biochemical and genetic experiments Computational methods: Homology method (BLAST), mRNA expression Phylogenetic profile Fusion method (Rosetta stone analysis) Gene neighbour method
12
12 Homology method Homology method: searches proteins whose AA sequences are similar. 40-70% of new genome can be assigned to some function. Involve identification of some molecular function.
13
13 mRNA expression Analysis of correlated mRNA expression levels enables to establish functional linkages, by detecting changes in mRNA expression in different cell types, or different environments.
14
14 Phylogenetic profile Describes the pattern of presence or absence of a particular protein, across a set of organisms. Number of possible profiles: This number far exceeds the protein families.
15
15 Phylogenetic profile Why would two proteins always both be inherited into new species or neither inherited, unless the two function together? If two proteins have the same phylogenetic profile, it is inferred that they have a functional link: engaged in a common pathway or complex.
16
16 Phylogenetic profile 1 11
17
17 Phylogenetic profile- example Analysis of three proteins: RL7, FlgL and His5, according to their phylogenetic profiles. RL7: more than half have function associated with the ribosome. FlgL: more than half include various flagellar proteins and cell-wall maintenance proteins. His5: more than half involved in amino acid metabolism.
18
18 Phylogenetic profile - example RL7 ribosome L7 RL15 ribosome L15 RL17 ribosome L17 PTH peptidyl-tRNA hydrolase RNC ribonuclease III PgsA phospholipid synthesis YGGH hypothetical YBEX hypothetical RL34 ribosome L34 RL36 ribosome L36 RL27 ribosome L27 RL25 ribosome L25 YQCB hypothetical YABO hypothetical YCEC hypothetical RFH peptide release factor ClpB geat shock protein YJFH hypothethocal RS14 ribosome S14 G3P3 dehydrogenase RL4 ribosome L4 NONE hypothtical GrpE co-chaperone GidB glucose inhib. Division RL24 ribosome L24 DEF polypeptide deformylase RL20 ribosome L20 MesJ cell cycle protein RL19 ribosome L19 RL21 ribosome L21 RL9 ribosome L9 SmpB small protein B
19
19 Phylogenetic profile Keyword No. proteins No. neighbors in keyword group No. neighbors in random group Ribosome6019727 Transcription361710 tRNA synthase and ligase26115 Membrane proteins25895 Flagellar21893 Iron, ferric, and ferritin19312 Galactose metabolism18312 Molybdoterin and Molybdenum, and molybdoterin 1261 Hypothetical10841082268440 Phylogenetic profiles link protein with similar keywords
20
20 Fusion method or the Rosetta stone analysis Some pairs of interacting proteins have homologs in another organism, fused into a single protein chain. When two separate proteins in one organism, A and B, are expressed as a fused protein in some other species, there is a high probability that A and B are linked in function.
21
21 Fusion method
22
22 The Rosetta Stone model
23
23 Fusion method – what is it good for? Predicts protein pairs that have related biological functions. Predicts potential protein-protein interactions. Can turn up complexes of proteins, or protein pathways.
24
24 Fusion method – what is it good for?
25
25 Fusion method The group searched the 4290 protein sequences of the E.coli genome. The proteins could form at most (4290)(4289)/2 pair interactions. But we expect much less… There were found 6809 candidate for pair interactions.
26
26 Fusion method – validation Looking for a similar function in existing annotations that would imply at least functional interaction. Of the E.coli pairs that were found in the Rosetta Stone analysis, 68% share at least one keyword in their annotations, whereas from E.coli proteins that were selected randomly, only 15% share a keyword.
27
27 Fusion method – validation From a database containing protein pairs that have been found to interact (experimentally) – 6.4% are linked by Rosetta Stone sequences. The phylogenetic profile method was applied to the interactions predicted by the fusion method. It found more than 8 times as many interactions suggested by the phylogenetic profile method, as for randomly chosen sets of interactions.
28
28 Fusion method – missing pairs False negatives: There was no fusion of the interacting proteins. The fused protein disappeared during the course of evolution.
29
29 Fusion method – False alarms False positives: False prediction of physical interactions when the proteins are fused, but are co-regulated and don’t interact. Cannot distinguish between homologs that bind and those that do not.
30
30 Fusion method – False alarms The false positive rate in E.coli due to the inability to distinguish homologs is about 82%. To reduce these errors: the “promiscuous” domains were found and removed during the analysis. By filtering of only 5% of all domains, we can remove the majority of falsely predicted interactions.
31
31 Fusion method – False alarms
32
32 Neighbour method Functional links between genes can be identified by examining whether the proximity of the genes is conserved across multiple genomes. Powerful in uncovering functional linkages in prokaryotes where operons are common.
33
33 Neighbour method
34
34 Neighbour method - definitions ‘close’: proximate genes are on the same strand within 300 bp, and transcribed in the same direction. Direct link: two proximate genes that are also proximate in at least two other genomes of different phylogenetic groups. Inferred link: two genes that are not close but with orthologs that are close in at least three other genomes of different phylogenetic groups.
35
35 Neighbour method - defenitions
36
36 Neighbour method Proximity between genes is maintained mostly because it facilitates their co-transfer to another organism. Example: restriction-modification systems.
37
37 Neighbour method - validation Identification of links that are annotated in KEGG or COG – and calculate the fraction of those in the same functional pathway / category. The functional correspondence is correlated to the minimal number of phylogenetic groups, in which the proximity is detected.
38
38 Neighbour method - validation N tradeoff
39
39 Neighbour method - example
40
40 Happy end??? The group analyzed the 6,217 proteins of the yeast Saccharomyces combining several methods. one can expect each protein to be functionally linked to perhaps 5 – 50 other proteins, giving 30,000 – 300,000 biologically meaningful links.
41
41 Happy end???
42
42 Networks When methods of detecting functional linkages are applied to all the proteins of an organism, network of interacting, functionally linked proteins can be traced. As methods improve for detecting protein linkages, it seems likely that most of the proteins will be included in the network.
43
43 Networks
44
44 פורים שמח
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.