Presentation is loading. Please wait.

Presentation is loading. Please wait.

Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding.

Similar presentations


Presentation on theme: "Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding."— Presentation transcript:

1 Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding Huang

2 Contents Introduction Methods to prediction Results and Discussion How About Next Work?

3 Introduction Motivation: Large-scale genome projects generate a rapidly increasing number of sequences, most of them biochemically uncharacterized Using experimental methods is tedious,labour intensive and inaccurate

4 Introduction Key Idea Correlation of sequence similarity with function similarity A basis for transferring functional knowledge from a characterized protein to a homologous, but uncharacterized one Functionally Linked and Proteins interaction So many programs to do this...

5 Introduction Protein Function linkage Proteins that participate in a common structural complex or metabolic pathway During evolution,all such functionally linked proteins tend to preserved or eliminated in a new species.

6 Introduction Protein-protein interaction(Gene fusion) Some interacting proteins such as the Gyr A and Gyr B submits of E Coli DNA gyrase are fused into another organism,in this case in the toposimerase of yeast. Thus the sequence of Gyr A (804 amino acid residues) and Gyr B (875) to different seqments of the topoisomerase (1429)might be used to predict that Gyr A and Gyr B intact in E.coli

7 Methods to predict function of protein Traditional Homology search Phylogenetic Profiles Rosetta Stone Method Gene Neighbor Method Gene Fusion Method Machine Learning Structure Prediction

8 Methods to predict function of protein Homology Method The function of a query protein can be deduced from comparison of the amino-acid sequence of the query protein with those of ho mologous proteins of known function However, it is worth noticing the limitations in predicting function by homology search. Based on the initial assumption, it cannot assign "novel" function(s) to the query protein, or "any" function if you cannot find any sequence homology with known function from the database. In addition, the sequence identity does not always match with the functional resemblance

9 Methods to predict function of protein Phylogenetic profiles (Marcotte) Based the hypothesis that functionally linked proteins evolve in a correlated fashion,and therefore,they have homologs in the same subset of organisms. A phygenetic profle describes the pattern of presence or absence of a particalar protein across a set of sequenced organisms.If two proteins have the same phygenetic profile in all surveyed genomes,it is inferred that these two proteins have a function linked. Pairs of functionally linked proteins have no amino acid sequence similarity with each other and can’t be linked by conventional sequence-alignment techniques

10 Methods to predict function of protein

11 Table Phylogenetic profiles link protein with similar keywords

12 Methods to predict function of protein Table 2. Phylogenetic profiles link proteins in EcoCyc classes

13 Methods to predict function of protein

14 Gene Fusion method (Enright) T

15 Methods to predict function of protein Domain—Fusion Analysis supported by the observation that a single protein chain in one organism shows homology with separate interacting proteins in another organism in such a way that the interacting proteins are fused into a single peptide chain. The detection of gene fusions in one genome (defin ed as ‘ composite ’ proteins) allows the prediction of functional associations between homologous genes that remain separate in another genome (defined as ‘ component ’ protein).

16 Methods to predict function of protein Flowchat of the Diffused Algorithm Symmetrification & Sequence clustering algorithm Fusion detect algorithm Smith-Waterman Matrix T Matrix Y Query genome BLAST vs Reference genome Query genome BLAST vs Query genome

17 Methods to predict function of protein Results of detection

18 Methods to predict function of protein Materials and methods Genome Sequence Complete genome sequences for the 24 species were obtained from their original sources Genome comparison  1, 24 genome were filtered using CAST compositional bias filtering algorithm  2,Compared against themselves and each other 23 genomes using the Blastp with a cut-off E-value 1e-10.  3,Diffused algorithm was then applied to each genome in turn as a query against the other 23(reference)genomes  4, Using other protein database as reference yields fewer composite cases

19 Methods to predict function of protein Result Yielded 132,812 component and 66,406 composite proteins in an all-against-all genome genome comparison representing multiple occurrences of the same proteins across species these,there are are 7,224 component and 2,365 composite unique proteins across the 24 genomes On average,9% of genes in a given genome appear to code for single-domain,component proteins predicted to be functionally associated.These proteins are detected by an additional 4% of genes that code for fused,composite proteins

20 Methods to predict function of protein Discussion This approach for the prediction of functional associations or proteins results in robust prediction for physical interaction,pathway involvement, complex formation and other types of functional associations of proteins molecules. The landscape of gene fusions appears to be a complex one,affected by paralogy,genome size and phylogenetic distance

21 Methods to predict function of protein Gene neighbor Method If two genes(blue and yellow in the figure) are found to be neighbors in several genomes,a functional linkage may be inferred between the proteins they encoded.

22 Methods to predict function of protein Discussion This method is most robust for microbial genomes but may work to some extent even for human genes where operon- like clusters are observed This method can be powerful in uncovering functional linkages in prokaryotes,where operons are common,but also shows promise for analysis interacting proteins in eukaryotes.

23 Methods to predict function of protein Finding Functional Features of Proteins using Machine Learning Techniques Hypothesis:A protein function arises from physical structures of the proteins.since the structures of proteins are built with physico-chemical interactions among amino-acids,there might exist some features of amino-acid sequences according to the physico- chemical interactions.These features are called ‘functional features’

24 Methods to predict function of protein Overview of the method

25 Methods to predict function of protein The procedure of Machine Learning Analogical reasoning To make a assumptions about functional features Inductive reasoning To generalize the hypothesis made by analogical reasoning To decide which localization pattern is most useful to classify protein functions Deductive reasoning To refine the localization pattern into classification rules Knowledge about protein functions and structures are used to make logical description of classification rules

26 Methods to predict function of protein Result and Discussion These features can discriminate different functions of proteins that have similar amino-acid sequence Furthermore,the features can recognize same function proteins that not similar sequences. More need to do : Refine classification rules and integrate three machine learning techniques.

27 Methods to predict function of protein How to predict protein function more precisely? By three-dimension structure: Because a protein’s function is determined more directly by its structure and dynamics than by its sequence

28 Methods to predict function of protein Two disadvantages of this method First,three-dimensional structure are available for only a fraction of proteins But this limitation should be reduced by structural genomics within a few years. Second,functional details that can be extracted from structure but not from sequence often depend on the environment,as well as on its dynamics and energetics,all of which are difficult to obtain by existing experimental and theoretical techniques

29 Results and Discussion It is conceivable that prediction of protein functions will be more precise when the above methods are combined Prediction methods need to be evaluated rigorously and made accessible over internet. Varied experimental data and theoretical predictions must be integrated because no single experimental or computational approach is likely to result in accurate and complete models of protein assemblies and pathways.

30 Results and Discussion System limitations Several errors, but not currently addressed in GeneQuiz False Positives A transfer is made on the basis of a wrongly inferred homology Inaccurate Transfer The wrong information is transferred although the homology is correct Misleading database information The database source is itself misleading


Download ppt "Functional Associations of Protein in Entire Genomes Sequences 2002. 1. 21. Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding."

Similar presentations


Ads by Google