Presentation is loading. Please wait.

Presentation is loading. Please wait.

Function, evolution, motifs and hierarchy Ashwin Sivakumar.

Similar presentations


Presentation on theme: "Function, evolution, motifs and hierarchy Ashwin Sivakumar."— Presentation transcript:

1 Function, evolution, motifs and hierarchy Ashwin Sivakumar

2 Today’s outline… A fairly short presentation Quick wrap-up of what we learnt yesterday Some basic methodological and biological background for today’s practice session. After a short break, we do the “real” stuff-PRACTICALS Like earlier, I practice along with you and we work together to script a nice ‘story’ building upon what we scripted yesterday. If all goes as planned, we will finish early with a good take home message.

3 Cement, bricks,accessories like windows-doors and roofs represents functions of the house/apartment at different levels Sequence motifs/signatures/patterns represents different levels of functions at the four levels of structural hierarchy.

4 What is function? Function like structure is hierarchical. -Molecular functions (metabolic reactions, fit into structural associates) -Post translation modifications (eg: glycosylation sites) -Phenotype (physiological sub-systems and influence of environmental factors)(phenotype property/disease). -Physiological function (set of proteins) (metabolic pathway, signal transduction) Change in function can broadly be changes in biochemistry, structure, gene network, phenotype.

5 Inference of function-Hypothesis and assumptions Since large value selection intensity S at an amino acid residue means functional importance (low evolutionary rate), and vice versa (Kimura, 1983), site-specific change in evolutionary rate (or selection intensity S) can be naturally interpreted as ‘change of functional importance.

6 Function of a domain is not function of a sequence Repercussions on public databases The annotations in publicly available databases can be erroneous because: a) The annotations are based on the ‘submitter’s discretion. At times, the annotation is that of the domain or in other cases it’s the of the sequence. b) Thus homology based function assignment through public databases might propagate errors. c) Sequence similarity does not mean functional similarity.

7 The assumptions Even limited sequence identity (~20%) might be enough to place unknown proteins into enzyme super-families for which the catalytic strategy is known. Functional importance directly proportional to Evolutionary conservation, F ~=E. Thus ΔF=ΔE. There are two types of Changes that can occur in evolutionary conservation (  E): TYPE I: Change in Evolutionary constraints (Evolutionary rate),  S  0 TYPE II: Change in Amino Acid properties. Eg: +ve Vs –ve charge,  A  0

8 Cont… Change in function can broadly be changes in biochemistry, structure, gene network, phenotype. Functional diverge at a residue can be.. I) would involve site-specific rate difference (A residue is conserved in One sub-family, variant in the other) (DIVERGE) II) TYPE II would involve site-specific Amino acid type difference (positive Vs negative charges) (SequenceSpace/NMF/Evol.trace etc.)

9 Enzyme super-families Most super-families adopt common catalytic strategies S: substrate, P: product

10 Specificity determining residues (SDPred) The most suitable method depends on the test set. SDP: for prediction of residues in protein sequences that determine functional differences between proteins, having same general biochemical function. Basis: Amino acid residues that determine differences in protein functional specificity and account for correct recognition of interaction partners, are usually thought to correspond to those positions of a protein multiple alignment, where the distribution of amino acids is closely associated with grouping of proteins by specificity. SDPpred can analyze alignments of length up to 2000 positions, containing at most 1000 proteins. There can be up to 1000 specificity groups. The predicted SDPs are mapped on to the multiple alignment of the family.

11 PHYLOGENY

12 When would phylogeny work? Provided your sequences share reasonable homology & similarity: a) Place the query sequence in respective family (Eg: based on ADDA). b) Get a reliable and consistent multiple sequence alignment, usually using progressive alignment which is best suited for tree building and making phylogentic inference. c) Adjust your alignment manually. When it comes to phylogeny, there is no strict definition of a ”good” alignment. d) Choose an appropriate Phylogenetic method.

13 There are a number of phylogenetic packages… Clustal W/X (quick and dirty tree) MEGA (Integrated package with an intuitive interface) Phylip (Arguably the most popular phylogenetic tool) Alibee (automated improvement over clustal alignment and subsequent tree building) PAUP # Beast and MrBaeyes (Bayesian inference of phylogeny) # Bete

14 Some basics… Most phylogenetic methods assume that each position in a sequence can change independently from the other positions. Gaps in alignments represent mutations in sequences such as: insertion, deletion, genetic rearrangments. Gaps are treated in various ways by the phylogenetic methods. Most of them ignore gaps.

15 METHODS.(Max. Likelihood) Maximum likelihood In this method, the bases (nucleotides or amino acids) of all sequences at each site are considered separately (as independent), and the log-likelihood of having these bases are computed for a given topology by using a particular probability model. This log-likelihood is added for all sites, and the sum of the log-likelihood is maximized to estimate the branch length of the tree. This procedure is repeated for all possible topologies, and the topology that shows the highest likelihood is chosen as the final tree. Notes : ML estimates the branch lengths of the final tree ; ML methods are usually consistent ; ML is extented to allow differences between the rate of transition and transversion. Drawbacks need long computation time to construct a tree.

16 METHODS (Maximum Parsimony) Maximum Parsimony Parsimony criterion It consists of determining the minimum number of changes (substitutions) required to transform a sequence to its nearest neighbor. Maximum Parsimony The maximum parsimony algorithm searches for the minimum number of genetic events (nucleotide substitutions or amino-acid changes) to infer the most parsimonious tree from a set of sequences. The best tree is the one which needs the fewest changes. Problems : within practical computational limits, this often leads in the generation of tens or more "equally most parsimonious trees" which make it difficult to justify the choice of a particular tree ; long computation time to construct a tree.

17 METHODS (Distance matrix) Distance matrix methods (upgma, nj, Fitch...) Convert sequence data into a set of discrete pairwise distance values, arranged into a matrix. Distance methods fit a tree to this matrix. The phylogenetic topology tree is constructed by using a cluster analysis method (like upgma or nj methods). The phylogeny makes an estimation of the distance for each pair as the sum of branch lengths in the path from one sequence to another through the tree. easy to perform ; quick calculation ; fit for sequences having high similarity scores. Drawbacks : the sequences are not considered as such (loss of information) all sites are generally equally treated (do not take into account differences of substitution rates ) not applicable to distantly divergent sequences.

18 Levels of functional annotation Gene product function Domain annotation for sequences Evolutionary hypothesis through similarity Functional motifs/signatures Sub-cellular localization

19 Sub-cellular localization (cont…) Intracellular, extracellular, membrane related… Proteins that sit on the inner or outer surface of the membrane are called extrinsic or peripheral, and have a large percentage of hydrophobic amino acids in the portion of the molecule that is close to the hydrophobic membrane structure.

20 The eukaryotic cell Image source: http://ridge.icu.ac.jp/gen-ed/cell-lect-gifs/04-eucaryote-plant-cell.GIF Top left: plant cell Top right: animal cell Bottom left: prokaryotes

21 Sub-cellular localization of proteins Subcellular localization is a key functional characteristic of proteins. To perform a common physiological role, proteins must be localized in the same cellular compartment. (plasma membrane…extracellular….cytoplasmic…mito chondrial…chloroplast… endoplasm…peroxisomal…)

22 Methods for predicting sub- cellular locations Homology based assignments: growing sequence data, thus impractical as well as error prone. Artificial learning combining amino acid properties/composition and sequence signals. Applications in a biological context For example, In a search for virulence factors of pathogenic bacteria or easily accessible entry points for pharmaceutical drugs extracellular proteins are good candidates while proteins at other subcellular locations may be, at the beginning, not considered for such purpose

23 Target P Olof Emanuelsson, Henrik Nielsen, Søren Brunak, and Gunnar von Heijne Neural network based artificial classifier for sub-cellular location in eukaryotes. Further away from the N-terminal the sequence starts, less reliable are the predictions.

24 ProtComp/ProtCompB Combination of a number of methods neural networks-based prediction; direct comparison with updated base of homologous proteins of known localization; comparisons of tetramer distributions calculated for query and DB sequences; prediction of certain functional peptide sequences, such as signal peptides, signal-anchors, GPI-anchors, transit peptides of mitochondria and chloroplasts and transmembrane segments; and search for certain localization-specific motifs.

25 Sub-cellular locations: Bacterial compartments Cytoplasmic, Membrane, Periplasmic and Extracellular (Secreted). Please note: There are separate server pages for a)Animals/fungi b)Plants c)Bacteria

26

27

28 Interpretation Prediction accuracy: Will be based on various factors: biological environment of compartments, known homologues, strength of signals/patterns etc. Nucleus: 91% Plasma Membrane: 100% Extra-cellular: 86% Cytoplasm: 88% Mitochondria: 89% Endoplasmic reticulum: 89% Peroxisome: 91% Lysosome: 100% Golgi bodies: 91%

29 Interpretation (cont…) First check would of course be the reliability prediction statistics of various compartments. (previous slide) Terminologies: ProtlocDB LocDB NeuralNets Tetramers Integral

30 Making sense out of the numbers Neural Nets statistics are based on preferential weights, thus this should be looked at seriously if there is no statistical pointers from the other three sources. If both neural networks and homology predictions point to the same compartment, this is very reliable prediction. Incase of NN, the predictions are more reliable if the second best hit gets a much lower weight compared to the one with the highest probability. NN should be the last option when there is no clear picture from Integral and other homology statistics.

31 Interpretation cont… Thumb rules-> First see the supporting evidence with LocDB. This is the strongest evidence. Else, look at Integral support statistics. If the integral statistics are conflicting, look at information from ProtLocDB. In absence of other evidence, you can see weighted statistics from NN to make a hypothesis. Incase NN and others point towards the same compartment, its obviously a very strong evidence.

32

33 We recommend adjusting your alignment, so that a reference sequence (a query sequence) would have no gaps or deletions in the original alignment file. Some editing, in particular, removing sequences with gaps, removing unknown residues, removing redundant sequences can be done using the SRP server using the "Filter your alignment" page. We recommend removing sequences with more than 10-20% gaps. We also recommend removing sequences with similarity of 90-95% or higher to other sequences in the alignment. Sometimes in the alignment, the reference sequence may have long N- and C- termini with gappy columns. One can remove these gappy columns first by using the "remove column" button, and then adjust the rest of the alignment removing gappy and redundant sequences http://consurf.tau.ac.il/results/1164197646/index.htm


Download ppt "Function, evolution, motifs and hierarchy Ashwin Sivakumar."

Similar presentations


Ads by Google