Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity.

Similar presentations


Presentation on theme: "1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity."— Presentation transcript:

1 1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity can imply homology Identity and Homology

2 2 HW Clarifications Insertions and Deletions

3 3 Prediction of functional/structural sites in a protein using conservation and hyper-variation (ConSeq, ConSurf, Selecton)

4 4 Empirical findings of conservation variation among sites: Functional/Structural sites evolve slowerthan nonfunctional/nonstructural sites

5 5 Conservation = functional/structural importance

6 6 Histone 3 protein

7 7 Xenopus MALWMQCLP-LVLVLLFSTPNTEALANQHL Bos MALWTRLRPLLALLALWPPPPARAFVNQHL **** : * *.*: *:..* :. : **** Xenopus CGSHLVEALYLVCGDRGFFYYPKIKRDIEQ Bos CGSHLVEALYLVCGERGFFYTPKARREVEG **************:***** ** :*::* Xenopus AQVNGPQDNELDG-MQFQPQEYQKMKRGIV Bos PQVG---ALELAGGPGAGGLEGPPQKRGIV.**. ** * * ***** Xenopus EQCCHSTCSLFQLENYCN Bos EQCCASVCSLYQLENYCN **** *.***:******* Alignment pre-pro-insulin

8 8 <>

9 9

10 10   Conserved sites: Important for the function or structure Important for the function or structure Not allowed to mutate Not allowed to mutate “Slow evolving” sitesLow rate of evolution   Variable sites: Less important (usually) Change more easily “Fast evolving” sitesHigh rate of evolution Conservation based inference

11 11 Detecting conservation: Detecting conservation: Evolutionary rates d Rate = distance/time Distance = number of substitutions per site Time = 2*#years (doubled because the sequences evolved independently)

12 12 Rate computation 1234567 HumanDMAAHAM ChimpDEAAGGC CowDQAAWAP FishDLAACAL S. cerevisiae DDGAFAA S. pombe DDGALGE MSAPhylogeny Evolutionary Model

13 13 http://conseq.tau.ac.il Site-specific rate computation tool

14 14 Locating the active site of Pyruvate kinase Glycolysis pathway

15 15

16 16

17 17

18 18 Conservation scores:  The scores are standardized: the average score of all residues is 0, and the standard deviation is 1  Negative values: slowly evolving (= low evolutionary rate). conserved sites The most conserved site in the protein has the lowest score The most conserved site in the protein has the lowest score  Positive values: rapidly evolving (= fast evolutionary rate). variable sites The most variable site in the protein has the highest score The most variable site in the protein has the highest score Scores are relative to the protein and cannot be compared between different proteins!!!

19 19

20 20 SWISS-PROT

21 21 Combining protein structure  Each protein has a particular 3D structure that determines its function  Protein structure is better conserved than protein sequence and more closely related to function  Analyzing a protein structure is more informative than analyzing its sequence for function inference

22 22 Protein core: structurally constrained - usually conserved Active site: functionally constrained - usually conserved Surface: tolerant to mutations - usually variable Core Surface Conservation in the structure Active site

23 23 http://consurf.tau.ac.il Same algorithm as ConSeq, but here the results are projected onto the 3D structure of the protein

24 24 The structure-function of the potassium channel transmembrane region cytoplasm

25 25

26 26

27 27

28 28

29 29 ConSeq/ConSurf user intervention (advanced options) ConSeq/ConSurf user intervention (advanced options) 1. Choosing the method for calculating the amino-acid conservation scores: (Bayesian/Max’ Likelihood) 2. Entering your own MSA file 3. Performing the MSA using: (MUSCLE/CLUSTALW) 4. Collecting the homologs from: (SWISS-PROT/UniProt) 5. Max. number of homologs: (50) 6. No. of PSI-BLAST iterations: (1) 7. PSI-BLAST 3-value cutoff: (0.001 ) 8. Model of substitution for proteins: (JTT/Dayhoff/mtREV/cpREV/WAG) 9. Entering your own PDB file 10. Entering your own TREE file

30 30 Codon-level selection  ConSeq/ConSurf: Compute the evolutionary rate of amino-acid sites → the data are amino acids Compute the evolutionary rate of amino-acid sites → the data are amino acids Compute only the rate of non-synonymous substitutions Compute only the rate of non-synonymous substitutions UUU → UUC (Phe → Phe ): synonymous UUU → CUU (Phe → Leu): non-synonymous

31 31 For most proteins, the rate of synonymous substitutions is much Higher than the non-synonymous rate purifying selection This is called purifying selection (= conservation in ConSeq/Surf ) Synonymous vs. non-synonymous substitutions

32 32 There are rare cases where the non- synonymous rate is much higher than the synonymous rate positive (Darwinian) selection This is called positive (Darwinian) selection Synonymous vs. nonsynonymous substitutions

33 33 Examples:  Pathogen proteins evading the host immune system  Proteins of the immune system detecting pathogen proteins  Pathogen proteins that are drug targets  Proteins that are products of gene duplication  Proteins involved in the reproductive system Positive Selection The hypothesis: promotes the fitness of the organism

34 34 Computing synonymous and non- synonymous rates Evolutionary Model Codon MSA Phylogeny

35 35 Inferring positive selection Look at the ratio between the non-synonymous rate (K a ) and the synonymous rate (K s )

36 36 Inferring positive selection Ka/Ks < 1purifying selection Ka/Ks > 1positive selection Ka/Ks = 1no selection (neutral)

37 37  Our evolutionary model assumes there is positive selection in the data  By chance alone we expect our model to find a few sites with Ka/Ks >1  Is this really indicative of positive selection or plain randomness? Maybe there’s no positive selection after all? Evolutionary Model Codon MSA Phylogeny

38 38 Solution: statistically compare between hypotheses  H 0 : There’s no positive selection  H 1 : There is positive selection  H 0 : compute the probability (likelihood) of the data using a model that does not account for positive selection P-value > 0.05 accept H 0 < 0.05 reject H 0  Perform a statistical test to accept or reject H 0 (likelihood ratio test)  H 1 : compute the probability (likelihood) of the data using a model that does account for positive selection

39 39 Note: saturation of synonymous substitutions Human and wheat are too evolutionary remote saturation of synonymous substitutions Pick closer sequences for positive selection analysis Syn. Nonsyn.

40 40 http://selecton.tau.ac.il

41 41 Selecton input  Coding sequences - only ORFs  No stop codons  If an MSA is provided it must be codon aligned (RevTrans) (RevTrans)  The user must provide the sequences – no psi-blast option Codon-level sequences !!!

42 42 Positive selection in the primateTRIM5a

43 43 PrimateTRIM5a TRIM5α from humans, rhesus monkeys, and African green monkeys are all unable to restrict retroviruses isolated from their own species, yet are able to restrict retroviruses from the other species TRIM5α is an important natural barrier to cross-species retrovirus transmission TRIM5α is in an antagonistic conflict with the retroviral capsid proteins TRIM5α is under positive selection

44 44 Positive selection analysis

45 45 Positive selection analysis in Selecton H0H0 H1H1

46 46 Comparing H 0 and H 1 in Selecton

47 47 Comparing H 0 and H 1 in Selecton

48 48

49 49 Selecton results:

50 50

51 51 Results Humanrhesus swaps at sites 332, 335-340 (SPRY) significantly elevate human resistance to HIV and rhesus resistance to SIV


Download ppt "1 HW Clarifications Homology implies shared ancestry Partial sequence identity does not necessarily imply homology A high coverage of sequence identity."

Similar presentations


Ads by Google