Download presentation
1
Chap. 5 AA & Scoring Matrix
2
DNA Bases s R Y W M K B N H V D A T G C A T G C A T G C A T G C A T
3
A G T C Models of nucleotide substitution transition Purine =>
transversion transversion Pyrimidine => T C transition
4
Jukes-Cantor (JC) Kimura 2P Kimura
5
Point Mutation (Substitution)
Point mutation – simplest form of mutation and occurs all over DNA sequences Transition – mutation within purine (A,G) or pyrimidine (C,T/U) Transversion – mutation between nt groups Effects depend on where mutations occur Non-coding region – no effect on proteins, and neutral But may have significant effects if occurring in control region Coding region Synonymous substitution when a mutation does not change AA Non-synonymous AA is replaced by another stop codon is introduced
6
Other Mutations Indel mutation
Small indels of a single base of a few bases are frequent Particularly frequent with repeated sequences GCGC…: insertion of extra GC or deletion cause slight slippage CAG repeated region in huntingtin protein can expand, causing Huntington’s disease Indels can cause frame shift, if indels are not multiples of three Gene inversion Whole genes are copied to offspring in reverse direction Translocation Whole genes can be deleted from one genome and inserted into another
7
Amino Acids General structure of amino acids
an amino group a carboxyl group α-carbon bonded to a hydrogen and a side-chain group, R R determines the identity of particular amino acid R: large white and gray C: black Nitrogen: blue Oxygen: red Hydrogen: white
9
AA Groups Classification of R groups Polar/nonpolar Acidic/basic
Polar share electron bonds unequally O-H bond is polar: O is more electro-negative and bonding electrons are closer to O C-H is nonpolar Acidic/basic Element Electro- negativty Oxygen 3.5 Nitrogen 3.0 Sulfur 2.6 Carbon 2.5 Phosphorus 2.2 Hydrogen 2.1
10
Group 1: Nonpolar (hydrophobic)
Sometimes, Gly (G) is included because C-H bond is nonpolar
11
Group 2: Polar Side chains are electronically neutral (uncharged)
Ser (S), Thr (T), Cys (C), Asn (N), Gln (Q), Try (Y) Asn (N) and Gln (Q) are consider derivatives of group 3 Asp (D) and Glu (E)
12
Group 3: Acidic Side chains have carboxyl group Asp (D) and Glu (E)
Side chains are negatively charged
13
Group 4: Basic His (H), Lys (K), Arg (R)
Side chain is positively charged His (H), Lys (K), Arg (R)
14
Physico-Chemical Properties
Vol. Alanine Ala A 67 Arginine Arg R 148 Asparagine Asn N 96 Aspartic Asp D 91 Cysteine Cys C 86 Glutamine Gln Q 114 Glycine Gly G 48 Histidine His H 118 Isoleucine Ile I 124 Leucine Leu L Lysine Lys K 135 Methionine Met M Phenyl. Phe F Proline Prot P 90 Serine Ser S 73 Threonine Thr T 93 Tryptophan Trp W 163 Tyrosine Y 141 Valine Val V 105 Mean 109 Physico-chemical properties of AA determine protein structures bioinformatics can be used via a pattern recognition Properties (1) Size in volume Volume occupied by side groups is important (also for molecular evolution), and difficult to substitute a large AA for a small one Van der Waals radius (volume until atoms are pushed to repulsion) is used to measure the volume of the sphere (in Å3) W has 3.4 times the volume of G
15
(5) pH of isoelectric point of AA (pI)
(2) Partial Vol. Measure expanded volume in solution when dissolved (3) Bulkiness The ratio of side chain volume to its length Measure of average cross-sectional area of the side chain Relevant to protein folding (4) Polarity index Electrostatic force acting on its surrounding at a distance of 10 Å (5) pH of isoelectric point of AA (pI) Acidic Asp and Glu have pI in 2-3: negatively charged at neutral pH due to ionization of COOH group to COO- -- need to put them in an acid solution to shift equilibrium and balance this charge (side chain is charged +) Basic (Arg, Lys and His) has pI >7 (charged -) All others have uncharged side chains (pl. in 5-6)
16
(6) Hydrophobicity (7) Surface area (8) Fraction of area
When molecules are dissolved in water, hydrogen-bonded structure is disrupted Polar AA residues can form hydrogen bonds with water –hydrophilic Non-polar that cannot form the bonds – hydrophobic Polar disrupts the structure less than non-polar Polar is usually at the exterior of a structure, non-polar, interior Hydrophobicity (hydropathy) scale: estimate of difference in free energy of AA when buried in hydrophobic environment of the interior of a protein in water solution (+ for hydrophobic – costs free energy to take residue out of protein and put it in water) (7) Surface area Surface area of AA exposed (accessible) to water in an unfolded peptide chain and become buried when the chain folds Relevant to protein folding (8) Fraction of area Fraction of the accessible surface area that is buried in the interor in a set of known crystal structures Hydrophobic residues have a larger fraction
17
Red: acidic Orange: basic Green: polar Yellow: non-polar Vol. Bulk
pI Hydro Surf2 Frac Alanine Ala A 67 11.5 0.0 6.0 1.8 113 0.74 Arginine Arg R 148 14.3 52.0 10.8 -4.5 241 0.64 Asparagine Asn N 96 12.3 3.4 5.4 -3.5 158 0.63 Aspartic Asp D 91 11.7 49.7 2.8 151 0.62 Cysteine Cys C 86 13.5 1.5 5.1 2.5 140 0.91 Glutamine Gln Q 114 14.5 3.5 5.7 189 Glu. Acid Glu E 109 13.6 49.9 3.2 183 Glycine Gly G 48 -0.4 85 0.72 Histidine His H 118 13.7 51.6 7.6 -3.2 194 0.78 Isoleucine Ile I 124 21.4 0.1 4.5 182 0.88 Leucine Leu L 3.8 180 0.85 Lysine Lys K 135 49.5 9.7 -3.9 211 0.52 Methionine Met M 16.3 1.4 1.9 204 Phenyl. Phe F 0.4 5.5 2.9 218 Proline Prot P 90 17.4 1.6 6.3 -1.6 143 Serine Ser S 73 9.5 1.7 -0.8 122 0.66 Threonine Thr T 93 15.8 -0.7 146 0.70 Tryptophan Trp W 163 21.7 2.1 5.9 -0.9 259 Tyrosine Y 141 18.0 -1.3 229 0.76 Valine Val V 105 21.6 4.2 160 0.86 Mean 15.4 -0.5 175 Red: acidic Orange: basic Green: polar Yellow: non-polar
18
“Universal” Genetic Code
19
Genetic Code 2
20
Properties Purine (A,G) is heavier than Pyrimidine (C,T)
Transition within a type (Purines or Pyrimidines) is more likely than Translation between types All AAs have more than one codon, except for Met and Trp Codons for an AA are clustered Two codons for an AA – same in the first 2 positions and differ only by transition at the 3rd position Four codons – differ only in the 3rd position Six codons – form one four-codon box and one two-codon box
21
Genetic Code X X X X Degeneracy is controlled by GC content of codons G-C binding is stronger First two bases (doublets) are GC – form four codon boxes (red X) Doublets are AU – split boxes (blue X) Doublets are mixed X X X X Purine 2nd base is pyrimidine – four codon boxes, split otherwise Larger purine at the 2nd position reduces binding at the 3rd position A doublet forms a four-codon box, its ‘conjugate’ forms a split box Conjugate – opposite size and opposite number of hydrogen bonds; A-C and G-U are conjugates
22
Genetic Code Five most hydrophobic – Phe, Leu, Ile, Met, Val
U at the 2nd position Three most similar – Leu, Ile, Val Single-base mutation at 1st position Six most hydrophilic – His,Gln,Asn,Lys,Asp, Glu A at the 2nd position (Tyr is hydrophobic and has A in 2nd position)
23
Evolution of Genetic Code
From what the current Genetic Code became stable ? Robin Knight
24
AA Substitutions 1978 – Dayhoff, Schwartz, Orcutt
Which AA substitutions are observed to occur when two homologous protein sequences are aligned ? From aligned sequences of 71 families of closely related proteins (sharing more than 85% of sequences), tabulated 1572 substitutions AA substitutions are accepted by natural selection occurs when A gene undergoes a DNA mutation to translate to a different AA and does not significantly alter the gene function The entire species adopts the change as a predominant form of the protein Frequencies can represent expected mutation over short evolutionary distances Called PAM (Point Accepted Mutation) PAM unit corresponds to one AA change per 100 residues (1% divergence)
25
Dayhoff counting Most freq. subs.: Glu to Asp (both acidic)
26
Protein Substitution Rates
Example Six letters: I, K, L, Q, T, V Seven sequences Form an evolutionary tree A: T L K K V Q K T B: T L K K V Q K T C: T L K K I Q K Q D: I I T K L Q K Q E: T I T K L Q K Q F: T L T K I Q K Q G: T L T Q I Q K Q
27
Protein Substitution Rates
Determine AAij Count AA j being substituted by i I K L Q T V I K L Q T V
28
Sub. Frequency to Score Matrix
AA mutation prob. Mij : Prob. of original AA j mutating to AA i in one PAM distance PAM distance: unit of evolutionary divergence in which 1% of AA's have changed between two protein sequences Mij = Aij /Ni (Ni count of amino acid i) -- normalized by the prob. of AA i occurring Pij(t) : Prob. that a site has AA i at time t when it had j at time 0 Pij(dt) = Mij
29
Mutation Prob. Matrix Each entry is scaled by 105
Two most freq. substitutions are highlighted
30
Sub. Frequency to Score Matrix
2. Mutation Prob. Matrix to Log-Odds Scoring qij : Prob. of aligning j to i pi: prob. of observing AA i by chance Odds – related to probability prob = 0: odds = 0 prob = 0.5: odds = 1 prob =0.75: odds = 3 (75:25) odds= prob/(1-prob) and prob = odds/(1+odds) Log-odd: sij = 10*log(qij/ pi) e.g., sED = 10*log( /0.062) Can add log-odd scores
31
PAM matrix assumptions 1992 – Jones, Taylor, Thornton (JTT)
Most important assumption – each AA replacement is independent of previous mutations at the same position Matrix can be extrapolated into predicted substitution fequencies at longer evolutionary distances PAM1 multiplied by itself 100 times can represent what one would expect if there were 100 AA changes per 100 residues – PAM100 All sites are equally mutable independent of neighboring residues No consideration of conserved blocks or motifs Forces responsible for sequence evolution over shorter time span are identical to those for longer time spans 1992 – Jones, Taylor, Thornton (JTT) 59,190 substitutions in all sequences in Swiss-Prot
32
PAM1 matrix Mutation Prob. Matrix has Pij(t)
PAM1 matrix for related proteins with 1% mutation = 99% identical between two sequences For distantly related proteins, other PAM matrices are used by successively multiplying PAM1 PAM % identity
33
PAM 250 matrix
34
Pairwise Alignment vs. PAM Distance
Two sequences 100 AAs After 80 PAM distances (80 mutations), 50 AAs are different After 250 hits, 20 AAs remain the same
35
PAM matrices Closely related: Human vs. Chimpanzee (100% AA identical)
Distantly related: HBA vs. HBB (43% AA identical)
36
BLOSUM S.Henikoff and J.G. Henikoff
Devised to perform best in identifying distant relationships Based on BLOCKS database of aligned protein sequences BLOcks Substitution Matrix (# of observed pairs of AA at any position)/(# of pairs expected from the overall AA frequencies) is computed from regions of closely related proteins alignable without gaps To avoid overweighting closely related sequences, groups of proteins with sequence identities higher than a threshold are replaced by either a single representative or a weighted average
37
BLOSUM 62 BLOSUM Threshold set at 62
Protein sequences sharing less than 62% identity Default BLAST
38
BLOSUM62 Most popular diagonal Off-diagonal Score for exact match
W-W: score 11: because alignment of W between two sequences is rare Off-diagonal W (tryptophan) – Y (tyrosine): score 2 Positive score – occur more often than by chance, but replacement is not as good as if W is preserved (2 < 11) or if Y is preserved (2 < 7) W – V (Valine): score -3
40
PAM vs. BLOSUM
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.