Download presentation
Presentation is loading. Please wait.
1
Blosum matrices What are they
Blosum matrices What are they? Morten Nielsen Depart of systems biology, DTU IIB-INTECH, UNSAM, Argentina
2
Alignment scoring matrices
Outline Alignment scoring matrices How are Blosum matrices constructed? What is a BLOSUM50 matrix and how is it different from a BLOSUM80 matrix? What is the difference between a Blosum scoring matrix and the Blosum frequency substitution matrix?
3
Sequence Alignment 1PLC._ 1PLB._
4
Where is the active site?
Sequence alignment. Infer function (and functional residues) from one protein to another 1K7C.A TVYLAGDSTMAKNGGGSGTNGWGEYLASYLSATVVNDAVAGRSARSYTREGRFENIADVVTAGDYVIVEFGHNDGGSLSTDN S G N 1WAB._ EVVFIGDSLVQLMHQCE---IWRELFS---PLHALNFGIGGDSTQHVLW--RLENGELEHIRPKIVVVWVGTNNHG------ 1K7C.A GRTDCSGTGAEVCYSVYDGVNETILTFPAYLENAAKLFTAK--GAKVILSSQTPNNPWETGTFVNSPTRFVEYAEL-AAEVA 1WAB._ HTAEQVTGGIKAIVQLVNERQPQARVVVLGLLPRGQ-HPNPLREKNRRVNELVRAALAGHP 1K7C.A GVEYVDHWSYVDSIYETLGNATVNSYFPIDHTHTSPAGAEVVAEAFLKAVVCTGTSL H 1WAB._ RAHFLDADPG---FVHSDG--TISHHDMYDYLHLSRLGYTPVCRALHSLLLRL---L
5
Homology modeling and the human genome
6
BLOSUM = BLOck SUbstitution Matrices
Focus on conserved domains, MSA's (multiple sequence alignment) are ungapped blocks. Compute pairwise amino acid alignment counts Count amino acid replacement frequencies directly from columns in blocks Sample bias: Cluster sequences that are x% similar. Do not count amino acid pairs within a cluster. Do count amino acid pairs across clusters, treating clusters as an "average sequence". Normalize by the number of sequences in the cluster. BLOSUMXX matrices Sequences that are xx% similar are clustered during the construction of the matrix.
7
Log-odds scores Log-odds scores are given by
Log( Observation/Expected) The log-odd score of matching amino acid j with amino acid i in an alignment is where Pij is the frequency of observing amino i aligned with j, and Qi, Qj are the frequencies of amino acids i and j in the data set. The log-odd score is (in half bit units)
8
So what does this mean? An example
NAA = 14 NAD = 5 NAV = 5 NDA = 5 NDD = 8 NDV = 2 NVA = 5 NVD = 2 NVV = 2 PAA = 14/48 PAD = 5/48 PAV = 5/48 PDA = 5/48 PDD = 8/48 PDV = 2/48 PVA = 5/48 PVD = 2/48 PVV = 2/48 1: VVAD 2: AAAD 3: DVAD 4: DAAA MSA QA = 8/16 QD = 5/16 QV = 3/16
9
So what does this mean? PAA = 0.29 QAQA = 0.25 PAD = 0.10 QAQD = 0.16
PAV = 0.10 PDA = 0.10 PDD = 0.17 PDV = 0.04 PVA = 0.10 PVD = 0.04 PVV = 0.04 QAQA = 0.25 QAQD = 0.16 QAQV = 0.09 QDQA = 0.16 QDQD = 0.10 QDQV = 0.06 QVQA = 0.09 QVQD = 0.06 QVQV = 0.03 1: VVAD 2: AAAD 3: DVAD 4: DAAA MSA QA=0.50 QD=0.31 QV=0.19
10
So what does this mean? PAA = 0.29 PAD = 0.10 PAV = 0.10 PDA = 0.10
PDD = 0.17 PDV = 0.04 PVA = 0.10 PVD = 0.04 PVV = 0.04 QAQA = 0.25 QAQD = 0.16 QAQV = 0.09 QDQA = 0.16 QDQD = 0.10 QDQV = 0.06 QVQA = 0.09 QVQD = 0.06 QVQV = 0.03 SAA = 0.44 SAD =-1.17 SAV = 0.30 SDA =-1.17 SDD = 1.54 SDV =-0.98 SVA = 0.30 SVD =-0.98 SVV = 0.49 BLOSUM is a log-likelihood matrix: Sij = 2log2(Pij/(QiQj))
11
The Scoring matrix A D V 0.44 -1.17 0.30 1.54 -0.98 0.49 1: VVAD
2: AAAD 3: DVAD 4: DAAA MSA
12
And what does the BLOSUMXX mean?
Cluster sequence Blocks at XX% identity Do statistics only across clusters Normalize statistics according to cluster size AV AP AL VL } min XX % identify A) B) AV GL GV
13
And what does the BLOSUMXX mean?
AV AP AL VL A) B) AV GL GV
14
And what does the BLOSUMXX mean?
High Blosum values mean high similarity between clusters Conserved substitution dominate Low Blosum values mean low similarity between clusters Less conserved substitutions dominate
15
BLOSUM80 <Sii> = 9.4 <Sij> = -2.9
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V <Sii> = 9.4 <Sij> = -2.9
16
BLOSUM30 <Sii> = 8.3 <Sij> = -1.16
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V <Sii> = 8.3 <Sij> = -1.16
17
The different Blosum matrices
The BLOSUM alignment scoring matrix is a log-likelihood matrix The Blosum frequency substitution matrix, is a conditional probability matrix of matching amino acids j given you have amino acid i and
18
The way from frequencies to log-odds
A R N D C Q E G H I L K M F P S T W Y V A R N D C Q E G H I L K M F P S T W Y V The weight matrix is identical to the Blosum row for each amino adis
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.