Signals in Sequences The number of sequences available for analysis rapidly approaches infinite. We need new ways to look at all this information.
Rule 1 First rule of sequence analysis: If a residue is conserved, it is important.
Rule 2 Second rule of sequence analysis: If a residue is very conserved, it is very important.
GPCR Project GPCR is THE drug target. Lots of data available. You have ~630 GPCRs. Little structure data sequences known. ‘Easy’ to align.
The GPCR (rhodopsin)
1 conserved aa / helix!
Laerte about modelling: “Use the sequence, Luke”
Conserved, CMA, variable QWERTYASDFGRGH QWERTYASDTHRPM QWERTNMKDFGRKC QWERTNMKDTHRVW Black = conserved White = variable Green = correlated mutations(CMA)
CMA and tree 1 ASASDFDFGHKM 2 ASASDFDFRRRL 3 ASLPDFLPGHSI 4 ASLPDFLPRRRV
CMA versus tree 1 ASASDFDFGHKMGHS 2 ASASDFDFRRRLRHS 3 ASLPDFLPGHSIGHS 4 ASLPDFLPRRRVRIT 5 ASASDFDFRRRLRIT 6 ASLPDFLPGHSIGIT Red : 1,2,5 vs 3,4,6 Black : 1,3,6 vs 2,4,5 Yellow: 1,2,3 vs 4,5,6
CMA on GPCR CMA on GPCR
CMA on GPCR
Florence Horn
Class B Ligands
Class B – ligand docking
G protein-coupling?
Sequence Signals Three classes of residues 1) Conserved 2) CMA 3) Variable
Conservation Artefacts Conservation can result from Not enough sequences Too conserved sequences Over-alignment
Variability Artefacts Variability can result from Wrong sequence choice Variable loops Alignment errors
CMA Artefacts CMA can result from Wrong sequence choice Poor sequence homogeneity Over-fitting
Recalcitrant residues
Sequence Entropy 20 E i = p i ln(p i ) i=1
Sequence Variability Sequence variability is the number of residues that is present in more than 0.5% of all sequences.
Entropy - Variability Entropy = Information Variability = Chaos
Entropy - Variability Variability is result of evolution. Entropy is the protein’s break on evolutionary speed.
GPCR Entropy - Variability 11 Red 12 Orange 22 Yellow 23 Green 33 Blue
GPCR Location 11 Red 12 Orange 22 Yellow 23 Green 33 Blue
Ras Entropy - Variability
Ras Location 11 Red 12 Orange 22 Yellow 23 Green 33 Blue
Protease Entropy - Variability
Protease Location 11 Red 12 Orange 22 Yellow 23 Green 33 Blue
Globin Entropy - Variability GPCR
Globin Location 11 Red 12 Orange 22 Yellow 23 Green 33 Blue
GPCR Again….
GPCR Location (Again) 11 Red 12 Orange 22 Yellow 23 Green 33 Blue
GPCR signaling 11 Purple 12 Red 22 ‘Yellow’ 23 Green 33 Blue
Summary Given infinitely many sequences: Every residues role known. Signaling paths detectable. So, sequences contain many signals
Thanks to: Laerte OliveiraSao Paulo Wilma KuipersWeesp Florence HornSan Francisco Bob BywaterCopenhagen Nora vd WendenThe Hague Mike SingerNew Haven Ad IJzermanLeiden Margot BeukersLeiden Amos BairochGeneva Fabien CampagneNew York