Bio/Chem-informatics © José R. Valverde, 2014 CC-BY-NC-SA
From sequence to atoms Cheminformatics
Index Goals Obtaining protein structures Obtaining protein sequences Comparing structures and sequences Obtaining ligand structures Limits
Goal Learn as much as you can about your protein Identify relevant properties Function Active site(s) Modifications Conserved features Relevant amino acids Cheminformatics: the application of informatic methods to solve chemical problems
Read Bibliography http://www.ncbi.nlm.nih.gov/pubmed Should be the initial step in all cases Should have been already done Likely to be neglected It is funnier to play from the start Guides all subsequent analysis and experiment Allows taking a decision It IS worth the trouble!
Sequence analysis Compare sequences and look for similarities and differences Match to experimental observation
Predict, predict, predict... Secondary Structure Properties (ProSite, PFAM, InterPro...)
ProSiteDoc {PS00433; PHOSPHOFRUCTOKINASE} {BEGIN} ********************************* * Phosphofructokinase signature * Phosphofructokinase (EC 2.7.1.11) (PFK) [1,2] is a key regulatory enzyme in the glycolytic pathway. It catalyzes the phosphorylation by ATP of fructose 6-phosphate to fructose 1,6-bisphosphate. In bacteria PFK is a tetramer of identical 36 Kd subunits. In mammals it is a tetramer of 80 Kd subunits. Each 80 Kd subunit consist of two homologous domains which are highly related to the bacterial 36 Kd subunits. In Human there are three, tissue-specific, types of PFK isozymes: PFKM (muscle), PFKL (liver), and PFKP (platelet). In yeast PFK is an octamer composed of four 100 Kd alpha chains (gene PFK1) and four 100 Kd beta chains (gene PFK2); like the mammalian 80 Kd subunits, the yeast 100 Kd subunits are composed of two homologous domains. As a signature pattern for PFK we selected a region that contains three basic residues involved in fructose-6-phosphate binding. -Consensus pattern: [RK]-x(4)-G-H-x-Q-[QR]-G-G-x(5)-D-R [The R/K, the H and the Q/R are involved in fructose-6-P binding] -Sequences known to belong to this class detected by the pattern: ALL. -Other sequence(s) detected in Swiss-Prot: NONE. -Note: Escherichia coli has two phosphofructokinase isozymes which are encoded by genes pfkA (major) and pfkB (minor). The pfkB isozyme is not evolutionary related to other prokaryotic or eukaryotic PFK's (see <PDOC00504>).
InterPro Database of protein families, domains and functional sies Integrates other databases: PROSITE, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, PANTHER, GENE3D... http://www.ebi.ac.uk/interpro/
InterProScan
PredictProtein Automatic prediction of structural and functional properties of proteins Runs a test battery And gives a detailed report
Look for known structure
Search for homologs Search in structural databases PDB/RCSB Search in Sequence databases Blast against SwissProt Blast against EMBL/GenBank/DDBJ
Blast vs. PDB (EBI) Search for sequence-related structures
NCBI BlastPDB Search for structures of sequence-related structures
ModBase Search for possible 3-D models of the protein
Nature's SBKB Search for models from a number of servers
Alignment of mt ATP6 Spot a few, well-preserved, amino acids with a major role.
Multiple Alignment Problems Homologue proteins Risk: Too high conservation Same family Risk: Too little conservation
Analyze coevolution Co-evolving amino acids highlight interactions See review at CNB
Structural matching Protein Function Prediction Server Uses structural data from known files to make predictions Catalytic Site Atlas Uses structural models of active sites
Compare, compare, compare... The answer may already be there If not, similarities and differences allow you to scan genomes for useful targets, and proteins for target sites. There are many tools. There are “supertools” combining many tools e.g. STING Millenium Information is often cheaper than calculation
Limits Still reduced knowledge of 3-D structures Prediction accuracy needs to be asserted Check the database metadata Available models may be outdated or incorrect Too high or too low conservation preclude specific assignment New, unknown proteins and functions are possible
But, wait! There is more... much more! Image by geralt. CC0. http://pixabay.com/en/ball-http-www-crash-administrator-63527/