Bridging cheminformatics and bioinformatics using protein structures Edith Chan Inpharmatica London 10 April 2001.

Bridging cheminformatics and bioinformatics using protein structures Edith Chan Inpharmatica London 10 April 2001

1 BioinformaticsCheminformatics SELECTING THE BEST TARGETS  Disease-association doesn’t make a protein a target - requires validation as point of intervention in pathway  Having good biological rationale doesn’t make a protein tractable to chemistry (drugable)  Genomics, HTS and Combichem have increased numerical throughput many hundred fold - overload of poorly integrated data, shortfall in productivity Target Validation Process DiseaseTarget Selection Drug Discovery Process ClinicLeads Inpharmatica’s protein structure focus - uniquely placed to assess both parameters High Validity and Drugability Requires a Unifying Informatics Framework

2 BIOPENDIUM AND CHEMATICA Genome Data Target Structure Lead Hypotheses BiopendiumChematica ctgacaagtatgaaaac aacaagctgattg tccgcagagggcagtct ttctatgtgcaga ttgacctcagtcgtc protein target validation drug discovery and selection

3 % SEQUENCE ID Advanced Approaches AHHLDRPGHNMCEAGFWQPILL Test Sequence 100% 30% 0 Standard Approaches STRUCTURE-BASED METHODS FIND MANY HOMOLOGUES (AND PUTATIVE TARGETS) NOT DETECTABLE FROM SEQUENCE SIMILARITY Biochemical function and drugability defined by 3D structure, not sequence - structure is better conserved Inpharmatica

4 BIOPENDIUM  Inputs - all public (or proprietary) protein data  Proprietary methods  Genome-Threader  QBI-  -Blast  Reverse Search Maximisation  Massive computation  1 million cpu hour set of calculations employing the most advanced algorithms (1100 processor farm)  Applied to 600,000 sequences, 14,000 structures + bound ligands  Yields 670m precalculated protein relationships  Query results in 15 minutes vs. two weeks with traditional bioinformatics in an Oracle database Protein Information  Structures  Sequences  Bound ligands  Families  Functions

5 Link complementary data in the 7 resources Precalculated data for 600,000 protein sequences. (scores and alignments for each hit) Pairwise sequence searches Profile based searches Threading based approaches Inpharmatica Workbench Ligplot ligand interaction editor Inpharmatica enhanced RasMol 3D viewer Interactive sequence alignment editor Relational Database Taxonomy Processed PDB to XMAS data Mask sequences THE INPHARMATICA BIOPENDIUM GenbankPDBPrositePrintsEnzymeSwissprot Ligplot Proprietary seq. ORF prediction Proprietary structures

7 CHEMATICA  Drugable site identified DRUGABLE TARGET DISCOVERY Finding a novel brain metalloprotease BIOPENDIUM  Novel brain protein identified

8 CHEMATICA IS…. Site Mapping Site Identification Fragment Mapping Pharmacophore Generation Database of putative/known binding sites site mapping and pharmacophore generation similarity searching/clustering of sites large scale virtual screening resource Gene Family Data Views Chemical annotation of PDB ‘real’ ligand structures Ligand 2-D structures Gene family structures consensus family analysis

9 a.Sphere is placed between the VDW surfaces of each atom pair. b.Any neighbouring atoms penetrating sphere cause its size to be reduced. c.Repeat for all possible atom pairs. d.Generate surface around surviving sphere to define site region. SURFNET: A program for visualizing molecular surfaces, cavities and intermolecular interactions. Laskowski R A (1995), J. Mol. Graph., 13, 323-330. Site identification - How sites in a protein structure are delineated?

10 Volume Hydrophobic content Polar content surface accessibility …… In total - 20 parameters calculated. Physical Parameters of the clefts 8 largest sites are stored together with their physical parameters

11 Prediction of binding/active sites Rule driven: use of Neural Nets a on a training set of 100 ligand/protein PDBs Validation: success rate = 90% on a extended set of 500 PDBs a backpropagation net -7-5-1 network

12 3-D distributions of 20 different atom types about the 20 amino acids are calculated. No assumption of energy terms. How XSITE potential is derived? X-SITE: use of empirically derived atomic packing preferences to identify favourable interaction regions in the binding sites of proteins. Laskowski R A, Thornton J M, Humblet C & Singh J (1996), Journal of Molecular Biology, 259, 175-201.

13 Data set Used (1) 521 non-homologus protein chains* from PDB that satisfy no two sequence identity is > 20% resolution <1.8Å R factor < 0.2 AND (2) 376 protein-ligand PDB structures for studying additional atom types other than those from peptides and proteins, such as Cl, F. Note: The PDB has about 14K entries! *cullpdb_pc20_res1.8_R0.2_d001130_chains521 (R. Dunbrack, Jr.) U. Hobohm, M. Scharf, R. Schneider, "Selection of representative protein data sets." Protein Science, 1, 409-417 (1993).

14 Application of XSITE distributions to side-chains making up the calculated protein binding site Projecting XSITE distributions onto the predicted binding site

15 How Pharmacophore is generated? a.Compare the XSITE predictions generated for the different probe atoms at a 3D grid of densities encompassing the region of the binding site. b.The higher the value at a given grid-point the higher the likelihood of finding that type of atom at that location. c.For each probe atom, it derives a “best” map. d.The net result is a new set of 3D grid maps, one per probe atom, holding only those regions where that atom scored higher than the others.

16 What is fragments mapping? a.In-built database of more than 100 small molecule fragments - most common functional groups and represent the common building blocks that satisfy drug-like elements used in chemistry. b.Privileged structures from companies.

17 How is fragments mapping done? Each atom in a fragment is assigned one of the 20 atom type. Each fragment is placed at every grid-point within the binding site and subjected to 300 rotations. At each rotation a score is calculated using the appropriate X-SITE predictions for the atom types that the fragment contains.

18 CHEMATICA U Curated, high-quality annotation and presentation of important ‘drugable’ gene families U NHRs, kinases, caspases, GPCRs,…. U Contains ligand structure information U Contains crystal environment classification U Automatic alerts for newly released structures U Multiple structure comparison options Gene Family Data Views

19 Consensus Family Analysis MMP-1 MMP-8 MMP-13 MMP-3  Size and topology of binding sites for MMP-1 & MMP-8 are similar, but detailed interactions differ U Spheres signify negative charge requirement in different areas of the binding pockets U provides potential for specificity CHEMATICA

20 Taken two sets of data from literature 1) GOLD (Jones, Willett, Glen, Leach and Taylor) Genetic Optimization for Ligand Docking (71% success rate in ligand binding mode in 100 pdbs) our method - 70% 2) SUPERSTAR (Verdonk, Cole and Taylor) Empirical method for interactions in proteins (67% success rate for original 4 probes ~67% in 122 pdbs) our method - 84% Validation Study 1. Jones et al. J. Mol. Biol. (1997) 267, 727-748 2. Verdonk et al. J. Mol. Biol. (1999) 289, 1093-1108

21 Acknowledgements Inpharmatica Alex Michie John Overington Simon Skidmore UCL Roman Laskowski Adrian Shepherd Janet Thornton

Bridging cheminformatics and bioinformatics using protein structures Edith Chan Inpharmatica London 10 April 2001.

Similar presentations

Presentation on theme: "Bridging cheminformatics and bioinformatics using protein structures Edith Chan Inpharmatica London 10 April 2001."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bridging cheminformatics and bioinformatics using protein structures Edith Chan Inpharmatica London 10 April 2001.

Similar presentations

Presentation on theme: "Bridging cheminformatics and bioinformatics using protein structures Edith Chan Inpharmatica London 10 April 2001."— Presentation transcript:

Similar presentations

About project

Feedback