D7.4 A REFMAC server for EM and NMR

Slides:



Advertisements
Similar presentations
V-Detector: A Negative Selection Algorithm Zhou Ji, advised by Prof. Dasgupta Computer Science Research Day The University of Memphis March 25, 2005.
Advertisements

Applications of one-class classification
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Using Metacomputing Tools to Facilitate Large Scale Analyses of Biological Databases Vinay D. Shet CMSC 838 Presentation Authors: Allison Waugh, Glenn.
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
Cloud Computing for Chemical Property Prediction Paul Watson School of Computing Science Newcastle University, UK Microsoft Cloud.
Spatial Pyramid Pooling in Deep Convolutional
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Protein Interfaces, Surfaces and Assemblies
Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Multi-scale tensor voting for feature extraction from unstructured.
CCP-EM community meeting 7 February 2013 EMDB and beyond Ardan Patwardhan and Gerard Kleywegt Protein Data Bank in Europe EMBL-EBI.
Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab
1 PyMOL Evolutionary Trace Viewer 1.1 Lichtarge Lab Sept. 13, 2010.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Data Integration and Management A PDB Perspective.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Protein Data Bank: An Introduction Learning to Use the RCSB PDB Portal.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Deposition, Validation, Search and Analysis Services.
EM Maps and Models in EMDB/PDB. Growth of EM entries
Worldwide Protein Data Bank wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.
Team Members Ming-Chun Chang Lungisa Matshoba Steven Preston Supervisors Dr James Gain Dr Patrick Marais.
Data Mining and Decision Support
EMBL-EBI Representative sets and Clustering.. EMBL-EBI Representative sets A subset of data that provides a statistically valid sample set for the complete.
NTU & MSRA Ming-Feng Tsai
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
A new protein-protein docking scoring function based on interface residue properties Reporter: Yu Lun Kuo (D )
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Page 1 Molecular Modeling Service in Profacgen. Page 2 The three-dimensional structure of a protein provides essential information about its biological.
Sub-fields of computer science. Sub-fields of computer science.
Big data classification using neural network
Machine Learning with Spark MLlib
PDBe Protein Interfaces, Surfaces and Assemblies
Take a REST from manual searching: PDBe, programmatically
Recognition of biological cells – development
Compact Bilinear Pooling
PDBemotif A web based integrated search service to understand ligand binding and secondary structure properties in macromolecular structures.
DataNet Collaboration
Intro to Machine Learning
CCP4 6.1 and beyond: Tools for Macromolecular Crystallography
Grid Portal Services IeSE (the Integrated e-Science Environment)
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
Implementing Boosting and Convolutional Neural Networks For Particle Identification (PID) Khalid Teli .
Basic machine learning background with Python scikit-learn
Project Implementation for ITCS4122
חיזוי ואפיון אתרי קישור של חלבון לדנ"א מתוך הרצף
A Unifying View on Instance Selection
Data Warehousing and Data Mining
Deep Learning Hierarchical Representations for Image Steganalysis
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
John H.L. Hansen & Taufiq Al Babba Hasan
West-Life: the last six months May-Oct 2018
Topological Signatures For Fast Mobility Analysis
The site to download BALBES:
Automatic Handwriting Generation
Volume 26, Issue 6, Pages e2 (June 2018)
Object Detection Implementations
What's New in eCognition 9
An introduction to neural network and machine learning
Presentation transcript:

D7.4 A REFMAC server for EM and NMR

D7.7: Quality analysis workflow for predicted complexes Joint project: EMBL-EBI Marc F. Lensink (University of Science and Technology of Lille)

Background & rationale Protein–protein interactions and protein assemblies play a crucial role in all cellular processes. A small fraction of protein complexes have been solved experimentally. Computational procedures to generate models of macromolecular assemblies is important to supplement experimental methods. Validation of experimentally determined structures: coordinated by the wwPDB structure validation task forces (X-ray, NMR, EM, hybrid/integrative methods). These are available to worldwide user. Validation of predicted complexes is established by the CAPRI (Critical Assessment of Predicted Interactions) community, however it is not available to users. This deliverable (D7.7) aims to make this analysis protocol available to users worldwide.

CAPRI + evaluation criteria The Critical Assessment of Predicted Interactions (CAPRI) is a community-wide initiative to organize worldwide experiment for macromolecular complex prediction. Established in 2001. >130 complexes have been predicted by groups worldwide Since its inception, the CAPRI committee has organized six evaluation meetings. During these evaluation meetings, discussions within the community have led to established standards for the parameters and criteria used to evaluate the quality of the predicted complexes Assess geometric (L-rms, I-rms) and biological (f(nat)) properties of the models

Web server Run workflow on EBI farm Input 1: Structures of Unbound proteins Input 2: Structure of Complex Input 3: Predicted models Run workflow on EBI farm

Landing page + expert interface

Result User provide e-mail – unique link to result Currently, result shows a table showing classification of each model Will allow download of superposition files, etc

D7.8 Report on prototypes using Big Data approaches Selected projects: Implementation of Convolutional Neural Network for structural biology maps Clustering PDB entries Extracting mentions of residues from literature

Machine learning for cryoEM First prototype for "Big Data" technologies in structural biology Other examples: Steve Ludtke CNNs for cryoET SuRVoS for segmentation Several attempts at particle picking Use case: Analyse 3D maps to learn and recognise features at different scales: e.g. missing components, domains, side chains Protein vs nucleic acid vs solvent EM: effect of map sharpening xtal: effect of phase quality

CNN for maps Network Data Classify as protein or noise EMD-2984 betagal 2.2Å 48*48 pixel 2D slices Fitted model 5a1a to annotate as protein or not Compare deposited map with blurred map 33,148 and 18,750 slices respectively Network Classify as protein or noise Implemented with Keras Tried standard VGG16 model, but reverted to 5-layer model No. images Protein Non-protein Loss Accuracy Model trained and tested on images from blurred map 33148 15696 17452 0.099 0.978 Model trained on blurred map, tested on images from unblurred map 18750 4736 14014 11.842 0.252 Model trained and tested on images from unblurred map 0.176 0.938 Model trained on unblurred map, tested on images from blurred map 1.631 0.587 deposited blurred

Similarity and clustering in PDB The problem New protein/chain, find similar ones in PDB Several methods to assess similarity, e.g. GESAMT: E. Krissinel, Enhanced fold recognition using efficient short fragment clustering, J Mol Biochem., 2012, 1: 76-85 RCC: R.Corral-Corral et al., Machine Learnable Fold Space Representation Based on Residue Cluster Classes,  Comp. biol. and chem., 2015, 59: 1-7. Too many (140k entries/500k chains) to compare with all Representative clean set (65k chains) Thorough statistics (space coverage, similarity measure correlation, pairwise similarity distribution, …) Two proposed approaches Dissimilarity search, dimensionality reduction and clustering

Similarity and clustering in PDB “Dissimilarity search” If the new structure t is dissimilar to some r, anything similar to r needn’t be considered Precompute similarity of all entries with some fixed representative set N Still feasible -- dne once, extensible with new entries On query with t, choose random samples of N Based on precomputed similarities, use them to identify candidates Compute similarity of t with all candidates, choose the best one

Similarity and clustering in PDB Dimensionality reduction and clustering Compute residue cluster classes – 26 integer descriptors Embed in real space, perform linear and non-linear dimensionality reductions Apply multiple clustering techniques Evaluate quality of resulting clusters Framework to plug in different algorithms in all steps being developed

Similarity and clustering in PDB Plans Develop prototypes Cross-validate results Use independent similarity assessment techniques Publish! Integrate with PDB

Annotations for EuropePMC

Natural Language Processing Annotations will also be imported to PDBe Dashboard at https://pyresid-dash.herokuapp.com/ Uses spaCy Software pyresid available via pip IUCR considering use Work done by Rob Firth STFC, Francisco Talo EBI Rob will attend OpenMinTed conference Challenges: sentence boundary recognition Possible continuation by future grant application Ingest from Proteopedia?

D7.9 Report on existing metadata standards, and proposals for new vocabularies Provenance: D6.4 should add PROV-O to Virtual Folder Projects: translate ARIA metadata to CERIF Workflows: CSIC EOSC Pilot on cryoEM workflows NUTS metadata in Repository, ? NMR-STAR mmCIF…

UU Contribution to data standards Utrecht member of the integrative modelling task force of PBD Contributing to expending the mmCIF dictionary https://github.com/ihmwg/IHM-dictionary Visit to Sali’s lab at USSF and RCSB in Rutgers in January First HADDOCK integrative model deposited in PDB dev: https://pdb-dev.wwpdb.org

mmCIF DipCheck now accepts mmCIF input ARP/WARP accepts mmCIF for ligands but not proteins (EMBL-Hamburg) The PDB-REDO databank now stores mmCIF files The PDB_REDO service reads and writes mmCIF Version 8 of PDB-REDO will use mmCIF internally (NKI). 3DBionotes reads mmCIF

D7.5 A HADDOCK server for EM