Modelling Workshop - Some Relevant Questions Prof. David Jones University College London Where are we now? Where are we going? Where should.

Slides:



Advertisements
Similar presentations
Critical and Analytical Thinking Transition Programme
Advertisements

Consensus Fold Recognition Methods Dongbo Bu School of Computer Science University of Waterloo Joint work with S.C. Li, X. Gao, L. Yu, J. Xu, M. Li Nov.
Functional Site Prediction Selects Correct Protein Models Vijayalakshmi Chelliah Division of Mathematical Biology National Institute.
Protein Structure Prediction using ROSETTA
TRIM Workshop Arco van Strien Wildlife statistics Statistics Netherlands (CBS)
Programming Types of Testing.
PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Protein structure (Part 2 of 2).
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
MULTICOM – A Combination Pipeline for Protein Structure Prediction
The Protein Data Bank (PDB)
User studies. Why user studies? How do we know security and privacy solutions are really usable? Have to observe users! –you may be surprised by what.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.
Genome Annotation BCB 660 October 20, From Carson Holt.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Internal Consistency Reliability Analysis PowerPoint.
Bioinformatics Ayesha M. Khan Spring 2013.
Protein Structure Prediction and Analysis
Are the results valid? Was the validity of the included studies appraised?
Computational Chemistry. Overview What is Computational Chemistry? How does it work? Why is it useful? What are its limits? Types of Computational Chemistry.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Part II : Introduction To Protein Structure Kong Lesheng Victor Tong Joo Chuan National University of Singapore.
© 2003, Carla Ellis Experimentation in Computer Systems Research Why: “It doesn’t matter how beautiful your theory is, it doesn’t matter how smart you.
Protein Structure Prediction. Historical Perspective Protein Folding: From the Levinthal Paradox to Structure Prediction, Barry Honig, 1999 A personal.
Lecture 10 – protein structure prediction. A protein sequence.
Modelling binding site with 3DLigandSite Mark Wass
Representations of Molecular Structure: Bonds Only.
Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
Bioinformatics 2 -- Lecture 8 More TOPS diagrams Comparative modeling tutorial and strategies.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Theory and Theoretical Model PHCL 436. Outline Interrelation between theory, research and practice. Theory definition and components. Use of health theories.
南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15.
Experimental Evaluation of Learning Algorithms Part 1.
Bell Work Write the answers on the left hand side of your IAN
WHAT IS SCIENCE? WHAT IS SCIENCE? An organized way of gathering and analyzing evidence about the natural world.
An Examination of Science. What is Science Is a systematic approach for analyzing and organizing knowledge. Used by all scientists regardless of the field.
Secondary structure prediction
EBI is an Outstation of the European Molecular Biology Laboratory. A web service for the analysis of macromolecular interactions and complexes PDBe Protein.
CHAPTER 12 Descriptive, Program Evaluation, and Advanced Methods.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Multiple Mapping Method with Multiple Templates (M4T): optimizing sequence-to-structure alignments and combining unique information from multiple templates.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Rosetta Steven Bitner. Objectives Introduction How Rosetta works How to get it How to install/use it.
1 Running Experiments for Your Term Projects Dana S. Nau CMSC 722, AI Planning University of Maryland Lecture slides for Automated Planning: Theory and.
Design of a Compound Screening Collection Gavin Harper Cheminformatics, Stevenage.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Artificial Intelligence
Chapter 11 Meta-Analysis. Meta-analysis  Quantitative means of reanalyzing the results from a large number of research studies in an attempt to synthesize.
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
Assessing Intelligence. Test Construction Standardization – defining the meaning of scores by comparing to a pretested “standard group”. Reliability –
Automated Structure Prediction using Robetta in CASP11 Baker Group David Kim, Sergey Ovchinnikov, Frank DiMaio.
Bloom’s Taxonomy questions for evaluating products.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
PDBe Protein Interfaces, Surfaces and Assemblies
Assessing Intelligence
Modelling the rice proteome
Protein Structure Prediction and Protein Homology modeling
Molecular Modeling By Rashmi Shrivastava Lecturer
Rosetta: De Novo determination of protein structure
Protein structure prediction.
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Chapter 26 Comparing Counts.
Reading and effective note-making
Presentation transcript:

Modelling Workshop - Some Relevant Questions Prof. David Jones University College London Where are we now? Where are we going? Where should we be going? New ideas? Can we combine theory with experiment? What do users need? What are we trying to achieve?

Reasons to Predict Protein Structure I want a picture of my protein for my thesis –EASY – but does anyone care if it’s wrong? I want to identify possible domain boundaries –POSSIBLE I want to identify surface residues –POSSIBLE I want some clues as to the function of my protein –POSSIBLE? I want to dock drug molecules into the structure –FORGET IT! NEED TO CLEARLY DEFINE WHO WANTS THE MODELS AND WHAT THEY WANT TO DO WITH THEM!

How good is comparative modelling really?

Types of Model and Production Costs Comparative models –Automatic all atom model models per day) $0.30/model –Sophisticated (multi-template) automatic model 1 30 models per day) $2/model –Human assisted (e.g. CASP) model models per year) $200/model –“Deluxe” model – incorporating experimental data models per year) $4000-$50000/model Fold recognition models –Automatic low resolution model models per day $0.30/model –Meta model models per day $5/model Ab initio models –Automatic low resolution model (e.g. Robetta) Beowulf 5 models per day $50/model –Hand built topological model – perhaps incorporating experimental data 1 10 models per year $2000/model Docking models –Automatic: similar to ab initio folding $50/model –“Deluxe” docking: equivalent to “Deluxe” modelling $4000-$50000/model Assumptions: Scientist salary: $50000/year Server costs (inc. maintenance & support): $10000/year Beowulf costs (inc. maintenance, support and electricity): $100000/year Costs of experimental data not included Archiving costs should reflect cost of generating the models!

Methods for Quality Control CASE 1 - Single PDB file with no supporting evidence –This is clearly of limited use –Can apply standard QC methods developed for X-ray structures (e.g. PROCHECK) Many incorrect models pass these checks –Methods do exist which can generate reliability estimates (e.g. MODCHECK or ProQ) However, these methods have reliability issues of their own Can present a summary of various quality measures, but how can these be interpreted? Which quality estimators do you believe? CASE 2 – Model submitted with supporting evidence –Much more useful –What evidence? Alignments Method description Experimental data (good – but how to evaluate) Generating consistent quality measures based on a wide variety of methods and supporting evidence is going to require a lot of hard research CASE 3 – Community modelling (many models for same target) –Example: CASP experiments or meta servers –Can generate global and local reliability scores from a large population of models Cluster all structures using a structural similarity measure (which one?) Derive "fold confidence" from the relative size of the largest cluster and some measure of tightness of the cluster according to a metric (GDT/RMSD/MaxSub/TM?) Derive positional confidence in a similar way i.e. looking at the RMSD "scatter" for each equivalent position within the ensemble of superposed models Side-chain confidence could also be estimated in a similar way e.g. looking at the scatter of chi angles or side-chain RMSDs. Problem here is that looking at 100 models from 100 different methods is very different from looking at 100 models from a single method e.g. the Robetta server. Would allow a running "community quality measure" to be maintained for a particular target. So, as different models are submitted from different groups or servers, quality statistics could be compiled automatically - as more models are submitted the quality estimator will change over time Would require models to be indexed according to target Would need checks in place to stop poisoning of data (deliberate or accidental) Might need to record performance histories of methods (sensitive issue to developers)