TEXTAL: A System for Automated Model Building Based on Pattern Recognition Thomas R. Ioerger Department of Computer Science Texas A&M University.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pattern Recognition and Machine Learning: Kernel Methods.
Three-Stage Prediction of Protein Beta-Sheets Using Neural Networks, Alignments, and Graph Algorithms Jianlin Cheng and Pierre Baldi Institute for Genomics.
Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.
Identifying Patterns in Road Networks Topographic Data and Maps Henri Lahtinen Arto Majoinen.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
The TEXTAL System: Automated Model-Building Using Pattern Recognition Techniques Dr. Thomas R. Ioerger Department of Computer Science Texas A&M University.
CAPRA: C-Alpha Pattern Recognition Algorithm Thomas R. Ioerger Department of Computer Science Texas A&M University.
Contents Description of the big picture Theoretical background on this work The Algorithm Examples.
Nature’s Algorithms David C. Uhrig Tiffany Sharrard CS 477R – Fall 2007 Dr. George Bebis.
The TEXTAL System for Automated Model Building Thomas R. Ioerger Texas A&M University.
Segmentation Divide the image into segments. Each segment:
1 Life-and-Death Problem Solver in Go Author: Byung-Doo Lee Dept of Computer Science, Univ. of Auckland Presented by: Xiaozhen Niu.
Protein Primer. Outline n Protein representations n Structure of Proteins Structure of Proteins –Primary: amino acid sequence –Secondary:  -helices &
Clustering Color/Intensity
PcaA Mycolic acid cyclopropyl synthase (Smith&Sacchettini) original structure solved at 2.0A via MAD R-value = 0.22, R-free = residues,  fold.
Current Status and Future Directions for TEXTAL March 2, 2003 The TEXTAL Group at Texas A&M: Thomas R. Ioerger James C. Sacchettini Tod Romo Kreshna Gopal.
POSTER TEMPLATE BY: Note: in high dimensions, the data are sphered prior to distance matrix calculation. Three Groups Example;
TEXTAL - Automated Crystallographic Protein Structure Determination Using Pattern Recognition Principal Investigators: Thomas Ioerger (Dept. Computer Science)
Automated Model-Building with TEXTAL Thomas R. Ioerger Department of Computer Science Texas A&M University.
Recent Developments in TEXTAL Phenix Workshop Berkeley Sept Thomas R. Ioerger Texas A&M University.
TEXTAL Progress Basic modeling of side-chain and backbone coordinates seems to be working well. –even for experimental MAD maps, 2.5-3A –using pattern-recognition.
Information Brokerage and Delivery to Mobile Sinks HyungJune Lee, Branislav Kusy, Martin Wicke.
Jeff Howbert Introduction to Machine Learning Winter Machine Learning Feature Creation and Selection.
A Probabilistic Approach to Protein Backbone Tracing in Electron Density Maps Frank DiMaio, Jude Shavlik Computer Sciences Department George Phillips Biochemistry.
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
Protein Tertiary Structure Prediction
Spectral coordinate of node u is its location in the k -dimensional spectral space: Spectral coordinates: The i ’th component of the spectral coordinate.
ClusPro: an automated docking and discrimination method for the prediction of protein complexes Stephen R. Comeau, David W.Gatchell, Sandor Vajda, and.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
CPSC 601 Lecture Week 5 Hand Geometry. Outline: 1.Hand Geometry as Biometrics 2.Methods Used for Recognition 3.Illustrations and Examples 4.Some Useful.
FlowString: Partial Streamline Matching using Shape Invariant Similarity Measure for Exploratory Flow Visualization Jun Tao, Chaoli Wang, Ching-Kuang Shene.
Artificial Neural Networks
Kumar Srijan ( ) Syed Ahsan( ). Problem Statement To create a Neural Networks based multiclass object classifier which can do rotation,
Finding dense components in weighted graphs Paul Horn
ENT 273 Object Recognition and Feature Detection Hema C.R.
Interactive surface reconstruction on triangle meshes with subdivision surfaces Matthias Bein Fraunhofer-Institut für Graphische Datenverarbeitung IGD.
Intelligent Vision Systems ENT 496 Object Shape Identification and Representation Hema C.R. Lecture 7.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
COMPARISON OF IMAGE ANALYSIS FOR THAI HANDWRITTEN CHARACTER RECOGNITION Olarik Surinta, chatklaw Jareanpon Department of Management Information System.
1/27 Discrete and Genetic Algorithms in Bioinformatics 許聞廉 中央研究院資訊所.
1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,
* Challenge the future Graduation project 2014 Exploring Regularities for Improving Façade Reconstruction from Point Cloud Supervisors Dr. Ben Gorte Dr.
Speeding Up Relational Data Mining by Learning to Estimate Candidate Hypothesis Scores Frank DiMaio and Jude Shavlik UW-Madison Computer Sciences ICDM.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Zhongyan Liang, Sanyuan Zhang Under review for Journal of Zhejiang University Science C (Computers & Electronics) Publisher: Springer A Credible Tilt License.
Laboratory of mechatronics and robotics Institute of solid mechanics, mechatronics and biomechanics, BUT & Institute of Thermomechanics, CAS Mechatronics,
Application of Statistics and Percolation Theory Temmy Brotherson Michael Lam.
A Framework for a Fully Automatic Karyotyping System E. Poletti, E. Grisan, A. Ruggeri Department of Information Engineering, University of Padova, Italy.
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
Automated Refinement (distinct from manual building) Two TERMS: E total = E data ( w data ) + E stereochemistry E data describes the difference between.
CSCE 552 Fall 2012 AI By Jijun Tang. Homework 3 List of AI techniques in games you have played; Select one game and discuss how AI enhances its game play.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Beth Tsai Jennifer E. Walter Nancy M. Amato Department of Computer Science Texas A&M University, College Station Distributed Reconfiguration of Metamorphic.
Big data classification using neural network
Data Mining, Neural Network and Genetic Programming
Reduce the need for human intervention in protein model building
Machine Learning Feature Creation and Selection
Department of Computer Science University of York
Machine Learning in Practice Lecture 23
Not your average density
Search.
Search.
Dr. Thomas R. Ioerger Department of Computer Science
Protein structure prediction
Describing a crystal to a computer: How to represent and predict material structure with machine learning Keith T Butler.
Presentation transcript:

TEXTAL: A System for Automated Model Building Based on Pattern Recognition Thomas R. Ioerger Department of Computer Science Texas A&M University

Main Stages of TEXTAL electron density map CAPRA C-alpha chains LOOKUP model (initial coordinates) model (final coordinates) Post-processing routines Reciprocal-space refinement/ML DM Human Crystallographer (editing) build-in side-chain and main-chain atoms locally around each CA example: real-space refinement

F=

CAPRA: C-Alpha Pattern Recognition Algorithm

Overview of CAPRA goal: predict CA chains from density map not just “tracing” - more than Bones desire 1:1 correspondence, ~3.8A apart based on principles of pattern recognition –use neural net to estimate which pseudo-atoms in trace “look” closest to true C-alphas –use feature extraction to capture 3D patterns in density for input to neural net –use other heuristics for “linking” together into chains, including geometric analysis (s.s.)

CAPRA: C-Alpha Pattern-Recognition Algorithm Tracer - remove lattice points from map (lowest density first) without breaking connectivity Neural nework - for each pseudo atom, extract features, input to network, predict distances to CAs (1:10 in trace), trained on example points in real maps Linking - desire long chains, good CA predictions (not in side-chains), “structurally plausible” (e.g. linear, helical) Density Trace Neural Network Linking into C-alpha chains pseudo atoms predictions of distance to true CA map C-alpha coordinates

Steps in CAPRA

Examples of CAPRA Steps

Tracer

Neural Network

Feature Extraction characterize 3D patterns in local density must be “rotation invariant” examples: –average density in region –standard deviation, kurtosis... –distance to center of mass –moments of inertia, ratios of moments –“spoke angles” calculated over spheres of 3A and 4A radius

Forward Propagation: Backward Propagation:

Selection of Candidate C-alpha’s method: –pick candidates in order of lowest predicted distance first, –among all pseudo-atoms in trace, –as long as not closer than 2.5A notes: –no 3.8A constraint; distance can be as high as 5A –don’t rely on branch points (though often near) –picked in random order throughout map –initially covers whole map, including side-chains and disconnected regions (e.g. noise in solvent)

Linking into Chains initial connectivity of CA candidates based on the trace “over-connected” graph - branches, cycles... start by computing connected components (islands, or clusters) two strategies: –for small clusters (<=20 candidates), find longest internal chain with “good” atoms –for large clusters (>20 candidates), incrementally clip branch points using heuristics

Extracting Chains from Small Clusters exhaustive depth-first search of all paths scoring function: –length –penalty for inclusion of points with high predicted distance to true CA by neural net –preference for following secondary structure (locally straight or helical)

Secondary Structure Analysis generate all 7-mers (connected fragments of candidate CAs of length 7) evaluate “straightness” –ratio of sum of link lengths to end-to-end distance –straightness>0.8 ==> potential beta-strand evaluate “helicity” –average absolute deviation of angles and torsions along 7-mer from ideal values (95º and 50º) –helicity potential alpha-helix

Handling Large Clusters start by breaking cycles (near “bad” atoms) clip links at branch points till only linear chains remain clip the most “obvious” links first, e.g. –if other two links are part of sec. struct. –if clipped branch has “bad” atom nearby –if clipped branch is small and other 2 are large ?? ?

Example of CA-chains for CzrA fit by CAPRA

Results for MVK

Results

Availability Textal web site: – –server-side processing –free access to Capra –beta-testing of Textal To contact us,

Acknowledgements Funding –National Institutes of Health –Welch Foundation People –Dr. James C. Sacchettini –The rest of the TEXTAL Group: Tod Romo Kreshna Gopal Reetal Pai