A Geometric Clustering Algorithm

Slides:



Advertisements
Similar presentations
Experimental Techniques in Protein Structure Determination Homayoun Valafar Department of Computer Science and Engineering, USC.
Advertisements

An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Structural bioinformatics
StreamMD Molecular Dynamics Eric Darve. MD of water molecules Cutoff is used to truncate electrostatic potential Gridding technique: water molecules are.
Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Protein-protein and Protein- ligand Docking The geometric filtering.
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
Hierarchical Clustering
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
CSE 185 Introduction to Computer Vision Pattern Recognition 2.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Introduction to Chemistry The Six Main Branches of Chemistry.
Chem. 860 Molecular Simulations with Biophysical Applications Qiang Cui Department of Chemistry and Theoretical Chemistry Institute University of Wisconsin,
6/1/ :03 AMAravamudhan WMBS NEHU Oct Intermediate Results in Computational Biology: Can they be the Realities during Biological Processes?
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Molecular Reaction Dynamics. Collision Theory of Kinetics With few exceptions, the reaction rate increases with increasing temperature temperature If.
Structural proteomics
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
1 Nuclear Magnetic Resonance Nuclear Magnetic Resonance (NMR) Applying Atomic Structure Knowledge to Chemical Analysis.
1/20 Study of Highly Accurate and Fast Protein-Ligand Docking Method Based on Molecular Dynamics Reporter: Yu Lun Kuo
Structural proteomics Handouts. Proteomics section from book already assigned.
Chapter 13 Nuclear Magnetic Resonance Spectroscopy
FlexWeb Nassim Sohaee. FlexWeb 2 Proteins The ability of proteins to change their conformation is important to their function as biological machines.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Surflex: Fully Automatic Flexible Molecular Docking Using a Molecular Similarity-Based Search Engine Ajay N. Jain UCSF Cancer Research Institute and Comprehensive.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
1 Calculation of Radial Distribution Function (g(r)) by Molecular Dynamic.
UNIT PLAN: FROM ATOMS TO POLYMERS Father Judge High School Grade 9 Physical Science Mr. A. Gutzler.
2014 Using machine learning to predict binding sites in proteins Jenelle Bray Stanford University October 10, 2014 #GHC
Course : T Computer Vision
Brain Imaging.
Semi-Supervised Clustering
Inorganic Chemistry - Atomic Structure
Constrained Clustering -Semi Supervised Clustering-
Introduction to Biophysics
Nuclear magnetic resonance NMR spectroscopy is a key analytical technique for structure elucidation of a wide range of materials from small molecules to.
Structure determination Our understanding of life at the molecular level is highly dependent on the ability to map the molecular details of individual.
Topic 3: Cluster Analysis
Nuclear Magnetic Resonance NMR Spectroscopy Nuclear Magnetic Resonance NMR Spectroscopy Shovan Sarker Biochemistry & Moleculer Biology SUST.
Predict the splitting patterns of the protons in a given molecule.
INFRA RED SPECTROSCOPY
Overview Of Clustering Techniques
Image Processing for Physical Data
Lecture 9: Entity Resolution
Hierarchical and Ensemble Clustering
Introduction to Physical Science & Scientific Method
Structure Determination: Nuclear Magnetic Resonance Spectroscopy
Virtual Screening.
NMR Spectroscopy – Part 2
Nuclear Magnetic Resonance Spectroscopy
INFRA RED SPECTROSCOPY
Nuclear Magnetic Resonance (NMR)
Hierarchical and Ensemble Clustering
BIOINFORMATICS Summary
CSE 373 Data Structures and Algorithms
AP Chem Curriculum TEST: May 2, (1st Exam--Monday Morning)
Machine Learning in Practice Lecture 27
Complementarity of Structure Ensembles in Protein-Protein Binding
Intermediate Results in Computational Biology:
Section 1-2 Levels of organization
Deep learning enhanced Markov State Models (MSMs)
Introduction to Machine learning
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Mr.Halavath Ramesh 16-MCH-001 Dept. of Chemistry Loyola College University of Madras-Chennai.
Presentation transcript:

A Geometric Clustering Algorithm and Its Applications to Structural Data Guido Muscioni Lorenzo Semeria Shuangxi zhu 10/05/2017 @UIC

Outline Context introduction Datasets description The algorithm Results From: “A Geometric Clustering Algorithm and Its Applications to Structural Data”, Shutan Xu, Shuxue Zou, and Lincong Wang

context Incredible growth of complex structured data in molecular biology Nuclear Magnetic Resonance (NMR) spectroscopy Protein-ligand Docking Molecular Dynamics (MD) simulations

Nuclear Magnetic Resonance spectroscopy NMR machine at the École polytechnique fédérale de Lausanne Based on magnetic property of atoms Determine properties of: Atoms Molecules Produces a unique firm for each molecules

Protein-ligand docking Protein modelling technique Predict the structure of a protein Image from: https://www.intechopen.com/books/protein-engineering-technology-and-application/protein-protein-and-protein-ligand-docking

Molecular dynamics simulation Based on strength between molecules Computing and updating the position based on the iteractions between molecules N-body simulation techniques

dataset NMR dataset Protein-ligand dataset Based on SiR5 Two set of intermediates: Computed on 101 residues Large amount of them 22 set of poses (retrieved by GOLD suite) 500 poses available for each initial protein-ligand complex

algorithm Cluster based algorithm RSDM Similarity measure between two structure in the same created cluster.

Steps Check for dmax Seed another cluster Reclustering based on d Create two more seeds Initializing two cluster Cluster the former dcc<dmax yes Cluster the latter no

Number of structures Ns results Metrics: Number of structures Ns VDW energy NOE violation

BASELINE Complete link Geometric clustering Average link VS K-medoid

Result-nmr Complete link Geometric clustering Average link VS K-medoid

Both show problems, the level of confidence of the results is not high Result-Protein-lingand Better identification of clusters Sometimes fails in recognizing the best cluster Geometric clustering GOLD score VS Both show problems, the level of confidence of the results is not high

Discussion Pros & What we liked Cons & What we disliked Algorithm is not clearly explained (elements are assigned to new clusters with no apparent criteria) Complexity is high (O (n^2 * log(n) ) ) A prior knowledge is needed to have a good result The datasets may not be big enough to be realistic The datasets may not represent real data# Second part in the result section does not have a good description It’s simple (it is a “standard” iterative clustering algorithm) The results are better than other clustering algorithms for these problems Not only implementing new algorithm, but It also came up with a relatively new scoring function for best cluster selection

Questions