Structural Bioinformatics Seminar Dina Schneidman

Slides:



Advertisements
Similar presentations
Protein Structure and Physics. What I will talk about today… -Outline protein synthesis and explain the basic steps involved. -Go over the Chemistry of.
Advertisements

A 3-D reference frame can be uniquely defined by the ordered vertices of a non- degenerate triangle p1p1 p2p2 p3p3.
Bioinformatics What is bioinformatics? Why bioinformatics? The major molecular biology facts Brief history of bioinformatics Typical problems of bioinformatics:
Introduction to Genetics A.Definition of “Genetics” B.Proteins C.Nucleic Acids D.The Central Dogma of Genetics E.Historical Perspective.
Structural bioinformatics
Protein Structure Alignment Human Myoglobin pdb:2mm1 Human Hemoglobin alpha-chain pdb:1jebA Sequence id: 27% Structural id: 90% Another example: G-Proteins:
Structural Bioinformatics Workshop Max Shatsky Workshop home page:
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Protein Structure, Databases and Structural Alignment
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Alignment of Flexible Molecular Structures. Motivation Proteins are flexible. One would like to align proteins modulo the flexibility. Hinge and shear.
Department of Computer Science, University of California, Santa Barbara August 11-14, 2003 CTSS: A Robust and Efficient Method for Protein Structure Alignment.
Agenda A brief introduction The MASS algorithm The pairwise case Extension to the multiple case Experimental results.
Basic Biology for CS262 OMKAR DESHPANDE (TA) Overview Structures of biomolecules How does DNA function? What is a gene? How are genes regulated?
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 18: Application-Driven Hardware Acceleration (4/4)
Finding Compact Structural Motifs Presented By: Xin Gao Authors: Jianbo Qian, Shuai Cheng Li, Dongbo Bu, Ming Li, and Jinbo Xu University of Waterloo,
FLEX* - REVIEW.
Structural Bioinformatics Workshop Max Shatsky Workshop home page:
Module 9 How to give a good research talk. What’s inside How to give a good research talk How to present a paper, a speaker’s guide.
Pointer and Shape Analysis Seminar Mooly Sagiv Schriber 317 Office Hours Thursday
Object Recognition. Geometric Task : find those rotations and translations of one of the point sets which produce “large” superimpositions of corresponding.
1 Alignment of Flexible Protein Structures Based on: FlexProt: Alignment of Flexible Protein Structures Without a Pre-definition of Hinge Regions / M.
What makes you look like your parents? Your parents passed down their DNA to you. What’s carried in your DNA that gives you your traits & characteristics?
Bioinformatics Original definition (1979 by Paulien Hogeweg): “application of information technology and computer science to the field of molecular biology”
Model Database. Scene Recognition Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988.
1 A Seminar on Memory Management Mooly Sagiv Schriber 317 Office Hours Wed
Protein Structure Alignment
Protein Structures.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
Advanced Research Methodology
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
Protein Tertiary Structure Prediction
NUS CS5247 A dimensionality reduction approach to modeling protein flexibility By, By Miguel L. Teodoro, George N. Phillips J* and Lydia E. Kavraki Rice.
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
1 Bio + Informatics AAACTGCTGACCGGTAACTGAGGCCTGCCTGCAATTGCTTAACTTGGC An Overview پرتال پرتال بيوانفورماتيك ايرانيان.
CSE 6406: Bioinformatics Algorithms. Course Outline
Topics covered Scope and applications of insilico modeling in modern biology. Comparative modeling Constructing an initial model refining the model manipulating.
SMART Teams: Students Modeling A Research Topic Jmol Training 101!
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
COT 6930 HPC and Bioinformatics Introduction to Molecular Biology Xingquan Zhu Dept. of Computer Science and Engineering.
-A cell is an organization of millions of molecules -Proper communication between these molecules is essential to the normal functioning of the cell -To.
RNA Structure and Transcription Mrs. MacWilliams Academic Biology.
Molecular visualization
Polymer Molecule made of many monomers bonded together
Lecture #3 Transcription Unit 4: Molecular Genetics.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Overview of Bioinformatics 1 Module Denis Manley..
Central dogma: the story of life RNA DNA Protein.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Transcription and Translation How genes are expressed (a.k.a. How proteins are made) Biology.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Teaching Bioinformatics Nevena Ackovska Ana Madevska - Bogdanova.
Structural alignment methods Like in sequence alignment, try to find best correspondence: –Look at atoms –A 3-dimensional problem –No a priori knowledge.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Local Flexibility Aids Protein Multiple Structure Alignment Matt Menke Bonnie Berger Lenore Cowen.
CHAPTER 10 DNA REPLICATION & PROTEIN SYNTHESIS. DNA and RNA are polymers of nucleotides – The monomer unit of DNA and RNA is the nucleotide, containing.
Find the optimal alignment ? +. Optimal Alignment Find the highest number of atoms aligned with the lowest RMSD (Root Mean Squared Deviation) Find a balance.
Proteins & Nucleic Acids
Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.
Bayesian Refinement of Protein Functional Site Matching
Transcription and Translation The How to…
Finding Functionally Significant Structural Motifs in Proteins
Protein Structures.
Protein Structure Alignment
A C G C C T T G A T C T G T C G C A T T T A G C
Protein structure prediction
Presentation transcript:

Structural Bioinformatics Seminar Dina Schneidman

Outline n Seminar requirements n Biological Introduction n How to prepare seminar lecture?

n No prior knowledge in Biology is assumed or required! n Attend ALL lectures n Prepare one of the lectures Seminar Requirements

n Learn how to study new subject from articles n Learn how to present work in Computer Science Seminar Goals

Biological Introduction

Schedule n Introduction to molecular structure. n Introduction to pattern matching. n Introduction to protein structure alignment (comparison). n Protein docking.

Small Ligands n Small organic molecules, composed of tens of atoms. n Highly flexible: can have many torsional degrees of freedom.

DNA – The code of life n DNA is a polymer. n The monomer units of DNA are nucleotides: A, T, C, G. n DNA is a normally double stranded macromolecule.

RNA n RNA is a polymer too. n The monomer units of RNA are nucleotides: A, U (instead of T), C, G. n DNA serves as the template for the synthesis of RNA.

Protein n Protein is a polymer too. n The monomer units of Protein are 20 amino acids. n Each amino acid is encoded by 3 RNA nucleotides. Hemoglobin sequence: VHLTPEEKSAVTALWGKVNVDEVGGEAL GRLLVVYPWTQRFFESFGDLSTPDAVMG NPKVKAHGKKVLGA FSDGLAHLDNLKGTFATLSELHXDKLHVD PENFRLLGNVLVCVLAHHFGKEFTPPVQ AAYQKVVAGVANA LAHKYH

Transcription mRNA Cells express different subset of the genes in different tissues and under different conditions. Gene (DNA) Translation Protein DNA RNA Protein Symptomes (Phenotype ) The Central Dogma

The central dogma DNA ---> mRNA ---> Protein {A,C,G,T} {A,C,G,U} {A,D,..Y} Guanine-Cytosine T->U Thymine-Adenine 4 letter alphabets 20 letter alphabet Sequence of amino acids Sequence of nucleic acids Sequence of amino acids

Bioinformatics - Computational Genomics n DNA mapping. n Protein or DNA sequence comparisons. n Exploration of huge textual databases. n In essence one- dimensional methods and intuition.

Structural Bioinformatics - Structural Genomics n Elucidation of the 3D structures of biomolecules. n Analysis and comparison of biomolecular structures. n Prediction of biomolecular recognition. n Handles three-dimensional (3-D) structures. n Geometric Computing. (a methodology shared by Computational Geometry, Computer Vision, Computer Graphics, Pattern Recognition etc.)

Protein Structural Comparison ApoAmicyanin - 1aaj Pseudoazurin - 1pmy

Algorithmic Solution About 1 sec. Fischer, Nussinov, Wolfson ~ 1990.

Introduction to Protein Structure

Amino acids and the peptide bond C  – first side chain carbon (except for glycine ). Cα atoms

Backbone or Secondary structure display

Wire-frame or ribbons display

Spacefill model

Geometric Representation 3-D Curve {v i }, i=1…n

Secondary structure

Hydrogen bonds.  strands and sheets

The Holy Grail - Protein Folding n From Sequence to Structure. n Relatively primitive computational folding models have proved to be NP hard even in the 2-D case.

Determination of protein structures n X-ray Crystallography n NMR (Nuclear Magnetic Resonance) n EM (Electron microscopy)

An NMR result is an ensemble of models Cystatin (1a67)

The Protein Data Bank (PDB) n International repository of 3D molecular data. n Contains x-y-z coordinates of all atoms of the molecule and additional data. n n

Why bother with structures when we have sequences ? n In evolutionary related proteins structure is much better preserved than sequence. n Structural motifs may predict similar biological function n Getting insight into protein folding. Recovering the limited (?) number of protein folds.

Applications n Classification of protein databases by structure. n Search of partial and disconnected structural patterns in large databases. n Extracting Structure information is difficult, we want to extract “new” folds.

Applications (continued) n Speed up of drug discovery. n Detection of structural pharmacophores in an ensemble of drugs (similar substructures in drugs acting on a given receptor – pharmacophore). n Comparison and detection of drug receptor active sites (structurally similar receptor cavities could bind similar drugs).

Object Recognition

Model Database

Scene

Recognition Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988.

Protein Alignment = Geometric Pattern Discovery

Protein Alignment The superimposition pattern is not known a- priori – pattern discovery. The matching recovered can be inexact. We are looking not necessarily for the largest superimposition, since other matchings may have biological meaning.

Geometric Task : find those rotations and translations of one of the point sets which produce “large” superimpositions of corresponding 3-D points. Given two configurations of points in the three dimensional space, T

Geometric Task (continued) Aspects: Object representation (points, vectors, segments) Object resemblance (distance function) Transformation (translations, rotations, scaling) -> Optimization technique

Transformations Translation Translation and Rotation Rigid Motion (Euclidian Trans.) Translation, Rotation + Scaling

Inexact Alignment. Simple case – two closely related proteins with the same number of amino acids. T Question: how to measure alignment error?

Superposition - best least squares (RMSD – Root Mean Square Deviation) Given two sets of 3-D points : P={p i }, Q={q i }, i=1,…,n; rmsd(P,Q) = √  i |p i - q i | 2 /n Find a 3-D rigid transformation T * such that: rmsd( T * (P), Q ) = min T √  i |T * p i - q i | 2 /n A closed form solution exists for this task. It can be computed in O(n) time.

Problem statement with RMSD metric. find the largest alignment, a set of matched elements and transformation, with RMSD less than ε. (belong to NP,) Given two configurations of points in the three dimensional space, and ε threshold T

Distance Functions Two point sets: A={a i } i=1…n B={b j } j=1…m Pairwise Correspondence: (a k 1,b t 1 ) (a k 2,b t 2 )… (a k N,b t N ) (1) Exact Matching: ||a k i – b t i ||=0 (2) RMSD (Root Mean Square Distance) Sqrt( Σ||a k i – b t i || 2 /N) < ε (3) Bottleneck max ||a k i – b t i || Hausdorff distance: h(A,B)=max aєA min bєB ||a– b|| H(A,B)=max( h(A,B), h(B,A))

Docking Problem: Given two molecules find their correct association: + = Receptor Ligand T Complex

Docking Problem: + = ?

Docking Problem: + = ?

How to present a paper in Computer Science

n The lecture should cover a given slot of time (~90 minutes). n Use PowerPoint slides for presentation. n Each slide usually spans 1-2 minutes. n The slides should not be overloaded. n Use mouse or pointer. n Use colors, pictures, tables and animation, but don’t exaggerate. Lecture Preparation

n Communicate the key ideas during your lecture. n Don’t get lost in technical details. n Structure your talk. n Use a top-down approach. What to say and how

n Introduction – general description of the paper. n Body - abstract of the current method. n Technical details. n Conclusions and discussion. Lecture Structure

n Most important part of your talk! n Title + short explanation about the presented topic. n Lecture outline. n Problem definition, input and output. Don’t forget to define the problem! n Problem motivation. n Introduce terminology of the field. n Short review of existing approaches (don’t forget to add references!). Introduction

n Abstract of the major results presented in the paper. n Significance of the results. n Sketch of the method. Body

n Extended presentation of the method. n Present key algorithmic ideas clearly and carefully. n Complexity of the method. n Experimental results. Technicalities

n Summarize major contributions of the work. n You can highlight points based on technical details you couldn’t discuss in introduction. n Present related open problems. n Don’t forget to thank the audience !!! n Questions. Conclusions and Discussion

n Use repetitions: “ “Tell them what you're going to tell them. Tell them. Then tell them what you told them". n Remind, don’t assume n Maintain eye contact n Control your voice and motion Getting to the Audience

Thanks!!! and Good Luck in your lectures!