Presentation is loading. Please wait.

Presentation is loading. Please wait.

Structural Bioinformatics Seminar Dina Schneidman

Similar presentations


Presentation on theme: "Structural Bioinformatics Seminar Dina Schneidman"— Presentation transcript:

1 Structural Bioinformatics Seminar Dina Schneidman Email: duhovka@post.tau.ac.ilduhovka@post.tau.ac.il

2 Outline n Seminar requirements n Biological Introduction n How to prepare seminar lecture?

3 n No prior knowledge in Biology is assumed or required! n Attend ALL lectures n Prepare one of the lectures Seminar Requirements

4 n Learn how to study new subject from articles n Learn how to present work in Computer Science Seminar Goals

5 Biological Introduction

6 Schedule n Introduction to molecular structure. n Introduction to pattern matching. n Introduction to protein structure alignment (comparison). n Protein docking.

7 Small Ligands n Small organic molecules, composed of tens of atoms. n Highly flexible: can have many torsional degrees of freedom.

8 DNA – The code of life n DNA is a polymer. n The monomer units of DNA are nucleotides: A, T, C, G. n DNA is a normally double stranded macromolecule.

9 RNA n RNA is a polymer too. n The monomer units of RNA are nucleotides: A, U (instead of T), C, G. n DNA serves as the template for the synthesis of RNA.

10 Protein n Protein is a polymer too. n The monomer units of Protein are 20 amino acids. n Each amino acid is encoded by 3 RNA nucleotides. Hemoglobin sequence: VHLTPEEKSAVTALWGKVNVDEVGGEAL GRLLVVYPWTQRFFESFGDLSTPDAVMG NPKVKAHGKKVLGA FSDGLAHLDNLKGTFATLSELHXDKLHVD PENFRLLGNVLVCVLAHHFGKEFTPPVQ AAYQKVVAGVANA LAHKYH

11 Transcription mRNA Cells express different subset of the genes in different tissues and under different conditions. Gene (DNA) Translation Protein DNA RNA Protein Symptomes (Phenotype ) The Central Dogma

12 The central dogma DNA ---> mRNA ---> Protein {A,C,G,T} {A,C,G,U} {A,D,..Y} Guanine-Cytosine T->U Thymine-Adenine 4 letter alphabets 20 letter alphabet Sequence of amino acids Sequence of nucleic acids Sequence of amino acids

13 Bioinformatics - Computational Genomics n DNA mapping. n Protein or DNA sequence comparisons. n Exploration of huge textual databases. n In essence one- dimensional methods and intuition.

14 Structural Bioinformatics - Structural Genomics n Elucidation of the 3D structures of biomolecules. n Analysis and comparison of biomolecular structures. n Prediction of biomolecular recognition. n Handles three-dimensional (3-D) structures. n Geometric Computing. (a methodology shared by Computational Geometry, Computer Vision, Computer Graphics, Pattern Recognition etc.)

15 Protein Structural Comparison ApoAmicyanin - 1aaj Pseudoazurin - 1pmy

16 Algorithmic Solution About 1 sec. Fischer, Nussinov, Wolfson ~ 1990.

17 Introduction to Protein Structure

18 Amino acids and the peptide bond C  – first side chain carbon (except for glycine ). Cα atoms

19 Backbone or Secondary structure display

20 Wire-frame or ribbons display

21 Spacefill model

22 Geometric Representation 3-D Curve {v i }, i=1…n

23

24 Secondary structure

25 Hydrogen bonds.  strands and sheets

26

27

28 The Holy Grail - Protein Folding n From Sequence to Structure. n Relatively primitive computational folding models have proved to be NP hard even in the 2-D case.

29 Determination of protein structures n X-ray Crystallography n NMR (Nuclear Magnetic Resonance) n EM (Electron microscopy)

30 An NMR result is an ensemble of models Cystatin (1a67)

31 The Protein Data Bank (PDB) n International repository of 3D molecular data. n Contains x-y-z coordinates of all atoms of the molecule and additional data. n http://pdb.tau.ac.il n http://www.rcsb.org/pdb/

32

33

34 Why bother with structures when we have sequences ? n In evolutionary related proteins structure is much better preserved than sequence. n Structural motifs may predict similar biological function n Getting insight into protein folding. Recovering the limited (?) number of protein folds.

35 Applications n Classification of protein databases by structure. n Search of partial and disconnected structural patterns in large databases. n Extracting Structure information is difficult, we want to extract “new” folds.

36 Applications (continued) n Speed up of drug discovery. n Detection of structural pharmacophores in an ensemble of drugs (similar substructures in drugs acting on a given receptor – pharmacophore). n Comparison and detection of drug receptor active sites (structurally similar receptor cavities could bind similar drugs).

37 Object Recognition

38 Model Database

39 Scene

40 Recognition Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988.

41 Protein Alignment = Geometric Pattern Discovery

42 Protein Alignment The superimposition pattern is not known a- priori – pattern discovery. The matching recovered can be inexact. We are looking not necessarily for the largest superimposition, since other matchings may have biological meaning.

43 Geometric Task : find those rotations and translations of one of the point sets which produce “large” superimpositions of corresponding 3-D points. Given two configurations of points in the three dimensional space, T

44 Geometric Task (continued) Aspects: Object representation (points, vectors, segments) Object resemblance (distance function) Transformation (translations, rotations, scaling) -> Optimization technique

45 Transformations Translation Translation and Rotation Rigid Motion (Euclidian Trans.) Translation, Rotation + Scaling

46 Inexact Alignment. Simple case – two closely related proteins with the same number of amino acids. T Question: how to measure alignment error?

47 Superposition - best least squares (RMSD – Root Mean Square Deviation) Given two sets of 3-D points : P={p i }, Q={q i }, i=1,…,n; rmsd(P,Q) = √  i |p i - q i | 2 /n Find a 3-D rigid transformation T * such that: rmsd( T * (P), Q ) = min T √  i |T * p i - q i | 2 /n A closed form solution exists for this task. It can be computed in O(n) time.

48 Problem statement with RMSD metric. find the largest alignment, a set of matched elements and transformation, with RMSD less than ε. (belong to NP,) Given two configurations of points in the three dimensional space, and ε threshold T

49 Distance Functions Two point sets: A={a i } i=1…n B={b j } j=1…m Pairwise Correspondence: (a k 1,b t 1 ) (a k 2,b t 2 )… (a k N,b t N ) (1) Exact Matching: ||a k i – b t i ||=0 (2) RMSD (Root Mean Square Distance) Sqrt( Σ||a k i – b t i || 2 /N) < ε (3) Bottleneck max ||a k i – b t i || Hausdorff distance: h(A,B)=max aєA min bєB ||a– b|| H(A,B)=max( h(A,B), h(B,A))

50 Docking Problem: Given two molecules find their correct association: + = Receptor Ligand T Complex

51 Docking Problem: + = ?

52 Docking Problem: + = ?

53 How to present a paper in Computer Science

54 n The lecture should cover a given slot of time (~90 minutes). n Use PowerPoint slides for presentation. n Each slide usually spans 1-2 minutes. n The slides should not be overloaded. n Use mouse or pointer. n Use colors, pictures, tables and animation, but don’t exaggerate. Lecture Preparation

55 n Communicate the key ideas during your lecture. n Don’t get lost in technical details. n Structure your talk. n Use a top-down approach. What to say and how

56 n Introduction – general description of the paper. n Body - abstract of the current method. n Technical details. n Conclusions and discussion. Lecture Structure

57 n Most important part of your talk! n Title + short explanation about the presented topic. n Lecture outline. n Problem definition, input and output. Don’t forget to define the problem! n Problem motivation. n Introduce terminology of the field. n Short review of existing approaches (don’t forget to add references!). Introduction

58 n Abstract of the major results presented in the paper. n Significance of the results. n Sketch of the method. Body

59 n Extended presentation of the method. n Present key algorithmic ideas clearly and carefully. n Complexity of the method. n Experimental results. Technicalities

60 n Summarize major contributions of the work. n You can highlight points based on technical details you couldn’t discuss in introduction. n Present related open problems. n Don’t forget to thank the audience !!! n Questions. Conclusions and Discussion

61 n Use repetitions: “ “Tell them what you're going to tell them. Tell them. Then tell them what you told them". n Remind, don’t assume n Maintain eye contact n Control your voice and motion Getting to the Audience

62 Thanks!!! and Good Luck in your lectures!


Download ppt "Structural Bioinformatics Seminar Dina Schneidman"

Similar presentations


Ads by Google