Bioinformatics (3 lectures) Why bother about proteins/prediction What is bioinformatics Protein databases Making use of database information –Predictions.

Slides:



Advertisements
Similar presentations
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Advertisements

1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Structural bioinformatics
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Heuristic alignment algorithms and cost matrices
Bayesian Classification of Protein Data Thomas Huber Computational Biology and Bioinformatics Environment ComBinE Department of Mathematics.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Thomas Blicher Center for Biological Sequence Analysis
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
1 Protein Structure Prediction Reporter: Chia-Chang Wang Date: April 1, 2005.
Thomas Huber Computational Biology and Bioinformatics Environment ComBinE Department of Mathematics The University of Queensland.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
In double vision when drunk By Thomas Huber 23 November 2001 Alexandra Headland.
Similar Sequence Similar Function Charles Yan Spring 2006.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
© Wiley Publishing All Rights Reserved. Biological Sequences.
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
An Introduction to Bioinformatics
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
COMPARATIVE or HOMOLOGY MODELING
Protein Sequence Alignment and Database Searching.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S Primary Supervisor: Prof. Heiko Schroder.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Computer Matchmaking in the Protein Sequence/Structure Universe Thomas Huber Supercomputer Facility Australian National University Canberra
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Secondary structure prediction
Construction of Substitution Matrices
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Classwork II: NJ tree using MEGA. 1.Go to CDD webpage and retrieve alignment of cd00157 in FASTA format. 2.Import this alignment into MEGA and convert.
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
November 18, 2000ICTCM 2000 Introductory Biological Sequence Analysis Through Spreadsheets Stephen J. Merrill Sandra E. Merrill Marquette University Milwaukee,
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson Let’s think!
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
DNA replication ?  DNA replication- the basis for biological inheritance, is a fundamental process occurring in all living organisms to copy their DNA.
Sequence Alignment.
Construction of Substitution matrices
Motif Search and RNA Structure Prediction Lesson 9.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Bioinformatics Overview
Protein Structure Prediction and Protein Homology modeling
Molecular Modeling By Rashmi Shrivastava Lecturer
Directed Mutagenesis and Protein Engineering
Homology Modeling.
Protein structure prediction.
Introduction to bioinformatics Lecture 5 Pair-wise sequence alignment
Presentation transcript:

Bioinformatics (3 lectures) Why bother about proteins/prediction What is bioinformatics Protein databases Making use of database information –Predictions Protein Design Thomas Huber Supercomputer Facility Australian National University

What is Bioinformatics? Handling lots of information –Concentrate knowledge public databases –Summarise knowledge in principles knowledge acquisition (data mining) –Apply principles predictions

Why do we care about Protein Structures/ Prediction? Academic curiosity? –Understanding how nature works Drug & Ligand design –Need protein structure to design molecules which inhibit/excite cure all sorts of diseases Protein design –making better proteins sensor proteins industrial catalysts (washing powder, synthetic reactions, …) Urgency of prediction –  structures are determined insignificant compared to all proteins –sequencing = fast & cheap –structure determination = hard & expensive

Protein Databases Collection of protein information –cunningly organised cross references easily accessible Different information = different databases –Literature databases (Medline) –Sequence databases (Swissprot) –Pattern (finger print) databases (Prints) –Structure databases (PDB) –Function databases (PFMP)

Prediction of Protein Structure

Sequence Search Sequences are major source of biology –access to  annotated sequences –much more to come from DNA sequencing What information to look for? –Sequence pattern many protein families have sequence “finger prints” –Similar sequences: Observation: Two proteins with sequence identity >35% adopt same structure Family of sequences  useful for structure prediction

Searching Sequence “Finger Prints” What are protein “finger prints”? –a pattern of conserved residues (often with functional importance) –unique (or highly specific) for a protein family –e.g. Carboxypeptidases finger print [LIVM]-x-[GTA]-E-S-Y-[AG]-[GS] Searching for finger prints

Sequence Alignment What is a similar sequence? –With finger prints: Yes/No –Sequence similarity (  1gozillion measures ) identity: score 1 if residues are the same score 0 if residues are different physico-chemical (e.g. positives, hydrophobicity):

Evolutionary Similarity PAM ( Probability of Accepted Mutation ) –Align sequences with >85% identity –Reconstruct phylogenetic tree –Compute mutation probabilities for 1 PAM of evolutionary distance –Calculate log odds –extrapolate matrices to desired evolutionary distance e.g. PAM250 for evolutionary distant sequence

Searching for Similar Sequences What is the difference to searching for finger prints? –Gaps and insertions: nasty complication

Finding Distant Homologues Iterative sequence alignment (  -Blast)

Predicting Secondary Structure Secondary structure (a reminder) –simple (but not sufficient) description of structure Prediction of secondary structure –relation of protein sequence to structure –statistically based prediction –pattern based prediction

Statistical Based Prediction Amino acids have preferences for secondary structure What are the odds?

Pattern Based Prediction Do amino acid pattern exist? –Yes but the code is not always obeyed Same sequence of 5 residues is sometimes in  -helix and at other times in  -strand BUT pattern have high preferences A good predictor: The helical wheel –Helices are likely on outside of proteins –I, I+3 and I+4 hydrophobic interface

Prediction with Neural Networks Not enough statistic for all pattern –for 5 residues 20 5 (3.2*10 6 ) pattern How to reduce the number of parameters? –Train a neural network to “learn” to predict secondary structure

How Accurate are the Predictions? Secondary structure prediction is not accurate –random prediction  33% correct –simple preference based predictors:  55% correct –pattern based predictors: up to  65% correct –best neural network based predictors using families of homologous sequences:  70-73% correct

Prediction of 3D Structure ab initio prediction –much too hard number of possible conformations = astronomical 3 possible rotamers per dihedral angle 2 dihedral angles per amino acid  for protein with 100 residues  possibilities

Fold recognition More moderate goal: –recognise if sequence matches a protein structure Is this useful? –  10 4 protein structures determined –<10 3 protein folds

How Fold Recognition Works Finding a match in a structure disco

What is a match? Calcululate happiness of pair –similar to energy in molecular modeling interactions between all pairs of residues –captures amino acid preferences BUT not necessarily physics

Scoring Schemes Plentiful like sequence similarity matrices –log odds (Boltzman based force fields) c.f. Boltzman’s law –optimised for discrimination

How Successful? Blind test of methods (and people) –methods always work better when one knows answer  30 proteins to predict  90 groups Best groups:  25% (partly) correct BUT –accuracy (probably) not good enough to be useful for X-ray structure determination

Protein Design The Inverse Problem –Is there a better sequence match for a structure? What is “better”? –More stable –Better function Why important? –Many industrial applications E.g. enzymes in washing powder –should be stable at high temperatures –work faster at low temperature –…

Rational Approaches For More Stable Proteins Rules of thumb (work nearly always) –Restriction of conformational space Covalent bonds between close residues –e.g. disulfide bonds Rigid residues –e.g. proline instead of glycin –Introducing favourable interactions salt bridges compensating for helix dipol

Naïve Approach Use happiness score –e.g. score from fold recognition Change sequence to increase happiness Why Naïve? Stability = difference between folded and unfolded state Aim: –Increase gap of happiness –NOT absolute happiness

Pitfalls

Combinatorial Design (Experimental) Basic Idea –Generate large number of sequence variations –Select pool for desired property Peptide libraries –systematic synthesis (e.g. all tri-peptides) –expensive –mix & code

Directed Evolution Techniques Idea Use random mutagenesis Connect phenotype (protein) and genotype (DNA/RNA) Express phenotype Select for desired property (phenotype) Recover genotype Amplify Where is genotype and phenotype connected? –In Viruses (coat protein/virus DNA) –At Ribosome

Phage Display

Ribosomal Display Advantage: –much bigger library ( copies) Problems: –How connect RNA with Ribosome? –How connect Protein to Ribosome?

Summary –Protein databases = huge collection of knowledge –Bioinformatics = making use of this knowledge –Simplest way to extract knowledge = statistical based log odds –Structure prediction = interpolation of rules (extrapolation is dangerous) –Protein design industrially important rational design not yet come to age combinatorial design = very powerful –accelerated spiral of information (hopefully knowledge)