Identifying Functional signatures in Proteins - a computational design approach David Bernick Rohl group 16-Mar-2005.

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

Pfam(Protein families )
Basics of Comparative Genomics Dr G. P. S. Raghava.
50%, guessing 100%, all correct Accuracy = Figure 2 Predictive Accuracy of SMO algorithm using each attribute separately Prediction of catalytic residues.
Sequence Similarity Searching Class 4 March 2010.
Profile-profile alignment using hidden Markov models Wing Wong.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
Protein RNA DNA Predicting Protein Function. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245.
Bioinformatics and Phylogenetic Analysis
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
Protein Modules An Introduction to Bioinformatics.
Tutorial 2: Some problems in bioinformatics 1. Alignment pairs of sequences Database searching for sequences Multiple sequence alignment Protein classification.
Pairwise profile alignment Usman Roshan BNFO 601.
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Identification of Domains using Structural Data Niranjan Nagarajan Department of Computer Science Cornell University.
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
Genomics and bioinformatics summary 1. Gene finding: computer searches, cDNAs, ESTs, 2.Microarrays 3.Use BLAST to find homologous sequences 4.Multiple.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Protein Structures.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Identification of Protein Domains Eden Dror Menachem Schechter Computational Biology Seminar 2004.
Protein Bioinformatics Course
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Chapter 6 Profiles and Hidden Markov Models. The following approaches can also be used to identify distantly related members to a family of protein (or.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Protein Secondary Structure, Bioinformatics Tools, and Multiple Sequence Alignments Finding Similar Sequences Predicting Secondary Structures Predicting.
Protein Strucure Comparison Chapter 6,7 Orengo. Helices α-helix4-turn helix, min. 4 residues helix3-turn helix, min. 3 residues π-helix5-turn helix,
Protein and RNA Families
Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Global Annotation of the Protein Kinase Family Michael Gribskov University of California, San Diego.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Big Data Bioinformatics By: Khalifeh Al-Jadda. Is there any thing useful?!
Construction of Substitution matrices
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
HomologyIf twp proteins are homologous, they have a common fold and a common ancestor If two proteins have >25% identity across their entire length, they.
Protein families, domains and motifs in functional prediction May 31, 2016.
METHOD: Family Classification Scheme 1)Set for a model building: 67 microbial genomes with identified protein sequences (Table 1) 2)Set for a model.
Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)
Sequence similarity, BLAST alignments & multiple sequence alignments
Demo: Protein Information Resource
Basics of Comparative Genomics
Genome Annotation Continued
Protein Bioinformatics Course
Protein Structures.
Protein structure prediction.
Pairwise Sequence Alignment
Basics of Comparative Genomics
Presentation transcript:

Identifying Functional signatures in Proteins - a computational design approach David Bernick Rohl group 16-Mar-2005

The big picture what is function?  hinges  substrate/DNA/protein binding/alignment/recognition  catalytic sites what isn’t function ? (structure)  secondary structures,  fold architecture  thermodynamically required elements nature selects for function (structure is implicit) computational methods select for structure can we predict…quickly ?

Some terms pssm - position specific score matrix  a [20 x length] model of residue frequencies for every position of sequence family homolog - natural sequences evolved from a common parent morpholog - computationally derived sequence generated from a parent structure ortholog - common ancestor, derived by speciation (constrained functional divergence) paralog - common ancestor, same species (unconstrained functional divergence)

pssm from an alignment ACDEFGHIKLMNPQRSTVWY

structure ensembles Larson (2003) - Improved homology searches Pei (2003) - Homology detection and active site searches Kuhlman (2000) - Structural optimality of Natural sequences

Results - SH3 domain 11 Structures 62 additional sequences

Results - S100 domain 11 structures 30 additional sequences Ca++ loop1 not detected backbone coordinated residues Ca++ loop2 not detected insufficient homolog depth

the protocol Sequence homolog Alignment paralog structures representative structure pssmHpssmM score cogs, pfam, reverse blast blast geometric statistical CE+SCOP TaylorDoms Flexible Design fixed design

genome scale high cost step - producing pssmM precalculate pssmM for every domain

morpholog pssms genome scale Data Sources  Taylor parsed Domain database  CE all-to-all + SCOP Precompute pssms for every domain ~8000 domains 100 sequences~90% diversity 1000 sequences~99% diversity ~4-8 wks, 70p cluster for initial set

scoring compare PSSMh to PSSMm PSSMm contains only structure signal PSSMh contains both function and structure each position represents a count-normalized position in 20-space (H or M) R-position -- average aa position RH and RM define 20 space vectors ‘function vector’ ‘structure vector’

next steps complete this set of domains - verification full domain pssmM generation

acknowledgements Carol Rohl Kevin Karplus Craig Lowe Rohl group HP