Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.

Slides:



Advertisements
Similar presentations
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Advertisements

Pfam(Protein families )
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
Hidden Markov models for detecting remote protein homologies Kevin Karplus, Christian Barrett, Richard Hughey Georgia Hadjicharalambous.
Profiles for Sequences
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
© Wiley Publishing All Rights Reserved. Analyzing Protein Sequences.
Protein RNA DNA Predicting Protein Function. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245.
1 Multiple sequence alignment Lesson 4. 2 VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT. 2 HMM Architecture Markov Chains What is a Hidden Markov Model(HMM)? Components of HMM Problems of HMMs.
Profile-profile alignment using hidden Markov models Wing Wong.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Protein RNA DNA Predicting Protein Function. Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245.
PSI (position-specific iterated) BLAST The NCBI page described PSI blast as follows: “Position-Specific Iterated BLAST (PSI-BLAST) provides an automated,
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.
Similar Sequence Similar Function Charles Yan Spring 2006.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Subsystem Approach to Genome Annotation National Microbial Pathogen Data Resource Claudia Reich NCSA, University of Illinois, Urbana.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
ComPath Comparative Metabolic Pathway Analyzer Kwangmin Choi and Sun Kim School of Informatics Indiana University.
Identification of Protein Domains Eden Dror Menachem Schechter Computational Biology Seminar 2004.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Chapter 6 Profiles and Hidden Markov Models. The following approaches can also be used to identify distantly related members to a family of protein (or.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
MCB 5472 Lecture #4: Probabilistic models of homology: Psi-BLAST and HMMs February 17, 2014.
I529: Lab5 02/20/2009 AI : Kwangmin Choi. Today’s topics Gene Ontology prediction/mapping – AmiGo –
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
NCBI resources II: web-based tools and ftp resources Yanbin Yin Fall 2014 Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1.
You have worked for 2 years to isolate a gene involved in axon guidance. You sequence the cDNA clone that contains axon guidance activity. What do you.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Protein and RNA Families
Tutorial 4 Substitution matrices and PSI-BLAST 1.
Finding Patterns Gopalan Vivek Lee Teck Kwong Bernett.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Sequence Based Analysis Tutorial
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.
Step 3: Tools Database Searching
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
(H)MMs in gene prediction and similarity searches.
Protein domain/family db Secondary databases are the fruit of analyses of the sequences found in the primary sequence db Either manually curated (i.e.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Sequence similarity, BLAST alignments & multiple sequence alignments
Protein Families, Motifs & Domains.
Demo: Protein Information Resource
Sequence based searches:
Genome Annotation Continued
Sequence Based Analysis Tutorial
Sequence Based Analysis Tutorial
Annotation Presentation
Basic Local Alignment Search Tool
Presentation transcript:

Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for proposal feedback/progress checkup

Inferring protein function By genomic context………….

Inferring protein function By homology……

COGs—Clusters of Orthologous Groups (Eukaryotic versions are KOGs) Identified using all-all against all sequence comparisons on collection of complete genomes. Includes genes with orthologous and paralogous relationships COGS are grouped into large scale functional categories

Domains-- Conserved structural entities with distinctive secondary structure content and an hydrophobic core Example: Protein kinase domain Motifs-- A pattern of amino acids that is conserved across many proteins and confers a particular function on the protein. Example: Zinc finger CX 2-4 C....HX 2-4 H Looking at Parts of Proteins

PFAM—Protein Families Database Based on Hidden Markov Models (HMM) statistical probability models of multiple sequence alignments Uses a seed alignment of manually curated alignments (PFAM-A) Based on these alignments a Position Specific Scoring Matrix (PSSM) is created How to identify domains?

Position Specific Scoring Matrix (PSSM)

PFAM—Protein Families Database Searching a protein against PFAM results in an E value with meaning similar to BLAST evalues (the probability that a sequence would score that well for that domain by chance)

Other Protein Databases SMART—uses HMMs, focus is signalling and regulatory proteins (tend to be more divergent than enzymes) TIGR FAMs– TIGR curated alignments used to generated HMMs, one advantage is names should be functionally accurate for all proteins they represent PRINTS—not HMM based, uses “fingerprints” of conserved motifs Ecumenical solution—InterPro— collection of multiple databases under one umbrella

Still more kinds of BLAST PSI-BLAST– Position Specific Iterated BLAST Use to: find members of a protein family or build a custom position-specific score matrix most sensitive BLAST program, making it useful for finding very distantly related proteins or new members of a protein family 1 st round: Standard BLASTP search, then a PSSM is built with all hits with E values better than inclusion threshold 2 nd round: PSSM is used to evaluate the alignment in this search. Additional hits better than inclusion threshold are incorporated into an updated PSSM 3 rd + rounds: as second round. Search reaches convergence when no new hits are found. Can save PSSM for use in later searching

Still more kinds of BLAST PHI-BLAST– Pattern Hit Initiated BLAST Find proteins similar to the query around a given pattern Must enter both a query sequence containing the pattern AND a pattern to search on Example Pattern: (easy) FGELA (harder) [LIVMF]-G-E-x-[GAS]-[LIVM]-x(5,11)-R-[STAQ]-A-x-[LIVMA]-x-[STACV] Matching peptide: FGELALMYNTPRAATIVA

Enzyme Nomenclature 1.Oxidoreductases 2.Transferases 3.Hydrolases 4.Lyases 5.Isomerases 6.Ligases EC Numbers: A hierachical classification scheme for enzymes enzymes are named and classified according to the reactions they catalyze

KEGG– Kyoto Encyclopedia of Genes and Genomes Collection of manually drawn metabolic/cellular pathway maps, based on most up to date biochemical information Metabolic maps are strongest feature--use EC numbered enzymes as key players, allowing pathways of different genomes to be easily mapped based on their predetermined EC content Also has a growing collection of signalling/cellular process maps Putting it all together….