Hands-on: Reviewing BLAST

Slides:



Advertisements
Similar presentations
Analysis of Biomolecular Sequences 29/01/2015 Mail: Prof. Neri Niccolai Simone Gardini
Advertisements

Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search programs designed to explore sequence databases. What are.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Corrections. N-linked glycosylation (GlcNac): Look at the Swiss-Prot annotation (in a random ‘glycosylated’ entry)
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Bioinformatics Lecture 3 BCH 550 Arjumand Warsy. Retrieving Protein Sequences.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Using structure alignment tools. Structure alignment View a structural alignment of the P53 1T4F protein with Catalytic And Tetramerization Domains From.
Enzymatic Function Module (KEGG, MetaCyc, and EC Numbers)
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
Protein domains. Protein domains are structural units (average 160 aa) that share: Function Folding Evolution Proteins normally are multidomain (average.
Protein 3D-structure analysis Exercises. Practicals Find update frequency for RCSB PDB: weekly. When was the last update? How many protein structures.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Corrections. - The cacao genome is currently being sequenced - Human Chromosome 1 sequence Search ‘Genome’
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
Motif discovery and Protein Databases Tutorial 5.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Class material and homework for February 9 today’s in-class topic: selected examples of contemporary biotechnology –polymerase chain reaction (PCR) –DNA.
I NTRODUCTION TO DATABASES - P RACTICAL. Q UERY S EQUENCE >my weird new protein MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRT.
Copyright OpenHelix. No use or reproduction without express written consent1.
(PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Copyright OpenHelix. No use or reproduction without express written consent1.
Advanced SRS Course 12/12/02 -Linking -Subentries -Applications.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
Group discussion Name this protein. Protein sequence, from Aedes aegypti automated annotation >25558.m01330 MIHVQQMQVSSPVSSADGFIGQLFRVILKRQGSPDKGLICKIPPLSAARREQFDASLMFE.
Copyright OpenHelix. No use or reproduction without express written consent1.
Bushy Binary Search Tree from Ordered List. Behavior of the Algorithm Binary Search Tree Recall that tree_search is based closely on binary search. If.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
Welcome to the combined BLAST and Genome Browser Tutorial.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Bacteriophage Gene Functions
bacteria and eukaryotes
From: The Pfam protein families database
Take a REST from manual searching: PDBe, programmatically
Understanding Search Engines
CMPT 120 Topic: Searching – Part 1
Functional Annotation of Transcripts
Pfam: multiple sequence alignments and HMM-profiles of protein domains
Genome Center of Wisconsin, UW-Madison
PIR: Protein Information Resource
There are four levels of structure in proteins
Dot Plots Dot Plots provide a graphic view of the amount of similarity between two sequences. The two axes represent the two sequences. In its simplest.
Welcome to the Protein Database Tutorial
Annotation Presentation
Writing the Introduction
Basic Local Alignment Search Tool
Sequence comparison: Significance of similarity scores
Basic Local Alignment Search Tool
An Introduction to Designing and Executing Workflows with Taverna
Condor: BLAST Tuesday, Dec 7th, 10:45am
Welkin Pope SEA-PHAGES Bioinformatics Workshop, 2017
Presentation transcript:

Hands-on: Reviewing BLAST What is the appropriate annotation for UniProt entry U4LTX6? U4LTX6 UniProt entry Use “Advanced” option, select UniProtKB and 1000 hits My notes: Lecturer notes: The first thing one notices in the BLAST results is that there seems to be many entries labeled dihydrofolate reductase (DHFR). However, very few of these are from the reviewed section of UniProtKB (UniProtKB/SwissProt). It is clear that the top-hit reviewed entry, DYR_SCHPO, is quite a bit longer than the query. The alignment indicates that the query is similar only to the N-terminal portion of DYR_SCHPO. If one looks at the Pfam graphical view for DYR_SCHPO, it is obvious that the N-terminal portion of DYR_SCHPO corresponds to serine hydrolase (PF03959) rather than DHFR (PF00186). In fact, PF03959 is the same domain observed within the query entry itself. Therefore, one can conclude that the DYR_SCHPO protein is a fusion protein, and the annotation from this fusion created the mass of confusion seen in the sequence databases. Hint: DYR_SCHPO

Hands-on: Finding Literature Starting at the UniProt website, try to find a PUBLICATION describing the FUNCTION of the following: Q1R9Q7 My notes: Lecturer notes: There are numerous ways to find literature associated with a sequence, even if the entry has nothing but “Large Scale” papers cited within (these can be recognized in UniProtKB by looking at the information in the “Cited for” line for each reference; note that some papers might not be labeled as such, but could nonetheless still be non-specific to the entry in question with respect to functional information). For this exercise, try to find a literature citation that addresses the function for the protein given in the entry. The links are to UniProtKB, but consider other databases as well (such as iProClass). It is not necessary for all of you to search all. Just choose one. In five minutes we’ll get your feedback. Search PubMed with Gene Other things to try: Search PubMed with author PDB link Use synonyms for gene/protein Look at organism-specific database

Homework Answer Should the unreviewed (TrEMBL) UniProt entry Q23890--currently named ORF425--be renamed to mitochondrial ribosomal protein L11 (yes or no)? My notes: Lecturer notes: The paper in question did indeed provide the sequence of ORF425, but nothing more. Various methods can be used to determine that the paper refers to a completely different sequence with respect to functional annotation. Answer: No. Getting there (a subset of ways): Hint: Run BLAST against UniProtKB (default parameters). Examine annotations. Nothing about mrpL11. Proof: Go to UniProt website and search for entry. Click Publications, then click on PubMed link. Under Related Information -> Protein (RefSeq). Click on the one labeled ribosomal protein L11. Grab accession number and ID map to UniProtKB. Will find it maps to a different entry (Q23884), which is the actual mrpL11 in UniProtKB (same result if use BLAST instead).

Hands-on: Reasons for Rules Action slide! Hands-on: Reasons for Rules 1 - Three highly similar activities are represented in this group: thymidine phosphorylase (TP, gene deoA, EC 2.4.2.4), pyrimidine-nucleoside phosphorylase (PyNP, gene pdp, EC 2.4.2.2), and AMP phosphorylase (AMPpase, EC 2.4.2.-). 2 - PyNP is typically the only pyrimidine nucleoside phosphorylase encoded by Gram positive bacteria, while eukaryotes and proteobacteria encode two: TP, and the unrelated uridine phosphorylase. AMPpase is found in archaea. 3 - Sequence comparison between the active site residues for TP and PyNP reveals only one difference, which has been proposed to partially mediate substrate specificity. In TP, position 111 is a methionine, while the analogous position in PyNP is lysine. It should be noted that the archaeal members of this family differ in a number of respects from either of these characterized activities. The simplest rule: If Bac/Firmicute, name it “Pyrimidine-nucleoside phosphorylase”, otherwise name it “Thymidine phosphorylase” if length is under 500aa and “Putative thymidine phosphorylase” if length is over 500aa. How to determine it: 1) Read the description 2) Sort by Protein Name then by Taxon Group 3) Sort by Length then by Taxon Group