Bioinformatics Lecture 3 BCH 550 Arjumand Warsy. Retrieving Protein Sequences.

Slides:



Advertisements
Similar presentations
Accessing electronic journals from off- campus This causes lots of headaches, but dont despair, heres how to do it! (Please note – this presentation is.
Advertisements

Learning the Basics – Lesson 1
© Wiley Publishing All Rights Reserved. How Most People Use Bioinformatics.
Bioinformatics Lecture 4 BCH 550 Arjumand Warsy. Retrieving DNA Sequences.
On line (DNA and amino acid) Sequence Information Lecture 7.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Swiss-Prot Protein Database Daniel Amoruso December 2, 2004 BI 420.
Archives and Information Retrieval
©CMBI 2005 Search tools Google, MRS, SRS. ©CMBI 2004 Search tools SRS = Sequence Retrieval System MRS = Maarten’s Retrieval System Google = Thé best generic.
1 Computing for Todays Lecture 22 Yumei Huo Fall 2006.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Welcome to the Turnitin.com Instructor Quickstart Tutorial ! This brief tour will take you through the basic steps teachers and students new to Turnitin.com.
The Poor Beginners’ Guide to Bioinformatics. What we have – and don’t have... a computer connected to the Internet (incl. Web browser) a text editor (Notepad.
Here is a list of citations the database retrieved for us. To find out more about an article, click on the “complete reference” link.
An introduction to using the AmiGO Gene Ontology tool.
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
ARCHIBUS Log On Instructions. Log Into ARCHIBUS Web Central Log In Screen 1.Open your Internet browser. 2.Enter the URL to view the ARCHIBUS Login Page.
On line (DNA and amino acid) Sequence Information
MyiLibrary® ‘Search & View’ Website Training June 8, 2010.
Lecturer: Ghadah Aldehim
Programming with Microsoft Visual Basic 2012 Chapter 12: Web Applications.
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Searching Databases. What is in the Library? The Online Library has thousands of journal articles and electronic books available for your use. Also available.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
Moodle (Course Management Systems). Assignments 1 Assignments are a refreshingly simple method for collecting student work. They are a simple and flexible.
Robinson_CIS_285_2005 HTML FORMS CIS 285 Winter_2005 Instructor: Mary Robinson.
Adding User Interactivity – Lesson 51 Adding User Interactivity Lesson 5.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
Biological Databases By : Lim Yun Ping E mail :
 The World Wide Web is a collection of electronic documents linked together like a spider web.  These documents are stored on computers called servers.
The Control Panel is the starting point when you wish to load files into Blackboard. Students cannot see this panel, unless they know your password of.
1 P6a Extra Discussion Slides Part 1. 2 Section A.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
Page 1 Non-Payroll Cost Transfer Enhancements Last update January 24, 2008 What are the some of the new enhancements of the Non-Payroll Cost Transfer?
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Basic Editing Lesson 2.
PIRSF Classification System PIRSF: Evolutionary relationships of proteins from super- to sub-families Homeomorphic Family: Homologous proteins sharing.
1 UNIT 13 The World Wide Web Lecturer: Kholood Baselm.
This tutorial will describe how to navigate the section of Gramene that provides descriptions of alleles associated with morphological, developmental,
1 EMBL Outstation — The European Bioinformatics Institute Removing redundancy in SWISS-PROT and TrEMBL.
Copyright OpenHelix. No use or reproduction without express written consent1.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Advanced SRS Course 12/12/02 -Linking -Subentries -Applications.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
1 Visalia Unified School District Principal & Area Administrator Service Request Approval Processing Using The SRTS November 16, 2005 Administrative Services.
Protein sequence databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen This also includes old material from my thesis
Copyright OpenHelix. No use or reproduction without express written consent1.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
: Information Retrieval อาจารย์ ธีภากรณ์ นฤมาณนลิณี
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
1 UNIT 13 The World Wide Web. Introduction 2 Agenda The World Wide Web Search Engines Video Streaming 3.
1 UNIT 13 The World Wide Web. Introduction 2 The World Wide Web: ▫ Commonly referred to as WWW or the Web. ▫ Is a service on the Internet. It consists.
DNA / protein sequence analysis 第九組成員: 吳宇軒 侯卜夫 朱子豪 王俊偉
XP Creating Web Pages with Microsoft Office
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Learning the Basics – Lesson 1
Tutorial for using Case It for bioinformatics analyses
BLAST.
An overview of the online edition
Annotation Presentation
Explore Evolution: Instrument for Analysis
Basic Local Alignment Search Tool (BLAST)
An Introduction to Designing and Executing Workflows with Taverna
Presentation transcript:

Bioinformatics Lecture 3 BCH 550 Arjumand Warsy

Retrieving Protein Sequences

Using other sites it is possible to retrieve relevant protein sequences from the Web to find out more about the subject at the molecular level. Using: ExPASy: A prime Internet site for protein information. The ExPASy server, is managed by Prof. Amos Bairoch, and is a world-leading resource for protein information.

The UniProt KnowledgebaseUniProt Knowledgebase UniProtKB/Swiss-Prot; a curated protein sequence database which strives to provide a high level of annotation (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases [More details / References / Linking to UniProtKB/Swiss-Prot / User manual / Recent changes / Disclaimer].More detailsReferencesLinking to UniProtKB/Swiss-ProtUser manualRecent changes Disclaimer UniProtKB/TrEMBL; a computer-annotated supplement of Swiss-Prot that contains all the translations of EMBL nucleotide sequence entries not yet integrated in Swiss- Prot.

Using ExPASy 1. Point your browser to the Swiss-Prot database home page. 2. Type dUTPase coli in the Search window, and then click the Search button. A list of three relevant protein sequences appears. Now, when you click the last DUT_ECOLI (P06968) link, a full page of information about this dUTPase protein of E. coli appears on-screen, as shown in Figures 2-12 and 2-13 (No single browser window can hold such a wealth of information, so we’ve broken up the Results page into two figures).

A typical Swiss-Prot entry is made up of four parts. Here’s how that looks for dUTPase : The top of the entry contains the entry name DUT_ECOLI on the first line and the unique identifier P06968 (called a primary accession number) on the second line. The top section offers a biochemical description of the protein, including its standard name, its international Enzyme Committee number (“E.C.” does not mean “E. coli”), as well as a couple of synonyms. The middle section offers a whole series of links to various functional classification schemes — including relevant protein domains, 3-D structures, and functional signatures. The bottom section — the sequence section — provides with the actual amino-acid sequence of the protein.

Sequence of dUTPase from E.coli >P06968|1-151 MKKIDVKILDPRVGKEFPLPTYATSGSAG LDLRACLNDAVELAPGDTTLVPTGLAIHI ADPSLAAMMLPRSGLGHKHGIVLGNLVG LIDSDYQGQLMISVWNRGQDSFTIQPGE RIAQMIFVPVVQAEFNLVEDFDATDRGE GGFGHSGRQ

The FASTA (and RAW) format FASTA is the name of a popular sequence alignment-and- database-scanning program created by W.R. Pearson and D.J. Lipman in 1988 (you can use your brand new PubMed skills to find the original article). The sequences used by FASTA have to obey the following format: >My_Sequence_Name ARCGTCRGCKINTANDRGCKINTANDCKINTANDARCGTCR GCKINTANDRGCKINTAND The line starting with > (the definition line) contains a unique identifier followed by an optional short definition. The lines that follow it contain the DNA or protein sequence (in one-letter code) until the next > character in the file indicates the beginning of a new sequence. Because FASTA is easy to parse, this format has become hugely popular — and is now the default input format for much sequence analysis software, including BLAST and CLUSTALW.

Care to be taken when using FASTA Though, that programs using FASTA formatted sequences as input are sometimes case- sensitive. Here are some pointers: Always use CAPITAL letters for the one letter codes. When using FASTA-formatted sequences on a PC, always use the TEXT option of your preferred word-processing software (that is, skip the formatting and use nothing but ASCII characters). When displaying these sequences as a word- processing document, use the Courier font for easy alignment.

Retrieving a list of related protein sequences Many questions in molecular biology require downloading a large collection of similar protein sequences, all related to the same function, rather than just one sequence. These biological questions typically include the detection of conserved functional motifs (segments of sequences that look the same in proteins with the same function), the simultaneous alignment of multiple sequences, the assessment of their variability, or phylogenetic studies — how sequences relate to each other through evolution. Build your own specialized sequence data file, on your PC is easy. This enables you to use it with other analysis programs accessible from other bioinformatic sites.

1. Point your browser to and click the Advanced Search in the UniProt Knowledgebase link. 2. In the Search line — directly above the Description window — keep the Swiss-Prot box checked but deselect the TrEMBL box. TrEMBL is a database made up of unsupervised computer translations of new DNA sequences, Swiss- Prot only includes entries validated by expert curators. Restricting the search to Swiss-Prot thus ensures — to the best of our knowledge — that all returned proteins are actual dUTPases. 3. Type dUTPase in the Description window. Be sure that you don’t type in anything else. In particular, don’t put anything in the Organism window. 4. Click the Submit Query button.