Swiss-Prot Database --- Xie, H

Slides:



Advertisements
Similar presentations
Introductory to database handling Endre Sebestyén.
Advertisements

Genome Annotation: A Protein-centric Perspective.
NeSC, 25 April 2002 Why Dont Scientists Use Databases? 1 Why Dont Scientists Use Databases? Peter Buneman Division of Informatics University of Edinburgh.
Curated Databases Peter Buneman School of Informatics University of Edinburgh.
UniProt Eric Jain Swiss Institute of Bioinformatics, Geneva W3C Workshop on Semantic Web for Life Sciences, October 2004.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Bioinformatics and Chips Bioinformatics is a very integral part of each step in a chip project. Bioinformatics is a very integral part of each step in.
Swiss-Prot Protein Database Daniel Amoruso December 2, 2004 BI 420.
Archives and Information Retrieval
Ion Channels. Cell membrane Voltage-gated Ion Channels voltage-gated because they open and close depending on the electrical potential across the membrane.
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Protein Modules An Introduction to Bioinformatics.
Toward Making Online Biological Data Machine Understandable Cui Tao Data Extraction Research Group Department of Computer Science, Brigham Young University,
Tutorial SQL Server and Matlab CIS 526. Build a New Database in SQL server.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Wellcome Trust Workshop Working with Pathogen Genomes Module 1 Artemis.
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Automatic methods for functional annotation of sequences Petri Törönen.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
CIS550 Handout 6 Fall CIS 550 Fall 2001 Handout 6 XML.
Semantic Similarity over Gene Ontology for Multi-label Protein Subcellular Localization Shibiao WAN and Man-Wai MAK The Hong Kong Polytechnic University.
Bsubt.embl complete entry in EMBL format (DNA and Features) bsubt.embl.Z bsubt.fasta complete DNA sequence in Fasta format bsubt.fasta.Z bsubt.con construct.
Essential Bioinformatics and Biocomputing Module (Tutorial) Biological Databases Lecturer: Chen Yuzong Jan 2003 TAs: Cao Zhiwei Lee Teckkwong, Bernett.
Eurotrace Hands-On The Eurotrace File System. 2 The Eurotrace file system Under MS ACCESS EUROTRACE generates several different files when you create.
Biological databases Nicky Mulder:
Database 5: protein domain/family. Protein domain/family: some definitions Most proteins have « modular » structures Estimation: ~ 3 domains / protein.
Computer Science 101 Database Concepts. Database Collection of related data Models real world “universe” Reflects changes Specific purposes and audience.
Biological Databases By : Lim Yun Ping E mail :
Fortaleza 31.VII.2006 UniProtKB: Questions and answers UniProtKB/Swiss-Prot: Questions, Answers and a few Tips.
Part I: Identifying sequences with … Speaker : S. Gaj Date
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
1 EMBL Outstation — The European Bioinformatics Institute Automatic and Reliable Functional Annotation of Proteins.
Sequence Search and Analysis SPE 1653 (703)
IT Futures Nov Peter Buneman School of Informatics, University of Edinburgh and Digital Curation Centre Curated Data.
1 PresDB Why current database technology does not support preservation Peter Buneman School of Informatics and Digital Curation Centre University of Edinburgh.
Protein and RNA Families
PROTEIN DATABASES. The ideal sequence database for computational analyses and data-mining: I t must be complete with minimal redundancy It must contain.
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
School B&I TCD Bioinformatics Proteins: structure,function,databases,formats.
Motif discovery and Protein Databases Tutorial 5.
Copyright OpenHelix. No use or reproduction without express written consent1.
SRS Introductory Course 5/12/ Temporary and permanent sessions - Simple querying - Browsing indices - Standard and extended query forms - User defined.
SQL Jan 20,2014. DBMS Stores data as records, tables etc. Accepts data and stores that data for later use Uses query languages for searching, sorting,
Copyright OpenHelix. No use or reproduction without express written consent1.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Exploring and Exploiting the Biological Maze Zoé Lacroix Arizona State University.
Advanced SRS Course 12/12/02 -Linking -Subentries -Applications.
©CMBI 2008 Databases Data must be in a certain format for software to recognize Every database can have its own format but some data elements are essential.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
1 EMBL Outstation — The European Bioinformatics Institute Mus musculus - a model organism in SWISS-PROT.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
Introduction to File Processing with PHP. Review of Course Outcomes 1. Implement file reading and writing programs using PHP. 2. Identify file access.
Catalog of human genes and genetic disorders Online version of the book Mendelian Inheritence in Man maintained by Johns Hopkins University and located.
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Managing, Storing, and Executing DTS Packages
CIS 550 Handout 6 -- XML CIS550 Handout 6.
Protein databases Henrik Nielsen
Bio/Chem-informatics
Biological Databases By: Komal Arora.
Protein Families, Motifs & Domains.
Molecular biology databases
Protein Structure Prediction and Protein Homology modeling
Genome Center of Wisconsin, UW-Madison
Predicting Active Site Residue Annotations in the Pfam Database
PIR: Protein Information Resource
BLAST.
Lesson 3 Bioinformatics Laboratory
Table 1. Occurrence of N-X-S/T motives in tryptic peptides1
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Swiss-Prot Database --- Xie, H Hongbo Xie 9/22/2018 4:59:40 PM Swiss-Prot Database --- Xie, H

Swiss-Prot Database --- Xie, H Presentation Outline Background Introduction Swiss-Prot database Features Building local Swiss-Prot database Experiments and Results 9/22/2018 4:59:40 PM Swiss-Prot Database --- Xie, H

Swiss-Prot Database --- Xie, H 9/22/2018 4:59:40 PM Swiss-Prot Database --- Xie, H

An Example of protein data Sequence information Structure information Function information Gene information Name in order to search Link to other database Paper reference Others. 9/22/2018 4:59:40 PM Swiss-Prot Database --- Xie, H

Swiss-Prot Database --- Xie, H Metadata ID 11SB_CUCMA STANDARD; PRT; 480 AA. AC P13744; DT 01-JAN-1990 (REL. 13, CREATED) DT 01-JAN-1990 (REL. 13, LAST SEQUENCE UPDATE) DT 01-NOV-1990 (REL. 16, LAST ANNOTATION UPDATE) DE 11S GLOBULIN BETA SUBUNIT PRECURSOR. OS CUCURBITA MAXIMA (PUMPKIN) (WINTER SQUASH). OC EUKARYOTA; PLANTA; EMBRYOPHYTA; ANGIOSPERMAE; DICOTYLEDONEAE; OC VIOLALES; CUCURBITACEAE. RN [1] RP SEQUENCE FROM N.A. RC STRAIN=CV. KUROKAWA AMAKURI NANKIN; RX MEDLINE; 88166744. RA HAYASHI M., MORI H., NISHIMURA M., AKAZAWA T., HARA-NISHIMURA I.; RL EUR. J. BIOCHEM. 172:627-632(1988). RN [2] RP SEQUENCE OF 22-30 AND 297-302. RA OHMIYA M., HARA I., MASTUBARA H.; RL PLANT CELL PHYSIOL. 21:157-167(1980). CC -!- FUNCTION: THIS IS A SEED STORAGE PROTEIN. CC -!- SUBUNIT: HEXAMER; EACH SUBUNIT IS COMPOSED OF AN ACIDIC AND A CC BASIC CHAIN DERIVED FROM A SINGLE PRECURSOR AND LINKED BY A CC DISULFIDE BOND. CC -!- SIMILARITY: TO OTHER 11S SEED STORAGE PROTEINS (GLOBULINS). DR EMBL; M36407; G167492; -. DR PIR; S00366; FWPU1B. DR PROSITE; PS00305; 11S_SEED_STORAGE; 1. KW SEED STORAGE PROTEIN; SIGNAL. FT SIGNAL 1 21 FT CHAIN 22 480 11S GLOBULIN BETA SUBUNIT. FT CHAIN 22 296 GAMMA CHAIN (ACIDIC). FT CHAIN 297 480 DELTA CHAIN (BASIC). FT MOD_RES 22 22 PYRROLIDONE CARBOXYLIC ACID. FT DISULFID 124 303 INTERCHAIN (GAMMA-DELTA) (POTENTIAL). FT CONFLICT 27 27 S -> E (IN REF. 2). FT CONFLICT 30 30 E -> S (IN REF. 2). SQ SEQUENCE 480 AA; 54625 MW; D515DD6E CRC32; MARSSLFTFL CLAVFINGCL SQIEQQSPWE FQGSEVWQQH RYQSPRACRL ENLRAQDPVR RAEAEAIFTE VWDQDNDEFQ CAGVNMIRHT IRPKGLLLPG FSNAPKLIFV AQGFGIRGIA EAFQIDGGLV RKLKGEDDER DRIVQVDEDF EVLLPEKDEE ERSRGRYIES ESESENGLEE TICTLRLKQN IGRSVRADVF NPRGGRISTA NYHTLPILRQ VRLSAERGVL YSNAMVAPHY TVNSHSVMYA TRGNARVQVV DNFGQSVFDG EVREGQVLMI PQNFVVIKRA SDRGFEWIAF KTNDNAITNL LAGRVSQMRM LPLGVLSNMY RISREEAQRL KYGQQEMRVL SPGRSQGRRE // Swissprot -- a curetted database (1 of ~100,000 entries) ... OS CUCURBITA MAXIMA (PUMPKIN) (WINTER SQUASH). OC EUKARYOTA; PLANTA; EMBRYOPHYTA; ANGIOSPERMAE; DICOTYLEDONEAE; OC VIOLALES; CUCURBITACEAE. RN [1] RP SEQUENCE FROM N.A. RC STRAIN=CV. KUROKAWA AMAKURI NANKIN; RX MEDLINE; 88166744. RA HAYASHI M., MORI H., NISHIMURA M., AKAZAWA T., HARA-NISHIMURA I.; RL EUR. J. BIOCHEM. 172:627-632(1988). DT 01-JAN-1990 (REL. 13, CREATED) DT 01-JAN-1990 (REL. 13, LAST SEQUENCE UPDATE) DT 01-NOV-1990 (REL. 16, LAST ANNOTATION UPDATE) Recordof history Sequence Data MARSSLFTFL CLAVFINGCL SQIEQQSPWE FQGSEVWQQH RYQSPRACRL ENLRAQDPVR RAEAEAIFTE VWDQDNDEFQ CAGVNMIRHT IRPKGLLLPG FSNAPKLIFV AQGFGIRGIA IPGCAETYQT DLRRSQSAGS AFKDQHQKIR PFREGDLLVV PAGVSHWMYN RGQSDLVLIV FADTRNVANQ IDPYLRKFYL AGRPEQVERG VEEWERSSRK GSSGEKSGNI FSGFADEFLE ... 9/22/2018 4:59:40 PM Swiss-Prot Database --- Xie, H

Swiss-Prot Database --- Xie, H SwissProt web search ID,AC,etc 143B_HUMAN 9/22/2018 4:59:40 PM Swiss-Prot Database --- Xie, H

Web search can not do everything, we definitely Find the single entry. But how about I try to find all proteins with same ProtoMap Index Web search can not do everything, we definitely Need some improvement. 9/22/2018 4:59:40 PM Swiss-Prot Database --- Xie, H

Swiss-Prot Database --- Xie, H Our Research Goal: Study the whole SwissProt database in order to find out the relationship between the protein sequence, protein structure and function. Tool: MATLAB Methodology : Building Localized Swiss-Prot database in MS SQL Server to accelerate the research 9/22/2018 4:59:40 PM Swiss-Prot Database --- Xie, H

Connection between Matlab and Database Data could be moved between Matlab and Database Import Matlab Database Export 9/22/2018 4:59:40 PM Swiss-Prot Database --- Xie, H

Local SwissProt Database Schema KW table ProtoMap Table ID KW ID ProtM_ID DR table Protein Table Prediction Table ID DR ID NO ID C Pfam Table SEQ V S LENGTH ID Pfam_ID VL2 VL3 VL4 9/22/2018 4:59:40 PM Swiss-Prot Database --- Xie, H

Search within our database Back to the old question, instead of questions like single query, we can try some queries for a summary report. Q: “List the proteins’ IDs with the ProtoMap family index = ‘840’ “ 9/22/2018 4:59:40 PM Swiss-Prot Database --- Xie, H

Type “querybuilder” in command window 9/22/2018 4:59:40 PM Swiss-Prot Database --- Xie, H

select distinct id from protomap where protfam=‘840’ Select variable name we try to collect Select certain table name Select DB name select distinct id from protomap where protfam=‘840’ Matlab varible hold the result 9/22/2018 4:59:40 PM Swiss-Prot Database --- Xie, H

Swiss-Prot Database --- Xie, H Result 9/22/2018 4:59:40 PM Swiss-Prot Database --- Xie, H

Swiss-Prot Database --- Xie, H Feedback/Questions ? 9/22/2018 4:59:40 PM Swiss-Prot Database --- Xie, H

Swiss-Prot Database --- Xie, H THANK YOU ! 9/22/2018 4:59:40 PM Swiss-Prot Database --- Xie, H