Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Review of Biological Database Utilization. 2 Biological Databases We will discuss: Usefulness to the bioinformaticist Database types Search methods.

Similar presentations


Presentation on theme: "1 Review of Biological Database Utilization. 2 Biological Databases We will discuss: Usefulness to the bioinformaticist Database types Search methods."— Presentation transcript:

1 1 Review of Biological Database Utilization

2 2 Biological Databases We will discuss: Usefulness to the bioinformaticist Database types Search methods and tools http://www.sequenceanalysis.com/

3 3 Importance of the Public Databases The data provide the basis for sequence- based biology –Open access is key Supported by Human Genome Project, International Nucleotide Sequence Database Collaboration and others The amount of biological data is enormous –Biologists are dependent on computers for storing, organizing, searching, manipulating, and retrieving the data/information

4 4 Why Search Biological Databases? Generate new sequence –Is it already in bank? –Homologous sequences? Find out about the gene –Annotation –Literature

5 5 Why Search Biological Databases? Similar non-coding sequences –Repetitive elements –Regulatory regions Homologous proteins;families Identify and verify PCR priming sites

6 6 Biological Databases Types of Databases Generalized databases (DNA, proteins and carbohydrates, 3D-structures) Specialized databases (EST, STS, SNP, RNA, genomes, protein families, pathways, microarray data...)

7 7 Generalized Databases 2 Main Classes –DNA (nucleotide) The large databases are: GenBank at NCBI (US), EMBL at EBI (Europe - UK), DDBJ (Japan). –Protein – SWISS-PROT/TrEMBL (high level of annotation), PIR (protein identification resource).

8 8 Specialized Databases ESTs (Expressed Sequence Tags) STSs (Sequence-Tagged Sites) SNPs (Single Nucleotide Polymorphisms) Organismal Genomic databases: Human (GDB), mouse (MGB), yeast (SGB), fly HTGS (High Throughput Genomic Sequences RNA –tRNAs, rRNAs, small RNA’s & others

9 9 Specialized Databases Protein families –PROSITE, PRINTS, BLOCKS Pathways: metabolic, regulatory etc. –EMP, PathDB, KEGG Microarray data: expression data –4 major: GeneX, ArrayExpress, –Stanford, Gene Expression Omnibus (GEO) To find specialized databases: http://www.agr.kuleuven.ac.be/vakken/i287/bioinformatica.htm#

10 10 Types of Database Primary: archival –experimental data with some annotation (interpretation) Secondary: curated

11 11 What is annotation? Extraction, definition and interpretation of features on the genome sequence Derived by integrating computational tools and biological knowledge –for example, known and predicted genes Some databases are referred to as “annotated databases” –means that they contain sequence, comments, literature references, notes on experiments…

12 12 Curated Databases Records are added only after they have been through a curation process –checked for accuracy, additional information (annotation) –scientific judgments are made as data are cleaned up and merged Examples of curated databases: –SWISS-PROT, OMIM, RefSeq, LocusLink

13 13 Swissprot SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases. http://www.expasy.ch/sprot/

14 14 Organismal Databases Human Mouse Drosophilia C. elegans Yeast Livestock Arapidopsis Maize Plasmodium Other These databases often serve a specific research community http://tolweb.org/tree/home.pages/linksdb.html#organismal

15 15 Multi-Organism Resources www.ncbi.nlm.nih.gov www.tigr.org www.expasy.org

16 16 Biological Databases Types of Database Search Text-based database search (SRS, Entrez) Sequence-based database search (sequence similarity search) (BLAST, FASTA...) Motif-based database search (ScanProsite, eMOTIF) Structure-based database search (structure similarity search) (VAST, DALI...)

17 17 Database Search Tools Text-based :querying the annotation SRS6 at http://srs6.ebi.ac.uk/srs6bin/cgi- bin/wgetz?-page+tophttp://srs6.ebi.ac.uk/srs6bin/cgi- bin/wgetz?-page+top ENTREZ at http://www.ncbi.nlm.nih.gov/Entrez/http://www.ncbi.nlm.nih.gov/Entrez/ DBGET/LinkDB at http://www.genome.ad.jp/dbget- bin/www_bfind?linkdbhttp://www.genome.ad.jp/dbget- bin/www_bfind?linkdb

18 18 Sequence-based Searches Considerations: Should I compare DNA or protein sequences? More random matches with DNA http://www.people.virginia.edu/~rjh9u/codetabl.ht ml Protein “matches” may be more relevant DNA databases are larger

19 19 Sequence-based Searches Sensitivity vs. Selectivity Sensitivity: the ability to find true positive matches but still have false positives Selectivity: the ability to reject false positives Trade-off when choosing algorithm

20 20 Database Search Tools Sequence-Based FASTA (FASTA at EBI, UK) BLAST (Basic local alignment search tool at NCBI, USA) MPsrch (Smith-Waterman algorithm-based search at EBI, UK) EBI

21 21 More Sequence-based Tools BLAST Microbial Genomes at http://www.ncbi.nlm.nih.gov/Microb_blast/unfi nishedgenome.html http://www.ncbi.nlm.nih.gov/Microb_blast/unfi nishedgenome.html (Search finished and unfinished genomic sequences at NCBI) Genome and proteome FASTA (at EBI, UK) at http://www2.ebi.ac.uk/fasta3/genomes.htmlhttp://www2.ebi.ac.uk/fasta3/genomes.html

22 22 More Sequence-based Tools Protein search in genomes at http://searchlauncher.bcm.tmc.edu/seq- search/protein-search-genomes.html http://searchlauncher.bcm.tmc.edu/seq- search/protein-search-genomes.html (BLAST and FASTA Species-specific protein sequence searches at Baylor College of Medicine, USA) SectionSearch (FASTA or TFASTA search against predefined sections of sequence databanks at IUBIO Indiana, USA)SectionSearch NRL-3D at http://pir.georgetown.edu/pirwww/dbinfo/nrl3d.html http://pir.georgetown.edu/pirwww/dbinfo/nrl3d.html (Sequence-structure data base search at John Hopkins University, USA)

23 23 Tools to Search Special Databases for Sequences with Similar Motifs or Patterns ProfileScan uses pfscan to find similarities between a query sequence and profile library PROSITE is one such database an Expasy database (ExpertProteinAnalysisSYstem, http://www.expasy.ch/) similarities are based on fingerprints or common patterns

24 24 a block is a motif or region of similar structure no gaps are introduced a block refers to the alignment, not the individual sequences BLOCKS database is derived from PROSITE searches can be done at Fred Hutchinson Cancer Center in Seattle BLOCKS Database

25 25 3 Major Portals into the Genome Data UCSC Genome Browser at Univ. of California Santa Cruz Ensembl at European Bioinformatics Inst (EBI) –http://www.ensembl.orghttp://www.ensembl.org Entrez at NCBI –http://www.ncbi.nlm.nih.gov/Entrez/http://www.ncbi.nlm.nih.gov/Entrez/

26 26 Entrez Databases PubMed: The biomedical literature –PUBMED database contains Medline abstracts as well as links to full text articles on sites maintained by journal publishers Nucleotide sequence database (Genbank) Protein sequence database Structure: three-dimensional macromolecular structures Genome: complete genome assemblies PopSet: population study data sets

27 27 Entrez Databases OMIM: Online Mendelian Inheritance in Man Taxonomy: organisms in GenBank Books: online books ProbeSet: Gene Expression Omnibus (GEO) 3D Domains: domains from Entrez Structure

28 28 Entrez sequence searching can find sequences for a given gene or protein can download copy of sequence

29 29 NCBI BLAST NCBI offers several “flavors” of BLAST

30 30 NCBI BLAST NCBI offers several “flavors” of BLAST

31 31 The Take Home Lessons Search often, search with multiple parameters Use specialized DBs where possible, use protein sequence if appropriate There are many tools available. You must know what tools are relevant. You must know how to use available tools. Look for sites that have multiple resources Google is your best friend.


Download ppt "1 Review of Biological Database Utilization. 2 Biological Databases We will discuss: Usefulness to the bioinformaticist Database types Search methods."

Similar presentations


Ads by Google