Download presentation
Presentation is loading. Please wait.
Published byBarbara Harris Modified over 9 years ago
1
1 Review of Biological Database Utilization
2
2 Biological Databases We will discuss: Usefulness to the bioinformaticist Database types Search methods and tools http://www.sequenceanalysis.com/
3
3 Importance of the Public Databases The data provide the basis for sequence- based biology –Open access is key Supported by Human Genome Project, International Nucleotide Sequence Database Collaboration and others The amount of biological data is enormous –Biologists are dependent on computers for storing, organizing, searching, manipulating, and retrieving the data/information
4
4 Why Search Biological Databases? Generate new sequence –Is it already in bank? –Homologous sequences? Find out about the gene –Annotation –Literature
5
5 Why Search Biological Databases? Similar non-coding sequences –Repetitive elements –Regulatory regions Homologous proteins;families Identify and verify PCR priming sites
6
6 Biological Databases Types of Databases Generalized databases (DNA, proteins and carbohydrates, 3D-structures) Specialized databases (EST, STS, SNP, RNA, genomes, protein families, pathways, microarray data...)
7
7 Generalized Databases 2 Main Classes –DNA (nucleotide) The large databases are: GenBank at NCBI (US), EMBL at EBI (Europe - UK), DDBJ (Japan). –Protein – SWISS-PROT/TrEMBL (high level of annotation), PIR (protein identification resource).
8
8 Specialized Databases ESTs (Expressed Sequence Tags) STSs (Sequence-Tagged Sites) SNPs (Single Nucleotide Polymorphisms) Organismal Genomic databases: Human (GDB), mouse (MGB), yeast (SGB), fly HTGS (High Throughput Genomic Sequences RNA –tRNAs, rRNAs, small RNA’s & others
9
9 Specialized Databases Protein families –PROSITE, PRINTS, BLOCKS Pathways: metabolic, regulatory etc. –EMP, PathDB, KEGG Microarray data: expression data –4 major: GeneX, ArrayExpress, –Stanford, Gene Expression Omnibus (GEO) To find specialized databases: http://www.agr.kuleuven.ac.be/vakken/i287/bioinformatica.htm#
10
10 Types of Database Primary: archival –experimental data with some annotation (interpretation) Secondary: curated
11
11 What is annotation? Extraction, definition and interpretation of features on the genome sequence Derived by integrating computational tools and biological knowledge –for example, known and predicted genes Some databases are referred to as “annotated databases” –means that they contain sequence, comments, literature references, notes on experiments…
12
12 Curated Databases Records are added only after they have been through a curation process –checked for accuracy, additional information (annotation) –scientific judgments are made as data are cleaned up and merged Examples of curated databases: –SWISS-PROT, OMIM, RefSeq, LocusLink
13
13 Swissprot SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases. http://www.expasy.ch/sprot/
14
14 Organismal Databases Human Mouse Drosophilia C. elegans Yeast Livestock Arapidopsis Maize Plasmodium Other These databases often serve a specific research community http://tolweb.org/tree/home.pages/linksdb.html#organismal
15
15 Multi-Organism Resources www.ncbi.nlm.nih.gov www.tigr.org www.expasy.org
16
16 Biological Databases Types of Database Search Text-based database search (SRS, Entrez) Sequence-based database search (sequence similarity search) (BLAST, FASTA...) Motif-based database search (ScanProsite, eMOTIF) Structure-based database search (structure similarity search) (VAST, DALI...)
17
17 Database Search Tools Text-based :querying the annotation SRS6 at http://srs6.ebi.ac.uk/srs6bin/cgi- bin/wgetz?-page+tophttp://srs6.ebi.ac.uk/srs6bin/cgi- bin/wgetz?-page+top ENTREZ at http://www.ncbi.nlm.nih.gov/Entrez/http://www.ncbi.nlm.nih.gov/Entrez/ DBGET/LinkDB at http://www.genome.ad.jp/dbget- bin/www_bfind?linkdbhttp://www.genome.ad.jp/dbget- bin/www_bfind?linkdb
18
18 Sequence-based Searches Considerations: Should I compare DNA or protein sequences? More random matches with DNA http://www.people.virginia.edu/~rjh9u/codetabl.ht ml Protein “matches” may be more relevant DNA databases are larger
19
19 Sequence-based Searches Sensitivity vs. Selectivity Sensitivity: the ability to find true positive matches but still have false positives Selectivity: the ability to reject false positives Trade-off when choosing algorithm
20
20 Database Search Tools Sequence-Based FASTA (FASTA at EBI, UK) BLAST (Basic local alignment search tool at NCBI, USA) MPsrch (Smith-Waterman algorithm-based search at EBI, UK) EBI
21
21 More Sequence-based Tools BLAST Microbial Genomes at http://www.ncbi.nlm.nih.gov/Microb_blast/unfi nishedgenome.html http://www.ncbi.nlm.nih.gov/Microb_blast/unfi nishedgenome.html (Search finished and unfinished genomic sequences at NCBI) Genome and proteome FASTA (at EBI, UK) at http://www2.ebi.ac.uk/fasta3/genomes.htmlhttp://www2.ebi.ac.uk/fasta3/genomes.html
22
22 More Sequence-based Tools Protein search in genomes at http://searchlauncher.bcm.tmc.edu/seq- search/protein-search-genomes.html http://searchlauncher.bcm.tmc.edu/seq- search/protein-search-genomes.html (BLAST and FASTA Species-specific protein sequence searches at Baylor College of Medicine, USA) SectionSearch (FASTA or TFASTA search against predefined sections of sequence databanks at IUBIO Indiana, USA)SectionSearch NRL-3D at http://pir.georgetown.edu/pirwww/dbinfo/nrl3d.html http://pir.georgetown.edu/pirwww/dbinfo/nrl3d.html (Sequence-structure data base search at John Hopkins University, USA)
23
23 Tools to Search Special Databases for Sequences with Similar Motifs or Patterns ProfileScan uses pfscan to find similarities between a query sequence and profile library PROSITE is one such database an Expasy database (ExpertProteinAnalysisSYstem, http://www.expasy.ch/) similarities are based on fingerprints or common patterns
24
24 a block is a motif or region of similar structure no gaps are introduced a block refers to the alignment, not the individual sequences BLOCKS database is derived from PROSITE searches can be done at Fred Hutchinson Cancer Center in Seattle BLOCKS Database
25
25 3 Major Portals into the Genome Data UCSC Genome Browser at Univ. of California Santa Cruz Ensembl at European Bioinformatics Inst (EBI) –http://www.ensembl.orghttp://www.ensembl.org Entrez at NCBI –http://www.ncbi.nlm.nih.gov/Entrez/http://www.ncbi.nlm.nih.gov/Entrez/
26
26 Entrez Databases PubMed: The biomedical literature –PUBMED database contains Medline abstracts as well as links to full text articles on sites maintained by journal publishers Nucleotide sequence database (Genbank) Protein sequence database Structure: three-dimensional macromolecular structures Genome: complete genome assemblies PopSet: population study data sets
27
27 Entrez Databases OMIM: Online Mendelian Inheritance in Man Taxonomy: organisms in GenBank Books: online books ProbeSet: Gene Expression Omnibus (GEO) 3D Domains: domains from Entrez Structure
28
28 Entrez sequence searching can find sequences for a given gene or protein can download copy of sequence
29
29 NCBI BLAST NCBI offers several “flavors” of BLAST
30
30 NCBI BLAST NCBI offers several “flavors” of BLAST
31
31 The Take Home Lessons Search often, search with multiple parameters Use specialized DBs where possible, use protein sequence if appropriate There are many tools available. You must know what tools are relevant. You must know how to use available tools. Look for sites that have multiple resources Google is your best friend.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.