1 Review of Biological Database Utilization. 2 Biological Databases We will discuss: Usefulness to the bioinformaticist Database types Search methods.

Slides:



Advertisements
Similar presentations
NCBI/WHO PubMed/Hinari Course NCBI Literature Databases: PubMed Background.
Advertisements

Databases (“knowledge bases”) used in genome analysis
Beyond PubMed and BLAST: Exploring NCBI tools and databases Kate Bronstad David Flynn Alumni Medical Library.
NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
COT 6930 HPC and Bioinformatics Bioinformatics Resources and Databases Xingquan Zhu Dept. of Computer Science and Engineering.
On line (DNA and amino acid) Sequence Information Lecture 7.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
GENBANK, SWISSPROT AND OTHERS As Problem Sources for CSE 549 Andriy Tovkach Genetics.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Archives and Information Retrieval
Biological databases.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
The Cell, Central Dogma and Human Genome Project.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
An Introduction to Bioinformatics Molecular Biology Databases.
Introductory Overview
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
On line (DNA and amino acid) Sequence Information
Bioinformatics.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Bioinformatics for biomedicine
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Sequence Databases What are they and why do we need them.
Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski Hugh Nicholas
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Biological Databases By : Lim Yun Ping E mail :
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
Sequence Retrieving, Manipulation and Management BIOINFORMATICS Lecture 3.
Part I: Identifying sequences with … Speaker : S. Gaj Date
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology.
NCBI Literature Databases: PubMed
Sequencing the World of Possibilities for Energy & Environment MGM workshop. 19 Oct 2010 Information Sources for Genomics Konstantinos Mavrommatis Genome.
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Bioinformatics and Computational Biology
Computer Storage of Sequences
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
NCBI: something old, something new. What is NCBI? Create automated systems for knowledge about molecular biology, biochemistry, and genetics. Perform.
Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
NCBI PubMed NCBI Literature Databases: PubMed Session #1, April 28, 2005 Session #2, April 29, 2005 Ho Chi Minh City, VietNam.
Entrez, dbSNP, GEO, OMIM & LinkOut JanPlan Entrez Distributed by NCBI in 1991 on CD-ROM Included linked nodes: GenBank & PDB Translated GenBank,
Introduction to Genes and Genomes with Ensembl
Biological databases: Collection, storage and maintenance
Archives and Information Retrieval
생물정보학 Bioinformatics.
Mangaldai College, Mangaldai
Problems from last section
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

1 Review of Biological Database Utilization

2 Biological Databases We will discuss: Usefulness to the bioinformaticist Database types Search methods and tools

3 Importance of the Public Databases The data provide the basis for sequence- based biology –Open access is key Supported by Human Genome Project, International Nucleotide Sequence Database Collaboration and others The amount of biological data is enormous –Biologists are dependent on computers for storing, organizing, searching, manipulating, and retrieving the data/information

4 Why Search Biological Databases? Generate new sequence –Is it already in bank? –Homologous sequences? Find out about the gene –Annotation –Literature

5 Why Search Biological Databases? Similar non-coding sequences –Repetitive elements –Regulatory regions Homologous proteins;families Identify and verify PCR priming sites

6 Biological Databases Types of Databases Generalized databases (DNA, proteins and carbohydrates, 3D-structures) Specialized databases (EST, STS, SNP, RNA, genomes, protein families, pathways, microarray data...)

7 Generalized Databases 2 Main Classes –DNA (nucleotide) The large databases are: GenBank at NCBI (US), EMBL at EBI (Europe - UK), DDBJ (Japan). –Protein – SWISS-PROT/TrEMBL (high level of annotation), PIR (protein identification resource).

8 Specialized Databases ESTs (Expressed Sequence Tags) STSs (Sequence-Tagged Sites) SNPs (Single Nucleotide Polymorphisms) Organismal Genomic databases: Human (GDB), mouse (MGB), yeast (SGB), fly HTGS (High Throughput Genomic Sequences RNA –tRNAs, rRNAs, small RNA’s & others

9 Specialized Databases Protein families –PROSITE, PRINTS, BLOCKS Pathways: metabolic, regulatory etc. –EMP, PathDB, KEGG Microarray data: expression data –4 major: GeneX, ArrayExpress, –Stanford, Gene Expression Omnibus (GEO) To find specialized databases:

10 Types of Database Primary: archival –experimental data with some annotation (interpretation) Secondary: curated

11 What is annotation? Extraction, definition and interpretation of features on the genome sequence Derived by integrating computational tools and biological knowledge –for example, known and predicted genes Some databases are referred to as “annotated databases” –means that they contain sequence, comments, literature references, notes on experiments…

12 Curated Databases Records are added only after they have been through a curation process –checked for accuracy, additional information (annotation) –scientific judgments are made as data are cleaned up and merged Examples of curated databases: –SWISS-PROT, OMIM, RefSeq, LocusLink

13 Swissprot SWISS-PROT is a curated protein sequence database which strives to provide a high level of annotations (such as the description of the function of a protein, its domains structure, post-translational modifications, variants, etc.), a minimal level of redundancy and high level of integration with other databases.

14 Organismal Databases Human Mouse Drosophilia C. elegans Yeast Livestock Arapidopsis Maize Plasmodium Other These databases often serve a specific research community

15 Multi-Organism Resources

16 Biological Databases Types of Database Search Text-based database search (SRS, Entrez) Sequence-based database search (sequence similarity search) (BLAST, FASTA...) Motif-based database search (ScanProsite, eMOTIF) Structure-based database search (structure similarity search) (VAST, DALI...)

17 Database Search Tools Text-based :querying the annotation SRS6 at bin/wgetz?-page+tophttp://srs6.ebi.ac.uk/srs6bin/cgi- bin/wgetz?-page+top ENTREZ at DBGET/LinkDB at bin/www_bfind?linkdbhttp:// bin/www_bfind?linkdb

18 Sequence-based Searches Considerations: Should I compare DNA or protein sequences? More random matches with DNA ml Protein “matches” may be more relevant DNA databases are larger

19 Sequence-based Searches Sensitivity vs. Selectivity Sensitivity: the ability to find true positive matches but still have false positives Selectivity: the ability to reject false positives Trade-off when choosing algorithm

20 Database Search Tools Sequence-Based FASTA (FASTA at EBI, UK) BLAST (Basic local alignment search tool at NCBI, USA) MPsrch (Smith-Waterman algorithm-based search at EBI, UK) EBI

21 More Sequence-based Tools BLAST Microbial Genomes at nishedgenome.html nishedgenome.html (Search finished and unfinished genomic sequences at NCBI) Genome and proteome FASTA (at EBI, UK) at

22 More Sequence-based Tools Protein search in genomes at search/protein-search-genomes.html search/protein-search-genomes.html (BLAST and FASTA Species-specific protein sequence searches at Baylor College of Medicine, USA) SectionSearch (FASTA or TFASTA search against predefined sections of sequence databanks at IUBIO Indiana, USA)SectionSearch NRL-3D at (Sequence-structure data base search at John Hopkins University, USA)

23 Tools to Search Special Databases for Sequences with Similar Motifs or Patterns ProfileScan uses pfscan to find similarities between a query sequence and profile library PROSITE is one such database an Expasy database (ExpertProteinAnalysisSYstem, similarities are based on fingerprints or common patterns

24 a block is a motif or region of similar structure no gaps are introduced a block refers to the alignment, not the individual sequences BLOCKS database is derived from PROSITE searches can be done at Fred Hutchinson Cancer Center in Seattle BLOCKS Database

25 3 Major Portals into the Genome Data UCSC Genome Browser at Univ. of California Santa Cruz Ensembl at European Bioinformatics Inst (EBI) – Entrez at NCBI –

26 Entrez Databases PubMed: The biomedical literature –PUBMED database contains Medline abstracts as well as links to full text articles on sites maintained by journal publishers Nucleotide sequence database (Genbank) Protein sequence database Structure: three-dimensional macromolecular structures Genome: complete genome assemblies PopSet: population study data sets

27 Entrez Databases OMIM: Online Mendelian Inheritance in Man Taxonomy: organisms in GenBank Books: online books ProbeSet: Gene Expression Omnibus (GEO) 3D Domains: domains from Entrez Structure

28 Entrez sequence searching can find sequences for a given gene or protein can download copy of sequence

29 NCBI BLAST NCBI offers several “flavors” of BLAST

30 NCBI BLAST NCBI offers several “flavors” of BLAST

31 The Take Home Lessons Search often, search with multiple parameters Use specialized DBs where possible, use protein sequence if appropriate There are many tools available. You must know what tools are relevant. You must know how to use available tools. Look for sites that have multiple resources Google is your best friend.