Biological Databases and Tools Sandra Sinisi / Kathryn Steiger November 25, 2002.

Slides:



Advertisements
Similar presentations
Bioinformatics Ayesha M. Khan Spring 2013.
Advertisements

Databases (“knowledge bases”) used in genome analysis
NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
On line (DNA and amino acid) Sequence Information Lecture 7.
HCS806 “Methods in Horticulture and Crop Science” Introduction to methods in Bioinformatics for plant science. David Francis (Coordinator) Ian Holford.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
BIOINFORMATICS Ency Lee.
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
GENBANK, SWISSPROT AND OTHERS As Problem Sources for CSE 549 Andriy Tovkach Genetics.
Archives and Information Retrieval
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Bioinformatics and Phylogenetic Analysis
How to use the web for bioinformatics Molecular Technologies February 11, 2005 Ethan Strauss X 1373
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
The Protein Data Bank (PDB)
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
How to use the web for bioinformatics Ethan Strauss X 1171
Bioinformatics Resources and Tools on the Web: A Primer.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
An Introduction to Bioinformatics Molecular Biology Databases.
On line (DNA and amino acid) Sequence Information
© Wiley Publishing All Rights Reserved.
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Bioinformatics for biomedicine
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Biological Databases By : Lim Yun Ping E mail :
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
NCBI Vector-Parasite Genomic Related Databases Chuong Huynh NIH/NLM/NCBI Sao Paulo, Brasil July 12, 2004
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
جلسه اول بیو انفورماتیک گردآوری:مسعود رسول آبادی
Organizing information in the post-genomic era The rise of bioinformatics.
REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology.
Function preserves sequences
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
EB3233 Bioinformatics Introduction to Bioinformatics.
Algorithms for Biological Sequence Analysis Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,
Bioinformatics and Computational Biology
Computer Storage of Sequences
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
ARGOS (A Replicable Genome InfOrmation System) for FlyBase and wFleaBase Don Gilbert, Hardik Sheth, Vasanth Singan { gilbertd, hsheth, vsingan
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Biological Information and Biological Databases Meena K Sakharkar Bioinformatics Centre National University of Singapore.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Copyright OpenHelix. No use or reproduction without express written consent1.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
What is BLAST? Basic BLAST search What is BLAST?
1 Discussion Practical 1. Features of major databases (PubMed and NCBI Protein Db) 2.
Welcome to the combined BLAST and Genome Browser Tutorial.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.
BLAST: Basic Local Alignment Search Tool Robert (R.J.) Sperazza BLAST is a software used to analyze genetic information It can identify existing genes.
Archives and Information Retrieval
Mirela Andronescu February 22, 2005 Lab 8.3 (c) 2005 CGDN.
생물정보학 Bioinformatics.
Mangaldai College, Mangaldai
Introduction to Bioinformatics
Explore Evolution: Instrument for Analysis
Problems from last section
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Biological Databases and Tools Sandra Sinisi / Kathryn Steiger November 25, 2002

Introduction More than storage Qualities of a good database  Flexible retrieval  Analysis software compatible  Data cleaning features

The need for electronic access Quantity of data has grown Data concentrated in distant locales Field is quickly developing so we need to relate new information to existing data

Types of Data Nucleotide sequences Protein sequences Protein structure Functional Secondary source information

Public Nucleotide Sequence Sites EMBL European Molecular Biology Laboratory nucleotide database from the European Bioinformatics Institute (EBI, Hinxton, UK)  The Institute for Genomic Research (Rockville, MD)

DDBJ (Mishima, Japan) DNA Data Bank of Japan  NCBI, DDBJ, and EMBL provide separate points of data submission, yet exchange this information daily, making the same database (in different formats and information systems) available to the community at-large.

SwissProt Integrated with other databases  TrEmbl Translation of nucleotide sequences into protein sequences  Protein Sequence Databases

Protein 3D Structure PDP Protein Databank  BioMagResBank  Structural Classification of Proteins  lmb.cam.ac.uk.scop/ lmb.cam.ac.uk.scop/

A good starting point … National Center for Biotechnology Information:

PubMed – gateway to biomedical research literature Entrez – search engine BLAST – most important OMIM – Online Mendelian Inheritance in Man database Taxonomy – groups all data by taxonomic classification Structure – contains the 3D structure for all nucleic acids & proteins whose shape has been determined by X-ray or NMR

Basic Local Alignment Search Tool (BLAST) program Important software tool for searching sequence databases Can be used to search databases using nucleic acid or protein query sequences Allows dynamic search of the sequence databases to find similar sequences in different organisms

Accepted input types FASTA format A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. GenBank format “DNA-centered” view of a sequence record.

BLAST Query Results of insulin protein from the Zebra fish

Blast It!!

Sequence detail

Annotate reference sequence - Genic sequences - Repetitive elements - cpG islands Identify evolutionarily related genomic sequences Align genomic sequences - Global alignment program - Local alignment program Identify conserved sequences - Percent identity and length thresholds - Homologs - Orthologs - Paralogs Visualize conserved sequences - Moving average point plot (VISTA) - Gap-free segment plot (PipMaker)

VISTA Organization SERVERS 2+ orthologous sequences Pairwise or Multiple Genome Vista 1+ to compare to whole human or mouse /

Query responses

PipMakerVISTA Input files DNA sequences Annotation of the base sequence base sequence mask file underlay files (for any sequence) embedded hyperlink file Output files Alignments in different formats (nucleotide level) Ordered and oriented sequence relative to first sequence The percent identity plotVISTA plot dot plotConserved sequences analysis of exons: splice junctions, predicted coding sequence Length ~2mb, time limited4 mb Implementation Web server and stand alone programs, finished and draft sequences Underlying alignmentlocalglobal Features to be visualizedGenes, exons, repeats, CNSs, Order and orientation of aligned sequences CpG islandsGaps in both sequences

Home A distributed computing PF project download & install client software 1-10 ns of simulation of protein and solvent Issues:  Networking (HTTP and proxies)  Security (corruption of data)  Feedback (don’t waste cycles)

How can you help? You can help our project by downloading and running our client software. Our algorithms are designed such that for every computer that joins the project, we get a commensurate increase in simulation speed.

Summary I. Programs for local and global alignments PipMaker Vista Pattern Hunter ClustalW BLAST LALIGN SSEARCH BLAT SSAHA II. Databases of Genomic Sequences NCBI TIGR Sanger EnsEMBL TAIR SGD MGD Human Genome Browser NISC Rat Genome Database FlyBase Wormbase ExoFish

III. Resources for Annotated Genomic Sequences Human Genome Browser EnsEMBL NCBI MGD FlyBase Gene Annotation/Prediction Programs GENSCAN GenomeScan Sim4 EST_Genome FGENESHhttp://genomic.sanger.ac.uk/gf.html. GrailEXP TwinScan Genie SGP IV. Databases for homology searches NCBI TIGR MGD EnsEMBL Human Genome Browser SGD

Conclusion Ultimately, the only way to familiarize yourself with these resources is to go to the various web sites and start exploring some of the links. Good tutorials are available on line.

References Modern Genetic Analysis: Integrating Genes and Genomes >Exploring Genomes: Web-Based Bioinformatics Tutorials Baxevanis, Andreas D.; B.F. Francis Ouellette. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. 2nd edition. 2001: John Wiley & Sons, Inc.

Acknowledgements Inna Dubchak LBL: Life Sciences Division – Genome Sciences Dubchak Lab Teresa Head-Gordon UCB: Dept. of Bioengineering