Bioinformatics for biomedicine

Slides:



Advertisements
Similar presentations
Bioinformatics Ayesha M. Khan Spring 2013.
Advertisements

Databases (“knowledge bases”) used in genome analysis
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
On line (DNA and amino acid) Sequence Information Lecture 7.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Archives and Information Retrieval
Protein Databases EBI – European Bioinformatics Institute
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
The Cell, Central Dogma and Human Genome Project.
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
The Protein Data Bank (PDB)
Protein databases Henrik Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
UniProt - The Universal Protein Resource
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
An Introduction to Bioinformatics Molecular Biology Databases.
Joint EBI-Wellcome Trust Summer School June 2010.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
On line (DNA and amino acid) Sequence Information
Bioinformatics.
The Ensembl Gene set The “Genebuild” 21 April 2008.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski Hugh Nicholas
© Wiley Publishing All Rights Reserved. Protein and Specialized Sequence Databases.
Secondary Databases Ansuman sahoo Roll: Y Bioinformatics Class Presentation 30 Jan 2013.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Biological Databases By : Lim Yun Ping E mail :
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
1 Review of Biological Database Utilization. 2 Biological Databases We will discuss: Usefulness to the bioinformaticist Database types Search methods.
Part I: Identifying sequences with … Speaker : S. Gaj Date
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Protein and RNA Families
Biological databases an introduction By Dr. Erik Bongcam-Rudloff LCB-UU/SLU ILRI 2007 By Dr. Erik Bongcam-Rudloff LCB-UU/SLU ILRI 2007.
Biological databases Exercises. Discovery of distinct sequence databases using ensembl.
EB3233 Bioinformatics Introduction to Bioinformatics.
1 EMBL Outstation — The European Bioinformatics Institute Removing redundancy in SWISS-PROT and TrEMBL.
Copyright OpenHelix. No use or reproduction without express written consent1.
Introduction to the Gene Ontology GO Workshop 3-6 August 2010.
Algorithms for Biological Sequence Analysis Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,
Bioinformatics and Computational Biology
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Copyright OpenHelix. No use or reproduction without express written consent1.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
Protein sequence databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen This also includes old material from my thesis
Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
1 EMBL Outstation — The European Bioinformatics Institute Large-Scale Characterization of Protein Sequence Data.
Protein databases Henrik Nielsen
Archives and Information Retrieval
Biological Sequence Databases
생물정보학 Bioinformatics.
UniProt: Universal Protein Resource
Mangaldai College, Mangaldai
PIR: Protein Information Resource
Introduction to Databases
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Bioinformatics for biomedicine Course, 1 credit point Per Kraulis

Goals for this course Introduction to bioinformatics Databases Contents, background Analysis methods Algorithms, properties, background Basic search and analysis Applied to some problems of interest

Practical details http://biomedicum.ut.ee/~kraulis Time: Tuesday 14-16 Room: 1024, Biomedicum Email: per.kraulis@ut.ee Phone: 737 4052

Course design What is bioinformatics? Basic databases and tools Sequence searches: BLAST, Fasta Multiple alignments, phylogenetic trees Protein domains and 3D structure Seminar: Sequence analysis of a favourite gene Gene expression data, methods of analysis Gene and protein annotation, Gene Ontology, pathways Seminar: Further analysis of a favourite gene

Bioinformatics? “Information technology applied to the management and analysis of biological data” Attwood & Parry-Smith 1999 “Collection, archiving, organization and interpretation of biological data” Thornton 2003

Databases (DB) Sets of data Stored on computer Explicit data model What is in the DB? What is not? Well-defined data structure

Analysis methods Searches Comparison Features Keywords or free text Similarity Comparison Sequence-sequence alignment Structures Features Identification of (sites, domains) Prediction of (secondary structure)

Important web sites EBI NCBI www.ebi.ac.uk/ European Bioinformatics Institute Databases: EMBL, UniProt, Ensembl,… NCBI www.ncbi.nlm.nih.gov/ National Center for Biotechnology Information Databases: PubMed, Entrez, OMIM,…

Historical background, 1 “Atlas of Protein Sequence and Structure”, Margaret Dayhoff et al 1965 Printed book with all published sequences New editions into the 1970s Basis for Protein Information Resource (PIR), pir.georgetown.edu/ Since 2003 part of UniProt, www.uniprot.org/

Historical background, 2 SwissProt Amos Bairoch, University of Geneva Swiss Institute of Bioinformatics, http://www.isb-sib.ch/ Data from literature, carefully curated Started in 1986 Since 2003 part of UniProt http://www.uniprot.org/

DB properties Quality Comprehensiveness Redundancy Error rates, types of errors Update policy Comprehensiveness Data sources Redundancy Multiple entries for same biological item?

Consider when choosing a DB Central data type Data entry and quality Primary or derived data Maintainer status Availability

Central data types Nucleotide sequences Protein sequences EMBL, GenBank Protein sequences UniProt (PIR, Swiss-Prot) Genes, genomes Ensembl, EntrezGene 3D structure PDB (RSCB)

Data entry and quality Method of data entry Quality control Scientists deposit data Curators enter data (from literature) Quality control Consistency, redundancy, conflicts Are checks applied? Update policy Regularity Are errors removed?

Primary or derived data? Primary data Experimental data, more or less Sequences, 3D structure, expression data Derived data Obtained by analysis methods Domains, secondary structure Aggregated data Unified from several data sources

DB vs. Interface Confusion: Interface is not same as DB! Interface is the method of access Database (DB) is the data itself Same DB accessed by different interfaces (UniProt from ExPASy or EBI) One interface may be used to access different databases (SRS)

Maintainer status Large, academic, public institute EBI, NCBI Quasi-academic institute TIGR, SIB Research group or scientist Company

Availability Publicly available, no restrictions EMBL, GenBank Available, but with copyright May not be re-used in other DB UniProt Commercial Copyright May be accessible to academics at no charge

DB identifiers Identify a DB item uniquely A primary key for an item Permanent “Accession code” E.g. P01112 Use this for UniProt, EMBL, etc “Entry name” E.g. RASH_HUMAN Warning: may change! P01112 RASH_HUMAN

Accession codes and updates DB items may be merged or split Two sequence entries merged, e.g. they were actually the same protein A sequence entry split, e.g. actually from two different genes Primary accession code The new, recommended code Secondary accession code The old; kept for trackability Version numbers in some DBs

Nucleotide sequence DBs Primary EMBL, www.ebi.ac.uk/embl GenBank, www.ncbi.nlm.nih.gov/GenBank Collaboration and synchronization Data submitted directly from sequencing projects, scientists Large, with subdivisions Redundant, fragments, rather messy… http://www.ebi.ac.uk/cgi-bin/expasyfetch?AF493916 Publicly available, no restrictions

Genome DBs, 1 Nucleotides, but complete genome Usually high quality, careful annotation Ensembl www.ensembl.org Eukaryotes Automatic annotation, links to other DBs http://www.ensembl.org/Homo_sapiens/geneview?gene=ENSG00000174775 Vega http://vega.sanger.ac.uk/ Manually curated, built on top of Ensembl

Genome DBs, 2 UniGene EntrezGene (LocusLink) www.ncbi.nlm.nih.gov/UniGene “An organized view of the transcriptome” EntrezGene (LocusLink) http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dopt=full_report&list_uids=3265 Species-specific databases SGD, Saccharomyces cerevisiae, www.yeastgenome.org Berkeley Drosophila Genome Project, www.fruitfly.org

Protein sequence DBs UniProt Swiss-Prot TrEMBL www.uniprot.org UniProtKB = UniProt Knowledge Base UniProt = Swiss-Prot + PIR + TrEMBL Swiss-Prot Credible sequences Manual expert annotation TrEMBL Translations of EMBL nucleotide sequences Automatic, basic annotation Eventually integrated into Swiss-Prot

UniProt ExPASY interface (Swiss Inst Bioinfo) EBI interface http://www.expasy.org/uniprot/P01112 EBI interface http://www.ebi.uniprot.org/uniprot-srv/uniProtView.do?proteinId=RASH_HUMAN&pager.offset=0 Same data, different look; your choice

Protein sequence domains Domains, motifs, families Patterns of similar residues in a section of a protein sequence Often a functional and structural unit The presence of a domain: hints on function Pfam Protein sequence domains www.sanger.ac.uk/Software/Pfam/ Hidden Markov Models, software HMMER

Macromolecular 3D structure Protein Data Bank, PDB Oldest computer-based bio-DB (1971) http://www.rcsb.org/pdb/ 3D structures of proteins, oligonucleotides X-ray crystallography and NMR SCOP, Structural Classification http://scop.mrc-lmb.cam.ac.uk/scop/ Hierarchical scheme of classification Similar 3D structures in families

Others, 1 OMIM Online Mendelian Inheritance in Man http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM Human genes and genetic disorders http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=190020

Others, 2 GeneCards GeneLynx Aggregate database; human Links from gene to other DBs http://www.genecards.org/ GeneLynx Aggregate database; human, rat, mouse http://www.genelynx.org/