Introduction to Bioinformatics Part 1 of 2 Jonathan Pevsner, Ph.D. M.E:440.714 September 8, 2003.

Slides:



Advertisements
Similar presentations
Zoology 305 Library Databases/Indexes Lab Goals for session: 1) Meet your librarian Kevin Messner 2) Understand.
Advertisements

UTACCEL 2010 Adventures in Biotechnology Graham Cromar.
NCBI/WHO PubMed/Hinari Course NCBI Literature Databases: PubMed Background.
Introduction to PubMed® (pubmed.gov)
Bunu databases’in icine koy lecture 5i de sonuna
NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
Searching Pubmed Database استخدام قاعدة المعلومات Pubmed د. سيناء عبد المحسن العقيل قسم الصيدلة الإكلينيكية برنامج مهارات البحث العلمي.
1.
On line (DNA and amino acid) Sequence Information Lecture 7.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
HINARI website interface, journals, and other full text resources (module 2)
NATIONAL LIBRARY OF MEDICINE The PubMed ID and Entrez, PubMed and PubMed Central Edwin Sequeira National Center for Biotechnology Information June 21,
Bioinformatics David Brodin BEA core facility MOLEKYLÄRBIOLOGI MED GENETIK – BIOINFORMATIK HT -07 Course web page:
Archives and Information Retrieval
Protein structure (Part 2 of 2).
Lecture 2.21 Retrieving Information: Using Entrez.
Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.
The Protein Data Bank (PDB)
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Accessing journals by via PubMed Note the link to find articles through HINARI/PubMed. Using this option will be covered in later in the Short Course.
PubMed/How to Search, Display, Download & (module 4.1)
Introductory Overview
On line (DNA and amino acid) Sequence Information
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Bioinformatics.
Bioinformatics Jack Min Office 3012 Office hours: TR 12:15 – 4.
Genomics, Proteomics, and Bioinformatics Biology 224 Instructor: Tom Peavy January 29, 2008.
Sequence Databases What are they and why do we need them.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Genomics, Proteomics, and Bioinformatics Biology 224 Instructor: Tom Peavy August 31, 2009.
Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner Bioinformatics M.E:
Searching PubMed® NCBI, NLM Resources, Micromedex -GSBS TTUHSC Preston Smith Library presents Rev. 08/17/14.
STONY BROOK UNIVERSITY HEALTH SCIENCES LIBRARY CHECK OUT THE SERVICES & RESOURCES AVAILABLE TO YOU.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
Part 1 – PubMed Interface, Display options, Saving, Printing, and ing results. Instructions This part of the course is a PowerPoint demonstration.
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
Copyright © 2010 Pearson Education Inc. Lecture 01 – Genetics & Genomics: An Introduction Based on Chapter 1 – Genetics: An introduction.
Introduction to Bioinformatics Introduction to Databases
Introduction to Bioinformatics Databases. DNARNAphenotypeprotein Central dogma of molecular biology A main focus of bioinformatics is to study molecular.
Accessing journals by via PubMed Note the link to find articles through HINARI/PubMed. Using this option will be covered in later in the Short Course.
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
EB3233 Bioinformatics Introduction to Bioinformatics.
Partner Publishers’ Websites From the Partner publisher services dropdown menu, click on the Elsevier Science - Science Direct website. Note that this.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
PubMed …featuring more than 20 million citations for biomedical literature from MEDLINE, life science journals, and online books.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Copyright OpenHelix. No use or reproduction without express written consent1.
Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.
Genome Bioinformatics DNA and protein Databases I.
Instructor Prof. Chandrama P. Upadhyaya 220, Life Sciences Building ,
PubMed Basics Barbara A. Wood, MLIS Calder Library University of Miami Miller School of Medicine.
REVIEW OF LITERATURE Dr Reneega Gangadhar MD Professor & Head of Pharmacology Govt. T.D Medical college Alappuzha.
Introduction to Bioinformatics
Introduction to Bioinformatics and Functional Genomics
Archives and Information Retrieval
PubMed Database Interface (Basic Course Module 4 Part A)
Mangaldai College, Mangaldai
Genomes and Their Evolution
9 Future Challenges for Bioinformatics
Lívia Vasas, PhD 2018 The Nation Library of Medicine and its databases Mozilla Firefox or Google Chrome Lívia Vasas, PhD.
PubMed.
PubMed Database Interface (Basic Course: Module 4 Part A)
PubMed Database Interface Part A (Basic Course Module 4)
Introduction to Bioinformatics
Presentation transcript:

Introduction to Bioinformatics Part 1 of 2 Jonathan Pevsner, Ph.D. M.E: September 8, 2003

Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by Jonathan Pevsner (ISBN ). Copyright © 2003 by John Wiley & Sons, Inc.John Wiley & Sons, Inc These images and materials may not be used without permission from the publisher. We welcome instructors to use these powerpoints for educational purposes, but please acknowledge the source. The book has a homepage at Including hyperlinks to the book chapters. Copyright notice

Hugh Cahill Mayra Garcia Gek Ming Sia Teaching assistants

People with very diverse backgrounds in biology People with diverse backgrounds in computer science and biostatistics Most people have a favorite gene, protein, or disease Who is taking this course?

What are the goals of the course? To provide an introduction to bioinformatics with a focus on the National Center for Biotechnology Information (NCBI) and EBI To focus on the analysis of DNA, RNA and proteins To introduce you to the analysis of genomes To combine theory and practice to help you solve research problems

Themes throughout the course Textbooks Web sites Literature references Gene/protein families Computer labs

Themes throughout the course: textbooks Several textbooks are available on reserve: Baxevanis and Ouellette David Mount Durbin et al. I have written a textbook that will appear Oct. 1, Bioinformatics and Functional Genomics. The chapters contain content, lab exercises, and quizzes that were developed in this course. We will provide chapters as handouts. Once the book becomes available, we will put copies on reserve. The book is recommended (not required).

Themes throughout the course: web sites The course website is: bioinfo_course.htm The textbook website is: This has 1000 URLs, organized by chapter The site offers a 15% discount on book purchases (although the book is not required) The principal website we will explore is NCBI:

Themes throughout the course: Literature references You are encouraged to read original source articles. Although articles are not required, they will enhance your understanding of the material. You can obtain articles through PubMed and through the WelDoc service at Welch. Some articles will be available on reserve.

Themes throughout the course: gene/protein families We will use retinol-binding protein 4 (RBP4) as a model gene/protein throughout the course. RBP4 is a member of the lipocalin family. It is a small, abundant carrier protein. We will study it in a variety of contexts including --sequence alignment --gene expression --protein structure --phylogeny --homologs in various species We will also use the Pol protein of HIV-1 as an example.

The HIV-1 pol gene encodes three proteins Aspartyl protease Reverse transcriptase Integrase PRRTIN

Themes throughout the course: computer labs There is a computer lab each Friday. This is a chance to gain practical experience using a variety of web resources. You can do the lab on your own if you wish. However, during the lab you can get help on problems, and in some cases the computers will have specialized software.

Grading 30% weekly quizzes (open book) 30% final exam November 13 40% discovery of a novel gene (by Oct. 9) and phylogenetic tree (by Nov. 13) extra credit: find a mistake in a database

What is bioinformatics? Interface of biology and computers Analysis of proteins, genes and genomes using computer algorithms and computer databases Genomics is the analysis of genomes. The tools of bioinformatics are used to make sense of the billions of base pairs of DNA that are sequenced by genomics projects.

Top ten challenges for bioinformatics [1] Precise models of where and when transcription will occur in a genome (initiation and termination) [2] Precise, predictive models of alternative RNA splicing [3] Precise models of signal transduction pathways; ability to predict cellular responses to external stimuli [4] Determining protein:DNA, protein:RNA, protein:protein recognition codes [5] Accurate ab initio protein structure prediction

Top ten challenges for bioinformatics [6] Rational design of small molecule inhibitors of proteins [7] Mechanistic understanding of protein evolution [8] Mechanistic understanding of speciation [9] Development of effective gene ontologies: systematic ways to describe gene and protein function [10] Education: development of bioinformatics curricula Source: Ewan Birney, Chris Burge, Jim Fickett

Three perspectives on bioinformatics The tree of life The organism The cell

Time of development Body region, physiology, pharmacology, pathology

DNARNAphenotypeprotein

DNARNA cDNA ESTs UniGene phenotype genomic DNA databases protein sequence databases protein

GenBankEMBLDDBJ There are three major public DNA databases The underlying raw DNA sequences are identical

GenBankEMBLDDBJ Housed at EBI European Bioinformatics Institute There are three major public DNA databases Housed at NCBI National Center for Biotechnology Information Housed in Japan

>100,000 species are represented in GenBank all species128,941 viruses6,137 bacteria31,262 archaea2,100 eukaryota87,147

The most sequenced organisms in GenBank Homo sapiens (6.9 million entries) Mus musculus (5.0 million) Zea mays (896,000) Rattus norvegicus (819,000) Gallus gallus (567,000) Arabidopsis thaliana (519,000) Danio rerio (492,000) Drosophila melanogaster (350,000) Oryza sativa (221,000)

National Center for Biotechnology Information (NCBI)

PubMed is… National Library of Medicine's search service 11 million citations in MEDLINE links to participating online journals PubMed tutorial (via “Education” on side bar)

Entrez integrates… the scientific literature; DNA and protein sequence databases; 3D protein structure data; population study data sets; assemblies of complete genomes

Entrez is a search and retrieval system that integrates NCBI databases

BLAST is… Basic Local Alignment Search Tool NCBI's sequence similarity search tool supports analysis of DNA and protein databases 80,000 searches per day

OMIM is… Online Mendelian Inheritance in Man catalog of human genes and genetic disorders edited by Dr. Victor McKusick, others at JHU

Books is… searchable resource of on-line books

TaxBrowser is… browser for the major divisions of living organisms (archaea, bacteria, eukaryota, viruses) taxonomy information such as genetic codes molecular data on extinct organisms

Structure site includes… Molecular Modelling Database (MMDB) biopolymer structures obtained from the Protein Data Bank (PDB) Cn3D (a 3D-structure viewer) vector alignment search tool (VAST)

Four questions we can answer at NCBI (and elsewhere): [1] How can I do a literature search using PubMed? [2] How can WelchWeb help? [3] How can I use Entrez to find information about a particular gene or protein? (What is an accession number?) [4] How can I find information about a particular disease?

Question #1: How can I use PubMed at NCBI to find literature information?

PubMed is the NCBI gateway to MEDLINE. MEDLINE contains bibliographic citations and author abstracts from over 4,000 journals published in the United States and in 70 foreign countries. It has 12 million records dating back to 1966.

MeSH is the acronym for "Medical Subject Headings." MeSH is the list of the vocabulary terms used for subject analysis of biomedical literature at NLM. MeSH vocabulary is used for indexing journal articles for MEDLINE. The MeSH controlled vocabulary imposes uniformity and consistency to the indexing of biomedical literature.

PubMed search strategies Try the tutorial (“education” on the left sidebar) Use boolean queries lipocalin AND disease Try using “limits” Try “LinkOut” to find external resources Obtain articles on-line via Welch Medical Library (and download pdf files):

lipocalin AND disease (35 results) lipocalin OR disease (1,300,000 results) lipocalin NOT disease (350 results) 1 AND 2 1 OR 2 1 NOT

Question #2: How can I use WelchWeb (from the Welch Medical Library) to do literature (and other) searches? WelchWeb is available at

gateway

PubMed gateway

Library catalog

Remote access to Welch services

Request literature

Browse journals

Browse databases

Basic Sciences Subject Guide RAUL (remote access) Weldoc (Inter Library Loan, and electronic delivery of articles) MyWelch (personal library portal) Welch E-Learning page (online tutorials and hand-outs) Johns Hopkins Author Publishing Tool Browse Welch E-Resources by Subject Liaison Librarian Program (every dept has a liaison librarian) Thanks to Brian Brown the Welch Medical Library liason to the basic sciences WelchWeb URLs of interest

Visit the Basic Sciences Subject guide for a long list of bioinformatics- related sites...

This lecture continues in part 2 with a discussion of more NCBI resources