Genbank Founded in 1982 at the Los Alamos National Laboratory

Slides:



Advertisements
Similar presentations
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Advertisements

Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
MICB 405 Bioinformatics Mini-Lab #1 – NCBI’s Entrez Dr. Joanne Fox We gratefully acknowledge the funding for the development of these.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
On line (DNA and amino acid) Sequence Information Lecture 7.
Lecture 7 Types of databases.
1 Exercise: BIOINFORMATIC DATABASES and BLAST. 2 Outline  NCBI and Entrez  Pubmed  Google scholar  RefSeq  Swissprot  Fasta format  PDB: Protein.
Heuristic alignment algorithms and cost matrices
Sequence analysis course
Scoring Matrices June 19, 2008 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
It & Health 2009 Summary Thomas Nordahl Petersen.
It & Health 2010 Summary Thomas Nordahl Petersen.
TRANSLATION The process of converting the information stored in mRNA into a protein is called translation mRNA carries information from a gene to a structure.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence Alignments Revisited
Alignment IV BLOSUM Matrices. 2 BLOSUM matrices Blocks Substitution Matrix. Scores for each position are obtained frequencies of substitutions in blocks.
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
Bioinformatics Gene Introduction Oct NTUST.
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
1 BLAST: Basic Local Alignment Search Tool Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology.
On line (DNA and amino acid) Sequence Information
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
MCB 5472 Lecture 2 Feb 3/14 (1) GenBank continued (2) Primer: Genome sequencing and assembly.
An Introduction to Bioinformatics
Substitution Numbers and Scoring Matrices
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
Founded in 1982 at the Los Alamos National Laboratory Initially managed at Stanford in conjunction with the BIOSCI/Bionet news groups transition.
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Predicting protein degradation rates Karen Page. The central dogma DNA RNA protein Transcription Translation The expression of genetic information stored.
Pairwise Sequence Analysis-III
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Construction of Substitution matrices
Step 3: Tools Database Searching
Alignment methods April 17, 2007 Quiz 1—Question on databases Learning objectives- Understand difference between identity, similarity and homology. Understand.
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Copyright OpenHelix. No use or reproduction without express written consent1.
GENBANK FILE FORMAT LOCUS –LOCUS NAME Is usually the first letter of the genus and species name, followed by the accession number –SEQUENCE LENGTH Number.
Introduction to Bioinformatics Summary Thomas Nordahl Petersen.
What is sequencing? Video: WlxM (Illumina video) WlxM.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
Tutorial 4 Comparing Protein Sequences Intro to Bioinformatics 1.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Computer Applications and Bioinformatics
Using BLAST to Identify Species from Proteins
Bioinformatics Overview
Command Line For windows an “ok” ssh program is putty.
"Nothing in biology makes sense except in the light of evolution"
Command Line For windows an “ok” ssh program is putty.
HISTORY OF BIOORGANIC CHEMISTRY
Pipelines for Computational Analysis (Bioinformatics)
Searching the NCBI Databases
Basic Local Alignment Search Tool
"Nothing in biology makes sense except in the light of evolution"
Explore Evolution: Instrument for Analysis
EVIDENCE FOR EVOLUTION
Alignment IV BLOSUM Matrices
BLAST Slides adapted & edited from a set by
How to search NCBI.
BLAST Slides adapted & edited from a set by
Introduction to bioinformatics Lecture 5 Pair-wise sequence alignment
Presentation transcript:

Genbank Founded in 1982 at the Los Alamos National Laboratory Initially managed at Stanford in conjunction with the BIOSCI/Bionet news groups 1989-92 transition to the NCBI on the east coast One precursor was Margaret Dayhoff’s Atlas of Protein Sequence and Structure In 1987 genbank fit onto a few 360 KB floppy disks. Genbank uses a flat file database format (see http://en.wikipedia.org/wiki/Flat_file_database) NCBI does not use a relational databank (as in Oracle, peoplesoft) NCBI stores data in ASN.1 format (http://en.wikipedia.org/wiki/Abstract_Syntax_Notation_One), which allows to hardwire crosslinks to other data bases, and makes retrieval of related information fast. NCBI’s sample record (http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html) contains links to most the fields used in the gbk flatfile. In the genbank records at NCBI the links connect to the features (i.e. the pubmed record, or the encoded protein sequence) --- not easy to work with.

Dr. Margaret Belle (Oakley) Dayhoff March 11, 1925 – February 5, 1983 Among other things, we owe her the first nucleotide and protein data bank, the PAM substitution matrix, and the single letter amino acid code. (Image from wikipedia)

Atlas of Protein Sequences 1972 (cont) The Atlas also contained RNA sequences, and PAM matrix for nucleotides

Atlas of Protein Sequences 1972 (cont) Contained phylogenetic reconstructions that went back in time to far before the Last Unversal Common Ancestor (LUCA) aka the cenancestor of all living cellular organisms alive today. tRNA phylogeny

PAM 250 log (odds) matrix Dayhoff recoding

Selecting Scoring Matrices Choose a matrix appropriate to the suspected degree of sequence identity between the query and its target sequences PAM: empirically derived for close relatives BLOSUM: empirically derived for distant relatives Kerfeld and Scott, PLoS Biology 2011 6 8. Teaching Tools