Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

NCBI BLAST, CDD, Mini-courses Katia Guimarães 2007/2.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
© Wiley Publishing All Rights Reserved. How Most People Use Bioinformatics.
On line (DNA and amino acid) Sequence Information Lecture 7.
Bioinformatics. Bioinformatics is an applied science that uses computer programs to access molecular biology databanks to make inferences about the information.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
HC70AL Spring 2009 An Introduction to Bioinformatics By Brandon Le & Min Chen April 7, 2009.
Archives and Information Retrieval
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף.
Bioinformatics and Phylogenetic Analysis
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
Chapter 2 Sequence databases A list of the databases’ uniform resource locators (URLs) discussed in this section is in Box 2.1.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
BLAST Basic Local Alignment Search Tool. BLAST החכה BLAST (Basic Local Alignment Search Tool) allows rapid sequence comparison of a query sequence [[רצף.
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
On line (DNA and amino acid) Sequence Information
Database searching with BLAST
Bioinformatics.
Pairwise Alignments Part 1 Biology 224 Instructor: Tom Peavy Sept 8
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Genomics, Proteomics, and Bioinformatics Biology 224 Instructor: Tom Peavy January 29, 2008.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
BLAST : Basic local alignment search tool B L A S T !
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Biology 224 Tom Peavy Sept 20 & 22, 2010
Muhammad Awais PhD Biochemistry 08-ARID-1103 Understanding Basic Local Alignment Search Tool.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
Introduction to Bioinformatics Databases. DNARNAphenotypeprotein Central dogma of molecular biology A main focus of bioinformatics is to study molecular.
Organizing information in the post-genomic era The rise of bioinformatics.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
EB3233 Bioinformatics Introduction to Bioinformatics.
Bioinformatics and Computational Biology
Computer Storage of Sequences
Biology 4900 Biocomputing.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Copyright OpenHelix. No use or reproduction without express written consent1.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
What is BLAST? Basic BLAST search What is BLAST?
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.
Instructor Prof. Chandrama P. Upadhyaya 220, Life Sciences Building ,
©CMBI 2009 Transfer of information The main topic of this course is transfer of information. In the protein world that leads to the questions: 1)From which.
Introduction to Bioinformatics
Introduction to Bioinformatics DNA and Protein Database Searching BLAST: Basic local alignment search tool Xiaolong Wang College of Life Sciences Ocean.
What is BLAST? Basic BLAST search What is BLAST?
Bacterial infection by lytic virus
ORF Calling.
bacteria and eukaryotes
Bacterial infection by lytic virus
Courtesy of Jonathan Pevsner
Basics of BLAST Basic BLAST Search - What is BLAST?
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
생물정보학 Bioinformatics.
Mangaldai College, Mangaldai
Gene architecture and sequence annotation
Genome Center of Wisconsin, UW-Madison
Bioinformatics and BLAST
Genomes and Their Evolution
BLAST.
What do you with a whole genome sequence?
Basic Local Alignment Search Tool
Lesson 3 Bioinformatics Laboratory
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
Presentation transcript:

Introduction to Bioinformatics CPSC 265

Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and computer databases Genome informatics: making sense of the billions of base pairs of DNA that are sequenced by genomics projects. Mostly, it’s about protein and DNA sequences What is bioinformatics?

What do bioinformatics researchers do? Process large data outputs from new technologies Turn sequence data into whole-genome sequences Interpret genome sequences in terms of genes and their expression Find genes that control crop, animal traits, disease etc. Model evolution in genomes and proteins Model and predict 3D structures of proteins

Growth of GenBank Year Base pairs of DNA (billions) Sequences (millions) Updated : >40b base pairs Fig. 2.1 Page 17

Cost of sequencing is falling exponentially

DNA sequence analysis Could be like those from our experiment last week Or, a lot bigger, like the whole human genome. Some have chromatogram or “quality” data, some don’t.

DNA makes RNA makes protein Hard to sequence RNA Very hard to sequence protein We can deduce RNA sequence from DNA (in bacteria, as easy as turning Ts to Us. In eukarya, need also to figure out where introns are) We can deduce protein sequence from RNA, using the Universal Genetic Code

Conceptual Translation In a computer, take each set of three RNA letters, and then figure out what amino acid they code for. Professional biologists use the SINGLE LETTER CODE

DNA potentially encodes six proteins 5’ CAT CAA 5’ ATC AAC 5’ TCA ACT 5’ GTG GGT 5’ TGG GTA 5’ GGG TAG 5’ CATCAACTACAACTCCAAAGACACCCTTACACATCAACAAACCTACCCAC 3’ 3’ GTAGTTGATGTTGAGGTTTCTGTGGGAATGTGTAGTTGTTTGGATGGGTG 5’

We call these READING FRAMES 5’ CAT CAA 5’ ATC AAT 5’ TCA ATG 5’ GTG GGT 5’ TGG GTA 5’ GGG TAG 5’ CATCAATGACAACTCCAAAGACACCCTTACACATCAACAAACCTACCCAC 3’ 3’ GTAGTTACTGTTGAGGTTTCTGTGGGAATGTGTAGTTGTTTGGATGGGTG 5’

All proteins start with M (ATG) TAG, TAA and TGA are all STOP This can help narrow it down 5’ CAT CAA 5’ ATC AAT 5’ TCA ATG 5’ GTG GGT 5’ TGG GTA 5’ GGG TAG 5’ CATCAATGACAACTCCAAAGACACCCTTACACATCAACAAACCTACCCAC 3’ 3’ GTAGTTACTGTTGAGGTTTCTGTGGGAATGTGTAGTTGTTTGGATGGGTG 5’

Once you know the sequence of the protein, you can figure out if it has been studied already. You may even be able to track down a likely structure

GenBankEMBLDDBJ Housed at EBI European Bioinformatics Institute There are three major public DNA databases Housed at NCBI National Center for Biotechnology Information Housed in Japan Page 16

PubMed is… National Library of Medicine's search service 12 million citations in MEDLINE links to participating online journals PubMed tutorial (via “Education” on side bar)

BLAST is… Basic Local Alignment Search Tool NCBI's sequence similarity search tool supports analysis of DNA and protein databases 80,000 searches per day

TaxBrowser is… browser for the major divisions of living organisms (archaea, bacteria, eukaryota, viruses) taxonomy information such as genetic codes molecular data on extinct organisms

From the NCBI home page, type “lectin” and hit “Search”

PubMed is the NCBI gateway to MEDLINE. MEDLINE contains bibliographic citations and author abstracts from over 4,600 journals published in the United States and in 70 foreign countries. It has 12 million records dating back to Page 35 PubMed

BLAST BLAST looks for similarity between your favorite query sequence and other known protein or DNA sequences. Applications include identifying homologs (orthologs and paralogs) discovering new genes or proteins discovering variants of genes or proteins investigating expressed sequence tags (ESTs) exploring protein structure and function page 88

Four components to a BLAST search (1) Obtain the sequence (query) (2) Select the BLAST program (3) Enter sequence (4) Choose optional parameters Then click “BLAST” page 88

Step 2: Choose the BLAST program blastn (nucleotide BLAST) blastp (protein BLAST) tblastn (translated BLAST) blastx (translated BLAST) tblastx (translated BLAST)

DNA potentially encodes six proteins 5’ CAT CAA 5’ ATC AAC 5’ TCA ACT 5’ GTG GGT 5’ TGG GTA 5’ GGG TAG 5’ CATCAACTACAACTCCAAAGACACCCTTACACATCAACAAACCTACCCAC 3’ 3’ GTAGTTGATGTTGAGGTTTCTGTGGGAATGTGTAGTTGTTTGGATGGGTG 5’

Choose the BLAST program ProgramInputDatabase 1 blastnDNADNA 1 blastpproteinprotein 6 blastxDNAprotein 6 tblastnprotein DNA 36 tblastxDNA DNA

Step 3: choose the database nr = non-redundant protein (most general database) Also can search specific organisms and DNA rather than protein (although ALL DNA is going to take a long time…)

filtering

So now you can Find any sequence in the database Find relevant publications Match DNA to protein sequence Find database matches to DNA or protein Find conserved domains in protein Find the 3D structure of a protein …Without doing any experiments!