BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.

Slides:



Advertisements
Similar presentations
1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
Advertisements

Bioinformatics Tutorial I BLAST and Sequence Alignment.
A new method of finding similarity regions in DNA sequences Laurent Noé Gregory Kucherov LORIA/UHP Nancy, France LORIA/INRIA Nancy, France Corresponding.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Jeff Shen, Morgan Kearse, Jeff Shi, Yang Ding, & Owen Astrachan Genome Revolution Focus 2007, Duke University, Durham, North Carolina Introduction.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Bioinformatics and Phylogenetic Analysis
Molecular Evidence Using DNA, RNA or Protein Sequences to Classify Organisms.
1 1. BLAST (Basic Local Alignment Search Tool) Heuristic Only parts of protein are frequently subject to mutations. For example, active sites (that one.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Similar Sequence Similar Function Charles Yan Spring 2006.
Genomics and bioinformatics summary 1. Gene finding: computer searches, cDNAs, ESTs, 2.Microarrays 3.Use BLAST to find homologous sequences 4.Multiple.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Protein Structures.
Sequencing a genome and Basic Sequence Alignment
Introduction to NCBI & Ensembl tools including BLAST and database searching Incorporating Bioinformatics into the High School Biology Curriculum Fran Lewitter,
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Bioinformatics.
Assessment of sequence alignment Lecture Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot.
An Introduction to Bioinformatics
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Protein Sequence Alignment and Database Searching.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Muhammad Awais PhD Biochemistry 08-ARID-1103 Understanding Basic Local Alignment Search Tool.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
Sequencing a genome and Basic Sequence Alignment
1 P6a Extra Discussion Slides Part 1. 2 Section A.
Construction of Substitution Matrices
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
A Tutorial of Sequence Matching in Oracle Haifeng Ji* and Gang Qian** * Oklahoma City Community College ** University of Central Oklahoma.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Introduction to NCBI & Ensembl tools including BLAST and database searching Incorporating Bioinformatics into the High School Biology Curriculum Fran Lewitter,
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Significance in protein analysis
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
BLAST, which stands for basic local alignment search tool, is a heuristic algorithm that is used to find similar sequences of amino acids or nucleotides.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
BLAST, which stands for basic local alignment search tool, is a heuristic algorithm that is used to find similar sequences of amino acids or nucleotides.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Sequence Alignment.
Construction of Substitution matrices
Doug Raiford Phage class: introduction to sequence databases.
Copyright OpenHelix. No use or reproduction without express written consent1.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Gene_identifier color_no gtm1_mouse 2 gtm2_mouse 2 >fasta_format_description_line >GTM1_HUMAN GLUTATHIONE S-TRANSFERASE MU 1 (GSTM1-1) PMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLPYLIDGAHKI.
DNA Sequences Analysis Hasan Alshahrani CS6800 Statistical Background : HMMs. What is DNA Sequence. How to get DNA Sequence. DNA Sequence formats. Analysis.
Using BLAST To Teach ‘E-value-tionary’ Concepts Cheryl A. Kerfeld 1, 2 and Kathleen M. Scott 3 1.Department of Energy-Joint Genome Institute, Walnut Creek,
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
Computer Applications and Bioinformatics
Introduction to Bioinformatics Resources for DNA Barcoding
Introduction to Bioinformatics and Functional Genomics
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
Sequence Based Analysis Tutorial
BLAST.
Protein Structures.
Basic Local Alignment Search Tool
Homology Modeling.
Basic Local Alignment Search Tool (BLAST)
Applying principles of computer science in a biological context
BLAST Slides adapted & edited from a set by
BLAST Slides adapted & edited from a set by
Presentation transcript:

BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses on the analysis of molecular sequences (DNA, RNA, and proteins) The National Institutes of Health (NIH) definition of bioinformatics: “research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, analyze, or visualize such data.” How is bioinformatics important to biochemistry? The tools of bioinformatics include algorithms and computer programs for analysis of molecular sequences that reveal the structure and function of macromolecules. Bioinformatics analysis gives valuable information that can guide experimental work.

AMINO ACID SEQUENCE ALIGNMENT A way to compare 2 or more sequences; The sequences are lined up (“aligned”), one above the other, so that each residue of one sequence can be compared to the corresponding residue of the other sequence; Sometimes one sequence must be “cut,” and a gap introduced, in order to make this sequence align in the optimal way with the other sequence. An example of a pairwise amino acid sequence alignment (2 sequences): sequence_1 1 MLFMCHQRVMKKEAEEKLKAEELRRARAAADIPIIWILGGPGCGKGTQCA 50.|||||..: ||:::||||.||||||. sequence_2 1 MEEKLKKTK IIFVVGGPGSGKGTQCE 26 All the residues that are identical in the two sequences are indicated with the “ | ” symbol between them; residues that are chemically similar are indicated with the “ : ” or “. ” symbol, such as W and F (both have aromatic side chains). Note that a gap ( region) was introduced into sequence_2 in order to make it align optimally with sequence_1.

BLAST– Basic Local Alignment Search Tool A bioinformatics tool that allows users to compare a protein or DNA sequence to databases of other protein or DNA sequences from many organisms. A web-based version is available free of charge at the National Center for Biotechnology Information (NCBI) website: The output from a “BLAST search” is a series of sequence alignments.

EXAMPLE OF A BLAST SEARCH Suppose you have the sequence of a human protein and want to know if there is a homologous protein in the fruit fly Drosophila melanogaster. The amino acid sequence of the human protein will be the “query” for the BLAST search. The BLAST algorithm compares the query sequence to all proteins in the Drosophila genome. The BLAST output will show a list of the Drosophila proteins that have statistical sequence similarity to the human query protein. These Drosophila proteins can be referred to as “BLAST hits.” Below this list of BLAST hits, there will be a series of sequence alignments between the human query protein and each Drosophila protein that is in the list of BLAST hits. The first alignment will be between the query and the Drosophila protein that is most similar in sequence; the second alignment will be between the query and the Drosophila protein that is the second best match in terms of sequence similarity… and so on. The next slide shows just one of these alignments from a BLAST search. The last 2 slides explain some of the features of the alignment.

Query = a human protein Subject (sbjct) = the Drosophila protein that is most similar to this human protein Sample from BLAST output (see explanation on next 2 slides): >gi| |ref|NP_ | Adenylate kinase-1, [Drosophila melanogaster] Length = 229 Score = 179 bits (453), Expect = 1e-45 Identities = 96/205 (47%), Positives = 131/205 (64%), Gaps = 15/205 (7%) Query: 2 EEKLKKTK IIFVVGGPGSGKGTQCEKIVQKYGYTHLSTGDLLRSEVSSG 50 EEKLK + II+++GGPG GKGTQC KIV+KYG+THLS+GDLLR+EV+SG Sbjct: 15 EEKLKAEELRRARAAADIPIIWILGGPGCGKGTQCAKIVEKYGFTHLSSGDLLRNEVASG 74 Query: 51 SARGKKLSEIMEKGQLVPLETVLDMLRDAMVAKVNTSKGFLIDGYPREVQQGEEFERRIG 110 S +G++L +M G LV + VL +L DA+ +SKGFLIDGYPR+ QG EFE RI Sbjct: 75 SDKGRQLQAVMASGGLVSNDEVLSLLNDAITRAKGSSKGFLIDGYPRQKNQGIEFEARIA 134 Query: 111 QPTLLLYVDAGPETMTQRLLKRGETSG--RVDDNEETIKKRLETYYKATEPVIAFYEKRG 168 L LY + +TM QR++ R S R DDNE+TI+ RL T+ + T ++ YE + Sbjct: 135 PADLALYFECSEDTMVQRIMARAAASAVKRDDDNEKTIRARLLTFKQNTNAILELYEPKT 194 Query: 169 IVRKVNAEGSVDSVFSQVCTHLDAL NAE VD +F +V +D + Sbjct: 195 LT--INAERDVDDIFLEVVQAIDCV 217

First you will see sequence identification information for the subject (Drosophila) protein in the alignment. This protein is called “Adenylate kinase-1”: >gi| |ref|NP_ | Adenylate kinase-1, [Drosophila melanogaster] Next you will see the total length of the subject protein, 229 amino acid residues: Length = 229 Looking at the sequence alignment itself, you will see that it wraps around, taking up 3 ½ “rows.” One “row” is shown at the bottom of this slide. Residues 2 to 193 of the query protein are aligned with residues 15 to 217 of the Drosophila protein (see the numbers on the right and left sides of the previous slide). The “middle” line of each row (the line between the query and subject lines) is called the “consensus sequence.” Whenever there is a residue that is identical for the query protein and the subject protein, it is indicated in this middle line. Whenever there is a residue that is chemically similar (a conservative substitution) for the query and the subject, it is marked with a ‘+’ symbol. If one of the sequences must be “cut” in order to align it with the other, this is indicated with a “ - ” symbol. This is referred to as a “gap” in the alignment. Query: 2 EEKLKKTK IIFVVGGPGSGKGTQCEKIVQKYGYTHLSTGDLLRSEVSSG 50 EEKLK + II+++GGPG GKGTQC KIV+KYG+THLS+GDLLR+EV+SG Sbjct: 15 EEKLKAEELRRARAAADIPIIWILGGPGCGKGTQCAKIVEKYGFTHLSSGDLLRNEVASG 74

Just above the sequence alignment itself you will see statistical information for the alignment (essentially telling you “how similar” the two sequences are): Score = 179 bits (453), Expect = 1e-45 Identities = 96/205 (47%), Positives = 131/205 (64%), Gaps = 15/205 (7%) This tells you that of the 205 amino acid residues that are aligned, 96 are identical between the query protein and the subject protein. Of the 205 aligned residues, 131 are either identical OR similar (have “+” symbol). 15 gaps were introduced into the sequences (have “ - ” symbol). The expected-value (1x in this case; a very small number!) is the probability that this alignment could occur by chance between two unrelated sequences from a database of the size that was searched. The bottom line: the smaller the expected- value, the more similar the two sequences.