Basic Overview of Bioinformatics Tools and Biocomputing Applications I Dr Tan Tin Wee Director Bioinformatics Centre.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment.
Sequence Similarity Searching Class 4 March 2010.
Heuristic alignment algorithms and cost matrices
Course Summary June 2, 2005 Programming Workshop Overview of course (presentation) Protein modeling, part 2 Instructor evaluations.
Introduction to Bioinformatics - Tutorial no. 2 Global Alignment Local Alignment FASTA BLAST.
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Bioinformatics Workshop, Fall 2003 Algorithms in Bioinformatics Lawrence D’Antonio Ramapo College of New Jersey.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Protein Structures.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
BLAST: Basic Local Alignment Search Tool Urmila Kulkarni-Kale Bioinformatics Centre University of Pune.
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
An Introduction to Bioinformatics
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Protein Sequence Alignment and Database Searching.
Sequence Alignment Techniques. In this presentation…… Part 1 – Searching for Sequence Similarity Part 2 – Multiple Sequence Alignment.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Computational Biology, Part 9 Efficient database searching methods Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Sequence Based Analysis Tutorial NIH Proteomics Workshop Lai-Su Yeh, Ph.D. Protein Information Resource at Georgetown University Medical Center.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
Construction of Substitution Matrices
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Protein Secondary Structure, Bioinformatics Tools, and Multiple Sequence Alignments Finding Similar Sequences Predicting Secondary Structures Predicting.
REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Basic Overview of Bioinformatics Tools and Biocomputing Applications II Dr Tan Tin Wee Director Bioinformatics Centre.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Sequence Based Analysis Tutorial
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Bioinformatics Computing 1 CMP 807 – Day 2 Kevin Galens.
Step 3: Tools Database Searching
Copyright OpenHelix. No use or reproduction without express written consent1.
Annotation of eukaryotic genomes
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
CISC667, S07, Lec7, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Sequence pairwise alignment Score statistics: E-value and p-value Heuristic algorithms:
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Bioinformatics Overview
Sequence Based Analysis Tutorial
Dr Tan Tin Wee Director Bioinformatics Centre
Sequence Based Analysis Tutorial
Protein Structures.
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Sequence alignment, E-value & Extreme value distribution
Presentation transcript:

Basic Overview of Bioinformatics Tools and Biocomputing Applications I Dr Tan Tin Wee Director Bioinformatics Centre

Software Tools Data stored in retrievable forms in database systems Data generated by machines, DNA / Protein sequencers, automated systems Biological Data Automated Machines Research Labs Databases Analytical Tools New Knowledge

Common Computational Analyses Sequence Assembly Simple sequence analysis –Translation and reverse Complement, ORF –Composition statistics (protein & DNA) –Molecular mass –Total charge and pI; local hydropathy –Simple determination of secondary structures –Restriction site analysis –Internal repeat analysis Detection of active sites, functional residues, characteristic structures, substrates, and processing signals

Common Computational Analyses Database sequence search Multiple alignment 2  and 3  Structure prediction; transmembrane helix detection Structure modeling Docking prediction and design Hidden Markov model searches

Sequence Assembly Fragmented data from DNA sequencers Detection of Overlap Merging of Contigs Assembly into continuous sequence 5' 3'

Sequence Format Interconversion DNA/Protein and other sequence data come in different formats. Annotations Different programs use different formats Interconversion utility tools eg. READSEQ, TOGCG, TOSTADEN, etc

Simple Sequence Analysis 1. Linear Sequence eg. DNA/ Protein 2. Open a Window - n = 1 n = variable n = sliding 3. Calculate based on list of criteria ………….… …………….. ……………...

Some Simple Sequence Analysis Applications DNA complementary strand eg. COMPLEMENT & REVERSE –Open window size 1 –A--->T –C --->G –T ---> A –G ---> C –Slide to next Window of 1 –Proceed to end of sequence –Reverse order of complement –5'...ATCTCGATACTACTACG...3' – ||||||||||||||||| –3'...TAGAGCTATGATGATGC...5'

DNA to Protein sequence translation, e.g. TRANSLATE –Open window of 3 bases –Look up Codon Usage table –Assign Amino acid residue –Slide window to next 3 bases –Proceed till stop codon detected. –Repeat whole procedure for six frames ATACTACTGAGATCTAGGCTAGTACTGCGTGCG Frame 1 Frame 2 Frame 3 Complement - Frames 4-6 Some Simple Sequence Analysis Applications

Detect Open Reading Frame e.g. ORF –Translate sequence, report long stretches of start and stop codons Compositional analysis –eg. Calculate total A, T, G, C –eg. Calculate total molecular mass of protein, analysis percentages of amino acids –eg. Total Charge composition, pI Some Simple Sequence Analysis Applications

Simple prediction of secondary structure of Protein sequence –decide a window size –compute for each window of amino acids statistical potential to form helix, beta sheet, turn, etc. Chou-Fasman, GOR etc algorithms –use a statistical potential chart –plot potentials in graphical or pictorial format Some Simple Sequence Analysis Applications

Restriction Mapping eg. MAP, MAPPLOT,MAPSORT, PLASMIDMAP etc –Table of Restriction Enzymes and cut sites eg. EcoRI, BamHI AluI and their cut sites eg. GAATTC, AATT –Take a DNA sequence –Pattern match against the list of cut sites –For each match, assign Restriction enzyme –Calculate distance between cut sites –Display in table, graphical, or restriction map, etc Some Simple Sequence Analysis Applications Plasmid map gel

Protein sequence Motifs pattern matching eg. PROSITEMAP, MOTIFS, BLOCKS etc –Table/Database of Sequence Patterns/Motifs and their signature sequence eg. Arg-Gly-Asp (RGD) or consensus sequence (eg. PROSITE, BLOCKS db) –Take Protein sequence –Pattern match against the list of signature sites –For each match, assign potential function according to database –Display in table or graphically, or hyperlinked Some Simple Sequence Analysis Applications

Peptide Cleavage Maps eg. PEPTIDESORT, PEPTIDE MAP –Table of Protease vs Cleavage sites eg. Trypsin, chymotrypsin, and Chemical cleavage sites cyanogen bromide –Pattern match with entire protein sequence –Calculate size of peptide fragments –Sort and Map, Plot as electrophoretic patterns on a log-linear simulated digest. –Compute Partial Digest patterns Some Simple Sequence Analysis Applications

DOTPLOT- selfcomparison –Take a Window size –Compare against entire length of own sequence –Report matches above a threshold –Plot on Graph –Slide window, repeat till end of sequence –Detection of Internal repeats Pairwise comparison - detection of homology Some Simple Sequence Analysis Applications Sequence A

RNA secondary structure analysis Mfold, PlotFold, FoldRNA, Squiggles, Circles, Domes, Mountains, StemLoop Folding of RNA into stems, loops Calculation of energy - prediction of stability of structure Display of structure and alternatives Some Simple Sequence Analysis Applications...AUCGAAUCUC... AUGCAUGC UACGUACG-- -- AUCG U G G A

Database Searching Text-based Database Searching - using a text string to match an annotation in a sequence database record, ie. Keyword search Sequence-based Database Searching - using a biological sequence to match its whole or parts of its sequence to the sequences of every sequence database records

Text-Based Database Searching Examples: Entrez, SRS, DBGET, AceDB - common integrated database systems Search Concepts –Boolean Search - AND, OR, NOT –Broadening Search –Narrowing the Search –Proximity searching, soundex –Wild Card, Stemming eg. Thala* for thalasemia, thalassemia, thalassemic Use standard string search algorithms and boolean operations, vocabulary matches

Text-based Database Searching Example: To find the human homolog of the Drosophila per gene Procedure –Web to Entrez –All Fields : enter "human" "per" –Hits returned, irrelevant - broaden search –"human" "period" - more hits –check every one, find the human RIGUI gene Hit and miss, clever guess work, free form or controlled vocabulary (MeSH terms)? Use Boolean searches?

Sequence-based Database Searching Homology Search Global or Local Sequence Alignment Needleman-Wunch Algorithm Smith-Waterman Algorithm Lipman - Pearson FASTA Altschul's BLAST Take a sequence, pairwise comparison with each sequence in the database

Sequence-based Database Searching Basic Assumptions: Sequences of homologous Genes/Protein diverge over time even though structure and/or function change little Significant sequence similarity inferred as potential structural /functional similarity or common evolutionary origin Based on well-characterised protein, infer the function of an unknown sequence at gene or protein sequence level.

Sequence-based Database Searching Global Alignment forces complete alignment of the pairwise comparison of the two input sequences Local Alignment looks for local stretches of similarity and tries to align the most similar segments Algorithms used may be similar, but output different, statistics needed to assess results

Sequence-based Database Searching Alignment Scoring Substitution score and substitution matrix PAM, BLOSUM affine gap costs/gap penalty and gap scores Optimal alignments, dynamic programming Needleman-Wunsch algorithm, Smith-Waterman algorithm (SSEARCH) Additional heuristics - FASTA, BLAST