Matrices A set of elements organized in a table (along rows and columns) Wikipedia image.

Slides:



Advertisements
Similar presentations
Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Advertisements

Computational Biology, Part 7 Similarity Functions and Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
BLAST, PSI-BLAST and position- specific scoring matrices Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Measuring the degree of similarity: PAM and blosum Matrix
Lecture 8 Alignment of pairs of sequence Local and global alignment
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
 A superposition of two sequences that reveals a large number of common regions (matches)  Possible alignment of ACATGCGATT and GAGATCTGA -AC-ATGC-GATT.
Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment.
 If Score(i, j) denotes best score to aligning A[1 : i] and B[1 : j] Score(i-1, j) + galign A[i] with GAP Score(i, j-1) + galign B[j] with GAP Score(i,
Sequence Similarity Searching Class 4 March 2010.
Heuristic alignment algorithms and cost matrices
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
1 Improved tools for biological sequence comparison Author: WILLIAM R. PEARSON, DAVID J. LIPMAN Publisher: Proc. Natl. Acad. Sci. USA 1988 Presenter: Hsin-Mao.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Local alignment
Assessment of sequence alignment Lecture Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot.
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
1 BLAST: Basic Local Alignment Search Tool Jonathan M. Urbach Bioinformatics Group Department of Molecular Biology.
Presented by Liu Qi Pairwise Sequence Alignment. Presented By Liu Qi Why align sequences? Functional predictions based on identifying homologues. Assumes:
Assessment of sequence alignment Lecture Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot.
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Computational Biology, Part 9 Efficient database searching methods Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
M.M. Dalkilic, PhD Monday, September 08, 2008 Class V Indiana University, Bloomington, IN Sequence Homology 1 Sequence Similiarty (Computation) M.M. Dalkilic,
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
Construction of Substitution Matrices
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
BLAST Slides adapted & edited from a set by Cheryl A. Kerfeld (UC Berkeley/JGI) & Kathleen M. Scott (U South Florida) Kerfeld CA, Scott KM (2011) Using.
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Constructing Probability Matrices Redux Suppose we live in a world with only 3 amino acids: Alanine Leucine Serine Furthermore suppose: Alanine Leucine.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Data and Knowledge Engineering Laboratory Clustered Segment Indexing for Pattern Searching on the Secondary Structure of Protein Sequences Minkoo Seo Sanghyun.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
1 Arrays of Arrays An array can represent a collection of any type of object - including other arrays! The world is filled with examples Monthly magazine:
Sequence Alignment.
Chapter 3 Gene Alignments: Investigating Antibiotic Resistance.
Construction of Substitution matrices
Doug Raiford Phage class: introduction to sequence databases.
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2010.
Heuristic Alignment Algorithms Hongchao Li Jan
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
PROTEIN IDENTIFIER IAN ROBERTS JOSEPH INFANTI NICOLE FERRARO.
Your friend has a hobby of generating random bit strings, and finding patterns in them. One day she come to you, excited and says: I found the strangest.
13.4 Product of Two Matrices
1.5 Matricies.
Sequence comparison: Local alignment
Dot Plot.
Sequence Alignment 11/24/2018.
Fast Sequence Alignments
Pairwise sequence Alignment.
Matrices Elements, Adding and Subtracting
CSCI N207 Data Analysis Using Spreadsheet
30% grade = class presentations
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Constructing Probability Matrices
2-Dimensional Lists (Matrices) in Python
It is the presentation about the overview of DOT MATRIX and GAP PENALITY..
Presentation transcript:

Matrices A set of elements organized in a table (along rows and columns) Wikipedia image

Matrices Python does not have direct support for matrix manipulation. For Bio/CS 251 matrices are provided through support.py makeMatrix(rows, cols) # creates a matrix with the # given rows and cols randomMatrix(rows, cols) # creates a matrix with the # given rows and cols with all # cells set to random values getRows(M) # returns the number of rows # of the given matrix getCols(M) # returns the number of cols M[r][c] = 5 # puts 5 in cell (r, c) score = M[r][c] # puts value of cell(r, c) in score

Matrices Indexing of rows and columns starts at 0 1 2 3 4 7 4 9 1 2 3 4 7 4 9 >>> M = makeMatrix(3, 5) # creates 3x5 matrix >>> rows = getRows(M) >>> print rows 3 >>> cols = getCols(M) >>> print cols 5 >>> M[0][0] = 7 >>> M[2][4] = 9 >>> M[1][2] = 4 >>> total = M[0][0] + M[2][4] + M[1][2] >>> print total

Matrix Processing Fill all cells of a matrix with the number 9 To FILL each cell of a given matrix with the value 9: 1. for each row index in the matrix: 2. for each column index in the matrix: 3. set cell of current row, col to 9 def fillMatrix(M): for r in range(0, getRows(M)): for c in range(0, getCols(M)): M[r][c] = 9 >>> D = makeMatrix(3, 5) >>> fillMatrix(D) >>> print D | 9 9 9 9 9 |

Matrix Processing Add all the values in a matrix To ADD all cells of a given matrix: set current total to 0 1. for each row index in the matrix: 2. for each column index in the matrix: 3. update total with current cell value 4. return total >>> D = randomMatrix(3, 5) >>> print D | 1 4 2 1 1 | | 3 2 2 1 4 | | 4 1 3 2 1 | >>> total = addElements(D) >>> print total 32 def addElements(M): total = 0 for r in range(0, getRows(M)): for c in range(0, getCols(M)): total = total + M[r][c] return total

Sequence Similarity Provides insight about the sequence under investigation – gene-coding regions (DNA), function (proteins) Typically assessed via the process of “sequence alignment” Standard sequence alignment algorithms Dot Plots Global Alignment Semiglobal Alignment Local Alignment Standard software BLAST, FASTA – find high scoring local alignments between query and a target database

Dot Plots The simplest method for identifying similarities between two sequence Uses a 2-dimensional table one of the sequences labels the rows the other sequence labels the columns place a ● in each cell that has matching (row, column) labels Example: Dot plot for “GATTACA” and “TACACATTG”

Dot Plots G A T C ? ? ● ? ? ● ? ? ? ? ● ? ? ? ? ● ? ? ● ?

Dot Plots G A T C ● ACA ACATT TACA TAC ATT

Dot Plots The simplest method for identifying similarities between two sequence Diagonal lines indicate regions of similarity SE slope – similarity along the direction of the sequences SW slope – similarity along one sequence in reverse Susceptible to noise – especially with DNA since only 4 possible symbols there will be a lot of “random hits” Noise can be addressed using a sliding window consider fragments of length W in the two sequences place ● in each cell that is the “origin” of the sliding window

Dot Plots (W = 2) G A T C ? ? ? ? ● ? ? ? ? ? ? ? ● ?

Dot Plots (W = 2) Compare with next slide with W = 1 G A T C ● Compare with next slide with W = 1 noise has disappeared one fewer dots per matching region in general if N matches per region, #dots = N – (W-1)

Dot Plots (W = 1) G A T C ● Compare with previous slide with W = 2

Self Alignment (W = 1) In self alignment C ● In self alignment main diagonal is filled in completely matrix is symmetric about main diagonal

Dot Plots Original paper Maizel JV and Lenk RP: Enhanced graphic matrix analysis of nucleic acid and protein sequences. Proc Natl Acad Sci USA 78:7665, 1981. Used a sliding window of odd length centered at the base Our examples used a sliding window anchored at the base G G

Dot Plots in Python Compute the dot plot matrix given two sequences To MAKE a DOT PLOT given two sequences: 1. Create a matrix with rows and columns equal to length of first and second sequence respectively 2. for each row index in the matrix: 3. for each column index in the matrix: 4. if symbol in first sequence equals symbol in second sequence 5. place a dot at current cell 6. return the matrix >>> M = makeDotPlot("GATTACA", "TACACATTG") >>> print M | * * | | * * * | | * | | * * * | | * |