Matrices A set of elements organized in a table (along rows and columns) Wikipedia image
Matrices Python does not have direct support for matrix manipulation. For Bio/CS 251 matrices are provided through support.py makeMatrix(rows, cols) # creates a matrix with the # given rows and cols randomMatrix(rows, cols) # creates a matrix with the # given rows and cols with all # cells set to random values getRows(M) # returns the number of rows # of the given matrix getCols(M) # returns the number of cols M[r][c] = 5 # puts 5 in cell (r, c) score = M[r][c] # puts value of cell(r, c) in score
Matrices Indexing of rows and columns starts at 0 1 2 3 4 7 4 9 1 2 3 4 7 4 9 >>> M = makeMatrix(3, 5) # creates 3x5 matrix >>> rows = getRows(M) >>> print rows 3 >>> cols = getCols(M) >>> print cols 5 >>> M[0][0] = 7 >>> M[2][4] = 9 >>> M[1][2] = 4 >>> total = M[0][0] + M[2][4] + M[1][2] >>> print total
Matrix Processing Fill all cells of a matrix with the number 9 To FILL each cell of a given matrix with the value 9: 1. for each row index in the matrix: 2. for each column index in the matrix: 3. set cell of current row, col to 9 def fillMatrix(M): for r in range(0, getRows(M)): for c in range(0, getCols(M)): M[r][c] = 9 >>> D = makeMatrix(3, 5) >>> fillMatrix(D) >>> print D | 9 9 9 9 9 |
Matrix Processing Add all the values in a matrix To ADD all cells of a given matrix: set current total to 0 1. for each row index in the matrix: 2. for each column index in the matrix: 3. update total with current cell value 4. return total >>> D = randomMatrix(3, 5) >>> print D | 1 4 2 1 1 | | 3 2 2 1 4 | | 4 1 3 2 1 | >>> total = addElements(D) >>> print total 32 def addElements(M): total = 0 for r in range(0, getRows(M)): for c in range(0, getCols(M)): total = total + M[r][c] return total
Sequence Similarity Provides insight about the sequence under investigation – gene-coding regions (DNA), function (proteins) Typically assessed via the process of “sequence alignment” Standard sequence alignment algorithms Dot Plots Global Alignment Semiglobal Alignment Local Alignment Standard software BLAST, FASTA – find high scoring local alignments between query and a target database
Dot Plots The simplest method for identifying similarities between two sequence Uses a 2-dimensional table one of the sequences labels the rows the other sequence labels the columns place a ● in each cell that has matching (row, column) labels Example: Dot plot for “GATTACA” and “TACACATTG”
Dot Plots G A T C ? ? ● ? ? ● ? ? ? ? ● ? ? ? ? ● ? ? ● ?
Dot Plots G A T C ● ACA ACATT TACA TAC ATT
Dot Plots The simplest method for identifying similarities between two sequence Diagonal lines indicate regions of similarity SE slope – similarity along the direction of the sequences SW slope – similarity along one sequence in reverse Susceptible to noise – especially with DNA since only 4 possible symbols there will be a lot of “random hits” Noise can be addressed using a sliding window consider fragments of length W in the two sequences place ● in each cell that is the “origin” of the sliding window
Dot Plots (W = 2) G A T C ? ? ? ? ● ? ? ? ? ? ? ? ● ?
Dot Plots (W = 2) Compare with next slide with W = 1 G A T C ● Compare with next slide with W = 1 noise has disappeared one fewer dots per matching region in general if N matches per region, #dots = N – (W-1)
Dot Plots (W = 1) G A T C ● Compare with previous slide with W = 2
Self Alignment (W = 1) In self alignment C ● In self alignment main diagonal is filled in completely matrix is symmetric about main diagonal
Dot Plots Original paper Maizel JV and Lenk RP: Enhanced graphic matrix analysis of nucleic acid and protein sequences. Proc Natl Acad Sci USA 78:7665, 1981. Used a sliding window of odd length centered at the base Our examples used a sliding window anchored at the base G G
Dot Plots in Python Compute the dot plot matrix given two sequences To MAKE a DOT PLOT given two sequences: 1. Create a matrix with rows and columns equal to length of first and second sequence respectively 2. for each row index in the matrix: 3. for each column index in the matrix: 4. if symbol in first sequence equals symbol in second sequence 5. place a dot at current cell 6. return the matrix >>> M = makeDotPlot("GATTACA", "TACACATTG") >>> print M | * * | | * * * | | * | | * * * | | * |