Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05.

Slides:



Advertisements
Similar presentations
2.3 Modeling Real World Data with Matrices
Advertisements

Computational Biology, Part 2 Sequence Motifs Robert F. Murphy Copyright  1996, All rights reserved.
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Dynamic Programming Nithya Tarek. Dynamic Programming Dynamic programming solves problems by combining the solutions to sub problems. Paradigms: Divide.
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Methods to CHAIN Local Alignments Sparse Dynamic Programming O(N log N)
Protein String Encoding Student: Logan Everett Mentor: Endre Boros Funded by DIMACS REU 2004.
Sequence Alignment Bioinformatics. Sequence Comparison Problem: Given two sequences S & T, are S and T similar? Need to establish some notion of similarity.
Regular Expression Constrained Sequence Alignment Abdullah N. Arslan Assistant Professor Computer Science Department.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Pattern databases in protein analysis Arthur Gruber Instituto de Ciências Biomédicas Universidade de São Paulo AG-ICB-USP.
Pattern Matching in Weighted Sequences Oren Kapah Bar-Ilan University Joint Work With: Amihood Amir Costas S. Iliopoulos Ely Porat.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Modern Information Retrieval Chapter 4 Query Languages.
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Fundamentals of matrices
Patterns and Sequences. Patterns refer to usual types of procedures or rules that can be followed. Patterns are useful to predict what came before or.
Brandon Andrews.  Longest Common Subsequences  Global Sequence Alignment  Scoring Alignments  Local Sequence Alignment  Alignment with Gap Penalties.
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Learning Phase at Head Ends 1 Edge Events Appliance Table Input Output by Naoki ref: M. Baranski and V. Jurgen (2004) by Josh Implemented in Java with.
Alignment, Part I Vasileios Hatzivassiloglou University of Texas at Dallas.
Motif discovery Tutorial 5. Motif discovery MEME Creates motif PSSM de-novo (unknown motif) MAST Searches for a PSSM in a DB TOMTOM Searches for a PSSM.
Multiple Sequence Alignments Craig A. Struble, Ph.D. Department of Mathematics, Statistics, and Computer Science Marquette University.
Column Sequences An Investigation Column Sequences Look carefully at the columns shown below and the way in which the numbers are being put into each.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
©Brooks/Cole, 2001 Chapter 9 Regular Expressions.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Matrices: Simplifying Algebraic Expressions Combining Like Terms & Distributive Property.
Sequence Alignment.
Multiplying Matrices Lesson 4.2. Definition of Multiplying Matrices The product of two matrices A and B is defined provided the number of columns in A.
Algebra Matrix Operations. Definition Matrix-A rectangular arrangement of numbers in rows and columns Dimensions- number of rows then columns Entries-
Warm Up Perform the indicated operations. If the matrix does not exist, write impossible
Construction of Substitution matrices
DNA, RNA and protein are an alien language
Step 3: Tools Database Searching
World Cup Matrix Multiplication….  Below is a league table for the group stage of the World Cup  The top 2 teams in each group progress through.
1.7 Linear Independence. in R n is said to be linearly independent if has only the trivial solution. in R n is said to be linearly dependent if there.
Computational Biology, Part 3 Representing and Finding Sequence Features using Frequency Matrices Robert F. Murphy Copyright  All rights reserved.
Chapter 6 - Profiles1 Assume we have a family of sequences. To search for other sequences in the family we can Search with a sequence from the family Search.
Learning Analogies and Semantic Relations Nov William Cohen.
. Sequence Alignment Author:- Aya Osama Supervision:- Dr.Noha khalifa.
Notes Over 4.2 Finding the Product of Two Matrices Find the product. If it is not defined, state the reason. To multiply matrices, the number of columns.
Precalculus Section 14.1 Add and subtract matrices Often a set of data is arranged in a table form A matrix is a rectangular.
4.1 Exploring Data: Matrix Operations ©2001 by R. Villar All Rights Reserved.
13.4 Product of Two Matrices
Matrices Rules & Operations.
Add and subtract complex numbers
Column Sequences An Investigation.
Matrix Operations SpringSemester 2017.
WarmUp 2-3 on your calculator or on paper..
MULTIPLYING TWO MATRICES
Sequence Alignment Using Dynamic Programming
Warmup Solve each system of equations. 4x – 2y + 5z = 36 2x + 5y – z = –8 –3x + y + 6z = 13 A. (4, –5, 2) B. (3, –2, 4) C. (3, –1, 9) D. no solution.
[ ] [ ] [ ] [ ] EXAMPLE 3 Scalar multiplication Simplify the product:
Intro to Alignment Algorithms: Global and Local
How to use hash tables to solve olympiad problems
Matrices.
1.8 Matrices.
Matrix Operations SpringSemester 2017.
Matrix A matrix is a rectangular arrangement of numbers in rows and columns Each number in a matrix is called an Element. The dimensions of a matrix are.
1.8 Matrices.
3.5 Perform Basic Matrix Operations Algebra II.
Presentation transcript:

Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke

Definitions Sequence, string – ordered arrangement of letters {'A', 'C', 'G', 'T'} Pattern – simplified regular expression, alphabet {'A', 'C', 'G', 'T', '.'}, where '.' - wild-card of length 1 ('A', 'C', 'G' or 'T') Triinu Tasa, Koke

What is a weight matrix? GATGAG GATGAT TGATAT GATGAT or [GT][AG][TA][GT]A[GT] What is a weight matrix? Triinu Tasa, Koke

Alignment matrix C: A C G T Frequency matrix F: A C G T Better: GATGAG GATGAT TGATAT Triinu Tasa, Koke What is a weight matrix?

Or weight matrix W: where N – number of sequences used - a priori probability of letter i What is a weight matrix? Triinu Tasa, Koke

Importance matrix I: I(i, j) = * A C G T What is a weight matrix? Triinu Tasa, Koke

Applications Pattern clustering 1. G.GATGAG.T 62/75 1:39/49 2:23/26 R: BP: e G.GATGAG 89/110 1:45/60 2:44/50 R: BP: e GATGAG.T 124/148 1:52/70 2:72/78 R: BP: e TG.AAA.TTT 132/145 1:53/61 2:79/84 R: BP: e AAAATTTT 200/231 1:63/77 2:137/154 R: BP: e TGAAAA.TTT 104/114 1:45/53 2:59/61 R: BP: e AAA.TTTT 343/537 1:79/145 2:264/392 R: BP: e G.AAA.TTTT 135/156 1:51/62 2:84/94 R: BP: e TG.GATGAG 49/57 1:30/35 2:19/22 R: BP: e TG.AAA.TTTT 86/91 1:40/43 2:46/48 R: BP:1.1124e Triinu Tasa, Koke Applications - Clustering

G.GATGAG.T: GAGATGAGAT GTGATGAGAT GAGATGAGGT... A C G T Triinu Tasa, Koke Applications - Clustering

Compare matrices with each other using the dynamic programming approach : where A, B – matrices i, j - columns If D(m,n) > threshold => matrices are different Triinu Tasa, Koke Applications - Clustering

G.GATGAG.TTG.AAA.TTTAAAATTTT G.GATGAGTGAAAA.TTTAAA.TTTT GATGAG.TTG.AAA.TTTT We want to represent the clusters by logos: We need to align the patterns first – position the similar parts of the patterns above each other: G.GATGAG.T G.GATGAG-- --GATGAG.T or the logo will look like this: Triinu Tasa, Koke Applications - Clustering

Multiple Alignment Importance matrix I – represents the aligned patterns. Example: G.GATGAG.T GATGAG.T G.GATGAG 1. Insert the first pattern into I: ('.' gives 0.25 to each) A C G T Align the second pattern with I using a dynamic programming approach: Triinu Tasa, Koke Applications – Multiple alignment

Dynamic programming matrix: G. G A T G A G. T G A T G A G T G.GATGAG.T --GATGAG.T Triinu Tasa, Koke Applications – Multiple alignment

3. Add the pattern '--GATGAG.T' to I, if necessary add columns to the matrix. 4. Repeat the procedure for every pattern. Output: G.GATGAG.T G.GATGAG-- --GATGAG.T Why importance matrix? Triinu Tasa, Koke Applications – Multiple alignment

Example: Pattern: GATG So far aligned: GATGATGTA GATGTGG We want: w(G, 4) > w(G, 1) > w(G, 9) Solution – importance matrix Triinu Tasa, Koke Applications – Multiple alignment

● Weight Matrix Matching Purpose: find the sequences that the weight matrix describes best in a given text file...CATAGGAAATTCCACCTCTTTGGCTTTGCCCAGTCTTCCCTTGAGGATGCCTACGTTC Calculate the score for each position 2. if score > threshold => signal Problem: finding a good threshold ● Threshold – 99.5% quantile Triinu Tasa, Koke Applications – Weight matrix matching

Questions? Triinu Tasa, Koke