Download presentation
Presentation is loading. Please wait.
Published byRebecca Lite Modified over 10 years ago
1
Computational Biology, Part 2 Sequence Motifs Robert F. Murphy Copyright 1996, 1999-2009. All rights reserved.
2
Slides from Chapter 4 Ch04_Motifs_mod.ppt Ch04_Motifs_mod.ppt Ch04_Motifs_mod.ppt
3
Describing features using frequency matrices Goal: Describe a sequence feature (or motif) more quantitatively than possible using consensus sequences Goal: Describe a sequence feature (or motif) more quantitatively than possible using consensus sequences Need to describe how often particular bases are found in particular positions in a sequence feature Need to describe how often particular bases are found in particular positions in a sequence feature
4
Describing features using frequency matrices Definition: For a feature of length m using an alphabet of n characters, a frequency matrix is an n by m matrix in which each element contains the frequency at which a given member of the alphabet is observed at a given position in an aligned set of sequences containing the feature Definition: For a feature of length m using an alphabet of n characters, a frequency matrix is an n by m matrix in which each element contains the frequency at which a given member of the alphabet is observed at a given position in an aligned set of sequences containing the feature
5
Frequency matrices (continued) Three uses of frequency matrices Three uses of frequency matrices Describe a sequence feature Calculate probability of occurrence of feature in a random sequence Calculate degree of match between a new sequence and a feature
6
Matlab Demonstration % read some aligned sequences provided with the bioinformatics toolbox seqs = fastaread('pf00002.fa'); seqdisp(seqs); startposition=4; endposition=13; [P,S] = seqprofile(seqs,'limits',[startposition endposition]); disp([' ' sprintf('%2d ',[1:size(P,2)])]); for i=1:length(S) disp([S(i) ' ' sprintf('%4.3f ',P(i,:))]) disp([S(i) ' ' sprintf('%4.3f ',P(i,:))])endseqlogo(seqs,'startat',startposition,'endat',endposition,'alphabet','aa’);
7
Frequency matrix
8
Logo Example
9
Logos for displaying sequence motifs http://www.ccrnp.ncifcrf.gov/~toms/sequencelogo.html Free logo maker at http://weblogo.berkeley.edu/ Free logo maker at http://weblogo.berkeley.edu/http://weblogo.berkeley.edu/
10
Frequency Matrices, PSSMs, and Profiles A frequency matrix can be converted to a Position-Specific Scoring Matrix (PSSM) by converting frequencies to scores A frequency matrix can be converted to a Position-Specific Scoring Matrix (PSSM) by converting frequencies to scores PSSMs also called Position Weight Matrixes (PWMs) or Profiles PSSMs also called Position Weight Matrixes (PWMs) or Profiles
11
Methods for converting frequency matrices to PSSMs Using log ratio of observed to expected Using log ratio of observed to expected where m(j,i) is the frequency of character j observed at position i and f(j) is the overall frequency of character j (usually in some large set of sequences) Using amino acid substitution matrix (Dayhoff similarity matrix) [see later] Using amino acid substitution matrix (Dayhoff similarity matrix) [see later]
12
Pseudo-counts How do we get a score for a position with zero counts for a particular character? Can’t take log(0). How do we get a score for a position with zero counts for a particular character? Can’t take log(0). Solution: add a small number to all positions with zero frequency Solution: add a small number to all positions with zero frequency
13
Finding occurrences of a sequence feature using a Profile As with finding occurrences of a consensus sequence, we consider all positions in the target sequence as candidate matches As with finding occurrences of a consensus sequence, we consider all positions in the target sequence as candidate matches For each position, we calculate a score by “looking up” the value corresponding to the base at that position For each position, we calculate a score by “looking up” the value corresponding to the base at that position
14
Block Diagram for Building a PSSM – Aligned Sequences PSSM builder Set of Aligned Sequence Features Expected frequencies of each sequence element PSSM
15
Block Diagram for Building a PSSM – Unaligned Sequences PSSM builder Set of unaligned sequences Expected frequencies of each sequence element PSSM Parameters for aligning (i.e., expected length)
16
Block Diagram for Searching with a PSSM PSSM search PSSM Set of Sequences to search Sequences that match above threshold Threshold Positions and scores of matches
17
Block Diagram for Searching for sequences related to a family with a PSSM PSSM search PSSM Set of Sequences to search Sequences that match above threshold Threshold Positions and scores of matches PSSM builder Set of Aligned Sequence Features Expected frequencies of each sequence element
18
Consensus sequences vs. PSSMs Should I use a consensus sequence or a frequency matrix to describe my site? Should I use a consensus sequence or a frequency matrix to describe my site? If all allowed characters at a given position are equally "good", use IUB codes to create consensus sequence Example: Restriction enzyme recognition sites If some allowed characters are "better" than others, use PSSM Example: Promoter sequences
19
Consensus sequences vs. frequency matrices Advantages of consensus sequences: smaller description, quicker comparison Advantages of consensus sequences: smaller description, quicker comparison Disadvantage: lose quantitative information on preferences at certain locations Disadvantage: lose quantitative information on preferences at certain locations
20
Reading for next class Jones/Pevzner Ch 6 through section 6.9 (p. 185) Jones/Pevzner Ch 6 through section 6.9 (p. 185) Read paper by Needleman and Wunsch on web site Read paper by Needleman and Wunsch on web site (recommended) Durbin et al, pp 17-32 (recommended) Durbin et al, pp 17-32
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.