Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.

Similar presentations


Presentation on theme: "Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology."— Presentation transcript:

1 Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology searching, the equivalent genes in one species to those known to be important in other model species. The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology searching, the equivalent genes in one species to those known to be important in other model species. Logic: if the linear alignment of a pair of sequences is similar, then we can infer that the 3-dimensional structure is similar; if the 3-D structure is similar then there is a good chance that the function is similar. Logic: if the linear alignment of a pair of sequences is similar, then we can infer that the 3-dimensional structure is similar; if the 3-D structure is similar then there is a good chance that the function is similar.

2 BASIC LOCAL ALIGNMENT SEARCH TOOLS (BLAST) BLAST programs (there are several) compare a query sequence to all the sequences in a database in a pairwise manner. BLAST programs (there are several) compare a query sequence to all the sequences in a database in a pairwise manner. Breaks: query and database sequences into fragments known as "words", and seeks matches between them. Breaks: query and database sequences into fragments known as "words", and seeks matches between them. Attempts to align query words of length "W" to words in the database such that the alignment scores at least a threshold value, "T". known as High-Scoring Segment Pairs (HSPs) Attempts to align query words of length "W" to words in the database such that the alignment scores at least a threshold value, "T". known as High-Scoring Segment Pairs (HSPs) HSPs are then extended in either direction in an attempt to generate an alignment with a score exceeding another threshold, "S", known as a Maximal-Scoring Segment Pair (MSP) HSPs are then extended in either direction in an attempt to generate an alignment with a score exceeding another threshold, "S", known as a Maximal-Scoring Segment Pair (MSP)

3 2 sequence alignment To align GARFIELDTHECAT with GARFIELDTHERAT is easy GARFIELDTHECAT ||||||||||| || GARFIELDTHERAT

4 Gaps Sometimes, you can get a better overall alignment if you insert gaps GARFIELDTHECAT |||||||| ||| GARFIELDA--CAT is better (scores higher) than GARFIELDTHECAT||||||||GARFIELDACAT

5 No gap penalty But there has to be some sort of a gap- penalty otherwise you can align ANY two sequences: G-R--E------AT | | | || GARFIELDTHECAT

6 Affine gap penalty Could set a score for each indel Could set a score for each indel Usually use affine (open + extend) Usually use affine (open + extend) Open –10, extend -0.05 Open –10, extend -0.05

7 2+ similar sequences When doing a similarity search against a database When doing a similarity search against a database you are trying to decide which of many sequences is the CLOSEST match to your search sequence. Which of the following alignment pairs is better?: Which of the following alignment pairs is better?:

8 Scoring Alignments GARFIELDTHECAT |||| ||||||| GARFRIEDTHECATGARFIELDTHECAT GARWIELESHECATGARFIELDTHECAT GAVGIELDTHEMAT

9 Willie Taylor’s AA Venn Diagram

10 Substitution matrices #BLOSUM 90 A R N D C Q E G H I L A R N D C Q E G H I L A 5 -2 -2 -3 -1 -1 -1 0 -2 -2 -2 R -2 6 -1 -3 -5 1 -1 -3 0 -4 -3 N -2 -1 7 1 -4 0 -1 -1 0 -4 -4 D -3 -3 1 7 -5 -1 1 -2 -2 -5 -5 C -1 -5 -4 -5 9 -4 -6 -4 -5 -2 -2 Q -1 1 0 -1 -4 7 2 -3 1 -4 -3 E -1 -1 -1 1 -6 2 6 -3 -1 -4 -4 G 0 -3 -1 -2 -4 -3 -3 6 -3 -5 -5 H -2 0 0 -2 -5 1 -1 -3 8 -4 -4 I -2 -4 -4 -5 -2 -4 -4 -5 -4 5 1 L -2 -3 -4 -5 -2 -3 -4 -5 -4 1 5

11 Low Complexity Masking Some sequences are similar even if they have no recent Some sequences are similar even if they have no recent common ancestor. Huntington's disease is caused by poly CAG tracks in the DNA which results in polyGlutamine (Gln, Q) tracks in the protein. Huntington's disease is caused by poly CAG tracks in the DNA which results in polyGlutamine (Gln, Q) tracks in the protein. If you do a homology search with QQQQQQQQQQ you get hits to other proteins that have a lot of glutamines but have totally different function. If you do a homology search with QQQQQQQQQQ you get hits to other proteins that have a lot of glutamines but have totally different function.

12 2 sequence alignment Huntingtin: MATLEKLMKA FESLKSFQQQ QQQQQQQQQQ QQQQQQQQQQ PPPPPPPPPP PQLPQPPPQA hits >MM16_MOUSE MATRIX METALLOPROTEINASE-16 Score = 34.4 bits (78), Expect = 0.18 Identities = 21/65 (32%), Positives = 25/65 (38%), Gaps = 2/65 (3%): FQQQQQQQQQQQQQQQQQQQQQQQPPPPPPPPPPPQLPQPPPQ--AQPLLPQPQPPPPPP F Q + + Q Q+ PP PPP LP PP P P+ P PP FYQYMETDNFKLPNDDLQGIQKIYGPPDKIPPPTRPLPTVPPHRSVPPADPRRHDRPKPP But not because it is involved in microtubule mediated transport!

13 E values An E-value is a measure of the probability of any given hit occurring by chance. An E-value is a measure of the probability of any given hit occurring by chance. Dependent on the size of the query sequence and the database. Dependent on the size of the query sequence and the database. The lower the E-value the more confidence you can have that a hit is a true homologue (sequence related by common descent). The lower the E-value the more confidence you can have that a hit is a true homologue (sequence related by common descent).

14 Dotplot theory A T G A T A T T C T T A........... T........... G........... T........... C........... Task: align ATGATATTCTT and ATTGTTC Another way of comparing 2 sequences

15 A T G A T A T T C T T A........... T. +.. +. +.. +. T........... G........... T........... C........... Go along the first seq inserting a + wherever 2/3 bases in a moving window match. The first seq is compared to ATT (the first 3 bases in the vertical sequence)

16 A T G A T A T T C T T A........... T. +.. +. +.. +. T. +..... +... G........... T........... C........... Then go along the first seq inserting a + wherever 2/3 bases in a moving window match. The first seq is compared to TTG (the next 3 in the vertical sequence).

17 A T G A T A T T C T T A........... T. +.. +. +.. +. T. +..... +... G.. +..... +.. T... +..... +. T....... +... C........... Iterate until

18 A T G A T A T T C T T A T + + + + T + + G + + T + + T + C The human eye is particularly good at picking up structure from the pattern of dots. You might see a hint of a duplicated region in the horizontal sequence that is not so clear from the sequence itself

19

20


Download ppt "Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology."

Similar presentations


Ads by Google