Download presentation
Presentation is loading. Please wait.
Published byLewis Dean Modified over 9 years ago
2
BBSI Research Simulation News Project proposals - Monday, June 16 - Format (see News, Presentations and other dates) Renaissance fair and other events Party at Greg’s house
3
BBSI Research Simulation PSSMs and Search for Repeats in DNA Application of PSSMs Regulatory protein and their binding sites Palindromic DNA and its significance How to find protein binding sites: Meme PSSMs to find beginning of genes Repeated sequences and location of protein binding sites Li et al (2002)
4
Regulatory Protein and their Binding Sites GTA..(8).. TAC 5’-GTGAGTTAGCTCACNNNNNNNNNNTANNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGNNNNNNNNNNNNNNNN 3’-CACTCAATCGAGTGNNNNNNNNNNATNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNNTACNNNNNNNNNNNNNNNN lacZ Crp RNA Polymerase Operator C Presence of CRP sites Regulation by carbon source Presence of X sites Regulation by Y
5
Regulatory Protein and their Binding Sites 5’-GTGAGTTAGCTCACNNNNNNNNNNTANNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGNNNNNNNNNNNNNNNN 3’-CACTCAATCGAGTGNNNNNNNNNNATNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNNTACNNNNNNNNNNNNNNNN
6
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA NNNNNNNNNNNNNNNNNNNNNNNNNNNN recognizes GTGAGTT
7
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA NNNNNNNNNNNNNNNNNNNNNNNNNNNN recognizes GTGAGTT
8
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA NNNNNNNNNNNNNNNNNNNNNNNNNNNN
9
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA NNNNNNNNNNNNNNNNNNNNNNNNNNNN
10
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA NNNNNNNNNNNNNNNNNNNNNNNNNNNN
11
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA NNNNNNNNNNNNNNNNNNNNNNNNNNNN
12
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA NNNNNNNNNNNNNNNNNNNNNNNNNNNN recognizes GTGAGTT
13
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA NNNNNNNNNNNNNNNNNNNNNNNNNNNN
14
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA NNNNNNNNNNNNNNNNNNNNNNNNNNNN
15
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA NNNNNNNNNNNNNNNNNNNNNNNNNNNN Palindromes: Serve as binding sites for dimeric protein
16
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ tRNA TA T G GC AT GC TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT DNA: cruciform RNA: stem/loop
17
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ TA T G GC AT GC TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell?
18
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ TA T G GC AT AC TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell?
19
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ TA T G GC AT A C TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell?
20
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ TA T G GC AT A T TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell?
21
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ TA T G GC AT AT TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell?
22
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ TA T G GC AT GC TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell?
23
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ TA T G GC AT GC TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT TTAATGTAAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell?
24
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ TA T G GC AT GC TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT TTAATGTAAGTTAGCTCACTCATT AATTACATTCAATCGAGTGAGTAA Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell?
25
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ TA T G GC AT GC TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT TTAATGTAAGTTAGCTCACTCATT AATTACATTCAATCGAGTGAGTAA Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell?
26
Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ TA T G GC AT GC TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT TTAATGTAAGTTAGCTCACTCATT AATTACATTCAATCGAGTGAGTAA How to tell? Compensatory mutations: RNA Uncorrelated mutations: protein
27
Count all in certain class (Li et al, 2000) Guess a pattern and improve (Meme, Gibbs sampler) snRNA U1 (pU1-6)AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1tGCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT nucleolinGCAGGCTCAGTCTTTCGCCTCAGTCTCGAGCTCTCGCTGG snRNP ETGCCGCCGCGTGACCTTCACACTTCCGCTTCCGGTTCTTT rp S14GACACGGAAGTGACCCCCGTCGCTCCGCCCTCTCCCACTC rp S17TGGCCTAAGCTTTAACAGGCTTCGCCTGTGCTTCCTGTTT ribosomal p. S19ACCCTACGCCCGACTTGTGCGCCCGGGAAACCCCGTCGTT a'-tubulin ba'1GGTCTGGGCGTCCCGGCTGGGCCCCGTGTCTGTGCGCACG b'-tubulin b'2GGGAGGGTATATAAGCGTTGGCGGACGGTCGGTTGTAGCA a'-actin skel-m.CCGCGGGCTATATAAAACCTGAGCAGAGGGACAAGCGGCC a'-cardiac actinTCAGCGTTCTATAAAGCGGCCCTCCTGGAGCCAGCCACCC b'-actinCGCGGCGGCGCCCTATAAAACCCAGCGGCGCGACGCGCCA Human sequences 5’ to transcriptional start Regulatory Protein and their Binding Sites How to find them?
28
Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table ACAGGGCAGAA CCCGGGTGTTT CCGGGGACGCG CCCCCGGGCCT CCGCAGAGCTG Regulatory Protein and their Binding Sites How Meme finds them snRNA U1 (pU1-6)AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1tGCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Human sequences 5’ to transcriptional start
29
How do pattern finders work? snRNA U1 (pU1-6)AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1tGCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table Step 5. Move around to find local maximum Regulatory Protein and their Binding Sites How Meme finds them
30
How do pattern finders work? snRNA U1 (pU1-6)AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1tGCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table Step 5. Move around to find local maximum Regulatory Protein and their Binding Sites How Meme finds them Step 6. If probability score high, remember pattern and score
31
How do pattern finders work? snRNA U1 (pU1-6)AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1tGCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table Step 7. Repeat Steps 1 - 5 Regulatory Protein and their Binding Sites How Meme finds them Step 5. Move around to find local maximum Step 6. If probability score high, remember pattern and score
32
You’ve found a gene related to Purple Tongue Syndrome BlastP: Encoded protein related to cAMP-binding proteins Are the similarities trivial? Related to cAMP binding? Does your protein contain cAMP-binding site? What IS a cAMP-binding site? Task 1.Determine what is a cAMP-binding site 2.Determine if your protein has one Regulatory Protein and their Binding Sites How Meme finds them
33
1.Collect sequences of known cAMP-binding proteins 2.Run Meme, a pattern-finding program Ask it to find any significant motifs 3.Rerun Meme. Demand that every protein has identified motifs 4.Run Pfam over known sequence to check Do it Strategy Regulatory Protein and their Binding Sites How Meme finds them
34
aceBACTATGGAGCATCTGCACATGAAAACC atpIACCTCGAAGGGAGCAGGAGTGAAAAAC bioBACGTTTTGGAGAAGCCCCATGGCTCAC glnAATCCAGGAGAGTTAAAGTATGTCCGCT glnHTAGAAAAAAGGAAATGCTATGAAGTCT lacZTTCACACAGGAAACAGCTATGACCATG rpsJAATTGGAGCTCTGGTCTCATGCAGAAC serCGCAACGTGGTGAGGGGAAATGGCTCAA sucAGATGCTTAAGGGATCACGATGCAGAAC trpE CAAAATTAGAGAATAACAATGCAAACA PSSMs in action Identification of beginning of gene Experimentally proven start sites unknown
35
PSSMs in action Identification of beginning of gene Experimentally proven start sites unknown aceBACTATGGAGCATCTGCACATGAAAACC atpIACCTCGAAGGGAGCAGGAGTGAAAAAC bioBACGTTTTGGAGAAGCCCCATGGCTCAC glnAATCCAGGAGAGTTAAAGTATGTCCGCT glnHTAGAAAAAAGGAAATGCTATGAAGTCT lacZTTCACACAGGAAACAGCTATGACCATG rpsJAATTGGAGCTCTGGTCTCATGCAGAAC serCGCAACGTGGTGAGGGGAAATGGCTCAA sucAGATGCTTAAGGGATCACGATGCAGAAC trpE CAAAATTAGAGAATAACAATGCAAACA
36
aceBACCACATAACTATGGAGCATCTGCACATGAAAACC atpI ACCTCGAAGGGAGCAG.....GAGTGAAAAAC bioB ACGTTTTGGAGAAGC...CCCATGGCTCAC glnA ATCCAGGAGAGTTA.AAGTATGTCCGCT glnH TAGAAAAAAGGAAATG.....CTATGAAGTCT lacZ TTCACACAGGAAACAG....CTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGG...GAAATGGCTCAA sucA GATGCTTAAGGGATCA....CGATGCAGAAC trpE CAAAATTAGAGAATA...ACAATGCAAACA ACGTACGT PSSMs in action Identification of beginning of gene
37
aceB ACCACATAACTATGGAGCATCT.GCACATGAAAACC atpI ACCTCGAAGGGAGCAG.....GAGTGAAAAAC bioB ACGTTTTGGAGAAGC...CCCATGGCTCAC glnA ATCCAGGAGAGTTA.AAGTATGTCCGCT glnH TAGAAAAAAGGAAATG.....CTATGAAGTCT lacZ TTCACACAGGAAACAG....CTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGG...GAAATGGCTCAA sucA GATGCTTAAGGGATCA....CGATGCAGAAC trpE CAAAATTAGAGAATA...ACAATGCAAACA ACGTACGT PSSMs in action Identification of beginning of gene
38
PSSMs in action Algorithm to find binding sites (Li et al)
39
Li et al (2002) Algorithm Calculation of probability by Poisson equation Dimer occurred n times. How likely is that? Frequency of GTGAGTT = f 1 Frequency of AACTCAC = f 2 How likely is it to find: GTGAGTTAACTCAC Frequency of joint occurrence = f 1 · f 2 = f 12
40
Li et al (2002) Algorithm Calculation of probability by Poisson equation Probability of n occurrences of dimer = f 12 · f 12 · f 12 · … · (1-f 12 ) · (1-f 12 ) · (1-f 12 ) · … n timesN - n times NCnNCn N ! n! · (N – n)! ·
41
Li et al (2002) Algorithm Calculation of probability by Poisson equation Probability of n occurrences of dimer = (f 12 ) n · (1-f 12 ) (N-n) N ! n! · (N – n)! · Expected number = m = f 12 · N f 12 = m / N (m/N) n · (1-m/N) (N-n) N ! n! · (N – n)! ·
42
Li et al (2002) Algorithm Calculation of probability by Poisson equation Probability of n occurrences of dimer = (m/N) n · (1-m/N) (N-n) N ! n! · (N – n)! · (m) n · (1-m/N) N N ! n! · (N – n)! · (N) n · (1-m/N) n (m) n · (1-m/N) N N ! n! · (N – n)! · (N) n · (1 ) n (m) n · e -m N ! n! · (N – n)! · (N) n · (1 ) n
43
Li et al (2002) Algorithm Calculation of probability by Poisson equation Probability of n occurrences of dimer = (m) n · e -m N ! n! · (N – n)! · (N) n · (1 ) n (m) n · e -m N · (N-1) · (N – 2) · … (N–n+1) n! (N) n · (1 ) n · (m) n · e -m n!n!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.