Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Bioinformatics Tuesday, 19 March

Similar presentations


Presentation on theme: "Introduction to Bioinformatics Tuesday, 19 March"— Presentation transcript:

1 Introduction to Bioinformatics Tuesday, 19 March

2 Are genes encoding proteins with all the universal motifs of cytosine methyltransferases commonly found in phages?

3 Define motifs (known proteins)
Are genes encoding proteins with all the universal motifs of cytosine methyltransferases commonly found in phages? Define motifs (known proteins) Find motif (unknown proteins)

4 Motifs – not only for proteins!
Position-specific scoring matrices (PSSMs)

5 Motifs – not only for proteins!
Nature of Regulatory Sites

6 Motifs – not only for proteins!
Nature of Regulatory Sites Sequence Filter Known sites

7 Motifs – not only for proteins!
Nature of Regulatory Sites Genomic sequence Predicted sites Unknown sites Sequence Filter

8 Nature of Sequence Filters
Hidden Markov model-based methods Ad hoc methods Position-dependent scoring matrix (PSSM) = Position-specific frequency table = Weight table

9 Some of 106 aligned human promoter sequences (near -26)
Making a PSSM CCCTATATAAGGC... histone H1t CGCTATAAAAACT... HMG-17 GGGTATATAAGCG... b'-tubulin b'2 GGCTATATAAAAC... a'-actin skel-m. TTCTATAAAGCGG... a'-cardiac actin CCCTATAAAACCC... b'-actin GAGTATAAAGCAC... keratin I 50K GGTTATAAAAACA... vimentin CAGTATAAAAGGG... a'1(I) collagen CCGTATAAATAGG... a'2(I) collagen TCCCATATAAGCC... fibronectin Some of 106 aligned human promoter sequences (near -26) Consensus TATAAA

10 Some of 106 aligned human promoter sequences (near -26)
Making a PSSM CCCTATATAAGGC... histone H1t CGCTATAAAAACT... HMG-17 GGGTATATAAGCG... b'-tubulin b'2 GGCTATATAAAAC... a'-actin skel-m. TTCTATAAAGCGG... a'-cardiac actin CCCTATAAAACCC... b'-actin GAGTATAAAGCAC... keratin I 50K GGTTATAAAAACA... vimentin CAGTATAAAAGGG... a'1(I) collagen CCGTATAAATAGG... a'2(I) collagen TCCCATATAAGCC... fibronectin Some of 106 aligned human promoter sequences (near -26)

11 Where to get a training set?
Making a PSSM Where to get a training set? Experimentally proven regulatory sites Orthologs of genes in different organisms Not too far (divergence of binding sites) Not too close (hidden amidst overall similarity) Experimentally indicated coregulated genes Suspected coregulated genes

12 Experimentally proven start sites
Using a PSSM atpI ACCTCGAAGGGAGCAGGAGTGAAAAAC bioB ACGTTTTGGAGAAGCCCCATGGCTCAC glnA ATCCAGGAGAGTTAAAGTATGTCCGCT glnH TAGAAAAAAGGAAATGCTATGAAGTCT lacZ TTCACACAGGAAACAGCTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGGGAAATGGCTCAA sucA GATGCTTAAGGGATCACGATGCAGAAC trpE CAAAATTAGAGAATAACAATGCAAACA Experimentally proven start sites

13 Experimentally proven start sites
Using a PSSM ? Unknown start site aceB ACTATGGAGCATCTGCACATGAAAACC atpI ACCTCGAAGGGAGCAGGAGTGAAAAAC bioB ACGTTTTGGAGAAGCCCCATGGCTCAC glnA ATCCAGGAGAGTTAAAGTATGTCCGCT glnH TAGAAAAAAGGAAATGCTATGAAGTCT lacZ TTCACACAGGAAACAGCTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGGGAAATGGCTCAA sucA GATGCTTAAGGGATCACGATGCAGAAC trpE CAAAATTAGAGAATAACAATGCAAACA Experimentally proven start sites

14 Experimentally proven start sites
Using a PSSM ? Unknown start site aceB ACTATGGAGCATCTGCACATGAAAACC atpI ACCTCGAAGGGAGCAGGAGTGAAAAAC bioB ACGTTTTGGAGAAGCCCCATGGCTCAC glnA ATCCAGGAGAGTTAAAGTATGTCCGCT glnH TAGAAAAAAGGAAATGCTATGAAGTCT lacZ TTCACACAGGAAACAGCTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGGGAAATGGCTCAA sucA GATGCTTAAGGGATCACGATGCAGAAC trpE CAAAATTAGAGAATAACAATGCAAACA Experimentally proven start sites

15 Using a PSSM atpI ACCTCGAAGGGAGCAG.....GAGTGAAAAAC
bioB ACGTTTTGGAGAAGC...CCCATGGCTCAC glnA ATCCAGGAGAGTTA.AAGTATGTCCGCT glnH TAGAAAAAAGGAAATG.....CTATGAAGTCT lacZ TTCACACAGGAAACAG....CTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGG...GAAATGGCTCAA sucA GATGCTTAAGGGATCA....CGATGCAGAAC trpE CAAAATTAGAGAATA...ACAATGCAAACA A C G T

16 Using a PSSM aceB ACCACATAACTATGGAGCATCTGCACATGAAAACC
atpI ACCTCGAAGGGAGCAG.....GAGTGAAAAAC bioB ACGTTTTGGAGAAGC...CCCATGGCTCAC glnA ATCCAGGAGAGTTA.AAGTATGTCCGCT glnH TAGAAAAAAGGAAATG.....CTATGAAGTCT lacZ TTCACACAGGAAACAG....CTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGG...GAAATGGCTCAA sucA GATGCTTAAGGGATCA....CGATGCAGAAC trpE CAAAATTAGAGAATA...ACAATGCAAACA A C G T

17 Using a PSSM aceB ACCACATAACTATGGAGCATCT.GCACATGAAAACC
atpI ACCTCGAAGGGAGCAG.....GAGTGAAAAAC bioB ACGTTTTGGAGAAGC...CCCATGGCTCAC glnA ATCCAGGAGAGTTA.AAGTATGTCCGCT glnH TAGAAAAAAGGAAATG.....CTATGAAGTCT lacZ TTCACACAGGAAACAG....CTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGG...GAAATGGCTCAA sucA GATGCTTAAGGGATCA....CGATGCAGAAC trpE CAAAATTAGAGAATA...ACAATGCAAACA A C G T

18 What to do with no training set?
New pattern discovery (Meme, Gibbs sampler, BioProspector) snRNA U1 (pU1-6) AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1t GCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14 CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1 GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1 CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT nucleolin GCAGGCTCAGTCTTTCGCCTCAGTCTCGAGCTCTCGCTGG snRNP E TGCCGCCGCGTGACCTTCACACTTCCGCTTCCGGTTCTTT rp S14 GACACGGAAGTGACCCCCGTCGCTCCGCCCTCTCCCACTC rp S17 TGGCCTAAGCTTTAACAGGCTTCGCCTGTGCTTCCTGTTT ribosomal p. S19 ACCCTACGCCCGACTTGTGCGCCCGGGAAACCCCGTCGTT a'-tubulin ba'1 GGTCTGGGCGTCCCGGCTGGGCCCCGTGTCTGTGCGCACG b'-tubulin b'2 GGGAGGGTATATAAGCGTTGGCGGACGGTCGGTTGTAGCA a'-actin skel-m. CCGCGGGCTATATAAAACCTGAGCAGAGGGACAAGCGGCC a'-cardiac actin TCAGCGTTCTATAAAGCGGCCCTCCTGGAGCCAGCCACCC b'-actin CGCGGCGGCGCCCTATAAAACCCAGCGGCGCGACGCGCCA Human sequences 5’ to transcriptional start

19 Things to do

20 ME


Download ppt "Introduction to Bioinformatics Tuesday, 19 March"

Similar presentations


Ads by Google