BBSI Research Simulation News Project proposals - Monday, June 16 - Format (see News, Presentations and other dates) Renaissance fair and other events.

Slides:



Advertisements
Similar presentations
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Advertisements

Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
STRATEGY FOR GENE REGULATION 1.INFORMATION IN NUCLEIC ACID – CIS ELEMENT CIS = NEXT TO; ACTS ONLY ON THAT MOLECULE 2.TRANS FACTOR (USUALLY A PROTEIN) BINDS.
Announcements 1. Reading Ch. 15: skim btm Look over problems Ch. 15: 5, 6, 7.
Negative regulatory proteins bind to operator sequences in the DNA and prevent or weaken RNA polymerase binding.
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Non-coding RNA William Liu CS374: Algorithms in Biology November 23, 2004.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
Finding approximate palindromes in genomic sequences.
Identification of regulatory elements. Transcriptional Regulation Strongest regulation happens during transcription Best place to regulate: No energy.
Transcription factor binding motifs (part I) 10/17/07.
A Very Basic Gibbs Sampler for Motif Detection Frances Tong July 28, 2004 Southern California Bioinformatics Summer Institute.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY 1 Identifying Regulatory Transcriptional Elements on Functional Gene Groups Using Computer-
Sequence Motifs. Motifs Motifs represent a short common sequence –Regulatory motifs (TF binding sites) –Functional site in proteins (DNA binding motif)
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Computational Biology, Part 2 Representing and Finding Sequence Features using Consensus Sequences Robert F. Murphy Copyright  All rights reserved.
Predicting RNA Structure and Function
Finding Regulatory Motifs in DNA Sequences
Predicting RNA Structure and Function. Nobel prize 1989 Nobel prize 2009 Ribozyme Ribosome.
Finding Regulatory Motifs in DNA Sequences. Motifs and Transcriptional Start Sites gene ATCCCG gene TTCCGG gene ATCCCG gene ATGCCG gene ATGCCC.
Welcome to Molecular Biology Through Discovery Tuesday, 18 September 2012 DNA Structure / Sanger & Tuppy.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
A Statistical Method for Finding Transcriptional Factor Binding Sites Authors: Saurabh Sinha and Martin Tompa Presenter: Christopher Schlosberg CS598ss.
Goals: Discuss 3 examples of transcriptional regulation -Lac operon -Coordinated gene regulation -Regulation of transcription without regulation of polymerase.
Welcome to Advanced Molecular Genetics, Bioinformatics, and Computational Genomics Pattern Recognition and Gene Finding.
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Using Mixed Length Training Sequences in Transcription Factor Binding Site Detection Tools Nathan Snyder Carnegie Mellon University BioGrid REU 2009 University.
Outline More exhaustive search algorithms Today: Motif finding
Transcription/Translation There are two major steps in protein synthesis; the first is transcription and the second is translation.
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Introduction to Bioinformatics Algorithms Finding Regulatory Motifs in DNA Sequences.
Welcome to Molecular Biology Through Discovery Tuesday, 17 September 2013 DNA Structure / Benzer (1959) Finally found a mentor! Is there anything that.
 DNA Microarray. What is DNA Microarray?  DNA Microarray allows scientists to perform an experiment on thousands of genes at the same time.
Welcome to Molecular Biology Through Discovery Tuesday, 17 September 2013 DNA Structure.
Welcome to Introduction to Bioinformatics Scenario 2: Simulation Finding biologically important sites in DNA How to avoid being fooled by imposters? Scenario.
Local Multiple Sequence Alignment Sequence Motifs
Inference with Gene Expression and Sequence Data BMI/CS 776 Mark Craven April 2002.
Motif Search and RNA Structure Prediction Lesson 9.
Special Topics in Genomics Motif Analysis. Sequence motif – a pattern of nucleotide or amino acid sequences GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA.
Intro to Probabilistic Models PSSMs Computational Genomics, Lecture 6b Partially based on slides by Metsada Pasmanik-Chor.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
Transcription factor binding motifs (part II) 10/22/07.
Chapter 11 Review. Explain the difference between each of the following 1. Operator, promoter -Operator: DNA segment where an inhibitor protein binds.
Applications of scan statistics in molecular biology and neuroscience by Chan Hock Peng Dept of Statistics and Applied Probabilty.
Welcome to Advanced Molecular Genetics, Bioinformatics, and Computational Genomics Pattern Recognition and Gene Finding Today is the last class. Would.
Chapter 17 From Gene to Protein. One gene, one protein Chapter 17 From Gene to Protein.
What is gene expression? Gene expression is the activation of a gene that results in a protein.
Gene regulation.
A Quest for Genes What’s a gene? gene (jēn) n.
Welcome to Molecular Biology Through Discovery Tuesday, 18 September 2012 DNA Structure / Sanger & Tuppy.
A Very Basic Gibbs Sampler for Motif Detection
Motifs BCH364C/394P - Systems Biology / Bioinformatics
Learning Sequence Motif Models Using Expectation Maximization (EM)
Control of Gene Expression
Molecular Mechanisms of Gene Regulation
Recitation 7 2/4/09 PSSMs+Gene finding
Transcription -The main purpose of transcription is to create RNA from DNA because RNA leaves the nucleus to carry out its functions but DNA does not -A.
Introduction to Bioinformatics II
A Zero-Knowledge Based Introduction to Biology
Regulation of Transcription Initiation
Nora Pierstorff Dept. of Genetics University of Cologne
Prokaryotic (Bacterial) Gene Regulation
TF candidate selection pipeline.
Motifs BCH339N Systems Biology / Bioinformatics – Spring 2016
Introduction to Bioinformatics Tuesday, 19 March
Presentation transcript:

BBSI Research Simulation News Project proposals - Monday, June 16 - Format (see News, Presentations and other dates) Renaissance fair and other events Party at Greg’s house

BBSI Research Simulation PSSMs and Search for Repeats in DNA Application of PSSMs Regulatory protein and their binding sites Palindromic DNA and its significance How to find protein binding sites: Meme PSSMs to find beginning of genes Repeated sequences and location of protein binding sites Li et al (2002)

Regulatory Protein and their Binding Sites GTA..(8).. TAC 5’-GTGAGTTAGCTCACNNNNNNNNNNTANNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGNNNNNNNNNNNNNNNN 3’-CACTCAATCGAGTGNNNNNNNNNNATNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNNTACNNNNNNNNNNNNNNNN lacZ Crp RNA Polymerase Operator C Presence of CRP sites Regulation by carbon source Presence of X sites Regulation by Y

Regulatory Protein and their Binding Sites 5’-GTGAGTTAGCTCACNNNNNNNNNNTANNNTNNNNNNNNNNNNNNNNNNNNNNNNNNNNATGNNNNNNNNNNNNNNNN 3’-CACTCAATCGAGTGNNNNNNNNNNATNNNANNNNNNNNNNNNNNNNNNNNNNNNNNNNTACNNNNNNNNNNNNNNNN

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA NNNNNNNNNNNNNNNNNNNNNNNNNNNN recognizes GTGAGTT

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA NNNNNNNNNNNNNNNNNNNNNNNNNNNN recognizes GTGAGTT

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA NNNNNNNNNNNNNNNNNNNNNNNNNNNN

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA NNNNNNNNNNNNNNNNNNNNNNNNNNNN

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA NNNNNNNNNNNNNNNNNNNNNNNNNNNN

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA NNNNNNNNNNNNNNNNNNNNNNNNNNNN

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA NNNNNNNNNNNNNNNNNNNNNNNNNNNN recognizes GTGAGTT

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA NNNNNNNNNNNNNNNNNNNNNNNNNNNN

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA NNNNNNNNNNNNNNNNNNNNNNNNNNNN

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA NNNNNNNNNNNNNNNNNNNNNNNNNNNN Palindromes: Serve as binding sites for dimeric protein

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ tRNA TA T G GC AT GC TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT DNA: cruciform RNA: stem/loop

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ TA T G GC AT GC TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell?

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ TA T G GC AT AC TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell?

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ TA T G GC AT A C TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell?

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ TA T G GC AT A T TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell?

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ TA T G GC AT AT TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell?

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ TA T G GC AT GC TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell?

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ TA T G GC AT GC TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT TTAATGTAAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell?

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ TA T G GC AT GC TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT TTAATGTAAGTTAGCTCACTCATT AATTACATTCAATCGAGTGAGTAA Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell?

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ TA T G GC AT GC TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT TTAATGTAAGTTAGCTCACTCATT AATTACATTCAATCGAGTGAGTAA Function of palindrome RNA secondary structure? Binding site for dimeric protein? How to tell?

Regulatory Protein and their Binding Sites Palindromic sequences TTAATGTGAGTTAGCTCACTCATT AATTACACTCAATCGAGTGAGTAA 5’- -3’ 3’- -5’ TA T G GC AT GC TA GC TTAAT TCATT AATTA AGTAA CG TA CG AT CG G T AT TTAATGTAAGTTAGCTCACTCATT AATTACATTCAATCGAGTGAGTAA How to tell? Compensatory mutations: RNA Uncorrelated mutations: protein

Count all in certain class (Li et al, 2000) Guess a pattern and improve (Meme, Gibbs sampler) snRNA U1 (pU1-6)AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1tGCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT nucleolinGCAGGCTCAGTCTTTCGCCTCAGTCTCGAGCTCTCGCTGG snRNP ETGCCGCCGCGTGACCTTCACACTTCCGCTTCCGGTTCTTT rp S14GACACGGAAGTGACCCCCGTCGCTCCGCCCTCTCCCACTC rp S17TGGCCTAAGCTTTAACAGGCTTCGCCTGTGCTTCCTGTTT ribosomal p. S19ACCCTACGCCCGACTTGTGCGCCCGGGAAACCCCGTCGTT a'-tubulin ba'1GGTCTGGGCGTCCCGGCTGGGCCCCGTGTCTGTGCGCACG b'-tubulin b'2GGGAGGGTATATAAGCGTTGGCGGACGGTCGGTTGTAGCA a'-actin skel-m.CCGCGGGCTATATAAAACCTGAGCAGAGGGACAAGCGGCC a'-cardiac actinTCAGCGTTCTATAAAGCGGCCCTCCTGGAGCCAGCCACCC b'-actinCGCGGCGGCGCCCTATAAAACCCAGCGGCGCGACGCGCCA Human sequences 5’ to transcriptional start Regulatory Protein and their Binding Sites How to find them?

Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table ACAGGGCAGAA CCCGGGTGTTT CCGGGGACGCG CCCCCGGGCCT CCGCAGAGCTG Regulatory Protein and their Binding Sites How Meme finds them snRNA U1 (pU1-6)AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1tGCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Human sequences 5’ to transcriptional start

How do pattern finders work? snRNA U1 (pU1-6)AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1tGCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table Step 5. Move around to find local maximum Regulatory Protein and their Binding Sites How Meme finds them

How do pattern finders work? snRNA U1 (pU1-6)AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1tGCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table Step 5. Move around to find local maximum Regulatory Protein and their Binding Sites How Meme finds them Step 6. If probability score high, remember pattern and score

How do pattern finders work? snRNA U1 (pU1-6)AGGTATATGGAGCTGTGACAGGGCAGAAGTGTGTGAAGTC histone H1tGCCCTACCCTATATAAGGCCCCGAGGCCGCCCGGGTGTTT HMG-14CGGCCGGCGGGGAGGGGGAGCCCGCGGCCGGGGACGCGGG TP1GCCAAGGCCTTAAATACCCAGACTCCTGCCCCCGGGCCTT protamine P1CCCTGGCATCTATAACAGGCCGCAGAGCTGGCCCCTGACT Step 1. Arbitrarily choose candidate pattern from a sequence Step 2. Find best matches to pattern in all sequences Step 3. Construct position-dependent frequency table based on matches Step 4. Calculate relative probability of matches from frequency table Step 7. Repeat Steps Regulatory Protein and their Binding Sites How Meme finds them Step 5. Move around to find local maximum Step 6. If probability score high, remember pattern and score

You’ve found a gene related to Purple Tongue Syndrome BlastP: Encoded protein related to cAMP-binding proteins Are the similarities trivial? Related to cAMP binding? Does your protein contain cAMP-binding site? What IS a cAMP-binding site? Task 1.Determine what is a cAMP-binding site 2.Determine if your protein has one Regulatory Protein and their Binding Sites How Meme finds them

1.Collect sequences of known cAMP-binding proteins 2.Run Meme, a pattern-finding program Ask it to find any significant motifs 3.Rerun Meme. Demand that every protein has identified motifs 4.Run Pfam over known sequence to check Do it Strategy Regulatory Protein and their Binding Sites How Meme finds them

aceBACTATGGAGCATCTGCACATGAAAACC atpIACCTCGAAGGGAGCAGGAGTGAAAAAC bioBACGTTTTGGAGAAGCCCCATGGCTCAC glnAATCCAGGAGAGTTAAAGTATGTCCGCT glnHTAGAAAAAAGGAAATGCTATGAAGTCT lacZTTCACACAGGAAACAGCTATGACCATG rpsJAATTGGAGCTCTGGTCTCATGCAGAAC serCGCAACGTGGTGAGGGGAAATGGCTCAA sucAGATGCTTAAGGGATCACGATGCAGAAC trpE CAAAATTAGAGAATAACAATGCAAACA PSSMs in action Identification of beginning of gene Experimentally proven start sites unknown

PSSMs in action Identification of beginning of gene Experimentally proven start sites unknown aceBACTATGGAGCATCTGCACATGAAAACC atpIACCTCGAAGGGAGCAGGAGTGAAAAAC bioBACGTTTTGGAGAAGCCCCATGGCTCAC glnAATCCAGGAGAGTTAAAGTATGTCCGCT glnHTAGAAAAAAGGAAATGCTATGAAGTCT lacZTTCACACAGGAAACAGCTATGACCATG rpsJAATTGGAGCTCTGGTCTCATGCAGAAC serCGCAACGTGGTGAGGGGAAATGGCTCAA sucAGATGCTTAAGGGATCACGATGCAGAAC trpE CAAAATTAGAGAATAACAATGCAAACA

aceBACCACATAACTATGGAGCATCTGCACATGAAAACC atpI ACCTCGAAGGGAGCAG.....GAGTGAAAAAC bioB ACGTTTTGGAGAAGC...CCCATGGCTCAC glnA ATCCAGGAGAGTTA.AAGTATGTCCGCT glnH TAGAAAAAAGGAAATG.....CTATGAAGTCT lacZ TTCACACAGGAAACAG....CTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGG...GAAATGGCTCAA sucA GATGCTTAAGGGATCA....CGATGCAGAAC trpE CAAAATTAGAGAATA...ACAATGCAAACA ACGTACGT PSSMs in action Identification of beginning of gene

aceB ACCACATAACTATGGAGCATCT.GCACATGAAAACC atpI ACCTCGAAGGGAGCAG.....GAGTGAAAAAC bioB ACGTTTTGGAGAAGC...CCCATGGCTCAC glnA ATCCAGGAGAGTTA.AAGTATGTCCGCT glnH TAGAAAAAAGGAAATG.....CTATGAAGTCT lacZ TTCACACAGGAAACAG....CTATGACCATG rpsJ AATTGGAGCTCTGGTCTCATGCAGAAC serC GCAACGTGGTGAGGG...GAAATGGCTCAA sucA GATGCTTAAGGGATCA....CGATGCAGAAC trpE CAAAATTAGAGAATA...ACAATGCAAACA ACGTACGT PSSMs in action Identification of beginning of gene

PSSMs in action Algorithm to find binding sites (Li et al)

Li et al (2002) Algorithm Calculation of probability by Poisson equation Dimer occurred n times. How likely is that? Frequency of GTGAGTT = f 1 Frequency of AACTCAC = f 2 How likely is it to find: GTGAGTTAACTCAC Frequency of joint occurrence = f 1 · f 2 = f 12

Li et al (2002) Algorithm Calculation of probability by Poisson equation Probability of n occurrences of dimer = f 12 · f 12 · f 12 · … · (1-f 12 ) · (1-f 12 ) · (1-f 12 ) · … n timesN - n times NCnNCn N ! n! · (N – n)! ·

Li et al (2002) Algorithm Calculation of probability by Poisson equation Probability of n occurrences of dimer = (f 12 ) n · (1-f 12 ) (N-n) N ! n! · (N – n)! · Expected number = m = f 12 · N f 12 = m / N (m/N) n · (1-m/N) (N-n) N ! n! · (N – n)! ·

Li et al (2002) Algorithm Calculation of probability by Poisson equation Probability of n occurrences of dimer = (m/N) n · (1-m/N) (N-n) N ! n! · (N – n)! · (m) n · (1-m/N) N N ! n! · (N – n)! · (N) n · (1-m/N) n (m) n · (1-m/N) N N ! n! · (N – n)! · (N) n · (1 ) n (m) n · e -m N ! n! · (N – n)! · (N) n · (1 ) n

Li et al (2002) Algorithm Calculation of probability by Poisson equation Probability of n occurrences of dimer = (m) n · e -m N ! n! · (N – n)! · (N) n · (1 ) n (m) n · e -m N · (N-1) · (N – 2) · … (N–n+1) n! (N) n · (1 ) n · (m) n · e -m n!n!