Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS262 Lecture 17, Win07, Batzoglou Gene Regulation and Microarrays.

Similar presentations


Presentation on theme: "CS262 Lecture 17, Win07, Batzoglou Gene Regulation and Microarrays."— Presentation transcript:

1 CS262 Lecture 17, Win07, Batzoglou Gene Regulation and Microarrays

2 CS262 Lecture 17, Win07, Batzoglou Overview A. Gene Expression and Regulation B. Measuring Gene Expression: Microarrays C. Finding Regulatory Motifs

3 CS262 Lecture 17, Win07, Batzoglou Cells respond to environment Cell responds to environment— various external messages

4 CS262 Lecture 17, Win07, Batzoglou Genome is fixed – Cells are dynamic A genome is static  Every cell in our body has a copy of same genome A cell is dynamic  Responds to external conditions  Most cells follow a cell cycle of division Cells differentiate during development Gene expression varies according to:  Cell type  Cell cycle  External conditions  Location slide credits: M. Kellis

5 CS262 Lecture 17, Win07, Batzoglou Where gene regulation takes place Opening of chromatin Transcription Translation Protein stability Protein modifications

6 CS262 Lecture 17, Win07, Batzoglou Transcriptional Regulation Efficient place to regulate: No energy wasted making intermediate products However, slowest response time After a receptor notices a change: 1.Cascade message to nucleus 2.Open chromatin & bind transcription factors 3.Recruit RNA polymerase and transcribe 4.Splice mRNA and send to cytoplasm 5.Translate into protein

7 CS262 Lecture 17, Win07, Batzoglou Transcription Factors Binding to DNA Transcription regulation: Certain transcription factors bind DNA Binding recognizes DNA substrings: Regulatory motifs

8 CS262 Lecture 17, Win07, Batzoglou Promoter and Enhancers Promoter necessary to start transcription Enhancers can affect transcription from afar

9 CS262 Lecture 17, Win07, Batzoglou Regulation of Genes Gene Regulatory Element RNA polymerase (Protein) Transcription Factor (Protein) DNA

10 CS262 Lecture 17, Win07, Batzoglou Regulation of Genes Gene RNA polymerase Transcription Factor (Protein) Regulatory Element DNA

11 CS262 Lecture 17, Win07, Batzoglou Regulation of Genes Gene RNA polymerase Transcription Factor Regulatory Element DNA New protein

12 TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATA CATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTC AGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTC CGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACT AGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATG ATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAA AAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAAT TGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAA TTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGG ATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGAT TTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAAT CTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATG AACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATC ATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAA AAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCA GCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAA CTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGA TAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTT GGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAG...TTGCGAA GTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAA TGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGA TACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGT TCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACAT TTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAA AGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAAT ACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTAC AACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATAT CAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCG TTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTC TTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATT AATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGT TCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATG TTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATA CCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATG TTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTA AGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGA TTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATA GTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATG CTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACT TAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGAT TGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAAT

13 TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATA CATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTC AGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTC CGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACT AGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATG ATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAA AAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAAT TGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAA TTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGG ATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGAT TTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAAT CTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATG AACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATC ATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAA AAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCA GCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAA CTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGA TAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTT GGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAG...TTGCGAA GTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAA TGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGA TACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGT TCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACAT TTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAA AGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAAT ACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTAC AACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATAT CAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCG TTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTC TTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATT AATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGT TCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATG TTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATA CCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATG TTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTA AGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGA TTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATA GTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATG CTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACT TAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGAT TGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATTT Promoter motifs 3’ UTR motifsExons Introns

14 CS262 Lecture 17, Win07, Batzoglou Example: A Human heat shock protein TATA box: positioning transcription start TATA, CCAAT: constitutive transcription GRE: glucocorticoid response MRE:metal response HSE:heat shock element TATASP1 CCAAT AP2 HSE AP2CCAAT SP1 promoter of heat shock hsp70 0 --158 GENE

15 CS262 Lecture 17, Win07, Batzoglou The Cell as a Regulatory Network Genes = wires Motifs = gates ABMake DC If C then D If B then NOT D If A and B then D D Make BD If D then B C gene D gene B

16 CS262 Lecture 17, Win07, Batzoglou The Cell as a Regulatory Network (2)

17 CS262 Lecture 17, Win07, Batzoglou DNA Microarrays Measuring gene transcription in a high- throughput fashion

18 CS262 Lecture 17, Win07, Batzoglou What is a microarray

19 CS262 Lecture 17, Win07, Batzoglou What is a microarray Measure the level of mRNA messages in a cell DNA 1 DNA 3 DNA 5DNA 6 DNA 4 DNA 2 cDNA 4 cDNA 6 Hybridize Gene 1 Gene 3 Gene 5Gene 6 Gene 4 Gene 2 Measure RNA 4 RNA 6 RT slide credits: M. Kellis

20 CS262 Lecture 17, Win07, Batzoglou What is a microarray

21 CS262 Lecture 17, Win07, Batzoglou What is a microarray A 2D array of DNA sequences from thousands of genes Each spot has many copies of same gene Measure number of hybridizations per spot Result: Thousands of “experiments” – one per gene – in one go Perform many microarrays for different conditions:  Time during cell cycle  Temperature  Nutrient level

22 CS262 Lecture 17, Win07, Batzoglou Goal of Microarray Experiments Measure level of gene expression across many different conditions:  Expression Matrix M: {genes}  {conditions}: M ij = |gene i | in condition j Group genes into coregulated sets  Observe cells under different conditions  Find genes with similar expression profiles Potentially regulated by same TF slide credits: M. Kellis

23 CS262 Lecture 17, Win07, Batzoglou Clustering vs. Classification Clustering  Idea: Groups of genes that share similar function have similar expression patterns Hierarchical clustering k-means Bayesian approaches Projection techniques Principal Component Analysis Independent Component Analysis Classification  Idea: A cell can be in one of several states (Diseased vs. Healthy, Cancer X vs. Cancer Y vs. Normal)  Can we train an algorithm to use the gene expression patterns to determine which state a cell is in? Support Vector Machines Decision Trees Neural Networks K-Nearest Neighbors

24 CS262 Lecture 17, Win07, Batzoglou Clustering Algorithms b e d f a c h g abdefghc K-means b e d f a c h g c1 c2 c3 abghcdef Hierarchical slide credits: M. Kellis

25 CS262 Lecture 17, Win07, Batzoglou Hierarchical clustering Bottom-up algorithm:  Initialization: each point in a separate cluster At each step:  Choose the pair of closest clusters  Merge The exact behavior of the algorithm depends on how we define the distance CD(X,Y) between clusters X and Y Avoids the problem of specifying the number of clusters b e d f a c h g slide credits: M. Kellis

26 CS262 Lecture 17, Win07, Batzoglou Distance between clusters CD(X,Y)=min x  X, y  Y D(x,y) Single-link method CD(X,Y)=max x  X, y  Y D(x,y) Complete-link method CD(X,Y)=avg x  X, y  Y D(x,y) Average-link method CD(X,Y)=D( avg(X), avg(Y) ) Centroid method e d f h g e d f h g e d f h g e d f h g slide credits: M. Kellis

27 CS262 Lecture 17, Win07, Batzoglou Results of Clustering Gene Expression CLUSTER is simple and easy to use De facto standard for microarray analysis Time: O(N 2 M) N: #genes M: #conditions

28 CS262 Lecture 17, Win07, Batzoglou K-Means Clustering Algorithm Each cluster X i has a center c i Define the clustering cost criterion COST(X 1,…X k ) = ∑ Xi ∑ x  Xi |x – c i | 2 Algorithm tries to find clusters X 1 …X k and centers c 1 …c k that minimize COST K-means algorithm:  Initialize centers  Repeat: Compute best clusters for given centers → Attach each point to the closest center Compute best centers for given clusters → Choose the centroid of points in cluster  Until the COST is “small” b e d f a c h g c1 c2 c3 slide credits: M. Kellis

29 CS262 Lecture 17, Win07, Batzoglou K-Means Algorithm Randomly Initialize Clusters

30 CS262 Lecture 17, Win07, Batzoglou K-Means Algorithm Assign data points to nearest clusters

31 CS262 Lecture 17, Win07, Batzoglou K-Means Algorithm Recalculate Clusters

32 CS262 Lecture 17, Win07, Batzoglou K-Means Algorithm Recalculate Clusters

33 CS262 Lecture 17, Win07, Batzoglou K-Means Algorithm Repeat

34 CS262 Lecture 17, Win07, Batzoglou K-Means Algorithm Repeat

35 CS262 Lecture 17, Win07, Batzoglou K-Means Algorithm Repeat … until convergence Time: O(KNM) per iteration N: #genes M: #conditions

36 CS262 Lecture 17, Win07, Batzoglou Mixture of Gaussians – Probabilistic K-means Data is modeled as mixture of K Gaussians  N(  1,  2 I), …, N(  K,  2 I)  Prior probabilities  1, …,  K Different  i for every Gaussian i, or even different covariance matrices are possible, but learning becomes harder  P(x) = ∑ i P(x | N(  1,  2 I))   I  Use EM to learn parameters

37 CS262 Lecture 17, Win07, Batzoglou Visualizing clustering output slide credits: M. Kellis

38 CS262 Lecture 17, Win07, Batzoglou 4. Analysis of Clustering Data Statistical Significance of Clusters  Gene Ontologyhttp://www.geneontology.org/http://www.geneontology.org/  KEGG http://www.genome.jp/kegg/http://www.genome.jp/kegg/ Regulatory motifs responsible for common expression Regulatory Networks Experimental Verification

39 CS262 Lecture 17, Win07, Batzoglou Evaluating clusters – Hypergeometric Distribution +–N experiments, p labeled +, (N-p) – +Cluster: k elements, m labeled + +P-value of single cluster containing k elements of which at least r are + Prob that a randomly chosen set of k experiments would result in m positive and k-m negative P-value of uniformity in computed cluster slide credits: M. Kellis

40 CS262 Lecture 17, Win07, Batzoglou Finding Regulatory Motifs

41 CS262 Lecture 17, Win07, Batzoglou Regulatory Motif Discovery DNA Group of co-regulated genes Common subsequence Find motifs within groups of corregulated genes slide credits: M. Kellis

42 CS262 Lecture 17, Win07, Batzoglou Characteristics of Regulatory Motifs Tiny Highly Variable ~Constant Size  Because a constant-size transcription factor binds Often repeated Low-complexity-ish

43 CS262 Lecture 17, Win07, Batzoglou Sequence Logos

44 CS262 Lecture 17, Win07, Batzoglou Problem Definition Probabilistic Motif: M ij ; 1  i  W 1  j  4 M ij = Prob[ letter j, pos i ] Find best M, and positions p 1,…, p N in sequences Combinatorial Motif M: m 1 …m W Some of the m i ’s blank Find M that occurs in all s i with  k differences Given a collection of promoter sequences s 1,…, s N of genes with common expression

45 CS262 Lecture 17, Win07, Batzoglou Algorithms Probabilistic 1.Expectation Maximization: MEME 2.Gibbs Sampling: AlignACE, BioProspector Exhaustive CONSENSUS, TEIRESIAS, SP-STAR, MDscan

46 CS262 Lecture 17, Win07, Batzoglou Discrete Approaches to Motif Finding

47 CS262 Lecture 17, Win07, Batzoglou Discrete Formulations Given sequences S = {x 1, …, x n } A motif W is a consensus string w 1 …w K Find motif W * with “best” match to x 1, …, x n Definition of “best”: d(W, x i ) = min hamming dist. between W and a word in x i d(W, S) =  i d(W, x i )


Download ppt "CS262 Lecture 17, Win07, Batzoglou Gene Regulation and Microarrays."

Similar presentations


Ads by Google