Input: Alignment. Model parameters from neutral sequence Estimation example
Estimation example 2
HMM version
Protein Coding Gene Known non-coding gene: XIST ch10 chX RepA Different gene conservation patterns
Find a ML estimator for using the EM algorithm. Score: Decompose Q by “extracting” the stationary distribution: R: Neutral substitution pattern : Site specific forces Estimating
Unlikeliness Score Rate Score Comparison
43% vs 16% detection by vs. Proof of concept
Gene and gene regulation
GTACTAAGCTACTGTATGGAGGCT Human Mouse *****GAGC**********ATGC* Dog *****AGGT**********CGGC* Bat *****AGCT**********AGAC* Find regions in the alignment whose substitution pattern is explained by the motif. x x x A generalization: Conserved motif discovery
P53 MDM2 Novel non coding gene M. Huarte, O. Zuk, M. Guttman P53 Motif instance conservation