Journal club 06/27/08
Phylogenetic footprinting A technique used to identify TFBS within a non- coding region of DNA of interest by comparing it to the orthologous sequences in different species (1988 by Tagle) The function and DNA binding preferences of transcription factors are well-conserved between diverse species Important non-coding DNA sequences that are essential for regulating gene expression will show differential selective pressure. (A slower rate of change occurs in TFBS than in other parts of the non-coding genome)
Phylogenetic footprinting
Not all conserved sequences are under selection pressure To eliminate false positives, statistical analysis must be performed that the motifs reported have a mutation rate meaningfully less than that of the surrounding nonfunctional sequence.
Mixture selective pressure Maintain the function of the protein encoded by the gene (AA selecvtive pressure) Maintain their regulatory role (CRUNCS) – ex: regulatory factors binding sites
Methods Sequence conservation: 1.Entropy score 2.Parsimony score Conservation p-value (mixture models) Posterior distributions of conservation scores Conditional p-values
Parsimony V.S. Entropy Score
Fitch’s algorithm
Aligned codons illustrating to what extent the conservation of each column is surprising, given the amino acids encoded L: CUN, UUA, UUG; W: UUG; V: GUN; A:GCN; G:GGN; D: GAU, GAC How surprising? 1. Conditional model
2. Mixture model Hydrophobic favor Glycine favor 50 functional classes
2. Mixture model
Non-coding model: HKY model represents the transition/transversion rate ratio for
How to compute :