Identification of Helix-Turn-Helix (HTH) DNA-Binding Motifs Changhui Yan Department of Computer Science Utah State University
HTH Motifs Protein sequences sharing low similarities can fold into a similar HTH structure. Identifying HTH motifs from sequence is extremely challenging 7 families containing HTH motifs from the Pfam database. Positive data set: 2,198 proteins. Negative data set: 1,518 proteins.
Combination of Amino Acid Sequence and Predicted Secondary Structure LQQITHIANQL-GLE----KDVVRVWF LQQITHIANQL-GLE----KDVVRVWF HHHEEHEEEHMHE----HHEEMMEH HMM_AA HMM_AA_SS
Reduced Alphabets Schemes for reducing amino acid alphabet based on the BLOSUM50 matrix by Henikoff and Henikoff (1992) derived by grouping and averaging the similarity matrix elements as described in the text. (Murphy et al. 2000)
Results Table 1. Cross-Families Evaluations True Positive 1 False Positive 2 HMM_AA 3 HMM_AA_SS (20 letters) 3 227 (Murphy_15) 3 474 (Murphy_10) 3 470 (Murphy_8) 3 431 5 True positive: HTH motifs that are correctly identified as such. False positive: Non-HTH motifs that are identified as HTH motifs. The alphabet used to encode amino acid sequences.
Results Table 2. Comparisons with a method based on profile-profile comparisons Total HTH motifs FFAS03 and HMM_AA_SS FFAS03 only HMM_AA_SS only 563 135 24 71 Table 3. Putative HTH motifs in Ureaplasma parvum Protein Location Annotation from Uniprot sp|Q9PQE5|SCPB_UREPA 176-214 Participates to chromosomal partition during cell division sp|Q9PQV6|RPOB_UREPA 540-587 DNA-directed RNA polymerase sp|Q9PR27|SYY_UREPA 340-380 Tyrosyl-tRNA synthetase sp|Q9PQC2|SYA_UREPA 217-265 Alanyl-tRNA synthetase sp|Q9PQ74|DPO3A_UREPA 365-400 DNA polymerase III subunit alpha sp|Q9PQX7|Y166_UREPA 507-553 Hypothetical protein