Promoter Panel Review
Background related Promoter In genetics, a promoter is a DNA sequence that enables a gene to be transcribed. It may be very long and may have multiple elements.geneticsDNAgenetranscribed In geWorkBench, Promoter Panel is used to discover potential transcription factor binding sites, based on known transcription factor binding profiles.
Background Cont’d Available Transcription factor binding profile databases: Transfac: most complete but commercial, about 700 matrices JASPAR: open source. Now it has 3 categories: JASPAR CORE: 123 profiles JASPAR PHYLOFACTS: 174 profiles JASPAR FAM: familial profiles based on CORE. geWorkbench uses 108 matrices from an old versioned JASPAR CORE.
Background Cont’d For sequence AAAGTA: SCORE = 21/21* 21/21* 21/21 * 21/21 * 8/21 * 6/21 = 0.108
Algorithm Normalize the matrix, P(i) will be > 0. The formula for the score is very simple: = ΣlogP(i) Create a background sequence, two ways to create background sequence. Scan the background sequence to set up the threshold. For a length of 1K background sequence, you can get about Matrix.length scores. The threshold is based on the P-value. For example, for P-Value = The threshold is the lowest score for the top 5% scores. Scan the input sequence and report hits above the threshold. Report results In summary, The result is very stringent. Bonferroni Correction is used. P-Value is really PValue/1K. Best for detecting enrichment of some patterns.
Issues - Programmatic The algorithm is not very efficient. For every TF, one scan of the background and input sequence is required. Most of the time is spent on scanning background sequences. Do all tests on Protein sequences. Stop button doesn’t work. Different species. The 13K background sequence? Different programs use different background sequence. Module discovery is not correctly programmed? Too stringent for finding hits, good for checking enrichments. Miss “All Sequences” button. What can we do after we get the patterns? Save result do not work properly.
Issues - GUI The logo is in poor quality. It should provide more information and should be in a separate panel. Separate parameters and results. The TFBS should be marked with direction, 5’ or 3’. Use updated Sequence Panel. No Image snapshot function.
Proposed fixes Update JASPAR Profiles Provide more information about the matrix. Use JASPAR Logos for JASPAR CORE, Use enoLogos instead of BioJava for user defined matrix to get high quality pictures. Scan once only. Get more information about 13K sequences. Cache the threshold for 13K. Change the GUI.