New approaches for determining functional siRNA Liyang Diao Dr. Stanley Dunn, advisor
Protein Production DNA mRNAprotein Production of proteins starts with DNA DNA is in the nucleus Requires mRNA to finish protein production mRNA: messenger RNA RNAi: RNA interference Suppresses gene expression Affects mRNA
Diagram: siRNA : short-interfering RNA Typically nucleotides long Double-stranded Participates in RNAi by degrading mRNA Potential for effective gene therapy Issues Some genes are more effectively suppressed than others Mechanism is poorly understood More on RNAi
Question How do we know which siRNA are functional? Some ideal properties: GC content between 30-55% Low level of secondary structure Differential between thermodynamic stability of 5’ and 3’ ends: A/U content Specific positional nucleotide preferences Avoid long GC stretches
Previous Model Pancoska’s Eulerian graph model Represent a string of siRNA by a directed digraph first Construct a weighted undirected Eulerian graph A T GC Compare graphs for functional and non functional siRNA For these two sets of siRNA, compute graph properties that reflect sequence structure.
Issues with Pancoska’s Algorithm Uniqueness Complex pattern recognition Other Ideas Number of nucleotide mutations Levenshtein distance: A T GC A T T C G T G G A C G G A T T C G T G G A C C G A T T C G T G G A … Measures the minimum number of substitutions/insertions required to go from one string to another.
Current/Future Progress 4 20 total number of possible siRNA strands of length 20. How many are potentially functional? Combinatorics!
Let H(n,i,j) be the number of potential positions of A/U, G/C pairs. Thus, the total number of potential strings is 2 20 * H(n,i,j). n the total number of G or C nucleotides i the total number of A or U nucleotides at 5’ end j the total number of A or U nucleotides at 3’ end Math Quantity desired: