Download presentation
Presentation is loading. Please wait.
Published byEdmund Robbins Modified over 9 years ago
1
Improving Intergenic miRNA Target Genes Prediction Rikky Wenang Purbojati
2
miRNA MicroRNA (miRNA) is a class of RNA which is believed to play important roles in gene regulation. It’s a short (21- to 23-nt) RNAs that bind to the 3 ′ untranslated regions (3 ′ UTRs) of target genes.
3
miRNA Characteristics Short (22-25nts) miRNA plays a major role in RNA Induced Silencing Complex (RISC). miRNAs control the expression of large numbers of genes by: mRNA degradation Translational repression Expression of miRNA will reduce the expression of its target genes Intergenic miRNA gene is located outside gene bodies
4
Basic miRNA problem Finding miRNA true target genes is not a trivial task One approach is to make a computational prediction before validating it in wet-lab experiments one basic challenge of miRNA: Given a miRNA sequence, what is its target genes?
5
miRNA sequence target prediction Several requirements for matching: Strong Watson-Crick base pairing of the 5’ seed (2-8 nts) Conservation of the miRNA binding site across species Local miRNA-mRNA interaction with positive balance of minimum free energy Available tools for target genes prediction: PicTar, TargetScan, miRanda,microT, etc. Most tool’s prediction does not complement each other, because they use different criteria
6
Problem and Opportunity Problem: Pure computational target genes prediction produces a lot of candidates Most of them are not validated Common assumption is that most of them are false positives Can we shorten the list to include only the strong candidates ? Opportunity: Lots of publicly available experimental dataset i.e. cDNA microarray, miRNA microarray, etc. Use the dataset to computationally invalidate some of the target genes
7
Assumptions miRNA works by silencing target genes, thus miRNA gene and target genes should be anti-correlated Intragenic miRNA are expressed along with the host gene. a host gene should be anti-correlated with a target gene Intergenic miRNA does not have a host gene, but its real target genes should be correlated together The real target genes should be down-expressed whenever the intergenic miRNA is expressed.
8
How to invalidate a target gene prediction A target gene prediction can be invalidated by using a set of microarray datasets For Intragenic miRNA target gene: If a target gene’s expressions has no correlation with the host gene’s expression, we assume that the target gene does not influenced by the host gene For Intergenic miRNA target gene: If a target gene behaves inconsistently compared to other target genes, we assume that it might not be affected by the miRNA gene
9
Filtering Intergenic miRNA Target Gene Prediction Use a combination of 8 prediction tools to produce the initial predictions (union & intersection) Use a collection of 190 microarray datasets to invalidate some of the predictions Use a greedy method to approximate the final subset of high-confidence target genes
10
Consistent Target Genes We need to establish the meaning of consistent target genes In this context, target gene A and target gene B is consistent if: For all microarray datasets in which gene A is down-regulated, then gene B is also down-regulated M1M2M3M4…Mk DHX9 ↑↓↑↓↑ ASTE1 ↓↓↓↓↑ C20ORF133 ↓↓↑↓↓ PARP11 ↑↓↓↓↓ SLC32A1 ↓↓↓↓↑ PPAPDC2 ↓↓↑↓↓ SCHIP1 ↑↓↓↓↑ MPST ↑↓↓↓↓
11
Greedy Method Given a set of target gene predictions, and a collection of microarray dataset: We wanted to find: The longest subset of consistent target genes The highest number of down-regulated target genes in the subset
12
Reasoning Why we wanted to find: The longest subset of consistent target genes? Consistent target genes, on large number of microarray dataset with different experiments, might indicate that they are affected by a common factor, which may be microRNA The longest subset ensures high probability of including the true target genes The highest number of down-regulated target genes in the subset? Since miRNA works by down-regulating target genes, it is desirable to find the largest subset of consistently down-regulated target genes
13
Current Algorithm for i = 0 to K A <- G[i] SigA <- signature(A) Temp_Subset = {SigA} down = countDownExpressedMicroarray(A) for j = 0 to K B <- G[j] SigB <= signature(B) if SigA == SigB Temp_Subset U {SigB} end if end for if (length(Temp_subset) > length(Subset)) && (down > downexpr_cnt) subset = Temp_Subset downexpr_cnt = down end if end for
14
Algorithm Limitations The algorithm result might be biased based on the first pivot gene expression signature : Might get stuck on local maxima Can be solved by prioritizing, sorting of target gene down- expression value, or random selection of pivot gene The subset is an approximation of high-confidence target genes, but it doesn’t necessarily include all real target genes (because of supporting data limitation)
15
Benchmarking Compare the performance with other prediction tools, based on: Number of correct predictions (based on validated target genes) Number of predictions The algorithm will use an initial target predictions with: 2, 3, and 4 prediction tools support
16
Performance Comparison
17
Sensitivity-Specificity Comparison
18
Conclusion In general, the approximation method shows better sensitivity compared to other prediction tools Specificity can be improved by including only target gene that is supported by more than 2 prediction tools
19
Further Work Adjusting the scoring function to find the optimum balance between the length of the subset and the number of down-regulated target genes Implementing a threshold on target gene signaturing to further reduce the specificity
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.