Download presentation
Presentation is loading. Please wait.
Published byWidyawati Sudirman Modified over 6 years ago
1
Prediction of Regulatory Elements for Non-Model Organisms Rachita Sharma, Patricia Evans, Virendra Bhavsar Faculty of Computer Science, University of New Brunswick, Fredericton, NB, Canada E3B 5A3 Introduction Determination of regulatory networks from available data is one of the major challenges in bioinformatics research. A regulatory network of an organism is represented by a set of genes and their regulatory relationships, which indicate how a gene or a group of genes affect (inhibit or activate) production of other gene products as shown in Figure 1. Some organisms such as yeast, Arabidopsis thaliana (thale cress, a plant) and fruit fly are being investigated very thoroughly by biologists as model organisms. We are developing a system to predict the regulatory relationships of a non-model organism (target genome), about which less information is known, using information about the regulatory relationships of a related model organism (source genome). If the organisms are closely related then the regulatory relationships are likely to be similar. Differences in the regulatory relationships between organisms can be determined by using data from both the model and non-model organisms. This research started as a part of the bioinformatics research component of the Canadian Potato Genome Project. Analysis This methodology has been implemented for mapping regulatory elements and their regulatory network. The first step of mapping regulatory elements has been tested on Yeast (Saccharomyces cerevisiae) and Arabidopsis thaliana as the source and target genomes, respectively, which diverged approximately 1.6 Giga-years ago. For any pair of genomes, only some of the transcription factors from one genome can be mapped to another genome, since the evolutionary distance between them leads to many false negatives. In addition, the number of confirmed mappings between any two genomes is unknown as it depends on the definition of a confirmed mapping used in the experiment. The predicted transcription factors are compared on the basis of how likely a sequence predicted as a transcription factor is to be a transcription factor of the target genome how likely the predicted transcription factor is to correspond to the correct type of transcription factor from the source genome Therefore, the predicted transcription factors are compared to a set of 1922 available transcription factors of the Arabidopsis thaliana genome to determine the actual number of transcription factors predicted. Results | Inhibition Activation Figure 3: Number of hit sequences divided into four types (Confirmed, Similar, Other TF and Not TF) using TF-Seq for BLAST e-value cut-off parameter of 0.1 Figure 2: Number of true positives, false positives, false negatives and true negatives for transcription factors identified using TF-Seq, TF-Fam, and TF-SubFam Figure 1: Example of gene regulatory network Objectives Determine associations between the genes that act as regulatory elements (transcription factors and target genes) in model and non-model organisms Predict the regulatory relationships in a non-model organism Transcription factor mapping based on having the same protein domain family has better performance than the other two methods based on sequence similarity and having the same protein domain sub-family as shown in Figure 2. Also, the transcription factors predicted are of the correct type as illustrated in Figure 3 and the sequences with similar annotation may be part of the false positives. Figure 4 shows that target gene mapping by finding TFBS motifs in promoters has better performance than the other methods. The sequence similarity in BS-Blast is not useful for mapping target genes, showing that target genes with similar binding sites do not need to have high sequence similarity. Also, using BS-Nuc to refine the results of BS-Prom using the Nucleosomes Position Prediction tool does not improve the performance of the results, showing the effects of the variable position of the transcription-suppressing nucleosomes. Methodology Find transcription factors of the target genome using the available regulatory element information of the source organism based on Similar sequences (TF-Seq) Same protein domain family (TF-Fam) Same protein domain sub-family (TF-SubFam) Map target genes from the source genome to the target genome based on finding transcription factor binding site motifs (TFBS) in Nucleotide data of the target genome (BS-Seq) Promoter data of the target genome (BS-Prom) Similar target gene sequences of source genome in the target genome (BS-Blast) Nucleotide data of the target genome discarding binding sites located in the predicted regions of nucleosome occupancy (BS-Nuc) Figure 4: Number of true positives, false positives, false negatives and true negatives for target genes identified using BS-Seq, BS-Prom, BS-Blast and BS-Nuc Conclusion These results in this work show that TF-Fam and BS-Prom are promising methods for predicting regulatory elements for a non-model organism based on a model organism. These regulatory elements can be used further to predict the regulatory network of the non-model organism. Gene expression data will be used to further refine the regulatory network to understand how the predicted regulatory relationships correspond to the expression levels of the genes in the data
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.