Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Discovery of Structural and Functional Features in RNA Pseudoknots Qingfeng Chen and Yi-Ping Phoebe Chen, Senior Member, IEEE IEEE TRANSACTIONS ON KNOWLEDGE.

Similar presentations


Presentation on theme: "1 Discovery of Structural and Functional Features in RNA Pseudoknots Qingfeng Chen and Yi-Ping Phoebe Chen, Senior Member, IEEE IEEE TRANSACTIONS ON KNOWLEDGE."— Presentation transcript:

1 1 Discovery of Structural and Functional Features in RNA Pseudoknots Qingfeng Chen and Yi-Ping Phoebe Chen, Senior Member, IEEE IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 21, NO. 7, JULY 2009 Adviser: Yu-Chiang Li Speaker: Shao-Hsiang Hung Date:2009/12/10

2 2 Outline Introduction Introduction Material and Methods Material and Methods Results Results Conclusion and Discussion Conclusion and Discussion

3 3 I. Introduction

4 4 I. Introduction(1/6) Accurately predicting the functions of biological macromolecules is one of the biggest challenges in functional genomics. Accurately predicting the functions of biological macromolecules is one of the biggest challenges in functional genomics. RNA molecules play a central role in a number of biological functions within cells, from the transfer of genetic information from DNA to protein, to enzymatic catalysis. RNA molecules play a central role in a number of biological functions within cells, from the transfer of genetic information from DNA to protein, to enzymatic catalysis.

5 5 I. Introduction(2/6) To fulfill this range of functions, a simple linear nucleotide string of RNA including: To fulfill this range of functions, a simple linear nucleotide string of RNA including: uracil, guanine, cytosine, adenine, uracil, guanine, cytosine, adenine, forms a variety of complex three-dimensional structures. forms a variety of complex three-dimensional structures. pseudoknot pseudoknot an RNA structure an RNA structure base pairing between a loop base pairing between a loop formed by an orthodox secondary structure formed by an orthodox secondary structure

6 6 I. Introduction(3/6)

7 7 I. Introduction(4/6) PseudoBase is the only online database containing: PseudoBase is the only online database containing: Structural, functional, and sequence data of RNA pseudoknots Structural, functional, and sequence data of RNA pseudoknots Unfortunately, the analysis of this valuable data set is underdeveloped Unfortunately, the analysis of this valuable data set is underdeveloped Difficulty in modeling Difficulty in modeling Complexity in computing structural Complexity in computing structural

8 8 I. Introduction(5/6) Association rule mining has been successfully used to discover valuable information in a larger data set. Association rule mining has been successfully used to discover valuable information in a larger data set. Limitations with multivalued variables Limitations with multivalued variables Categorical multivalued valuables (such as color {red, blue, green}) Categorical multivalued valuables (such as color {red, blue, green}) Quantitative multivalued variables (such as weight {[40, 50], [50, 75]}) Quantitative multivalued variables (such as weight {[40, 50], [50, 75]}) The relationships are captured by using The relationships are captured by using Conditional probability matrix Conditional probability matrix

9 9 I. Introduction(6/6) We develop a framework to identify potential top-k covering rule groups in RNA pseudoknots We develop a framework to identify potential top-k covering rule groups in RNA pseudoknots Relationships Relationships Structure-function Structure-function Structure-category Structure-category Significant ratios of stems and loops. Significant ratios of stems and loops. Allows users to regulate k and the minsupp threshold and compare between rules in the same group. Allows users to regulate k and the minsupp threshold and compare between rules in the same group. Handling high dimensional data Handling high dimensional data Enhances the understanding of structure-function relationships Enhances the understanding of structure-function relationships

10 10 II. Material and Methods

11 11 II. Material and Methods (1/20) Pseudoknot Data. Pseudoknot Data. S1, S2, L1, L2, and L3 S1, S2, L1, L2, and L3 stem 1, stem 2, loop 1, loop 2, and loop 3 stem 1, stem 2, loop 1, loop 2, and loop 3 A, G, C, and U A, G, C, and U adenine, guanine, cytosine, and uracil, adenine, guanine, cytosine, and uracil, vr, vt, vf, v3, v5, vo, rr, mr, tm, ri, ap, ot, and ar vr, vt, vf, v3, v5, vo, rr, mr, tm, ri, ap, ot, and ar viral ribosomal readthrough signals, viral tRNA-like structures, viral ribosomal frameshifting signals, other viral 30-UTR, other viral 50-UTR,viral others, rRNA, mRNA, tmRNA, Ribozymes, Aptamers, artifical molecules, others viral ribosomal readthrough signals, viral tRNA-like structures, viral ribosomal frameshifting signals, other viral 30-UTR, other viral 50-UTR,viral others, rRNA, mRNA, tmRNA, Ribozymes, Aptamers, artifical molecules, others ss, tc and fs ss, tc and fs self-splicing, translation control, and viral frameshifting self-splicing, translation control, and viral frameshifting

12 12 II. Material and Methods (2/20) Let X and Y be multivalued attribute valuables Let X and Y be multivalued attribute valuables x and y be items x and y be items p(X) p(X) p(Y|X) p(Y|X) minsupp be the minimum support in the context

13 13 II. Material and Methods (3/20) The data here is collected from PseudoBase The data here is collected from PseudoBase Organism Organism RNA type RNA type Bracket view of structure Bracket view of structure Classified by two stems and three loops Classified by two stems and three loops Nucleotide squence Nucleotide squence Size Size

14 14 II. Material and Methods (4/20) A data set consisting of 225 H- pseudoknots is obtained A data set consisting of 225 H- pseudoknots is obtained

15 15 II. Material and Methods (5/20)

16 16 II. Material and Methods (6/20) Partition of Attributes Partition of Attributes {class, function, stem, loop, base, ratio, length} {class, function, stem, loop, base, ratio, length} the last one is a quantitative attribute. the last one is a quantitative attribute. Propose a novel partition in conjunction with the properties of pseudoknot data and top-k rule groups. Propose a novel partition in conjunction with the properties of pseudoknot data and top-k rule groups.

17 17 II. Material and Methods (7/20)

18 18 II. Material and Methods (8/20) The domain of quantitative attribute has to be partitioned into intervals The domain of quantitative attribute has to be partitioned into intervals 1) The number of intervals 2) The size of each intervals For example For example (14,15] included in stem 1, stem 2, and loop 1 but not in loop 3 (14,15] included in stem 1, stem 2, and loop 1 but not in loop 3

19 19 II. Material and Methods (9/20) Definition 1. Definition 1. a quantitative attribute y divided into a set of intervals {y 1,..., y n } using the categorical item x i such that for any base interval y j, y j consists of a single value for 1 ≦ j ≦ n. a quantitative attribute y divided into a set of intervals {y 1,..., y n } using the categorical item x i such that for any base interval y j, y j consists of a single value for 1 ≦ j ≦ n. The partition using x i is defined as {(y 1i, max(y 2i )];... ; (max(y m1i ), max(y mi )]}. The partition using x i is defined as {(y 1i, max(y 2i )];... ; (max(y m1i ), max(y mi )]}. Table 2 presents the distribution of sizes of stem 1 and stem 2 of pseudoknots in PseudoBase. Table 2 presents the distribution of sizes of stem 1 and stem 2 of pseudoknots in PseudoBase.

20 20 II. Material and Methods (10/20) Definition 1. Definition 1. For example For example Y 1 = {0, (0, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5, 6], (6, 7], (7, 8], (8, 9], (9, 10], (10, 11], (11, 12], (12, 13], (13, 14], (14, 15], (15, 16], (16, 17], (17, 18], (18, 19], (19, 20], (20, 21], (21, 22]} Y 1 = {0, (0, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5, 6], (6, 7], (7, 8], (8, 9], (9, 10], (10, 11], (11, 12], (12, 13], (13, 14], (14, 15], (15, 16], (16, 17], (17, 18], (18, 19], (19, 20], (20, 21], (21, 22]}

21 21 II. Material and Methods (11/20) Denfinion 2. Denfinion 2. Suppose Y i = {y 1i,..., y mi } and Y i+1 = {y 1i+1,..., y ni+1 } are two adjacent partitions. Let Y =ψ. Suppose Y i = {y 1i,..., y mi } and Y i+1 = {y 1i+1,..., y ni+1 } are two adjacent partitions. Let Y =ψ. The integration of them is defined as The integration of them is defined as

22 22 II. Material and Methods (12/20) Denfinion 2. Denfinion 2. For example For example stem 1 as Y1 ={0, (0, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5, 6], (6, 7], (7, 8], (8, 9], (9, 10], (10, 11], (11, 12], (12, 13], (13, 14], (14, 15], (15, 16], (16, 17], (17, 18], (18, 19], (19, 20], (20, 21], (21, 22]}. stem 1 as Y1 ={0, (0, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5, 6], (6, 7], (7, 8], (8, 9], (9, 10], (10, 11], (11, 12], (12, 13], (13, 14], (14, 15], (15, 16], (16, 17], (17, 18], (18, 19], (19, 20], (20, 21], (21, 22]}. stem 2 as Y2 ={0, (0, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5, 6], (6, 7], (7, 8], (8, 9], (9, 10], (10, 11], (11, 12], (12, 13], (13, 14], (14, 15],..., (31, 32], (32, 33]} stem 2 as Y2 ={0, (0, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5, 6], (6, 7], (7, 8], (8, 9], (9, 10], (10, 11], (11, 12], (12, 13], (13, 14], (14, 15],..., (31, 32], (32, 33]}

23 23 II. Material and Methods (13/20) the integrated partition of Y1 and Y2 the integrated partition of Y1 and Y2 {0, (0, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5, 6], (6, 7], (7, 8], (8, 9], (9, 10], (10, 11], (11, 12], (12, 13], (13, 14], (14, 15], (15, 16], (16, 17], (17, 18], (18, 19], (19, 22], (22, 33]}. {0, (0, 1], (1, 2], (2, 3], (3, 4], (4, 5], (5, 6], (6, 7], (7, 8], (8, 9], (9, 10], (10, 11], (11, 12], (12, 13], (13, 14], (14, 15], (15, 16], (16, 17], (17, 18], (18, 19], (19, 22], (22, 33]}.

24 24 II. Material and Methods (14/20) In comparison, the values of ratio attributes are positive real numbers rather than integers. In comparison, the values of ratio attributes are positive real numbers rather than integers. |y i | = 1 in Definition 3.1 needs to be changed to |y i | =1 or |y i | =0.5. |y i | = 1 in Definition 3.1 needs to be changed to |y i | =1 or |y i | =0.5. |x| =1 and |x c | =1 in Definition 3.2 are changed to |x| =1 and |x c | =1 or |x| =0.5 and |x c | =0.5. |x| =1 and |x c | =1 in Definition 3.2 are changed to |x| =1 and |x c | =1 or |x| =0.5 and |x c | =0.5. Avoid missing interesting knowledge. Avoid missing interesting knowledge.

25 25 II. Material and Methods (15/20) Generation of rule groups. Generation of rule groups. Work out the conditional probabilities for X and Y in the probability matrix below. Work out the conditional probabilities for X and Y in the probability matrix below. the conditional probability the conditional probability Y = y i, given X = x i,as p(y i |x i ) = p(x i |y i ) * p(y i )/p(x i ) Y = y i, given X = x i,as p(y i |x i ) = p(x i |y i ) * p(y i )/p(x i )

26 26 II. Material and Methods (16/20) For example: For example: x,y as stem1,the size interval => (3,4] of stem1 By Table2, n = 255, p(255/255)=1 Addition Table2, (3,4] of stem1 with four nuleotides = 42 AndSo

27 27 II. Material and Methods (17/20) Compute the entire conditional probabilities of stem 1, namely [p(y 1 | stem1) p(y 2 | stem1)... p(y n | stem1)] Compute the entire conditional probabilities of stem 1, namely [p(y 1 | stem1) p(y 2 | stem1)... p(y n | stem1)] Stem 2, loop1, loop3 can computed Stem 2, loop1, loop3 can computed

28 28 II. Material and Methods (18/20) Suppose M Y|X corresponding to an association AS consists of a set of rows {r 1,..., r n }. Suppose M Y|X corresponding to an association AS consists of a set of rows {r 1,..., r n }. A ={A 1,..., A m } be the complete set of antecedent items of AS A ={A 1,..., A m } be the complete set of antecedent items of AS C = {C 1,..., C k } be the complete set of consequent items of AS C = {C 1,..., C k } be the complete set of consequent items of AS Namely Namely

29 29 II. Material and Methods (19/20) Definition 3 (Rule group) Definition 3 (Rule group) Let Let be a rule group with an antecedent item x and consequent support set C. Definition 4 Definition 4 Let Let

30 30 II. Material and Methods (20/20) For example For example In Table 2 k max = 21 top-1 covering rule group = {stem1 → (2,3], stem2 → (5,6]}. top-2 covering rule group = {stem1 → (2,3], stem1 → (3,4], stem2 → (5,6], stem2 → (4,5]}.

31 31 III. Results

32 32 III. Results (1/4)

33 33 III. Results (2/4)

34 34 III. Results (3/4)

35 35 III. Results (4/4)

36 36 IV. Conclusion and Discussion

37 37 IV. Conclusion and Discussion (1/2) If more rules are considered together, a further understanding of pseudoknot ’ s structure and function can be achieved. If more rules are considered together, a further understanding of pseudoknot ’ s structure and function can be achieved. This paper aims to analyze increasingly available RNA pseudoknot data and identifies interesting patterns from PseudoBase. This paper aims to analyze increasingly available RNA pseudoknot data and identifies interesting patterns from PseudoBase.

38 38 IV. Conclusion and Discussion (2/2) The obtained rule groups reveal the structural properties of pseudoknots and imply potential structurefunction and structure-class relationships in RNA molecules. The obtained rule groups reveal the structural properties of pseudoknots and imply potential structurefunction and structure-class relationships in RNA molecules. Moreover, the interpretation of rules demonstrates their significance in the sense of biology. Moreover, the interpretation of rules demonstrates their significance in the sense of biology.


Download ppt "1 Discovery of Structural and Functional Features in RNA Pseudoknots Qingfeng Chen and Yi-Ping Phoebe Chen, Senior Member, IEEE IEEE TRANSACTIONS ON KNOWLEDGE."

Similar presentations


Ads by Google