Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generation of patterns from gene expression by assigning confidence to differentially expressed genes Elisabetta Manduchi, Gregory R. Grant, Steven E.McKenzie,

Similar presentations


Presentation on theme: "Generation of patterns from gene expression by assigning confidence to differentially expressed genes Elisabetta Manduchi, Gregory R. Grant, Steven E.McKenzie,"— Presentation transcript:

1 Generation of patterns from gene expression by assigning confidence to differentially expressed genes Elisabetta Manduchi, Gregory R. Grant, Steven E.McKenzie, G. Christian Overton, Saul Surrey, Christian J. Stoeckert Presented by Keith Betts

2 Goal: Provide tools to aid in the analysis of data collected from highly parallel gene expression experiments. Generate descriptive and dependable expression patterns representing the differential expression of genes across cell types.

3 Identify those genes that are ‘most likely’ to be differentially expressed. Transform typical ‘raw’ input into easily interpretable list of patterns

4

5 Patterns from Gene Expression

6 What is ??? PaGE is free downloadable Perl software (tested mainly on Unix systems) which can be used as a statistical test for differentially expressed genes between two experimental conditions, given replicated expiriments. Available at: http://www.cbil.upenn.edu/PaGE/

7 Methods and Algorithm Input consists of normalized data (the normalization procedure depends on the kind of experiments conducted) The input normalized intensities are subjected to preprocessing steps.

8 In each gene tag’s expression pattern there will be one symbol for each homotypic group (set of samples of the same type) For each homotypic group and for each gene tag, compute the average intensity of that tag over the group which have values for that tag. This average will represent the intensity of that tag at that group. Methods and Algorithm Cont.

9 Two Stage Approach First: Attach an ordered list of real numbers to each tag. Second: Bin the numbers in this list, resulting in a pattern of integers.

10 First Stage Fix an ordering of the groups in the collection. Attach to each tag the ordered list of real numbers obtained by dividing each of its non-reference group intensities by the median of its group intensities. List of ratios attached to the tag.

11 Second Stage For each non-reference group, partition the range into disjoint subintervals. Number the bins using consecutive integers –m,…,0,….m (where 0 corresponds to ratio 1) Attach the ordered list of integers to each gene tag.

12 Example For group i: Divide the range into m i + n i + 1 bins. The list of ratios from the first stage for a certain gene tag is (r 1, r 2,…., r l ) Each r i belongs to exactly one of the bins B i,j. The expression pattern associated with this tag is then (j 1, j 2,…, j l )

13 Choose level cutoffs Suppose we are taking ratios to a reference homotypic group (group 0) and are focusing on a fixed group (group i). Suppose also that we have replicate experiments for each of the two groups. Concentrate on up-regulation

14 Goal Goal is to achieve a certain degree of confidence in the assertion: ‘this gene is up-regulated at group i as compared to the reference group’

15 Each gene will have a distribution of intensities in a group, whose mean will be called ‘the true mean intensity of the gene at that group’ Denote the Random Variable giving the intensity of gene g at group j by X g,j, and denote the Mean and Std. Dev as  g,j,  g,j

16 False Positive Rate Prob((X g,I / X g,0 ) > C i | (  g,j /  g,0 ) < 1)

17 Claim that (Ave. g,I /  g,j ) / (Ave. g,0 /  g,0 ) > C i ) And (  g,j /  g,0 ) < 1 Are independent events.

18 Seek C i as small as possible such that: Prob( (Ave. g,I /  g,j ) / (Ave. g,0 /  g,0 ) > C i ) < s%

19 Approximate (Ave. g,j /  g,j ) for (j = 0, i) ((X g,j,k / Ave. g,j ) – 1) / Sqrt(t j – 1) + 1

20 Compute the desired C i through integration If fj ( j = 0,i) is the density function for Ave. g,j /  g,j, and C is fixed, then evaluate using

21 If this is above the desired false positive rate, them C is raised and the integral is recalculated. Repeat process until the desired false positive rate is attained.

22 Down-regulation Proceed in similar manner Seek c i as small as possible such that: Prob( (Ave. g,I /  g,j ) / (Ave. g,0 /  g,0 ) > c i ) < s%

23 Once the C i ’s and the c i ’s are determined for each reference group I, if the ratio of the average intensity of a gene tag at group i, and the average intensity of the same gene tag at the reference group is between C i and C i 2, we say that the gene tag is up-regulated one level at this group as compared to the reference group.

24 One can now estimate the probability Prob(not up | predicted up) Prob(not up) * Prob(predicted up | not up) / Prob(predicted up)  Prob(predicted up | not up) / Prob(predicted up)

25 As a consequence of this approach, when we see a level different from 0, we have a certain confidence in the gene tag being up-regulated or down-regulated as compared to the reference group. However, when we see a 0 there is no confidence implied. We can only take 0 to mean that we do not have enough evidence to support a change in level.

26 Results Application to an erythroid development nylon filter dataset

27 Background Erythroid development dataset contains 5 homotypic groups representing an erythroleukemic cell line and normal cells under different conditions There are repliate data for each of the groups.

28 Background Continued The groups are: 1.CD34 positive cells 2.Human adult erythroblasts 3.Cord erythroblasts 4.HEL cells 5.HEL cells treated with hemin

29 Application Available replicates Two CD34 Three adult erythroblasts Two cord blood erythroblasts Three HEL Two HEL + hemin

30 The value of d is set at 15 Only the moderate to highly abundant mRNA classes are likely to have given hybridization signals above background on the filter array. Set the HEL group as reference

31 Two approaches PaGE was run once merging the adult and the cord erythoblasts into one group with five replicates PaGe was run a second time keeping the adult and cord erythoblasts in separate groups.

32 Performance Running time always under 90 seconds when run on a UltraSPARC Iii CPU at 300MHZ with 128MB RAM.

33 Adult and Cord Merged Results Total of 18,123 clones 540 were above the minimum useful value in every group 5,063 were above the minimum useful value in at least one group.

34 Merged Results Cont. For s% = 1% (false positive rate) 5 levels for CD-34 (0 to 4) 10 levels for erythoblasts (-1 to 8) 6 levels for HEL + hemen (-1 to 4)

35 Findings Clones representing the same gene were usually found to have identical or very similar patterns. Clones representing genes whose expression is known in these cells presented patterns compatible with what was expected.

36

37 New Application Ask what genes are differentially expressed between Normal and leukemic cells? Ask which genes are induced by hemin to adopt a normal expression pattern.

38 Findings Having more genes available to start with led to more genes identified as differentially expressed but at lower confidence. At similar confidence levels, starting with more genes did not necessarily lead to more genes identified as differentially expressed between normal and HEL cells.


Download ppt "Generation of patterns from gene expression by assigning confidence to differentially expressed genes Elisabetta Manduchi, Gregory R. Grant, Steven E.McKenzie,"

Similar presentations


Ads by Google