Generation of patterns from gene expression by assigning confidence to differentially expressed genes Elisabetta Manduchi, Gregory R. Grant, Steven E.McKenzie,

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
Statistics review of basic probability and statistics.
Math 144 Confidence Interval.
Chapter 5: Confidence Intervals.
ELEC 303 – Random Signals Lecture 18 – Statistics, Confidence Intervals Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 10, 2009.
Topic 6: Introduction to Hypothesis Testing
10 Hypothesis Testing. 10 Hypothesis Testing Statistical hypothesis testing The expression level of a gene in a given condition is measured several.
Software Quality Control Methods. Introduction Quality control methods have received a world wide surge of interest within the past couple of decades.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Evaluating Hypotheses
Statistical Background
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Probability Distributions Random Variables: Finite and Continuous A review MAT174, Spring 2004.
Continuous Random Variables and Probability Distributions
7-2 Estimating a Population Proportion
Experimental Evaluation
BCOR 1020 Business Statistics
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Confidence Interval A confidence interval (or interval estimate) is a range (or an interval) of values used to estimate the true value of a population.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Review of normal distribution. Exercise Solution.
Objectives (BPS chapter 14)
Chapter 7 Confidence Intervals and Sample Sizes
Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.
Statistical Hypothesis Testing. Suppose you have a random variable X ( number of vehicle accidents in a year, stock market returns, time between el nino.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
 1  Outline  stages and topics in simulation  generation of random variates.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Section 9-2 Inferences About Two Proportions.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Week 111 Power of the t-test - Example In a metropolitan area, the concentration of cadmium (Cd) in leaf lettuce was measured in 7 representative gardens.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
1 2. Independence and Bernoulli Trials Independence: Events A and B are independent if It is easy to show that A, B independent implies are all independent.
Statistical Inference
Estimating a Population Proportion
LECTURE 14 TUESDAY, 13 OCTOBER STA 291 Fall
Fundamentals of Data Analysis Lecture 3 Basics of statistics.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Sample Size Determination Text, Section 3-7, pg. 101 FAQ in designed experiments (what’s the number of replicates to run?) Answer depends on lots of things;
Week111 The t distribution Suppose that a SRS of size n is drawn from a N(μ, σ) population. Then the one sample t statistic has a t distribution with n.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
MPS/MSc in StatisticsAdaptive & Bayesian - Lect 41 Lecture 4 Sample size reviews 4.1A general approach to sample size reviews 4.2Binary data 4.3Normally.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
BASIC STATISTICAL CONCEPTS Statistical Moments & Probability Density Functions Ocean is not “stationary” “Stationary” - statistical properties remain constant.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Psych 230 Psychological Measurement and Statistics Pedro Wolf October 21, 2009.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.

Point and interval estimations of parameters of the normally up-diffused sign. Concept of statistical evaluation.
CONCEPTS OF ESTIMATION
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
Presentation transcript:

Generation of patterns from gene expression by assigning confidence to differentially expressed genes Elisabetta Manduchi, Gregory R. Grant, Steven E.McKenzie, G. Christian Overton, Saul Surrey, Christian J. Stoeckert Presented by Keith Betts

Goal: Provide tools to aid in the analysis of data collected from highly parallel gene expression experiments. Generate descriptive and dependable expression patterns representing the differential expression of genes across cell types.

Identify those genes that are ‘most likely’ to be differentially expressed. Transform typical ‘raw’ input into easily interpretable list of patterns

Patterns from Gene Expression

What is ??? PaGE is free downloadable Perl software (tested mainly on Unix systems) which can be used as a statistical test for differentially expressed genes between two experimental conditions, given replicated expiriments. Available at:

Methods and Algorithm Input consists of normalized data (the normalization procedure depends on the kind of experiments conducted) The input normalized intensities are subjected to preprocessing steps.

In each gene tag’s expression pattern there will be one symbol for each homotypic group (set of samples of the same type) For each homotypic group and for each gene tag, compute the average intensity of that tag over the group which have values for that tag. This average will represent the intensity of that tag at that group. Methods and Algorithm Cont.

Two Stage Approach First: Attach an ordered list of real numbers to each tag. Second: Bin the numbers in this list, resulting in a pattern of integers.

First Stage Fix an ordering of the groups in the collection. Attach to each tag the ordered list of real numbers obtained by dividing each of its non-reference group intensities by the median of its group intensities. List of ratios attached to the tag.

Second Stage For each non-reference group, partition the range into disjoint subintervals. Number the bins using consecutive integers –m,…,0,….m (where 0 corresponds to ratio 1) Attach the ordered list of integers to each gene tag.

Example For group i: Divide the range into m i + n i + 1 bins. The list of ratios from the first stage for a certain gene tag is (r 1, r 2,…., r l ) Each r i belongs to exactly one of the bins B i,j. The expression pattern associated with this tag is then (j 1, j 2,…, j l )

Choose level cutoffs Suppose we are taking ratios to a reference homotypic group (group 0) and are focusing on a fixed group (group i). Suppose also that we have replicate experiments for each of the two groups. Concentrate on up-regulation

Goal Goal is to achieve a certain degree of confidence in the assertion: ‘this gene is up-regulated at group i as compared to the reference group’

Each gene will have a distribution of intensities in a group, whose mean will be called ‘the true mean intensity of the gene at that group’ Denote the Random Variable giving the intensity of gene g at group j by X g,j, and denote the Mean and Std. Dev as  g,j,  g,j

False Positive Rate Prob((X g,I / X g,0 ) > C i | (  g,j /  g,0 ) < 1)

Claim that (Ave. g,I /  g,j ) / (Ave. g,0 /  g,0 ) > C i ) And (  g,j /  g,0 ) < 1 Are independent events.

Seek C i as small as possible such that: Prob( (Ave. g,I /  g,j ) / (Ave. g,0 /  g,0 ) > C i ) < s%

Approximate (Ave. g,j /  g,j ) for (j = 0, i) ((X g,j,k / Ave. g,j ) – 1) / Sqrt(t j – 1) + 1

Compute the desired C i through integration If fj ( j = 0,i) is the density function for Ave. g,j /  g,j, and C is fixed, then evaluate using

If this is above the desired false positive rate, them C is raised and the integral is recalculated. Repeat process until the desired false positive rate is attained.

Down-regulation Proceed in similar manner Seek c i as small as possible such that: Prob( (Ave. g,I /  g,j ) / (Ave. g,0 /  g,0 ) > c i ) < s%

Once the C i ’s and the c i ’s are determined for each reference group I, if the ratio of the average intensity of a gene tag at group i, and the average intensity of the same gene tag at the reference group is between C i and C i 2, we say that the gene tag is up-regulated one level at this group as compared to the reference group.

One can now estimate the probability Prob(not up | predicted up) Prob(not up) * Prob(predicted up | not up) / Prob(predicted up)  Prob(predicted up | not up) / Prob(predicted up)

As a consequence of this approach, when we see a level different from 0, we have a certain confidence in the gene tag being up-regulated or down-regulated as compared to the reference group. However, when we see a 0 there is no confidence implied. We can only take 0 to mean that we do not have enough evidence to support a change in level.

Results Application to an erythroid development nylon filter dataset

Background Erythroid development dataset contains 5 homotypic groups representing an erythroleukemic cell line and normal cells under different conditions There are repliate data for each of the groups.

Background Continued The groups are: 1.CD34 positive cells 2.Human adult erythroblasts 3.Cord erythroblasts 4.HEL cells 5.HEL cells treated with hemin

Application Available replicates Two CD34 Three adult erythroblasts Two cord blood erythroblasts Three HEL Two HEL + hemin

The value of d is set at 15 Only the moderate to highly abundant mRNA classes are likely to have given hybridization signals above background on the filter array. Set the HEL group as reference

Two approaches PaGE was run once merging the adult and the cord erythoblasts into one group with five replicates PaGe was run a second time keeping the adult and cord erythoblasts in separate groups.

Performance Running time always under 90 seconds when run on a UltraSPARC Iii CPU at 300MHZ with 128MB RAM.

Adult and Cord Merged Results Total of 18,123 clones 540 were above the minimum useful value in every group 5,063 were above the minimum useful value in at least one group.

Merged Results Cont. For s% = 1% (false positive rate) 5 levels for CD-34 (0 to 4) 10 levels for erythoblasts (-1 to 8) 6 levels for HEL + hemen (-1 to 4)

Findings Clones representing the same gene were usually found to have identical or very similar patterns. Clones representing genes whose expression is known in these cells presented patterns compatible with what was expected.

New Application Ask what genes are differentially expressed between Normal and leukemic cells? Ask which genes are induced by hemin to adopt a normal expression pattern.

Findings Having more genes available to start with led to more genes identified as differentially expressed but at lower confidence. At similar confidence levels, starting with more genes did not necessarily lead to more genes identified as differentially expressed between normal and HEL cells.