Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.

Slides:



Advertisements
Similar presentations
i) Two way ANOVA without replication
Advertisements

1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Chi-Squared Hypothesis Testing Using One-Way and Two-Way Frequency Tables of Categorical Variables.
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Hypothesis Testing IV Chi Square.
Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
Bi-correlation clustering algorithm for determining a set of co- regulated genes BIOINFORMATICS vol. 25 no Anindya Bhattacharya and Rajat K. De.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
1. 2 BIOSTATISTICS 5.6 TEST OF HYPOTHESIS 3 BIOSTATISTICS TERMINAL OBJECTIVE: 5.6 Perform a test of significance on a hypothesis using Chi-square test.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.
Diversity and Distribution of Species
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Gene Set Enrichment Analysis Petri Törönen petri(DOT)toronen(AT)helsinki.fi.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Chapter VIII: Elements of Inferential Statistics
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Means Tests Hypothesis Testing Assumptions Testing (Normality)
QNT 531 Advanced Problems in Statistics and Research Methods
Analysis of Microarray Data 1.Scan the images 2.Quantify intensity of spots 3.Normalization 4.Analysis of data 5.Identification of genes of interest 6.Validation.
GO::TermFinder Gavin Sherlock Department of Genetics Stanford University
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
14 Elements of Nonparametric Statistics
Suppose we have analyzed total of N genes, n of which turned out to be differentially expressed/co-expressed (experimentally identified - call them significant)
Inference and Inferential Statistics Methods of Educational Research EDU 660.
Chapter 12 A Primer for Inferential Statistics What Does Statistically Significant Mean? It’s the probability that an observed difference or association.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Testing Hypothesis That Data Fit a Given Probability Distribution Problem: We have a sample of size n. Determine if the data fits a probability distribution.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Example 10.17:
Hierarchical Bayesian Model Specification Model is specified by the Directed Acyclic Network (DAG) and the conditional probability distributions of all.
© aSup-2007 CHI SQUARE   1 The CHI SQUARE Statistic Tests for Goodness of Fit and Independence.
Statistical Significance The power of ALPHA. “ Significant ” in the statistical sense does not mean “ important. ” It means simply “ not likely to happen.
Introduction to Statistics Osama A Samarkandi, PhD, RN BSc, GMD, BSN, MSN, NIAC Deanship of Skill development Dec. 2 nd -3 rd, 2013.
Statistical Testing with Genes Saurabh Sinha CS 466.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
Cluster validation Integration ICES Bioinformatics.
Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Review: Stages in Research Process Formulate Problem Determine Research Design Determine Data Collection Method Design Data Collection Forms Design Sample.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
ANOVA and Multiple Comparison Tests
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Chance We will base on the frequency theory to study chances (or probability).
Review Statistical inference and test of significance.
Canadian Bioinformatics Workshops
Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
Chapter 9: Hypothesis Tests for One Population Mean 9.5 P-Values.
I. CHI SQUARE ANALYSIS Statistical tool used to evaluate variation in categorical data Used to determine if variation is significant or instead, due to.
Clustering Manpreet S. Katari.
Statistical Testing with Genes
Overview and Basics of Hypothesis Testing
CORRELATION ANALYSIS.
Probability and Statistics
Statistical Testing with Genes
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need to figure out what it all means Since we don't know much about function of most of the genes this is not easy Complicated further by the fact that the gene function is context-specific. Depends on the tissue, developmental stage of the organism and multiple other factors "Functional clustering" grouping genes with respect to their known function (ontology) Establishing statistical significance between groups of genes identified in the analysis and "Functional clusters"

Analyzing Microarray Data Experimental Design Universal Control Not Treated C 1 Treated Not Treated C 3 Treated Not Treated C 2 Treated Not Treated C 4 Treated Data Normalization – reducing technical variability Statistical Analysis (ANOVA): Identifying differentially expressed genes Factoring out variability sources Data Mining

Data Integration and Interpretation

Modeling Microarray Data Mathematical./ Statistical Models Computer Algorithms/ Software

Regulating Transcription -transcription factor itself does not need to be transcriptionally regulated

Suppose we have analyzed total of N genes, n of which turned out to be differentially expressed/co-expressed (experimentally identified - call them significant) Suppose that x out of n significant genes and y out of N total genes were classified into a specific "Functional group" Q1: Is this "Functional group" significantly correlated with our group of significant genes? Q2: Are significant genes overrepresented in this functional group when compared to their overall frequency among all analyzed genes? Q3: What is the chance of getting x or more significant genes if we randomly draw y out of N genes "out of a hat" with assumption that each gene remaining in the hat has an equal chance of being drawn? ( H 0 : p(significant gene belonging to this category) = y/N Q3A: What is the p-value for rejecting this null hypothesis First step of making a story: Statistical significance of a particular "Functional cluster"

Statistical significance of a particular "Functional cluster" - cont g n+1 g1g1 gngn gNgN... g1g1 gxgx g x+1 gygy g n+y-x+1 g y+1 g n+y-x gNgN... Observed Removing Functional Classification Q: By randomly drawing y boxes to color their border blue, what is the chance to draw x or more red ones Outcome (o 1,...,o T ): A set of y genes with selected from the list of N genes Event of interest (E): Set of all outcomes for which the number of red boxes among the y boxes drawn is equal to x Since drawing is random all outcomes are equally probable

Statistical significance of a particular "Functional cluster" - cont Outcome (o 1,...,o T ): A set of y genes with selected from the list of N genes Event of interest (E): Set of all outcomes for which the number of red boxes among the y boxes drawn is equal to x All we have to do is calculating M and N where: T=number of different sets we can draw a set of y genes out of total of N genes M=number of different ways to obtain x red boxes (significant genes) when drawing y boxes (genes) out of total of N boxes (genes), x of which are red (significant) Comes from the fact that order in which we pick genes does not matter First pick x red boxes. For each such set of x red boxes pick a set of y-x non-red boxes

Statistical significance of a particular "Functional cluster" - p-value Fisher's exact test or the "hypergeometric" test P-value: Probability of observing x or more significant genes under the null hypothesis

381 genes that were differentially expressed after the treating a cell line with three different carcinogens: Dex and E2 and Irradiation Dex_Day1 Dex_Day2 Dex_Day3 E2_Day4 E2_Day7 E2_Day10 Irr_Day1 Irr_Day2 Irr_Day3

Up

Finding important functional groups for up-regulated genes Using the "Ease" annotation tool We obtained following significant gene ontologies Up_DexANDNE2ANDirr_381_GO.htm Homework: 1) Download and install Ease 2) Select top 20 most-signficianly up-regulated genes in our W-C dataset and identify significantly over-represented categories (using the three-way ANOVA analysis) 3) Repeat the analysis with 30, 40, 50 and 100 up-regulated and down- regulated gene 4) Prepare questions for the next class regarding problems you run into