Microarray Data Analysis The Bioinformatics side of the bench.

Slides:



Advertisements
Similar presentations
Application of available statistical tools Development of specific, more appropriate statistical tools for use with microarrays Functional annotation of.
Advertisements

Microarray Normalization
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Microarray Data Analysis Stuart M. Brown NYU School of Medicine.
OHRI Bioinformatics Introduction to the Significance Analysis of Microarrays application Stem.
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Getting the numbers comparable
DNA microarray and array data analysis
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
10 Hypothesis Testing. 10 Hypothesis Testing Statistical hypothesis testing The expression level of a gene in a given condition is measured several.
Gene Expression Data Analyses (3)
Statistical Analysis of Microarray Data
GCB/CIS 535 Microarray Topics John Tobias November 3 rd, 2004.
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Introduce to Microarray
Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Division of Human Cancer Genetics Ohio State University.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
Summaries of Affymetrix GeneChip probe level data By Rafael A. Irizarry PH 296 Project, Fall 2003 Group: Kelly Moore, Amanda Shieh, Xin Zhao.
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
Microarray Data Analysis The Bioinformatics side of the bench.
Microarray Data Analysis The Bioinformatics side of the bench.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Affymetrix GeneChips Oligonucleotide.
Probability Distributions and Test of Hypothesis Ka-Lok Ng Dept. of Bioinformatics Asia University.
Affymetrix vs. glass slide based arrays
14. Introduction to inference
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
CDNA Microarrays MB206.
Applying statistical tests to microarray data. Introduction to filtering Recall- Filtering is the process of deciding which genes in a microarray experiment.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
CSCE555 Bioinformatics Lecture 16 Identifying Differentially Expressed Genes from microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun.
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 6: Case Study.
Assessing expression data quality in high-density oligonucliotide arrays.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Protein Lysate Microarrays Clay Scott Ryan McConnell Shannon Neeley.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Gene Expression and Evolution. Why are Evolutionists Interested in Gene Expression? Divergence in gene expression can underlie differences between taxa.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
One-way ANOVA: - Comparing the means IPS chapter 12.2 © 2006 W.H. Freeman and Company.
Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Experiments Presented by Nan Lin 13 October 2002.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Statistics for Differential Expression Naomi Altman Oct. 06.
Design of Micro-arrays Lecture Topic 6. Experimental design Proper experimental design is needed to ensure that questions of interest can be answered.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Statistical Analysis of Microarray Data By H. Bjørn Nielsen.
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
CSIRO Insert presentation title, do not remove CSIRO from start of footer Experimental Design Why design? removal of technical variance Optimizing your.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Cluster validation Integration ICES Bioinformatics.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Disease Diagnosis by DNAC MEC seminar 25 May 04. DNA chip Blood Biopsy Sample rRNA/mRNA/ tRNA RNA RNA with cDNA Hybridization Mixture of cell-lines Reference.
AP Statistics Chapter 21 Notes
Analysis of Variance STAT E-150 Statistical Methods.
Variability & Statistical Analysis of Microarray Data GCAT – Georgetown July 2004 Jo Hardin Pomona College
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Canadian Bioinformatics Workshops
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Micro array Data Analysis. Differential Gene Expression Analysis The Experiment Micro-array experiment measures gene expression in Rats (>5000 genes).
Getting the numbers comparable
Lecture 3 From Images to Data
Presentation transcript:

Microarray Data Analysis The Bioinformatics side of the bench

The anatomy of your data files from MAS 5.0 (Microarray Suite 5.0).DAT.CEL.EXP.CHP.txt files generated from.CHP

Quality Control (QC) of the chip – visual inspection Look at the.DAT file or the.CHP file image –Scratches? Spots? –Corners and outside border checkerboard appearance (B2 oligo) Positive hybridization control Used by software to place grid over image –Array name is written out in oligos!

Scratch on a chip

Possible chip contamination

Internal controls B. subtilis genes (added poly-A tails) –Assessment of quality of sample preparation –Also as hybridization controls –Not used in our module

More internal controls Eukaryotic Hybridization controls (bioB, bioC, bioD, cre) –E. coli and P1 bacteriophage biotin- labeled cRNAs –Spiked into the hybridization cocktail –Assess hybridization efficiency

And still more internal controls Actin and GAPDH assess RNA sample/assay quality –Compare signal values from 3’ end to signal values from 5’ end ratio generally should not exceed 3 Percent genes present (%P) –Replicate samples - similar %P values

MAS 5.0 output files For each transcript (gene) on the chip: –signal intensity –a “present” or “absent” call (presence call) –p-value (significance value) for making that call Each gene associated with GenBank accession number (NCBI database)

How are transcripts determined to be present or absent? Probe pair (PM vs. MM) intensities –generate a detection p-value assign “Present”, “Absent”, or “Marginal” call for transcript Every probe pair in a probe SET has a potential “vote” for presence call

Discrimination score Probe pairs “vote” via discrimination score (R) R compared to a predetermined threshold: Tau –R > Tau = present –R < Tau = absent Voting result expressed as p-value –Reflects confidence of expression call

Altering Tau You can fine tune Tau yourself within MAS 5.0 Increase Tau: reduce “false positives”, may also reduce number of TRUE present calls Our rule: use the default!

Calculation of R R = (PM - MM) / (PM + MM) –(PM – MM): intensity difference of probe pair –(PM + MM): overall hybridization intensity –R value closer to 1: lower p-value (detection call is more significant) PM >> MM –R value close to 0 or negative: higher p-value (detection call is less significant) MM >/= PM –One-sided Wilcoxon’s Signed Rank test used to determine Detection p-value

Calculating signal One-Step Tukey Biweight Estimate –Yields robust weighted mean –Relatively insensitive to even extreme outliers Signal intensity value is created –related to amount of transcript present for that gene

Thank goodness for software!!! MAS 5.0 does these calculations for you –.CHP file Basic analysis in MAS 5.0, but it won’t handle replicates Import MAS 5.0 (.CHP) data into GeneSifter –web based microarray data analysis software package designed BY biologists FOR biologists

How do we want to analyze this data? Pairwise analysis is most appropriate –Control vs. DMSO List of genes that are “upregulated” or “downregulated” Determine fold up or down cutoffs –What is significant? 1.5 fold up/down? 2 fold up/down? 10 fold up/down?

Normalization “Normalizing” data allows comparisons ACROSS different chips –Intensity of fluorescent markers might be different from one batch to the other –Normalization allows us to compare those chips without altering the interpretation of changes in GENE EXPRESSION

Statistics Statistical tests allow us to determine how SIGNIFICANT the data are t-test statistic –compares the means of two groups while taking into account the standard deviations of those means p value (probability value) of </= 0.05 –(only 5 times out of 100 or less will the change in gene expression be due to chance, rather than a REAL change)

Present or absent? Can do analysis on genes that are considered “absent” under all conditions ONE transcript should be “present” in a pairwise analysis

Thresholds/cutoffs What is a significant change in gene expression? –Some think 2 fold at the lowest –Judgement call –Can also set upper limit of expression changes Remember we are talking about changes in mRNA expression –does that always mean more protein?

The output Run analysis, get output of a GENE LIST –List indicates what genes are up or down regulated –p values for t-test –Graphs of signal levels Absolute numbers not as important here as the trends you see –Now what????

Follow the links Click on a gene Find links to other databases Follow links to discover what the protein does Now the fun part begins….

Back to Biology Do the changes you see in gene expression make sense BIOLOGICALLY? If they don’t make sense, can you hypothesize as to why those genes might be changing? Leads to many, many more experiments

Validation Not enough to just do microarrays Usually “validate” microarray results via some other technique –rt-PCR –TaqMan –Northern analysis –Protein level analysis No technique is perfect…

Why microarrays? Ask a single question, and get more answers than you dreamed of! Can assess GLOBAL changes in gene expression under a certain experimental condition Can discover new pathways, gene regulation, the possibilities are almost endless

Caveat… There is NO standard way to analyze microarray data Still figuring out how to get the “best” answers from microarray experiments Best to combine knowledge of biology, statistics, and computers to get answers

One last note Microarrays are “cutting edge” technology You now have experience doing a technique that most Ph.D.s have never done Looks great on a resume…