Gene Expression Microarrays Microarray Normalization Stat 115 2012.

Slides:



Advertisements
Similar presentations
Biology and Cells All living organisms consist of cells. Humans have trillions of cells. Yeast - one cell. Cells are of many different types (blood, skin,
Advertisements

Introduction to Microarray Gene Expression
Pre-processing in DNA microarray experiments Sandrine Dudoit PH 296, Section 33 13/09/2001.
Gene Expression Index Stat Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.
Microarray Normalization
Zhongxue Chen, Monnie McGee, Qingzhong Liu and Richard Scheuermann
Mathematical Statistics, Centre for Mathematical Sciences
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Microarray Data Analysis Stuart M. Brown NYU School of Medicine.
Gene Expression Chapter 9.
Getting the numbers comparable
DNA microarray and array data analysis
Normalization for cDNA Microarray Data Yee Hwa Yang, Sandrine Dudoit, Percy Luu and Terry Speed. SPIE BIOS 2001, San Jose, CA January 22, 2001.
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Central Dogma 2 Transcription mRNA Information stored In Gene (DNA) Translation Protein Transcription Reverse Transcription SELF-REPAIRING ARABIDOPSIS,
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Microarray Data Analysis Data quality assessment and normalization for affymetrix chips.
Inferring the nature of the gene network connectivity Dynamic modeling of gene expression data Neal S. Holter, Amos Maritan, Marek Cieplak, Nina V. Fedoroff,
Microarrays: Theory and Application By Rich Jenkins MS Student of Zoo4670/5670 Year 2004.
Introduce to Microarray
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In.
Gene Expression Data Analyses (1) Trupti Joshi Computer Science Department 317 Engineering Building North (O)
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Analysis of microarray data
Microarray Preprocessing
with an emphasis on DNA microarrays
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
‘Omics’ - Analysis of high dimensional Data
Lecture 22 Introduction to Microarray
CDNA Microarrays MB206.
Data Type 1: Microarrays
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Lo w -Level Analysis of Affymetrix Data Mark Reimers National Cancer Institute Bethesda Maryland.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
What Is Microarray A new powerful technology for biological exploration Parallel High-throughput Large-scale Genomic scale.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.
Genomics I: The Transcriptome
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Introduction to Microarrays.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Microarray hybridization Usually comparative – Ratio between two samples Examples – Tumor vs. normal tissue – Drug treatment vs. no treatment – Embryo.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Introduction to Microarrays. The Central Dogma.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.
DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.
Statistical Analyses of High Density Oligonucleotide Arrays Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Introduction to Oligonucleotide Microarray Technology
Microarray: An Introduction
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu.
CDNA-Project cDNA project Julia Brettschneider (UCB Statistics)
Microarray Technology and Applications
Introduction to Microarrays.
Getting the numbers comparable
Normalization for cDNA Microarray Data
Microarray Data Analysis
Data Type 1: Microarrays
Presentation transcript:

Gene Expression Microarrays Microarray Normalization Stat

Outline Gene expression microarrays –Differential ExpressionDifferential Expression –Spotted cDNA and oligonucleotide arraysSpotted cDNA and oligonucleotide arrays Microarray normalization methods –Median scaling, Lowess, and QnormMedian scaling, Lowess, and Qnorm –MA plotsMA plots Microarray databases 2

Central Dogma of Molecular Biology DNA replication DNA RNA Transcription Physiology Folded with function Protein Translation Reverse transcription 3

Imagine a Chef Restaurant DinnerHome Lunch Certain recipes used to make certain dishes 4

Each Cell Is Like a Chef 5

Infant Skin Adult Liver Glucose, Oxygen, Amino Acid Fat, Alcohol Nicotine Healthy Skin Cell State Disease Liver Cell State Certain genes expressed to make certain proteins 6

Differential Expression Understand the transcription level of gene(s) under different conditions –Cell types (brain vs. liver) –Developmental (fetal vs. adult) –Response to stimulus (rich vs poor media) –Gene activity (wild type vs. mutant) –Disease states (healthy vs. diseased) 7

High Throughput Measures of Gene Expression Measure gene expression: quasi-estimate of the protein level and cell state High throughput: measure mRNA level of all the genes in the genome together Checking what the chef is making in many different situations Different microarrays: –Spotted cDNA microarrays –oligonucleotide arrays 8

Microarrays Grow cells at certain condition, collect mRNA population, and label them Microarray has high density sequence specific probes with known location for each gene/RNA Sample hybridized to microarray probes by DNA (A-T, G-C) base pairing, wash non- specific binding Measure sample mRNA value by checking labeled signals at each probe location 9

Spotted cDNA Arrays Pat Brown Lab, Stanford University Robotic spotting of cDNA (mRNA converted back to DNA, no introns) Several thousands of probes / array One long probe per gene 10

Spotted cDNA Arrays Competing hybridization –Control –Treatment Detection –Green: high control –Red: high treatment –Yellow: equally high –Black: equally low 11

Why Competing Hybridization? DNA concentration in probes not the same, probes not spotted evenly 12

cDNA Microarray Readout Result often viewed with Excel or wordpad 13

Oligonucleotide Arrays GeneChip® by Affymetrix Parallel synthesis of oligonucleotide probes (25- mer) on a slide using photolithographic methods Millions of probes / microarray Multiple probes per gene One-color arrays 14

Affymetrix GeneChip Probes 15

Labeled Samples Hybridize to DNA Probes on GeneChip 16

Shining Laser Light Causes Tagged Fragments to Glow 17

Perfect Match (PM) vs MisMatch (MM) (control for cross hybridization) 18

Affymetrix Microarray Imagine Analysis Gridding: based on spike-in DNA Affymetrix GeneChip Operating System (GCOS) –cel file XYMEANSTDV NPIXELS –cdf file Which probe at (X,Y) corresponds to which probe sequence and targeted transcript MM probes always (X,Y+1) PM 19

Array Platform Comparisons cDNA microarrays: –Two-color assay, comparative hybridization –Cheaper ($50-$200 / chip) –Flexibility of custom-made array: do not need whole sequence Oligonucleotide GeneChip: –One-color assay, absolute expression level –A little more expensive ($ / chip) –Automated: better quality control, less variability –Easier to compare results from different experiments Many more commercial array platforms –Agilent, ABI, Amgen, NimbleGen… –Some use long oligo probes: nt 20

Experimental Design Issues Replicates: always preferred Biological replicates: repetition of the experiment prior to extracting mRNA –Multiple cell conditions & individuals Technical replicates: repetition of experimental conditions after mRNA extraction –Include reverse transcription, probe labeling, and hybridization 21

Normalization Try to preserve biological variation and minimize experimental variation, so different experiments can be compared Consideration: scale, dye bias, location bias, probe bias, … Assumption: most genes / probes don’t change between two conditions Normalization can have larger effect on analysis than downstream steps (e.g. group comparisons) 22

Dye Swap in cDNA Microarrays Cy5, Cy3 dyes do not label equally –log 2 R/G -> log 2 R TRUTH /G TRUTH - c So swap the dyes in a replicate experiment, ideally Combine by subtract the normalized log-ratios: [ (log 2 (R/G) - c) - (log 2 (R’/G’) - c’) ] / 2  [ log 2 (R/G) + (log 2 (G’/R’) ] / 2  [ log 2 (RG’/GR’) ] / 2 23

Median Scaling Linear scaling –Ensure the different arrays have the same median value and same dynamic range –X' = (X – c 1 ) * c 2 array2 array1 24

Loess LOcally WEighted Scatterplot Smoothing Fit a smooth curve –Use robust local linear fits –Effectively applies different scaling factors at different intensity levels –Y = f(X) –Transform X to X' = f(X) –Y and X' are comparable 25

Reference for Normalization Need to pick one reference sample –“Middle” chip: median of median –Pooled reference RNA sample –Selection of baseline chip influences the results Need to pick a subset of genes to estimate the scaling factor or smooth curve –Housekeeping genes: present at constant levels –Invariant rank: If a gene is not differentially expressed, its rank in the two arrays (or colors) should be similar 26

Quantile Normalization Probes ExperimentsMean Bolstad et al Bioinformatics 2003 –Currently considered the best normalization method –Assume most of the probes/genes don’t change between samples Calculate mean for each quantile and reassign each probe by the quantile mean No experiment retain value, but all experiments have exact same distribution 27

Dilution Series RNA sample in 5 different concentrations 5 replicates scanned on 5 different scanners Before and after quantile normalization 28

Normalization Quality Check MA Plot log 2 R vs log 2 G Values should be on diagonal M=log 2 R- log 2 G A=(log 2 R+log 2 G)/2 Values should scatter around 0 29

Before Normalization Pairwise MA plot for 5 arrays, probe (PM) 30

After Normalization Pairwise MA plot for 5 arrays, probe (PM) 31

Public Microarray Databases SMD: Stanford Microarray Database, most Stanford and collaborators’ cDNA arraysSMD GEO: Gene Expression Omnibus, a NCBI repository for gene expression and hybridization data, growing quickly.GEO Oncomine: Cancer Microarray DatabaseOncomine –Published cancer related microarrays –Raw data all processed, nice interface 32

Homework How many data series are there on GEO with Affymetrix gene expression profiles of –Human breasts –Human prostates –Human brains –Mouse liver –Just the numbers Which series have > 10 samples –Use the DataSet Browser format 33

Acknowledgment Terry Speed, Rafael Irizarry & group Kevin Coombes & Keith Baggerly Erick Rouchka Wing Wong & Cheng Li Mark Reimers Erin Conlon Larry Hunter Zhijin Wu Wei Li 34