Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center.

Slides:



Advertisements
Similar presentations
Microarray Technique, Analysis, and Applications in Dermatology Jennifer Villaseñor-Park 1 and Alex G Ortega-Loayza 2 1 Department of Dermatology, University.
Advertisements

Introduction to Microarray Gene Expression
Bioinformatics Lectures at Rice
DNA microarray and array data analysis
The Human Genome Project and ~ 100 other genome projects:
SNP Discovery in the Human Genome C244/144 November 21, 2005.
Basics of hybridization. What is hybridization? n Complementary base pairing of two single strands of nucleic acid  double strand product u DNA/DNA u.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
5 µm Millions of copies of a specific oligonucleotide probe >5 760,000 different complementary probes ~ targets Single stranded, labeled ‘target’
Alternative Splicing As an introduction to microarrays.
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
Introduce to Microarray
GeneChip Hybridization. The following hybridization mix is prepared for each sample Fragmented cRNA 5ug 10 ul Control B2 Oligo1.7 ul 20x Eukaryotic Control.
Introduction to DNA microarrays DTU - January Hanne Jarmer.
GeneChips and Microarray Expression Data
Microarrays: Basic Principle AGCCTAGCCT ACCGAACCGA GCGGAGCGGA CCGGACCGGA TCGGATCGGA Probe Targets Highly parallel molecular search and sort process based.
and analysis of gene transcription
By Moayed al Suleiman Suleiman al borican Ahmad al Ahmadi
Microarray Preprocessing
Chapter 14 Jizhong Zhou and Dorothea K. Thompson.
with an emphasis on DNA microarrays
PCR Primer Design Guidelines
Affymetrix vs. glass slide based arrays
1 EE381V: Genomic Signal Processing Lecture #13. 2 The Course So Far Gene finding DNA Genome assembly Regulatory motif discovery Comparative genomics.
Detection and Compensation of Cross- Hybridization in DNA Microarray Data Joint work with Quaid Morris (1), Tim Hughes (2) and Brendan Frey (1) (1)Probabilistic.
Data Type 1: Microarrays
Biochemistry Lecture 6. Functions of Nucleotides and Nucleic Acids Nucleotide Functions: –Energy for metabolism (ATP) –Enzyme cofactors (NAD + ) –Signal.
Microarray Technology
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Introduction to DNA microarrays DTU - May Hanne Jarmer.
Technology for Systems Biology. Nucleic Acid Hybridization In principle complementary strands will associate Chemistry is quite different on surfaces.
5.3 – Advances in Genetics Trashketball!. Selecting organisms with desired traits to be parents of the next generation is… A. Inbreeding A. Inbreeding.
Scenario 6 Distinguishing different types of leukemia to target treatment.
CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.
Introduction to DNA microarray technologies Sandrine Dudoit, Robert Gentleman, Rafael Irizarry, and Yee Hwa Yang Bioconductor short course Summer 2002.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits)
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Model-based analysis of oligonucleotide arrays, dChip software Statistics and Genomics – Lecture 4 Department of Biostatistics Harvard School of Public.
How are we different? …at the DNA level.
GeneChip® Probe Arrays
CHAPTER SIX Nucleic acid hybridization: principles and applications 생물정보학협동과정 강민호.
Analysis of protein-DNA interactions with tiling microarrays
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Lecturer: David. * Reverse transcription PCR * Used to detect RNA levels * RNA is converted to cDNA by reverse transcriptase * Then it is amplified.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.
Statistical Analyses of High Density Oligonucleotide Arrays Rafael A. Irizarry Department of Biostatistics, JHU (joint work with Bridget Hobbs and Terry.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
Overview of Hybridization, Stringency, and Genechip Processing.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Functional Genomics Carol Bult, Ph.D. Course coordinator The Jackson Laboratory Winter/Spring 2011 Keith Hutchison, Ph.D. Course co-coordinator.
Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data Rafael A. Irizarry Department of Biostatistics, JHU (joint.
Introduction to Oligonucleotide Microarray Technology
AN INTRODUCTION TO GENE EXPRESSION ANALYSIS BY MICROARRAY TECHNIQUE (PART I) DR. AYAT B. AL-GHAFARI MONDAY 3 RD MUHARAM 1436.
Basics of hybridization. What is hybridization? n Complementary base pairing of two single strands of nucleic acid  double strand product u DNA/DNA u.
MICROARRAY. Microarray  A multiplex lab-on-a-chip  A 2D array on a solid substrate (Usually a glass slide or silicon thin-film cell) that assays large.
Detecting DNA with DNA probes arrays. DNA sequences can be detected by DNA probes and arrays (= collection of microscopic DNA spots attached to a solid.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Microarray Technology and Applications
SOUTHERN BLOTTING Ali Zaeri Medical Genetics and diagnostic lab Lab 5.
Overview of Hybridization, Stringency, and Genechip Processing
Overview of Hybridization, Stringency, and Genechip Processing
Overview of Hybridization, Stringency, and Genechip Processing
Presentation transcript:

Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center

Wide use of short oligonucleotide microarrays Gene expression assay Genotyping (SNP detection) Comparative genome hybridization DNA methylation detection Gene structure discovery Genome reseqeuncing

Protocol of a microarray experiment

Affymetrix GeneChip ® Probe Arrays 24µm Each probe cell or feature contains millions of copies of a specific oligonucleotide probe Image of Hybridized Probe Array Over 250,000 different probes complementary to genetic information of interest Single stranded, fluorescently labeled DNA target Oligonucleotide probe * * * * * 1.28cm GeneChip Probe Array Hybridized Probe Cell *

Double helix on microarrays The probe is a 25-mer DNA oligo: ATCAGCATACGAGAGAATGATGGAT ||||||||||||||||||||||||| AAUAGUCGUAUGCUCUCUUACUACCUAGC cRNA fragment from solution ATCAGCATACGACAGAATGATGGAT Average distance between probes is 80Å

Technical factors affecting gene expression measurements Interaction between base pairs (stacking) Interaction with microarray surface Interaction with unintended targets (cross hybridization) Kinetic process (equilibration & washing) Physical properties of RNA sample Degradation (missing 5’ ends) Alternative splicing (missing exons) Secondary structure (RNA hairpins & loops) Biotinylation

Technical factors affecting gene expression measurements Interaction between base pairs (stacking) Nearest-neighbor model Interaction with microarray surface Positional dependant weights for stacking energies Interaction with unintended targets (cross hybridization) PDNN; mean field theory Kinetic process (equilibration & washing) Langmuir and Sips model Physical properties of RNA sample Degradation (missing 5’ ends) Alternative splicing (missing exons) Secondary structure (RNA hairpins & loops) Biotinylation

Assumption: two types of binding 1.Gene-specific binding: 25 n.t. exact complementary sequences (binding with the intended target). 2.Non-specific binding: Many (>5) mismatches or short stretches (binding with unintended targets).

Gene-specific binding energy: Non-specific binding energy: Weighted sum base-pair stacking energies: Positional Dependant Nearest-Neighbor (PDNN) model of molecular interactions

PDNN model of probe signals Minimization of T Energy parameters B, N*, N j N*, B are the same on a microarray; N j is the same in a probe set. Probe Signal: Fitness: Constraints: Software available at:

Fitting PDNN model ln (signal) Probe index

Energy parameters in PDNN model Weight factors Stacking energy terms

Baseline of non-specific binding Non-specific binding energy

Effects of Mismatches A Mismatch disrupts the double helix formation. Energetically, it is unfavorable for binding. It depends on the context of DNA sequences.

Effect of mismatch at base13 depends on the nearest-neighbors A A C G T

Sequence dependence of free energy cost of single mismatch in DNA duplexes

Pattern of cross hybridization: MM and PM probes bind to different molecules Var(ln PM) Var(ln MM) Data source: Affymetrix HG-U133 spike-in data set. Large variation indicates resonse to spike-ins. Number of arrays: 42. Number of probes on an array: ~ 0.5 million.

Microarray surface effects DNA and RNA are negatively charged. Glass surface also charged Repulsion

Pattern of cross hybridization: bias towards the 5’ end 5’ end

Sense and antisense Upon binding, sense and antisense probes form the same double helix structure. The same interactions should lead to the same binding energy. The observed data contradict with this prediction.

Contrast of sense and antisense probe signals Ŷ = Nt – 0.05 Na Ng R 2 = 0.67; Sample size=875. Ln (sense probe signal / antisense probe signal) Model fitted

Summary Binding on array surface: Probe binding free energy can be approximated by a weighted sum of base-pair stacking energies, with the probe ends having less contributions. Mismatches: Mismatches disrupt hybridization, especially in cross hybridization. The effects of mismatches depend on sequences. The surface also an effect. Surface effects: Cross hybridization is biased towards the 5’ end of the probes. Repulsion of surface depends on nucleotides.