Dr Andrew Harrison Departments of Mathematical Sciences and Biological Sciences University of Essex Looking for signals in tens of thousands.

Slides:



Advertisements
Similar presentations
Analysis of Affymetrix expression data using R on Azure Cloud Anne Owen Department of Mathematical Sciences University of Essex 15/16 March, 2012 SAICG.
Advertisements

NASC Normalisation and Analysis of the Affymetrix Data David J Craigon.
Microarray Pitfalls Stem Cell Network Microarray Course, Unit 3 October 2006.
Application of available statistical tools Development of specific, more appropriate statistical tools for use with microarrays Functional annotation of.
Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center.
Gene Expression Index Stat Outline Gene expression index –MAS4, average –MAS5, Tukey Biweight –dChip, model based, multi-array –RMA, model.
Microarray Normalization
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Microarray Data Analysis Stuart M. Brown NYU School of Medicine.
Gene Expression Chapter 9.
Getting the numbers comparable
DNA Microarray: A Recombinant DNA Method. Basic Steps to Microarray: Obtain cells with genes that are needed for analysis. Isolate the mRNA using extraction.
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
The Human Genome Project and ~ 100 other genome projects:
Identification of spatial biases in Affymetrix oligonucleotide microarrays Jose Manuel Arteaga-Salas, Graham J. G. Upton, William B. Langdon and Andrew.
DNA Arrays …DNA systematically arrayed at high density, –virtual genomes for expression studies, RNA hybridization to DNA for expression studies, –comparative.
Microarray Data Analysis Using R Studies in Tissue Databases Mark Reimers, NCI.
Arrays: Narrower terms include bead arrays, bead based arrays, bioarrays, bioelectronic arrays, cDNA arrays, cell arrays, DNA arrays, gene arrays, gene.
ViaLogy Lien Chung Jim Breaux, Ph.D. SoCalBSI 2004 “ Improvements to Microarray Analytical Methods and Development of Differential Expression Toolkit ”
Introduce to Microarray
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
GeneChips and Microarray Expression Data
Analysis of microarray data
Gene Expression Microarrays Microarray Normalization Stat
Microarray Preprocessing
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Affymetrix GeneChips Oligonucleotide.
Fine Structure and Analysis of Eukaryotic Genes
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
Panu Somervuo, March 19, cDNA microarrays.
Microarray Technology
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Gene Level Expression Profiling Using Affymetrix Exon Arrays Alan Williams, Ph.D. Director Chip Design Affymetrix, Inc.
Effect of Single Nucleotide Polymorphism in Affymetrix probes Olivia Sanchez-Graillet Departments of Biological Sciences and Mathematical Sciences University.
Microarray - Leukemia vs. normal GeneChip System.
Scenario 6 Distinguishing different types of leukemia to target treatment.
Transcriptional Regulation during Gravitropism in Arabidopsis Root Tips Heike Winter Sederoff, Jeffery M. Kimbrough, Raul Salinas-Mondragon and Christopher.
The generalized transcription of the genome Víctor Gámez Visairas Genomics Course 2014/15.
Lawrence Hunter, Ph.D. Director, Computational Bioscience Program University of Colorado School of Medicine
Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits)
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.
Biases in RNA-Seq data. Transcript length bias Two transcripts of length 50 and 100 have the same abundance in a control sample. The expression of both.
Microarray Data Analysis The Bioinformatics side of the bench.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
Gene expression  Introduction to gene expression arrays Microarray Data pre-processing  Introduction to RNA-seq Deep sequencing applications RNA-seq.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Introduction to Oligonucleotide Microarray Technology
AN INTRODUCTION TO GENE EXPRESSION ANALYSIS BY MICROARRAY TECHNIQUE (PART II) DR. AYAT B. AL-GHAFARI MONDAY 10 TH OF MUHARAM 1436.
Detecting DNA with DNA probes arrays. DNA sequences can be detected by DNA probes and arrays (= collection of microscopic DNA spots attached to a solid.
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu.
CDNA-Project cDNA project Julia Brettschneider (UCB Statistics)
The Basics of Microarray Image Processing
Correlation of log-transformed signal intensity from two Affymetrix microarray hybridizations using platelet RNA. Plotted are those probesets with an average.
Genome organization and Bioinformatics
Getting the numbers comparable
Microarray Data Analysis
Hierarchical clustering analysis of 7785 genes (genes with a log-ratio variation in the 25th centile and >5% missing data were excluded) (A) A heat map.
Presentation transcript:

Dr Andrew Harrison Departments of Mathematical Sciences and Biological Sciences University of Essex Looking for signals in tens of thousands of GeneChips There are >10 5 GeneChip experiments in the public domain, that cost ~$10 9 to produce. Extracting further information from this resource will be very cost effective.

FacultyDegrees in ….. Dr Andrew HarrisonPhysics Professor Graham UptonStatistics Dr Berthold LausenStatistics + Dr Hugh Shanahan (Royal Holloway)Physics PhD students Farhat MemonComputer Science Anne OwenMathematics Fajriyah RohmatulStatistics Microarray informatics at Essex University Departments of Mathematical Sciences and Biological Sciences Alumni Dr Jose Arteaga-SalasStatistics Dr Renata CamargoComputer Science Dr Caroline JohnstonMolecular Biology and Bioinformatics Dr William LangdonComputer Science and Physics Dr Joanna RowsellMathematics Dr Olivia Sanchez-GrailletComputer Science and Bioinformatics Dr Maria StalteriInorganic Chemistry and Bioinformatics + 4 former MSc students Current MSc and UG students Aleksandra IljinaStatistics and Data Analysis Lina HamadehStatistics and Data Analysis Madalina GhitaMathematics

There is a huge multiple-testing problem. m=log 2 (Fold Change), a=log 2 (Average Intensity) What can be learnt from comparing different experiments? Perfect Match (PM) Mismatch (MM) The biggest uncertainty in GeneChip analysis is how to merge all the probe information for one gene - Harrison, Johnston and Orengo, 2007, BMC Bioinformatics, 8: 195

Some genes are represented by multiple probe-sets. Probe-set AProbe-set B If they are measuring the same thing the signals should be up and down regulated together. Is that always true?No Stalteri and Harrison, 2007, BMC Bioinformatics, 8:13

Probes map to different exons. Alternative splicing may cause some exons to be upregulated and others to be downregulated.

Genes come in pieces. But exons do not. Multiple probes mapping to the same exon should measure the same thing.

We are studying the correlations in expression across >6,000 GeneChips (HGU-133A), sampling RNA from many tissues and phenotypes.

The correlations in intensities (log2) between probes in probeset _at on the HG-U133A array. The number in each square is the correlation ×10 Blue = low correlation Yellow = high correlation Average intensity in GEO The correlation calculated for PM probes 9 and 11, the data in the earlier scatter plot, is reported as 8 (0.76 multiplied by 10 and rounded). Probe order along the gene

This probeset shows no coherent correlations amongst its probes.

Some probesets clearly have outliers.

Probes 1-11 all map to the same exon. This is a different probe- set mapping to the same exon – there seems to be one outlier.

The outliers are correlated with each other!

Virtually all of the probes in the group have runs of Guanines within their 25 bases. TCCTGGACTGAGAAAGGGGGTTCCT GAGACACACTGTACGTGGGGACCAC GGTAGACTGGGGGTCATTTGCTTCC There is little sequence similarity between the probes, they are from probe-sets picking up different biology, yet they are correlated!

Number of contiguous Gs Mean Correlation Comparing probes with runs of Gs. We are only looking at a small fraction of the entire probe, yet it is dominating the effects across all experiments.

Probes all have the same sequence in a cell – a run of guanines will result in closely packed DNA with just the right properties to form G-quadruplexes. Upton et al BMC Genomics, 9, 613 GGGGGGGG GGGGGGGG GGGGGGGG G-quadruplexes

How do we deal with known outliers such as G-quadruplexes? What is the best way to calculate expression in the presence of outliers?

G-stacks bias which genes are reported to be clustered together within published experiments.

Kerkhoven et al. 2008, PLoS ONE 3(4): e1980 Probes containing GCCTCCC will hybridize to the primer spacer sequence that is attached to all aRNA prior to hybridization.

Log(magnitude) of averaged probe values Colour coded by size. Note the perimeter of bright-dark pairs. Cell (0,0) contains a probe which does not measure any biology

Corner correlations (correlations with values in cell (0,0)) Numbers are correlations times 10 (red greater than 0.8) Negative correlations appear as blanks Filled circles indicate probes not listed in CDF file. Large circles indicate correlations greater than 0.8

Correlations with cell (0,0) Being in the opposite corner has not reduced the correlations of the interior row and column

What are in the sheep pens? Entries are log(mean(Intensity)) Entries are correlation with cell (0,0) Sheep!

Many thousands of probes are correlated with each other simply because they are adjacent to bright probes. We believe that the focus of the scanner may be responsible – regions adjacent to bright spots will gain the same fraction of light. A comparison of many images at different levels of blurriness will appear to indicate that dark regions adjacent to bright regions are correlated in their intensities.

A CEL file contains information about the ID of the scanner as well as the date on which the image was scanned – how does the impact of blur change over time for each scanner? Upton and Harrison, 2010, Stat Appl Genet Mol Biol, 9(1), Article 37

How best to transform a DAT image into a CEL file? We are testing whether ideas from astronomy are applicable. We are checking whether the temporal patterns in scanner performance for human and other organisms are related.

Bioinformatix, Genomix, Mathematix, Physix, Statistix, Transcriptomix are needed in order to extract reliable information from Affymetrix GeneChips Thank you for your attention.