Expression and Methylation: QC and Pre-Processing

Slides:



Advertisements
Similar presentations
BiGCaT Bioinformatics Hunting strategy of the bigcat.
Advertisements

Randa Stringer Supervisor: Dr. Guillaume Par é A review of quality control and pre- processing measures for the Illumina 450K BeadChip.
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
Dahlia Nielsen North Carolina State University Bioinformatics Research Center.
Microarray Normalization
DNA methylation age of human tissues and cell types. Genome Biol (10):R115 PMID:
Getting the numbers comparable
Gene expression analysis summary Where are we now?
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Preprocessing Methods for Two-Color Microarray Data
Dilution/Mixture Study Bill Craven, GeneLogic, Inc. Motivated by a desire for a data set to be used as a baseline to characterize analysis and normalization.
Normalization of 2 color arrays Alex Sánchez. Dept. Estadística Universitat de Barcelona.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Microarray Analysis Jesse Mecham CS 601R. Microarray Analysis It all comes down to Experimental Design Experimental Design Preprocessing Preprocessing.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
DNA Methylation Assays High Throughput Data Analysis BIOS , VCU Winter 2010 Mark Reimers, PhD.
Large-Scale Copy Number Polymorphism in the Human Genome J. Sebat et al. Science, 305:525 Luana Ávila MedG 505 Feb. 24 th /24.
Geuvadis RNAseq analysis at UNIGE Analysis plans
Factors to Consider in Selecting a Genotyping Platform Elizabeth Pugh June 22, 2007.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
A New Oklahoma Bioinformatics Company. Microarray and Bioinformatics.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Other genomic arrays: Methylation, chIP on chip… UBio Training Courses.
Computational Laboratory: aCGH Data Analysis Feb. 4, 2011 Per Chia-Chin Wu.
Innovative Paths to Better Medicines Design Considerations in Molecular Biomarker Discovery Studies Doris Damian and Robert McBurney June 6, 2007.
No reference available
Supplemental Figure 1. False trans association due to probe cross-hybridization and genetic polymorphism at single base extension site. (A) The Infinium.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Ethnic variation in methylation of birth weight and length Presenter: Zahra Sohani Supervisor: Dr. Sonia Anand.
동물 분자 유전체 연구의 최신 동향 National Institute of Animal Science Animal Genomics & Bioinformatics 정호영
Lecture 11 Epigenetics of Aging Andrea Baccarelli, MD, PhD, MPH Laboratory of Environmental Epigenetics Harvard School of Public Health
071126_EAS56_0057_FC – lanes 1-8 read 2 b a _EAS56_0057_FC – lanes 1-8 read 1 Table S1. Summary tables for a read 1 and b read 2 of a.
High-throughput genomic profiling of tumor-infiltrating leukocytes
Differential Methylation Analysis
David Amar, Tom Hait, and Ron Shamir
Copy-number estimation using Robust Multichip Analysis - Supplementary materials for the aroma.affymetrix lab session Henrik Bengtsson & Terry Speed Dept.
Sensitivity Analysis of the MGMT-STP27 Model and Impact of Genetic and Epigenetic Context to Predict the MGMT Methylation Status in Gliomas and Other.
Differential Gene Expression
Discovery of Multiple Differentially Methylated Regions
Copyright © 2007 Dan Nettleton
Epigenetics and psychiatric illness
876 fetal cord blood DNA samples
Normalization Methods for Two-Color Microarray Data
Sensitivity Analysis of the MGMT-STP27 Model and Impact of Genetic and Epigenetic Context to Predict the MGMT Methylation Status in Gliomas and Other.
Analysing ChIP-Seq Data
ppmi EPIgenetics Andy Singleton and Dena Hernandez
Supplementary Figure 1 | hbz expression is consistent between clones
Analysis of GO annotation at cluster level by Agnieszka S. Juncker
Sensitivity of RNA‐seq.
DNA Chip Data Interpretation Tools: Genmapp & Dragon View
Exploring and Understanding ChIP-Seq data
Visualising and Exploring BS-Seq Data
Getting the numbers comparable
Volume 23, Issue 11, Pages (June 2018)
Volume 20, Issue 4, Pages e6 (April 2017)
Adequacy of Linear Regression Models
Volume 20, Issue 4, Pages e6 (April 2017)
Volume 23, Issue 1, Pages 9-22 (January 2013)
Normalization for cDNA Microarray Data
Adequacy of Linear Regression Models
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Adequacy of Linear Regression Models
Density Density ß values ß values
Other genomic arrays: Methylation, chIP on chip…
Differential Expression of RNA-Seq Data
Scatter plots of the posterior probability values belonging to normal and IM gastric tissue categories calculated by (A) FP; (B) HW; and (C) integrated.
Pre-processing AFFY data
Batch variation of formulations from two products by two different genomic-scale techniques. Batch variation of formulations from two products by two different.
Presentation transcript:

Expression and Methylation: QC and Pre-Processing Randa Stringer

Epigenetics in NutriGen Examine epigenetic mediation of prenatal environment Subset of NutriGen cohorts ~500 each from START and CHILD Matched samples (where possible) Integrate methylation and expression changes

Expression Data Ongoing Majority of samples available START = 496 CHILD = 467 QC/pre-processing is underway

Expression QC/Pre-Processing Quality assessment Probe boxplots (signal distributions, unusual samples) Background correction Uses negative controls Normalization Batch effects MDS plots/ComBat Transformation Probe filtering Detection p < 0.01 in > 50% of samples

Methylation Data In analysis stage QC/pre-processing complete* *Pending further advancements in the field Good final sample size for both START = 506 CHILD = 491

Methylation QC/Pre-Processing Sample Quality Compare reported vs. predicted sex Remove samples where proportion of failed probes is > 0.01 Probe Quality Remove probes that failed to be detected in > 5% of samples Remove cross-hybridizing and polymorphic probes Chen et al. 2013

Methylation QC/Pre-Processing Normalization 2 probe types with different distributions Infinium I Probe 2 different probes per CpG Infinium II Probe Single base extension at CpG Maksimovic et al. Genome Biology 2012

Type I Grn Probes Type II Probes

Methylation QC/Pre-Processing Batch effects Adjust for technical variation Corrected by plate Cellular composition Crucial issue in methylation studies Cord blood not well characterized ReFACTor (Rahmani et al., 2016) Reference-free Utilizes PCA

Other Considerations Background correction Bead count (> 3) SNP probe definition MAF > 0.01 Other normalization methods BMIQ vs SWAN Cellular composition adjustment

QC Summary Samples Probes START CHILD Initial 512 511 Sex Check 5 14 Missingness 2 7 Final 506 491 Probes START CHILD Initial > 485 000 Failed 756 634 Polymorphic 70 889 Cross-Reactive 29 233 Final 393 400 393 449

Questions?

Normalization Goal: reduce non-biological variation Equalizes probe intensity and signal distributions across arrays and between colour channels New challenges with DNA methylation vs. gene expression techniques Systematic/technical variation Novel probe design Maksimovic et al. Genome Biology 2012

CpG Content Infinium II ≤ 3 Infinium I ≥ 3 Compressed β value distribution in InfII Solution: scale Infinium II probes to InfI probes Maksimovic et al. Genome Biology 2012

Subset Within-Array Normalization (SWAN) Allows InfI and InfII probes to be normalized together Subset of N InfI and InfII probes chosen based on underlying CpG content Separate methylated and unmethylated channels Mean intensity for each of 3N calculated InfI and II probes adjusted separately by linear interpolation Maksimovic et al. Genome Biology 2012

Beta-MIxture Quantile normalization (BMIQ) Novel normalization method Fit 3-state (U/H/M) to InfI and InfII probes separately Transform InfI U and M probes using the inverse of the cumulative beta distribution estimated from the respective InfII probes For H probes perform dilation transformation to fit the data into the gap Teschendorff et al. Bioinformatics 2012