Factors affecting mRNA expression in a large population study Peter J. Munson, Ph.D. Mathematical and Statistical Computing Laboratory Division of Computational.

Slides:



Advertisements
Similar presentations
Genetic Analysis of Genome-wide Variation in Human Gene Expression Morley M. et al. Nature 2004,430: Yen-Yi Ho.
Advertisements

Microarray Pitfalls Stem Cell Network Microarray Course, Unit 3 October 2006.
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center.
Data integration across omics landscapes Bing Zhang, Ph.D. Department of Biomedical Informatics Vanderbilt University School of Medicine
Recursive Partitioning Method on Survival Outcomes for Personalized Medicine 2nd International Conference on Predictive, Preventive and Personalized Medicine.
RNA-seq: the future of transcriptomics ……. ?
Teresa Przytycka NIH / NLM / NCBI RECOMB 2010 Bridging the genotype and phenotype.
Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520
Gene expression analysis summary Where are we now?
Microarrays Dr Peter Smooker,
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Data visualization in the post-genomics era Carol Morita Genentech, Inc.
Future Trends: Translational Informatics James J. Cimino Chief, Laboratory for Informatics Development Mark O. Hatfield Clinical Research Center National.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Introduction of Cancer Molecular Epidemiology Zuo-Feng Zhang, MD, PhD University of California Los Angeles.
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2015 Xiaole Shirley Liu Please Fill Out Student Sign In.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Manipulating the Genome: DNA Cloning and Analysis 20.1 – 20.3 Lesson 4.8.
Presented by Karen Xu. Introduction Cancer is commonly referred to as the “disease of the genes” Cancer may be favored by genetic predisposition, but.
Diabetes and Endocrinology Research Center The BCM Microarray Core Facility: Closing the Next Generation Gap Alina Raza 1, Mylinh Hoang 1, Gayan De Silva.
Introduction The goal of translational bioinformatics is to enable the transformation of increasingly voluminous genomic and biological data into diagnostics.
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
June Detecting Alternative Splicing using the Human Affymetrix Exon Array 1.0 Instructors: Jennifer Barb, Zoila Rangel, Peter Munson June 15, 2009.
Manolis Kellis Broad Institute of MIT and Harvard
A cell and its population of genes :. DNA forms double strands by a process called hybridization:
Strong Heart Family Study Phase VI Genetics Center Aims October 8, 2009.
ARC Biotechnology Platform: Sequencing for Game Genomics Dr Jasper Rees
Does gene order matter? Cis-regulatory elements, proteins, and messengers are integrated into biological circuits. Does gene location in the genome affect.
Human Genomics Chapter 5. Human Genomics Human genomics is the study of the human genome. It involves determining the sequence of the nucleotide base.
Chapter 13. The Impact of Genomics on Antimicrobial Drug Discovery and Toxicology CBBL - Young-sik Sohn-
Data Type 1: Microarrays
Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.
Gene expression and DNA microarrays Old methods. New methods based on genome sequence. –DNA Microarrays Reading assignment - handout –Chapter ,
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
The Center for Medical Genomics facilitates cutting-edge research with state-of-the-art genomic technologies for studying gene expression and genetics,
INTRODUCTION Nutrigenomics Dr. Muhamad Firdaus
BIOMARKERS Diagnostics and Prognostics. OMICS Molecular Diagnostics: Promises and Possibilities, p. 12 and 26.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
Verna Vu & Timothy Abreo
Functional Genomics Carol Bult, Ph.D. Course coordinator The Jackson Laboratory Winter/Spring 2012 Keith Hutchison, Ph.D. Course co-coordinator.
Epigenetic Analysis BIOS Statistics for Systems Biology Spring 2008.
NHLBI Genomics Core Facility. Kim Woodhouse Hangxia Qiu, Ph.D Tony Cooper Xiuli Xu, Ph.D Bio-Informatics Nalini Raghavachari, Ph.D Wet lab Peter Munson,
MPL Identification of alternative spliced mRNA variants related to cancers by genome-wide ESTs alignment KIM DAE SOO Oncogene Apr.
The generalized transcription of the genome Víctor Gámez Visairas Genomics Course 2014/15.
The Role and Mechanism of PPAR  in the Transcriptional Regulation of its Target Genes Jinlu Cai 1, Henry L. Keen 2,Thomas L. Casavant 3,4,5, and Curt.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Central dogma: the story of life RNA DNA Protein.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Genetic Cancer Susceptibility (GCS) Genomics in I4C James McKay.
A Report on CAMDA’01 Biointelligence Lab School of Computer Science and Engineering Seoul National University Kyu-Baek Hwang and Jeong-Ho Chang.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Functional Genomics Carol Bult, Ph.D. Course coordinator The Jackson Laboratory Winter/Spring 2011 Keith Hutchison, Ph.D. Course co-coordinator.
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
An atlas of genetic influences on human blood metabolites Nature Genetics 2014 Jun;46(6)
Nature as blueprint to design antibody factories Life Science Technologies Project course 2016 Aalto CHEM.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
Considerations for multi-omics data integration Michael Tress CNIO,
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu.
Detect alternative splicing
Integromic Analysis of Genetic Variation and Gene Expression Identifies Networks for Cardiovascular Disease PhenotypesCLINICAL PERSPECTIVE by Chen Yao,
Genomes and Their Evolution
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
Presentation transcript:

Factors affecting mRNA expression in a large population study Peter J. Munson, Ph.D. Mathematical and Statistical Computing Laboratory Division of Computational Bioscience Center for Information Technology, NIH

Systems Biology Has been greatly facilitated by completion of human genome Can only proceed if high-quality, broad, deep datasets are available Growing number of such datasets in model systems (yeast, mouse, zebrafish) are available Limited number of such datasets exist in human: – GWAS studies (not clear if useful to systems biology) – NCI-60, Affymetrix tissue data, Novartis GeneAtlas, e.g.

Traditional laboratory research has great depth (many details) Population studies have great breadth Genomically-informed Systems Biology requires both depth and breadth (many observations on many components) Space of “systems-friendly” datasets Breadth Depth

Space of “systems-friendly” datasets Breadth Depth

Space of “systems-friendly” datasets Breadth Depth

3 billion base pairs One SNP every 300 bp Space of “systems-friendly” datasets Breadth Depth

6 million parts, 1500 aircraft Moderately-sized molecular simulation, 1000 atoms, 100 million steps Space of “systems-friendly” datasets Breadth Depth

GWAS studies listed at NCBI dbGAP Space of “systems-friendly” datasets Breadth Depth

Functional Genomics: We wish to measure not just identity, but quantity of ~30,000 transcripts comprised of 300,000 exons This is now measurable in single Affymetrix HuEx1.0_st array We want this on a very large number of samples Space of “systems-friendly” datasets Breadth Depth

Broad Connectivity Map measured how expression of 12,000 genes is affected by ~1,000 compounds, hormones, drugs, biologics using standard cell lines. Space of “systems-friendly” datasets Breadth Depth

Framingham SABRe project 3 case-control study assesses RNA expression in 222 cases of MI, CABG, PRCD, ABI with 222 age, sex matched controls. Space of “systems-friendly” datasets Breadth Depth

When completed SABRe Project 3 will assay 5,000+ samples from Framingham population, for expression of 300,000 exons, 20,000 genes, accompanied by detailed health histories Space of “systems-friendly” datasets Breadth Depth

Affymetrix HuEx_1.0_st Array 6.5 million probes, 1.4 million probesets targeting 1.2 million exons, every known or predicted exon in the genome Allows for genome-wide screening of expression and alternative splicing events

SABRe CVD Project 3 Phase 1: Feasibility study. Choose appropriate sample type (whole blood, PBMC fraction, lymphoblastoid cell lines), based on 50 samples of each type – completed 10/2009 Phase 2: Case-control study of MI, CABG, PRCD, ABI with age, sex matched controls – completed 7/2010 Phase 3: ~2,000 Offspring generation samples –12/2010 ~3,000 Gen3 Exam 1 samples – 7/2010

Analytical Challenges Quality control Detect significant biomarkers Account for un-matched covariates Account for Batch effects

Principal Components Analysis contro l case No separation of case control in PC1, PC2

Principal Components Analysis Samples handled robotically in batches of 96 Cases/controls balanced within batch One batch per week Substantial batch effect (as expected)

Preliminary Result 279 genes are significant at FDR<50%, Paired t-test

Other Factors Affecting Expression MANOVA of gene expression on covariates using 20 PCs (45% of total variability) Sex (primarily due to presence of chrY) Batch (need better ways to mitigate this effect!) Identify genes affected by Smoking, Triglyceride level, Age and maybe Aspirin Use Can now identify biomarker genes (later exons) for Case-ness

Further Steps Account (adjust) for covariates Mixed-effect model analysis to better account for batch Network analysis (systems level) Pathway analysis of candidate biomarkers (bioinformatics) Identify biomarkers by "Triangulation" -- combine gene expression with genetic variation (SNPs), proteomic, lipomic, metabolomic data on same individuals Goal: Better understanding of mechanisms leading to CVD, myocardial infarction and stroke Goal: Create a high quality, "systems friendly" dataset for systems modeling

Acknowledgements MSCL – Jennifer Barb – Zhen Li – Antej Nuhanovic – Roby Joehanes – Tianxia Wu – Delong Liu – James Bailey NHLBI Microarray Lab – Nalini Raghavachari – Richard Wang – Poching Liu – Hangxia Qiu – Kim Woodhouse – Yanqin Yang – Mark Gladwin Framingham Heart Study – Dan Levy, Dir. – Paul Courchesne – Chris O’Donnell, Assoc. Dir