Data integration across omics landscapes Bing Zhang, Ph.D. Department of Biomedical Informatics Vanderbilt University School of Medicine

Slides:



Advertisements
Similar presentations
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
Advertisements

OMICS Group International is an amalgamation of Open Access publications and worldwide international science conferences and events. Established in the.
Bioinformatics lectures at Rice University Li Zhang Lecture 10: Networks and integrative genomic analysis-2 Genome instability and DNA copy number data.
TCGA(The cancer genome atlas) catalogue genetic mutations responsible for cancer, using genome sequencing and bioinformatics The TCGA is sequencing the.
Gene expression analysis summary Where are we now?
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Introduction to BioInformatics GCB/CIS535
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Microarray Data Analysis Using R Studies in Tissue Databases Mark Reimers, NCI.
Kelly Ruggles, Ph.D. Proteomics Informatics March 31, 2015
Integrative omics analysis Qi Liu Center for Quantitative Sciences Vanderbilt University School of Medicine
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
A combination of the words Proteomics and Genomics. Proteogenomics commonly refer to studies that use proteomic information, often derived from mass spectrometry,
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Presented by Karen Xu. Introduction Cancer is commonly referred to as the “disease of the genes” Cancer may be favored by genetic predisposition, but.
Pharmacogenomics and personalized medicines Jean-Marie Boeynaems
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Proteomics Informatics – Data Analysis and Visualization (Week 13)
Genome Informatics 2005 ~ 220 participants 1 keynote speaker: David Haussler 47 talks 121 posters.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Radiogenomics in glioblastoma multiforme
Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.
Data Analysis Summary. Elephant in the room General Comments General understanding that informatics is integral in medical sequencing and other –omics.
Experimental validation. Integration of transcriptome and genome sequencing uncovers functional variation in human populations Tuuli Lappalainen et al.
Finish up array applications Move on to proteomics Protein microarrays.
Karl Clauser Proteomics and Biomarker Discovery Breast Cancer Proteomics and the use of TCGA Mutational Data - Broad Institute update/issues Karl Clauser.
MicroRNA regulation in Arabidopsis thaliana
COMPUTATIONAL ANALYSIS OF MULTILEVEL OMICS DATA FOR THE ELUCIDATION OF MOLECULAR MECHANISMS OF CANCER Presented by Azeez Ayomide Fatai Supervisor: Junaid.
Transcriptomics Sequencing. over view The transcriptome is the set of all RNA molecules, including mRNA, rRNA, tRNA, and other non coding RNA produced.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Central dogma: the story of life RNA DNA Protein.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
No reference available
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
(1) Genotype-Tissue Expression (GTEx) Largest systematic study of genetic regulation in multiple tissues to date 53 tissues, 500+ donors, 9K samples, 180M.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
CCRC Cancer Conference November 8, 2015.
Advances and challenges in computational modeling and statistical learning of biological systems Qi Liu Department of Biomedical Informatics Vanderbilt.
Considerations for multi-omics data integration Michael Tress CNIO,
Canadian Bioinformatics Workshops
 Facilities Open House Functional Genomics Facility Molishree Joshi, Ph.D. 6/1/2015 Contact Information:
Integrative Genomics. Double-helix DNA strands are separated in the gene coding region Which enzyme detects the beginning of a gene ? RNA Polymerase (multi-subunit.
Molecular Biology of Cancer AND Cancer Informatics (omics) David Boone.
The regulation of Caspase 8 chIP-seq motifs mRNA expression DNA methylation.
CRISPRi-based genome-scale identification of functional long noncoding RNA loci in human cells Presented by Nur Ata Bruss and Xinyi Ma.
Cancer Genomics Core Lab
A graph-based integration of multiple layers of cancer genomics data (Progress Report) Do Kyoon Kim 1.
Connecting Cancer Genomics to Cancer Biology using Proteomics
Many Sample Size and Power Calculators Exist On-Line
Post-GWAS and Mechanistic Analyses
“Proteomics is a science that focuses on the study of proteins: their roles, their structures, their localization, their interactions, and other factors.”
Molecular Therapy - Nucleic Acids
Genome organization and Bioinformatics
Proteomics Informatics David Fenyő
HIS-24 regulates expression of infection-inducible genes.
Relationship between Genotype and Phenotype
Working in the Post-Genomic C. elegans World
Genomic landscapes of the two typical NEs found in this study.
Interaction networks of the regulated phosphoproteins.
Session 1: WELCOME AND INTRODUCTIONS
Integrative omic approaches for the study of host–pathogen interactions Integrative omic approaches for the study of host–pathogen interactions (A) Proteomic.
Proteomics Informatics David Fenyő
Presentation transcript:

Data integration across omics landscapes Bing Zhang, Ph.D. Department of Biomedical Informatics Vanderbilt University School of Medicine

Omics data integration CNCP DNA mRNA Protein Elephant

Informatics approaches to integrate genomic and proteomic data CNCP Genomic data Proteomic data Novel biological insights Genomic data Improved proteomic data analysis Protein expression MS/MS Protein PTMMS/MS, protein arrays Proteome CPTAC CNV LOH DNA Methylation Exon expression Junction expression Gene expression Mutations Sequence variants arrayCGH, SNP Array SNP Array Methylation Array Array, RNA-Seq RNA-Seq Array, RNA-Seq Exome Sequencing RNA-Seq Exome Sequencing RNA-Seq Genome Transcriptome EG TechnologyData Type TCGA The Cancer Genome Atlas Clinical Proteomic Tumor Analysis Consortium

Using genomic data to improve proteomic data analysis  Project 1. customProDB: generating customized protein databases to enhance protein identification in shotgun proteomics  Project 2. NetWalker: prioritizing candidate gene lists for targeted MRM analysis Integrating genomic and proteomic data to gain novel biological insights  Project 3. miRNA-mediated regulation: understanding post- transcriptional mechanisms regulating human gene expression  Project 4. NetGestalt: viewing and correlating cancer omics data within a biological network context Informatics approaches to integrate genomic and proteomic data CNCP2012 4

customProDB: motivation CNCP Database search commonly used database Expressed proteins Unexpressed proteins Proteins with sequence variation

Increased sensitivity Reduced ambiguity Variant peptides Customized protein database from RNA-Seq data CNCP Wang et al., J Proteome Res, 2012

R package Compatible with both DNA and RNA sequencing data Sample specific database and consensus database Application to the CPTAC project Spectral library CustomProDB: moving forward CNCP Wang et al., manuscript in preparation

miRNA regulation: motivation miRNA expression mRNA expression Protein/mRNA ratio Protein expression mRNA decay Translation repression Combined effect Inverse correlation 8 CNCP2012

miRNA regulation: data preparation 9 colorectal cancer cell lines Protein expression data: Current study mRNA expression data: GSE10843 miRNA expression data: GSE CNCP2012

miRNA regulation: data analysis workflow 10 Liu et al., manuscript in preparation CNCP2012

Early studies suggest a major role of translational repression  Olsen et al. Dev Biol, 1999; Zeng et al., Molecular Cell, 2001 Recent large-scale studies suggest a predominant role of mRNA decay  Baek et al., Nature, 2008; Selbach et al., Nature, 2008; Guo et al., Nature, 2010 Our study suggested equally important roles of mRNA decay and translational repression  Translational repression was involved in 58% and played a major role in 30% of all predicted miRNA-targeted interactions  Most miRNAs exert their effect through both mRNA decay and translational repression  Sequence features known to drive site efficacy in mRNA decay were generally not applicable to translational repression miRNA regulation: mRNA decay or translational repression? 11 CNCP2012

miR-138 prefers translational repression 12 CNCP2012

NetGestalt: motivation CNCP DNA mutation methylation DNA mutation methylation mRNA expression splicing mRNA expression splicing Protein expression modification Protein expression modification Phenotype Network

NetGestalt: scalable network representation CNCP Total number of modules (size >30): 92 Functional homogeneity: 63 (69%) Spatial homogeneity: 55 (60%) Dynamic homogeneity: 69 (75%) Homogeneity of any type: 82 (89%) 3210

Viewing data as tracks  Heat map (e.g. gene expression data)  Bar chart (e.g. fold changes, p values)  Binary track (e.g. significant genes, GO) Comparing binary tracks  Clickable Venn diagram Enrichment analysis  Network modules  GO terms  Pathways Navigating at different scales  Zoom  Pan  2D graph visualization NetGestalt: viewing and cross-correlating data CNCP Shi et al., manuscript under revision

CNCP Browsing data sources Viewing data as tracks Comparing tracks Identifying modules Annotating modules Moving across scales

CNCP Luminal B Basal Proteomics -log(p) signed Diff proteins -log(p) signed Diff proteins Luminal B Basal -log(p) signed Diff genes PNNL TCGA Ruler Network modules Vandy Microarray Browsing data sources Viewing data as tracks Comparing tracks Identifying modules Annotating modules Moving across scales

CNCP Luminal B Basal Proteomics -log(p) signed Diff proteins -log(p) signed Diff proteins Luminal B Basal -log(p) signed Diff genes PNNL TCGA Ruler Network modules Vandy Microarray 45% 51% 4% 0% Browsing data sources Viewing data as tracks Comparing tracks Identifying modules Annotating modules Moving across scales

CNCP Vandy PNNL -log(p) signed Luminal B Basal -log(p) signed Ruler Network modules Microarray Luminal B Basal Enriched Modules Browsing data sources Viewing data as tracks Comparing tracks Identifying modules Annotating modules Moving across scales

CNCP Browsing data sources Viewing data as tracks Comparing tracks Identifying modules Annotating modules Moving across scales Vandy PNNL -log(p) signed (Vandy) -log(p) signed (PNNL) Luminal B Basal -log(p) signed Ruler Network modules Microarray Luminal B Basal Enriched Modules MRM targets DNA damage response Gene symbol

CNCP Browsing data sources Viewing data as tracks Comparing tracks Identifying modules Annotating modules Moving across scales Vandy PNNL Luminal B Basal -log(p) signed Ruler Network modules Microarray Luminal B Basal Enriched Modules MRM targets DNA damage response Gene symbol -log(p) signed (Vandy) -log(p) signed (PNNL)

CNCP Browsing data sources Viewing data as tracks Comparing tracks Identifying modules Annotating modules Moving across scales Luminal B Basal Proteomics -log(p) signed Luminal B Basal -log(p) signed Ruler Network modules Microarray Enriched Modules Proteomics Microarray T cell activation

Using genomic data to improve proteomic data analysis  Project 1. customProDB: generating customized protein databases to enhance protein identification in shotgun proteomics  Project 2. NetWalker: prioritizing candidate gene lists for targeted MRM analysis Integrating genomic and proteomic data to gain novel biological insights  Project 3. miRNA-mediated regulation: understanding post- transcriptional mechanisms regulating human gene expression  Project 4. NetGestalt: viewing and correlating cancer omics data within a biological network context Informatics approaches to integrate genomic and proteomic data CNCP

Qi Liu Jing Wang Xiaojing Wang Jing Zhu Dan Liebler Rob Slebos Dave Tabb Zhiao Shi Acknowledgement CNCP Funding: NIGMS R01GM NCI U24CA NCI P50CA095103