1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.

Slides:



Advertisements
Similar presentations
1 Phenotype Prediction by Integrative Network Analysis of SNP and Gene Expression Microarrays Hsun-Hsien Chang 1, Michael McGeachie 1,2 1 Children’s Hospital.
Advertisements

Molecular Systems Biology 3; Article number 140; doi: /msb
Genetic Analysis of Genome-wide Variation in Human Gene Expression Morley M. et al. Nature 2004,430: Yen-Yi Ho.
Geuvadis RNAseq UNIGE Genetic regulatory variants
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Bioinformatics lectures at Rice University Li Zhang Lecture 10: Networks and integrative genomic analysis-2 Genome instability and DNA copy number data.
1 Harvard Medical SchoolMassachusetts Institute of Technology Identifying Differentially Expressed Genes in Time Series Microarrays Jonathan J. Smith 1.
Bioinformatics at IU - Ketan Mane. Bioinformatics at IU What is Bioinformatics? Bioinformatics is the study of the inherent structure of biological information.
Teresa Przytycka NIH / NLM / NCBI RECOMB 2010 Bridging the genotype and phenotype.
Introduction Integrative Analysis of Genomic Variants in Carcinogenesis Syed Haider, Arek Kasprzyk, Pietro Lio Artificial Intelligence and Computational.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
Metabolomics Bob Ward German Lab Food Science and Technology.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Graph Regularized Dual Lasso for Robust eQTL Mapping Wei Cheng 1 Xiang Zhang 2 Zhishan Guo 1 Yu Shi 3 Wei.
Give me your DNA and I tell you where you come from - and maybe more! Lausanne, Genopode 21 April 2010 Sven Bergmann University of Lausanne & Swiss Institute.
Identification of obesity-associated intergenic long noncoding RNAs
Manolis Kellis Broad Institute of MIT and Harvard
1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT.
Geuvadis RNAseq analysis at UNIGE Analysis plans
Characterizing the role of miRNAs within gene regulatory networks using integrative genomics techniques Min Wenwen
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Analyzing DNA Differences PHAR 308 March 2009 Dr. Tim Bloom.
ANALYZING PROTEIN NETWORK ROBUSTNESS USING GRAPH SPECTRUM Jingchun Chen The Ohio State University, Columbus, Ohio Institute.
Data Analysis Summary. Elephant in the room General Comments General understanding that informatics is integral in medical sequencing and other –omics.
Microarrays to Functional Genomics: Generation of Transcriptional Networks from Microarray experiments Joshua Stender December 3, 2002 Department of Biochemistry.
“REWIRING STEM CELLS: NEW TECHNIQUE MAY REVOLUTIONIZE UNDERSTANDING OF HOW GENES FUNCTION” AND “IMPORTANT DISCOVERY FOR DIAGNOSIS OF GENETIC DISEASES”.
A little about how DNA works David Sloane, MD Special Studies, HGSE Brigham and Women’s Hospital Harvard Medical School 2/10/2014David.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Agent-based methods for translational cancer multilevel modelling Sylvia Nagl PhD Cancer Systems Science & Biomedical Informatics UCL Cancer Institute.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
By: Amira Djebbari and John Quackenbush BMC Systems Biology 2008, 2: 57 Presented by: Garron Wright April 20, 2009 CSCE 582.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Introduction to biological molecular networks
ACCELERATING CLINICAL AND TRANSLATIONAL RESEARCH Challenges in Bioinformatics R.W. Doerge Department of Statistics Department Agronomy.
SNPs, Haplotypes, Disease Associations Algorithmic Foundations of Computational Biology II Course 1 Prof. Sorin Istrail.
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring T.R. Golub et al., Science 286, 531 (1999)
Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana Pe’er.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 5.
1 What forces constrain/drive protein evolution? Looking at all coding sequences across multiple genomes can shed considerable light on which forces contribute.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Genetics of Gene Expression BIOS Statistics for Systems Biology Spring 2008.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
Simultaneous identification of causal genes and dys-regulated pathways in complex diseases Yoo-Ah Kim, Stefan Wuchty and Teresa M Przytycka Paper to be.
Eigengenes as biological signatures Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University 3.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
EQTLs.
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
Statistical Applications in Biology and Genetics
Gene Hunting: Design and statistics
Building and Analyzing Genome-Wide Gene Disruption Networks
Linking Genetic Variation to Important Phenotypes
Schedule for the Afternoon
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
An Algorithm for Bayesian Network Construction from Data
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Predicting Gene Expression from Sequence
Genetic and Epigenetic Regulation of Human lincRNA Gene Expression
Presentation transcript:

1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children ’ s Hospital Informatics Program Harvard-MIT Division of Health Sciences and Technology Harvard Medical School March 10, 2010

2 Harvard Medical School Information Flow in Multimodal Genomic Data Genetic Variants –100k – 1000k SNPs –250k copy number variations (CNVs) –250k methylation measurements Transcripts –50k mRNA expression levels –50k microRNA expression levels –1.5M exon expression / splicing Information

3 Harvard Medical School Expression Quantitative Trait Loci (eQTLs) Connection from variant to expression is an information channel –A DNA locus is modulating the expression level of a gene = eQTL Cis(Trans) eQTLs are the genetic variants located close to (far away) genes. Identifying cis-eQTLs is easier –Focusing on cis-eQTL reduces search space –trans eQTLs?

4 Harvard Medical School Cancer: based on genetic modification (variants) and cellular malfunction (gene expression) Identification of eQTLs helps understand molecular mechanisms in cancer and provides biological insight. Clinical study of Acute lymphoblastic leukemia (ALL) –The most common malignancy in children, nearly one third of all pediatric cancers. –A few cases are associated with inherited genetic syndromes (i.e., Down syndrome, Bloom syndrome, Fanconi anemia), but the cause remains unknown. Data –29 patients. –Genotyped 100,000 SNPs (Affymetrix Human Mapping 100K). –Profiled 50,000 gene expressions (Affymetrix HG-U133 Plus 2.0). Clinical Study on Pediatric Leukemia

5 Harvard Medical School Challenges in Finding eQTLs Compare the distribution of each Variant to the levels of each expression measurement –Computational All pairs of variants vs. expressions is costly Usually discretize expression levels (Pensa et al., BioKDD, 2004) –Multiple testing considerations Understanding –Too many associations to test via laboratory science Computational methods of biological discovery Want to summarize main informational (biological) pathways Answer: Use transcriptional information

6 Harvard Medical School Transcriptional Information Channel X Y SNPs are modeled as binomial variables. Expressions are modeled as log-normal variables. Mutual Information quantifies information flow: Higher MI is achieved by larger σ 2 and smaller σ k 2, i.e., when expression level Y is more likely modulated by SNP X. Transcription Channel Info Theory: measures Entropy, H(X)

7 Harvard Medical School Transcript Y is modulated by SNP X : Transcript Y is independent of SNP X :

8 Harvard Medical School Transcriptional Information Map X1X1 Y1Y1 X2X2 Y2Y2 X3X3 X4X4 Y4Y4 X5X5 Y5Y5 X6X6 X7X7 Y7Y7 Y8Y8 X9X9 Y9Y9 X8X8 Y3Y3 Y6Y6

9 Harvard Medical School ALL Transcriptional Information Map of Chr21

10 Harvard Medical School Cluster Genes and SNPs into Networks X1X1 Y1Y1 X2X2 Y2Y2 X3X3 X4X4 Y4Y4 X5X5 Y5Y5 X6X6 X7X7 Y7Y7 Y8Y8 X9X9 Y9Y9 X8X8 Y3Y3 Y6Y6

11 Harvard Medical School X1X1 Y1Y1 Y2Y2 X3X3 X4X4 Y9Y9 X8X8 Cluster Genes and SNPs into Networks We can further infer the optimal modulation patterns using Bayesian networks.

12 Harvard Medical School Bayesian networks are directed acyclic graphs: –Nodes correspond to random variables. –Directed arcs encode conditional probabilities of the target nodes on the source nodes. –p(X) depends on (A,B) –p(Z|X,Y) independent of (A,B) Bayesian Networks AB XYZ

13 Harvard Medical School Infer Bayesian Networks in Individual Clusters X1X1 Y1Y1 Y2Y2 X3X3 X4X4 Y9Y9 X8X8 Step 1: Use TIM as the initial network. Step 2: Bayesian network infers SNP-SNP connections.

14 Harvard Medical School A Bayesian Network Inferred from Chr21 TIM

15 Harvard Medical School Information Theoretic Network Analysis Find hubs, motifs, guilds, etc. –Abstract edges –Global patterns -> local patterns –Reveal emergent properties –Information theoretic approach using Data Compression Alterovitz G, and Ramoni MF, “Discovering biological guilds through topological abstraction,” AMIA Annu Symp Proc, pp. 1-5, 2006.

16 Harvard Medical School Identified Fundamental Components Reference: Alterovitz and Ramoni, AMIA Annu Symp Proc, pp. 1-5, 2006.

17 Harvard Medical School Identification of Cis- and Trans eQTL RIPK4, 21q22.3 –Related to Downs Syndrome –RIPK4 has 5 (trans) SNPs in q11.2 (shown as blue in the figure) affecting its expression. RIPK4

18 Harvard Medical School Identification of Cis and Trans eQTL CYYR1, 21q21.1 –Recently discovered. –Encodes a cysteine and tyrosine-rich protein. –Recent study found a correlation with neuroendocrine tumors. –TIM shows CYYR1 modulated by SNPs across the q arm of chromosome 21. –DSCAM related to Down’s syndrome –DSCAM-CYYR1 interaction leads to ALL? DSCAM

19 Harvard Medical School Complete TIM Algorithm Infer Network in Individual Clusters Cluster 1 Cluster N Compute Transcriptional Information Genetic Variant Transcript Group Linked SNPs and Transcripts Cluster 1 Cluster N... Network Topology Analysis and Summary

20 Harvard Medical School Transcriptional Information Maps Make large multimodal genetic dataset amenable to transcriptional analysis Identifies –Modulation patterns between genetic variants and transcripts. – CIS and TRANS eQTL. Analysis of pediatric ALL helps identify biological hypotheses regarding connection to Down’s syndrome

21 Harvard Medical School Questions? Thanks to Prof. Marco F. Ramoni, Dr. Hsun-Hsien Chang, Dr. Gil Alterowitz, Children’s Hospital Informatics Program, Brigham and Women’s Hospital