A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.

Slides:



Advertisements
Similar presentations
Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Advertisements

Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Gene regulation /function card Anatomical network card Tassy et al., Figure S1: Navigation diagram of ANISEED Anatomical structure card Expression card.
Annotation standards in ORegAnno (Draft) Obi Griffith The RegCreative Jamboree Nov 29, 2006 Ghent, Belgium.
PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Computational detection of cis-regulatory modules Stein Aerts, Peter Van Loo, Ger Thijs, Yves Moreau and Bart De Moor Katholieke Universiteit Leuven, Belgium.
A Novel Knowledge Based Method to Predicting Transcription Factor Targets
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Work Process Using Enrich Load biological data Check enrichment of crossed data sets Extract statistically significant results Multiple hypothesis correction.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
1 Using Gene Ontology. 2 Assigning (or Hypothesizing About) Biological Meaning to Clusters What do you want to be able to to? –Identify over-represented.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
Cis/TF discovery for Arabidopsis Aristotelis Tsirigos NYU Computer Science.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Tutorial 5 Motif discovery.
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
An analysis of “Alignments anchored on genomic landmarks can aid in the identification of regulatory elements” by Kannan Tharakaraman et al. Sarah Aerni.
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE
Genome Informatics 2005 ~ 220 participants 1 keynote speaker: David Haussler 47 talks 121 posters.
Automatic methods for functional annotation of sequences Petri Törönen.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Networks and Interactions Boo Virk v1.0.
Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.
Reconstruction of Transcriptional Regulatory Networks
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
HUMAN-MOUSE CONSERVED COEXPRESSION NETWORKS PREDICT CANDIDATE DISEASE GENES Ala U., Piro R., Grassi E., Damasco C., Silengo L., Brunner H., Provero P.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Analysis of the yeast transcriptional regulatory network.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Overview of Bioinformatics 1 Module Denis Manley..
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Tools for Comparative Sequence Analysis Ivan Ovcharenko Lawrence Livermore National Laboratory.
Statistical Testing with Genes Saurabh Sinha CS 466.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Bioinformatics and Computational Biology
Introduction to biological molecular networks
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Cluster validation Integration ICES Bioinformatics.
Local Multiple Sequence Alignment Sequence Motifs
Inference with Gene Expression and Sequence Data BMI/CS 776 Mark Craven April 2002.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Motif Search and RNA Structure Prediction Lesson 9.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.
The Transcriptional Landscape of the Mammalian Genome
Statistical Testing with Genes
Computational Discovery of miR-TF Regulatory Modules in Human Genome
Statistical Testing with Genes
Presentation transcript:

A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua Ruan

Overview n Osteocytes – Background & Motivation n Review of Biological Central Dogma n Osteoctye gene set derivation u Osteocyte purification u Microarray experiments u Functional annotation analysis n Sequence Analysis of promoter regions n Construction of regulatory network n Partitioning to define cis-regulatory modules n Results

Background – Cellular functions n Certain types of cells perform specific biological functions u Key genes must be activated to perform correctly n Osteocytes play an essential role in regulating bone formation and remodeling u We want to identify these key genes and the activators of these genes

Why study osteocyte cells? n Identifying these key genes (and their activators) involved in the bone-formation process may lead to new targeted therapies u For osteoporosis, loss of bone in space travel, extended bed rest, etc.

Molecular Biology Central Dogma

u We want to identify these associations between Transcription Factors and the genes that they regulate in order to build a “transcriptional regulatory network”

Osteocyte cells are hard to isolate n Embedded within the bone matrix, and lacking molecular and cell surface markers, they are seemingly inaccessible n How to characterize and isolate these cells? n Solution: create “special” mouse that contains inserted “special” gene that drives fluorescence in osteocytes

Isolating osteocytes n Osteocytes are known to highly express Dentin matrix protein 1 (DMP1) u A transgene was created with the same promoter (activation) region as DMP1 that drives GFP, then inserted into this transgenic mouse u Cells that highly express DMP1 (osteocytes) will also drive GFP n We can now purify osteocytes from other cells using fluorescence-activated cell sorting

Identifying key osteocyte genes using microarray n Microarray experiments allow us to measure the activity of genes (expression profile) n We compared the expression profiles of the purified osteocyte cells (+GFP) to non-osteocyte cells (-GFP) u Identified the top 269 genes expressed > 3 fold in the +GFP as compared to –GFP (FDR- corrected p-value 3 fold in the +GFP as compared to –GFP (FDR- corrected p-value < 0.05)

Identifying functionally-related osteocyte genes n Each of the 269 genes has one or more GO terms or PIR-keywords associated with it u Gene Ontology (GO) terms describe biological processes, cellular components and molecular functions u Protein Information Resource (PIR) keyword is an annotation from the PIR database

Functional Annotation Clustering n For each GO term associated with a gene or group of genes within the 269 set, a p-value is computed using hypergeometric dist. and adjusted for multiple testing using Benjamini method n Enrichment score per cluster is the geometric mean of the indivual GO p-vals. n DAVID Bioinformatics Tool was used for the clustering

Functional annotation clustering results n As expected, most enriched clusters relate to “extracellular region”, “system development”, etc. n Cluster 2 relates to bone, and interestingly, Cluster 5 relates to muscle n We narrowed our 269 gene set to these 98 genes corresponding to bone and muscle

Identifying TF Binding Sites in the 98 gene set n We searched the 5kb promoter sequence upstream to TSS of each gene for known TF binding motifs from TRANSFAC db, using rVista tool n Filtered the TF motifs to keep only those conserved between mouse and human genomes n Conserved motifs increase confidence

Identifying TF Binding Sites in the 98 gene set n Many motifs identified related to bone & muscle n 67 of the 98 genes contained over 10 conserved Mef2 binding sites in their promoters n Bone & muscle genes and their number of conserved Mef2 binding sites

Building the transcriptional regulatory network n Created a network consisting of the 98 gene set and their conserved and enriched TF’s as nodes n An edge between a gene and a TF represents the statistically significant presence of that TF’s binding site on the promoter of that gene n TF’s filtered using conservation AND enrichment to produce more reliable edges and reduce noise n Enrichment of a TF motif is determined by a p-value based on the # of occurrences in the 5kb upstream of this gene, as compared to the # of occurrences in the 5kb upstream of the rest of the genes in the genome

Modular structure of the regulatory network n Final network consisted of 98 genes and 153 conserved and over-represented TF’s n To identify possible combinatorial effects of TFBS, we partitioned the genes in the network using the Q-Cut algorithm n Q-Cut is a graph partitioning algorithm for finding dense subnets (i.e., communities). Optimizes a statistical score called the modularity, and automatically determines the most appropriate number of communities

n We reduced noise and created a more sparse gene-gene network for better partitioning n We created this temporary network by assigning a cosine similarity score to each pair of genes according to their shared TF’s. n Cosine similarity is a measure of similarity between two vectors (each vector contains 153 slots for the 153 enriched TFs in the 98 gene set) n Edges between genes represent their similarity score, and this net was converted to a sparse net by connecting each gene to its k nearest neighbors (k=7) and employing a similarity score cutoff of 0.5

Identifying modules in the initial regulatory network n Q-Cut was then applied to this gene-gene network, resulting in communities with many common TF binding sites

Interesting clusters n Cluster below shows a strong community structure between 16 genes and their common TFBS n Representative of many TF’s coordinately regulating a small set of genes

A putative model of a transcriptional network n A proposed model was built using the network results n DMP1 & Sost (highly expr. in osteocytes) are shown to be regulated by Mef2 and Myogenin

Putative model used to generate hypotheses We now have an ex vivo system for pure osteocytes in a proper microenvironment to conduct experimental validation based on this model We now have an ex vivo system for pure osteocytes in a proper microenvironment to conduct experimental validation based on this model n Here the osteocytes will make appropriate levels of osteocyte-specific genes n Experiments are currently underway

Conclusions n We used a systems biology method to construct a putative transcriptional regulatory network model for osteocytes, by integrating n Microarray data n Functional annotation n Comparative genomics n Graph-theoretic knowledge n Many parts of the network can be confirmed by the literature n Experiments are currently underway to further validate the model