Inferring Models of cis-Regulatory Modules using Information Theory

Slides:



Advertisements
Similar presentations
Periodic clusters. Non periodic clusters That was only the beginning…
Advertisements

Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Mutual Information Mathematical Biology Seminar
Human: 78 tissues (Su et al, 2004) Stastical significance P. falciparum: intra-erythrocytic development cycle Yeast: 78 co-expression clusters From k-mers.
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Bioinformatics lectures at Rice University Lecture 4: Shannon entropy and mutual information.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Learning Regulatory Networks that Represent Regulator States and Roles Keith Noto and Mark Craven K. Noto and M. Craven, Learning Regulatory.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Introduction to biological molecular networks
Flat clustering approaches
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
Inference with Gene Expression and Sequence Data BMI/CS 776 Mark Craven April 2002.
Module Networks BMI/CS 576 Mark Craven December 2007.
Finding genes in the genome
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Estimating the False Discovery Rate in Genome-wide Studies BMI/CS 576 Colin Dewey Fall 2008.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
EQTLs.
Reverse-engineering transcription control networks timothy s
Figure 1. Overview of the Ritornello approach
Learning gene regulatory networks in Arabidopsis thaliana
Biological networks CS 5263 Bioinformatics.
Motifs BCH364C/394P - Systems Biology / Bioinformatics
Figure 2. Ritornello captures the FLD from single-end sequencing data
Learning Sequence Motif Models Using Expectation Maximization (EM)
Advanced Bioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2018 Anthony Gitter
Mass spectrometry-based proteomics
Interpolated Markov Models for Gene Finding
RNA Secondary Structure Prediction
Eukaryotic Gene Finding
Interpreting noncoding variants
Hierarchical clustering approaches for high-throughput data
Advanced Artificial Intelligence
1 Department of Engineering, 2 Department of Mathematics,
Network topology reflects target gene expression profile and function.
1 Department of Engineering, 2 Department of Mathematics,
(A) Graph depicting interaction between the five regulatory motifs measured by synergy and cooccurrence. (A) Graph depicting interaction between the five.
Sensitivity of RNA‐seq.
Linking Genetic Variation to Important Phenotypes
1 Department of Engineering, 2 Department of Mathematics,
Volume 3, Issue 1, Pages (July 2016)
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
CSCI2950-C Lecture 13 Network Motifs; Network Integration
Learning Sequence Motif Models Using Gibbs Sampling
Volume 1, Issue 6, Pages (December 2015)
Evaluation of inferred networks
Stochastic Context Free Grammars for RNA Structure Modeling
FUNCTIONAL ANNOTATION OF REGULATORY PATHWAYS
Revealing Global Regulatory Perturbations across Human Cancers
Translation of Genotype to Phenotype by a Hierarchy of Cell Subsystems
Evolutionary Rewiring of Human Regulatory Networks by Waves of Genome Expansion  Davide Marnetto, Federica Mantica, Ivan Molineris, Elena Grassi, Igor.
Mapping Global Histone Acetylation Patterns to Gene Expression
Revealing Global Regulatory Perturbations across Human Cancers
Volume 49, Issue 2, Pages (January 2013)
Characteristics of tissue‐specific co‐expression networks (CNs)‏
Rational Design of Therapeutic siRNAs: Minimizing Off-targeting Potential to Improve the Safety of RNAi Therapy for Huntington's Disease  Ryan L Boudreau,
Motifs BCH339N Systems Biology / Bioinformatics – Spring 2016
Presentation transcript:

Inferring Models of cis-Regulatory Modules using Information Theory BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Mark Craven, Colin Dewey, and Anthony Gitter

Overview Biological question What is causing differential gene expression? Goal Find regulatory motifs in the DNA sequence Solution FIRE (Finding Informative Regulatory Elements)

Goals for Lecture Key concepts: Entropy Mutual information (MI) Motif logos Using MI to identify cis-regulatory module elements

A Common Type of Question What causes this set of yeast genes to be up-regulated in stress conditions? Genes Experiments / Conditions Figure from Gasch et al., Mol. Biol. Cell, 2000

cis-Regulatory Modules (CRMs) Co-expressed genes are often controlled by specific configurations of binding sites RNAP …accgcgctgaaaaaattttccgatgagtttagaagagtcaccaaaaaattttcatacagcctactggtgttctctgtgtgtgctaccactggctgtcatcatggttgta… …caaaattattcaagaaaaaaagaaatgttacaatgaatgcaaaagatgggcgatgagataaaagcgagagataaaaatttttgagcttaaatgatctggcatgagcagt… …gagctggaaaaaaaaaaaatttcaaaagaaaacgcgatgagcatactaatgctaaaaatttttgaggtataaagtaacgaattggggaaaggccatcaatatgaagtcg… RNAP RNAP

Information Theory Background Problem Create a code to communicate information Example Need to communicate the manufacturer of each bike

Information Theory Background Four types of bikes Possible code Type code Trek 11 Specialized 10 Cervelo 01 Serotta 00 Expected number of bits we have to communicate: 2 bits/bike

Information Theory Background Can we do better? Yes, if the bike types aren’t equiprobable Optimal code uses bits for event with probability 1 2 3 01 001 000 Type, probability # bits code

Information Theory Background 1 2 3 01 001 000 Type, probability # bits code Expected number of bits we have to communicate: 1.75 bits/bike

Entropy Entropy is a measure of uncertainty associated with a random variable Can be interpreted as the expected number of bits required to communicate the value of the variable entropy function for binary variable Image from Wikipedia

How is entropy related to DNA sequences?

Sequence Logos Typically represent a binding site Height of each character c is proportional to P(c)

Sequence Logos Height of logo at a given position determined by decrease in entropy (from maximum possible) # of characters in alphabet decrease in entropy

Mutual Information Mutual information quantifies how much knowing the value of one variable tells about the value of another entropy of M conditioned on C entropy of M

FIRE Elemento et al., Molecular Cell 2007 Finding Informative Regulatory Elements (FIRE) Given a set of sequences grouped into clusters Find motifs, and relationships, that have high mutual information with the clusters Applicable when sequences have continuous values instead of cluster labels

Mutual Information in FIRE We can compute the mutual information between a motif and the clusters as follows m=0, 1 represent absence/presence of motif c ranges over the cluster labels

Finding Motifs in FIRE Motifs are represented by regular expressions; initially each motif is represented by a strict k-mer (e.g. TCCGTAC) Test all k-mers (k=7 by default) to see which have significant mutual information with the cluster label Filter k-mers using a significance test to obtain motif seeds Generalize each motif seed Filter motifs using a significance test

Key Step in Generalizing a Motif in FIRE Randomly pick a position in the motif Generalize in all ways consistent with current value at position Score each by computing mutual information Retain the best generalization A or G in this position TCCGTAC TCC[AG]TAC TCC[GT]TAC TCC[CG]TAC TCC[ACG]TAC TCC[CGT]TAC TCC[AGT]TAC TCC[ACGT]TAC

Generalizing a Motif in FIRE given: k-mer, n best  null repeat n times motif  k-mer repeat motif  GeneralizePosition(motif) // shown on previous slide until convergence (no improvement at any position) if score(motif) > score(best) best  motif return: best

Generalizing a Motif in FIRE: Example Figure from Elemento et al. Molecular Cell 2007

Avoiding Redundant Motifs Different seeds could converge to similar motifs Use mutual information to test whether new motif is unique and contributes new information TCCGTAC TCCCTAC TCC[CG]TAC TCC[CG]TAC previous motif new candidate motif expression clusters

Characterizing Predicted Motifs in FIRE Mutual information is also used to assess various properties of found motifs orientation bias position bias interaction with another motif

Using MI to Determine Orientation Bias C indicates cluster S=1 indicates motif present on transcribed strand S=0 otherwise (not present or not on transcribed strand) C S 1 1 Also compute MI where S=1 indicates motif present on complementary strand 1 1 1 1 1 1

Using MI to Determine Position Bias P ranges over position bins O=0, 1 indicates whether or not the motif is over-represented in a sequence’s cluster P O 1 1 1 1 1 Only sequences containing the motif are considered for this calculation 2 2 1 P 2 1

Using MI to Determine Motif Interactions M1=0, 1 indicates whether or not a sequence has the motif and is in a cluster for which the motif is over-represented; similarly for M2 M1 M2 1 1 1 1 1 1 1 1 1 1

Using MI to Determine Motif Interactions Yeast motif-motif interactions White: positive association Dark red: negative association Blue box: DNA-DNA Green box: DNA-RNA Plus: spatial co-localization

Discussion of FIRE FIRE mutual information used to identify motifs and relationships among them motif search is based on generalizing informative k-mers Consider advantages and disadvantages of k-mers versus PWMs In contrast to many motif-finding approaches, FIRE takes advantage of negative sequences FIRE returns all informative motifs found

Mutual Information for Gene Networks Mutual information and conditional mutual information can also be useful for reconstructing biological networks Build gene-gene network where edges indicate high MI in genes’ expression levels Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNE)

ARACNE Gaussian kernel estimator to estimate mutual information No binning or histograms Data processing inequality Prune indirect edges Margolin et al. BMC Bioinformatics 2006