Periodic clusters. Non periodic clusters That was only the beginning…

Slides:



Advertisements
Similar presentations
GS 540 week 5. What discussion topics would you like? Past topics: General programming tips C/C++ tips and standard library BLAST Frequentist vs. Bayesian.
Advertisements

Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
March 03 Identification of Transcription Factor Binding Sites Presenting: Mira & Tali.
Transcriptome Sequencing with Reference
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Human: 78 tissues (Su et al, 2004) Stastical significance P. falciparum: intra-erythrocytic development cycle Yeast: 78 co-expression clusters From k-mers.
Comparative Motif Finding
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, (16 April 2004)
Whole Genome Polymorphism Analysis of Regulatory Elements in Breast Cancer AAGTCGGTGATGATTGGGACTGCTCT[C/T]AACACAAGCGAGATGAAGAAACTGA Jacob Biesinger Dr.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.
Chris Chander, Luke Adea BioSci D145 Feb. 12, 2015
Combinatorics of promoter regulatory elements determines gene expression profiles Yitzhak (Tzachi) Pilpel Priya Sudarsanam George Church DJ Club, Feb.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Cis-regulatory element study in transcriptome Jin Chen CSE Fall
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Regulation of transcript stability and post-transcriptional processes – from yeast to human Reut Shalgi Weizmann Institute of Science, Israel RSMD workshop.
Igor Ulitsky.  “the branch of genetics that studies organisms in terms of their genomes (their full DNA sequences)”  Computational genomics in TAU ◦
Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
Statistical Analysis for Word counting in Drosophila Core Promoters Yogita Mantri April Bioinformatics Capstone presentation.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Figure 2: over-representation of neighbors in the fushi-tarazu region of Drosophila melanogaster. Annotated enhancers are marked grey. The CDS is marked.
Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis Presented by Rhee, Je-Keun Graduate Program in Bioinformatics.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Conservation and Evolution of Cis-Regulatory Systems Tal El-Hay Computational Biology Seminar חנוכה תשס"ו December 2005.
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
Gene Expression and Networks. 2 Microarray Analysis Supervised Methods -Analysis of variance -Discriminate analysis -Support Vector Machine (SVM) Unsupervised.
Signatures of Accelerated Somatic Evolution in Gene Promoters in Multiple Cancer Types Update Talk Kyle Smith De Lab.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Tools for Comparative Sequence Analysis Ivan Ovcharenko Lawrence Livermore National Laboratory.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Discovery of transcription networks Lecture3 Nov 2012 Regulatory Genomics Weizmann Institute Prof. Yitzhak Pilpel.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
Cis-regulatory Modules and Module Discovery
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Comparative Genomics Methods for Alternative Splicing of Eukaryotic Genes Liliana Florea Department of Computer Science Department of Biochemistry GWU.
Statistical Tests We propose a novel test that takes into account both the genes conserved in all three regions ( x 123 ) and in only pairs of regions.
Finding genes in the genome
Transcription factor binding motifs (part II) 10/22/07.
Enhancers and 3D genomics Noam Bar RESEARCH METHODS IN COMPUTATIONAL BIOLOGY.
Regulation of Gene Expression
Figure 1. Annotation and characterization of genomic target of p63 in mouse keratinocytes (MK) based on ChIP-Seq. (A) Scatterplot representing high degree.
Algorithms for Regulatory Motif Discovery
Structure of proximal and distant regulatory elements in the human genome Ivan Ovcharenko Computational Biology Branch National Center for Biotechnology.
Recitation 7 2/4/09 PSSMs+Gene finding
Eukaryotic Comparative Genomics
(A) Graph depicting interaction between the five regulatory motifs measured by synergy and cooccurrence. (A) Graph depicting interaction between the five.
Cis-regulatory evolution of duplicate genes in yeasts
Volume 38, Issue 4, Pages (May 2010)
Phylogenetic footprinting and shadowing
Revealing Global Regulatory Perturbations across Human Cancers
Mapping Global Histone Acetylation Patterns to Gene Expression
Presented by, Jeremy Logue.
Revealing Global Regulatory Perturbations across Human Cancers
Predicting Gene Expression from Sequence
Presented by, Jeremy Logue.
Origins and Impacts of New Mammalian Exons
Derek de Rie and Imad Abuessaisa Presented by: Cassandra Derrick
Presentation transcript:

Periodic clusters

Non periodic clusters

That was only the beginning…

The human cell cycle G1-Phase S-Phase G2-Phase M-Phase

The proliferation cluster genes are cell cycle periodic G2/M G1/S CHR Samples Gene Expression Proportion All genes Proliferation genes

TSS NFY E2F ELK1 CDE CHR The cell cycle motifs are enriched among the periodic genes Not in the cluster, mutated in cancer Tabach et al. Mol Sys Biol 2005

Potential regulatory motifs in 3’ UTRs Finding 3’ UTRs elements associated with high/low transcript stability (in yeast) AAGCTTCCCCTACAAC Entire genome

Time/tissues Expression Clustering Motif finding Diagnosing motifs using expression Reverse the inference flow

Once we reverse the inference order we can Enumerate and score all possible k-mer motifs Examine the effect of “mutations” on motifs Examine the effect of motif location within promoter Examine the effect of motif combinations, distances within a combination More?

…But the correlation between gene cluster and motifs is imprecise in both directions: there are genes in the cluster without the motif and many genes with the motif do not respond. If gene control is multifactorial, groups of genes defined by a common motif will not be mutually disjointed partitioning the data into disjoint clusters will cause loss of information.

A k-mer enumeration method: score every possible k-mer for an association with expression level Ag is expression level of gene g C is a basal expression level (same for all gs) The integer Nμg equals the number of occurrences of motif μ in gene g M a set of motifs Fμ is the increase/decrease in expression level caused by the presence of motif μ (same for all gs)

Time Expression level Time Expression level EC score = 0.05 EC score = 0.5 ScanACE (Hughes et al.) Motifs characterization through Expression Coherence (EC)

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * EC1=0EC2=0.66 EC3=0.2EC4=0.2 Threshold distance, D Expression coherence score, intuition

Interaction of motifs Expression level Only M1Only M2 Expression level M1 AND M2 G2 M1 M2

Synergistic motifs A combination of two motifs is called ‘synergistic’ if the expression coherence score of the genes that have the two motifs is significantly higher than the scores of the genes that have either of the motifs SFFMcm1

A global map of combinatorial expression control *High connectivity *Hubs *Alternative partners in various conditions Pilpel et al. Nature Genetics 2001

Deduced network Properties G1 G2 Mbp1 Ndt80 Ume6 MCM1' MCB MSE URS1 SCB MCM1' SFF' Correlation ExpressionCoherence Fkh1 Swi4 Sufficiency Necessity Ho et al. Nature TF-TF interaction Hierarchy

Detect the effect of mutations in a motif

ATG Distance and orientation of motifs affect expression profiles

Some typical expression patterns

A Bayesian approach (conditional probability) Xi could “1” to denote denote: The presences of motif m It’s distance from TSS is < N It’s on the coding strand It neighbors another motif m’ Or “0” otherwise e i = being expressed in patter i

Example: two rRNA processing motifs The two motifs Work together The two motifs’ orientation matters

The procedure Given that P(N|D)=P(N)*P(D|N) / P(D): Search in the space of possible Ns to look for a one that maximizes the above probability Impossible to enumerate all possible networks Use cross validation: partition the data into 5 gene sets, learn the rules based on all but one and test based on the left-out, each time.

For example: what does it take to belong to expression patter (4)? Need to have RRPE and PAC If PAC is not within 140 bps from ATG, but RRPE is within 240 bps then the probability of pattern 4 is 22% If PAC is within 140 and RRPE is within 240 bp then 100% chance

Inferring various logical conditions (“gates”) on motif combinations

The Bayesian network predicts very accurately expression profiles

Can make useful predictions in worm

The modern synthetic approach

Motif discovery from evolutionary conservation data

S. Cerevisiae S. mikatae, S. kudriavzevii, S. bayanus). S. castellii S. Kluyveri Their intergenic sequences average 59 to 67% identity to their S. cerevisiae orthologs in global Alignments S. castellii and S. Kluyveri ~40% identity to Cerevisae

Nucleotide conservation in promoters is highest close to the TSS TATA-containing genes All genes

? ? ? ? ?

A set of discovered motifs

NATURE | VOL 434 | 17 MARCH 2005

The data Examined intergenic regions of human mouse rate and dog ~18,000 genes “Promoters”: 4kb centered on TSS 3UTRs based on RNA annotations 64 Mb, and 15 Mb in total respectively Negative control: Introns of ~120 Mb % of alignable sequence: promoters: 51% (44% upstream and 58% downstream of the TSS), 3’ UTR: 73%, Introns:34%, Entire genome: 28%

The phylogenetic trees Questions: How would addition of species affect analyses? What if the sequences were not only mammalian?

An example: a known binding site of Err-  in the GABPA promoter Questions: What is the “meaning” of the other conserved positions?

Discovery of new motifs: exhaustive enumeration of all 6-mers

Targets of new motifs showed defined expression patterns

Motifs often show clear positional bias – close to TSS

Same methods to look for motifs in 3’ UTRs reveals strand-specific motifs