Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.

Slides:



Advertisements
Similar presentations
Periodic clusters. Non periodic clusters That was only the beginning…
Advertisements

Regulomics II: Epigenetics and the histone code Jim Noonan GENE760.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Gene regulatory network
Gene regulation in cancer 11/14/07. Overview The hallmark of cancer is uncontrolled cell proliferation. Oncogenes code for proteins that help to regulate.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Cis/TF discovery for Arabidopsis Aristotelis Tsirigos NYU Computer Science.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
In silico cis-analysis promoter analysis - Promoters and cis-elements - Searching for patterns - Searching redundant patterns.
Fuzzy K means.
1 Predicting Gene Expression from Sequence Michael A. Beer and Saeed Tavazoie Cell 117, (16 April 2004)
Whole Genome Polymorphism Analysis of Regulatory Elements in Breast Cancer AAGTCGGTGATGATTGGGACTGCTCT[C/T]AACACAAGCGAGATGAAGAAACTGA Jacob Biesinger Dr.
Promoter Analysis using Bioinformatics, Putting the Predictions to the Test Amy Creekmore Ansci 490M November 19, 2002.
ICA-based Clustering of Genes from Microarray Expression Data Su-In Lee 1, Serafim Batzoglou 2 1 Department.
Bryan Heck Tong Ihn Lee et al Transcriptional Regulatory Networks in Saccharomyces cerevisiae.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Comprehensive Gene Expression Analysis of Prostate Cancer Reveals Distinct Transcriptional Programs Associated With Metastatic Disease Kevin Paiz-Ramirez.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,
Genome Informatics 2005 ~ 220 participants 1 keynote speaker: David Haussler 47 talks 121 posters.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Radiogenomics in glioblastoma multiforme
From motif search to gene expression analysis
Chapter 11 Objectives Section 1 Control of Gene Expression
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Kristen Horstmann, Tessa Morris, and Lucia Ramirez Loyola Marymount University March 24, 2015 BIOL398-04: Biomathematical Modeling Lee, T. I., Rinaldi,
Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Finish up array applications Move on to proteomics Protein microarrays.
Vidyadhar Karmarkar Genomics and Bioinformatics 414 Life Sciences Building, Huck Institute of Life Sciences.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Gene expression analysis
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis Presented by Rhee, Je-Keun Graduate Program in Bioinformatics.
Regulation of Gene Expression. You Must Know The functions of the three parts of an operon. The role of repressor genes in operons. The impact of DNA.
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Ranjit Ganta, Raj Acharya, Shruthi Prabhakara Department of Computer Science and Engineering, Penn State University DATA WAREHOUSE FOR BIO-GEO HEALTH CARE.
MCB 317 Genetics and Genomics Topic 11 Genomics. Readings Genomics: Hartwell Chapter 10 of full textbook; chapter 6 of the abbreviated textbook.
An Overview of Clustering Methods Michael D. Kane, Ph.D.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Statistical Testing with Genes Saurabh Sinha CS 466.
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
Cluster validation Integration ICES Bioinformatics.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Pattern Discovery and Recognition for Understanding Genetic Regulation Timothy L. Bailey Institute for Molecular Bioscience University of Queensland.
Transcription factor binding motifs (part II) 10/22/07.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
Other uses of DNA microarrays
Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Gene expression.
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Revealing Global Regulatory Perturbations across Human Cancers
Revealing Global Regulatory Perturbations across Human Cancers
Nora Pierstorff Dept. of Genetics University of Cologne
Predicting Gene Expression from Sequence
Presentation transcript:

Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Genomics and pathology Genomics provides high-throughput measurements of molecular mechanisms –Microarrays, ChIP-on-chip, etc. Genomics may provide the molecular underpinnings of pathology, in a highly comprehensive manner –Revolutionize the diagnosis and management of diseases, including cancer

Prior applications to cancer Gene expression measurements have been applied to cancer diagnosis Measure each gene’s expression in several normal tissue samples, and several pathological (diseased) samples Find subset of genes differentially expressed in the two sample groups If such “gene signatures” of particular cancer types are found, they can become the basis of tests for malignancy

We want better … Genes may be differentially expressed, but not enough to cross certain thresholds used in the analysis Analyzing the data on a gene-by-gene basis is error prone -- microarray data has inherent noise Finding the genes involved in one type of cancer is only the first step; it does not reveal the underlying processes

Part 1: Cancer modules

A “module” level view Many methods use “gene modules” (sets of genes) as basic blocks for analysis Instead of trying to find changes in individual gene expression profiles, look out for entire sets of genes with changing expression profiles

The study of Mootha et al. Showed that expression of “oxidative phosphorylation” genes (a particular set of genes) is reduced in diabetic muscle Signal not very strong when looking at individual genes, but highly significant when looking at the “gene module”

Source: Nature Genetics 37, S38 - S45 (2005) Disease tissue (Diabetes mellitus type 2) Normal tissue (Normal tolerance to glucose) Grey: all genes Red: oxidative phosphorylation genes

Segal et al.: Methodology Compile a large collection of cancer-related microarrays –microarrays measuring gene expression in cancer tissues or normal tissue Compile a large collection of gene sets (modules) from earlier studies Identify gene set (modules) induced or repressed in a microarray Identify modules induced in several arrays, or repressed in several arrays Check if these arrays are enriched in some clinical annotation

Identify gene set (modules) induced or repressed in a microarray Given expression value E g,m of each gene g in the microarray experiment m Compute average expression E g of the gene g over all microarrays If E g,m is 2-fold greater than E g, call the gene g as induced in array m Categorize each gene as being induced or not-induced in the array. Source: Nature Genetics 36, (2004)

Identify gene set (modules) induced or repressed in a microarray |All genes| = N |Module| = n |Induced| = m |Intersection| = k Hypergeometric test(N,n,m,k): If a set of m genes was chosen at random (sampling w/o replacement), what is the probability that the intersection would be larger than or equal to k? All genes Module Induced Intersection

Identify gene set (modules) induced or repressed in a microarray |All genes| = N |Module| = n |Induced| = m |Intersection| = k Hypergeometric test(N,n,m,k): Sum over i>=k: If a set of m genes was chosen at random (sampling w/o replacement), what is the probability that the intersection would be equal to i? All genes Module Induced Intersection

Identify gene set (modules) induced or repressed in a microarray |All genes| = N |Module| = n |Induced| = m |Intersection| = k Hypergeometric test(N,n,m,k): All genes Module Induced Intersection “p-value” of the Hypergeometric test

Identify gene set (modules) induced or repressed in a microarray |All genes| = N |Module| = n |Induced| = m |Intersection| = k Hypergeometric test(N,n,m,k) If the “p-value” is very small, then we infer that the intersection is “statistically significant”, i.e., the module is induced in the microarray Similarly define module repressed in microarray All genes Module Induced Intersection

Segal et al.: Methodology Compile a large collection of cancer-related microarrays –microarrays measuring gene expression in cancer tissues or normal tissue Compile a large collection of gene sets (modules) from earlier studies Identify gene set (modules) induced or repressed in a microarray Identify modules induced in several arrays, or repressed in several arrays Check if these arrays are enriched in some clinical annotation

Source: Nature Genetics 36, (2004)

Segal et al.: Methodology Compile a large collection of cancer-related microarrays –microarrays measuring gene expression in cancer tissues or normal tissue Compile a large collection of gene sets (modules) from earlier studies Identify gene set (modules) induced or repressed in a microarray Identify modules induced in several arrays, or repressed in several arrays Check if these arrays are enriched in some clinical annotation

Identify modules induced in several arrays, or repressed in several arrays Source: Nature Genetics 36, (2004)

Segal et al.: Methodology Compile a large collection of cancer-related microarrays –microarrays measuring gene expression in cancer tissues or normal tissue Compile a large collection of gene sets (modules) from earlier studies Identify gene set (modules) induced or repressed in a microarray Identify modules induced in several arrays, or repressed in several arrays Check if these arrays are enriched in some clinical annotation

Source: Nature Genetics 36, (2004)

Segal et al: Cancer “module maps” Source: Nature Genetics 37, S38 - S45 (2005) Red(m,c): Microarrays in which module m was overexpressed (induced) are enriched in condition c Green: Microarrays in which module m was underexpressed (repressed) are enriched in condition c Rows and columns are not in an arbitrary order. They have been “clustered” to display similar rows (or columns) together

Insights from cancer module map Some modules activated or repressed across many tumor types. Such modules could be related to general tumorogenic processes Some modules specifically activated or repressed in certain tumor types or stages of tumor progression

From modules to regulation A module map shows the transcriptional changes underlying cancer Transcriptional changes are a result of transcription factors and their binding sites A deeper understanding of cancer would come from finding out which transcription factors and binding sites led to the transcriptional changes

Part 2: Cis-regulatory elements

Genomics and gene regulation Such knowledge comes from genomics data ChIP-chip studies identify which transcription factors bind which DNA sequences Analysis of DNA sequence, using known binding site motifs, gives us putative binding sites Cross-species conservation also tells us something about possible locations of binding sites

Cis-regulatory analysis Identify a set of genes whose promoters contain the same binding sites –Such a set of genes is likely to be regulated by the same TF –Often called a “regulatory module” Earlier studies mined microarrays for “co- expressed” genes, then used motif finding algorithms to discover their shared binding sites

Cis-regulatory analysis Another approach (Segal et al. 2003) tried to solve the problem in an integrated manner Find a set of genes such that –their expression profiles are similar (microarrays) –they share the same binding sites (sequence) Joint learning of “regulatory module” from two very different types of data: microarray and sequence –An important theme in current bioinformatics

Cis-regulatory analysis Connection between gene expression and cis-regulatory elements (binding sites) also explored in Beer & Tavazoie. Found rules on combinations and locations of binding sites that would cause the gene to be over- or under- expressed

The binding sites “RRPE” and “PAC” must occur within 240 bp and 140 bp of gene start Genes containing both motifs, following certain rules on location, are tightly co-regulated Genes containing any one motif, or both in incorrect positional configuration, have close to random expression Source: Nature Genetics 37, S38 - S45 (2005)

Eukaryotes These studies have mostly focused on yeast (which is a eukaryote, but has a small, compact genome) Not much work of this type in the longer, more complex genomes of metazoans (e.g., humans, rodents, fruitflies) The genome is not compact; may not suffice to look at sequence right next to a gene. Intergenic regions are long, and cis- regulatory signals may not be close to gene

One study in humans HeLa cells are an “immortal” cell-line derived from cervical cancer cells in a person who died in –Used extensively in studying cancer Method of Segal et al. (joint learning of regulatory modules from gene expression and sequence data) applied to these cells

One study in humans Gene expression data used: microarrays measuring genes during cell cycle in HeLa cells Sequence: 1000 bp promoters (upstream) of human genes

Result of analysis: Two motifs found to be shared by this set of genes. The genes have similar expression profiles. One of the identified motifs (NFAT) known to be involved in cell-cycle Source: Nature Genetics 37, S38 - S45 (2005)

Summary The common theme is to analyze sets of genes, and relate their common expression patterns to cancer types or to presence of cis-regulatory motifs Search algorithms may be required to identify some of these features