Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages.

Slides:



Advertisements
Similar presentations
Analysis of Microarray Genomic Data of Breast Cancer Patients Hui Liu, MS candidate Department of statistics Prof. Eric Suess, faculty mentor Department.
Advertisements

BioInformatics (3).
Basic Gene Expression Data Analysis--Clustering
Supervised and unsupervised analysis of gene expression data Bing Zhang Department of Biomedical Informatics Vanderbilt University
Introduction to Bioinformatics
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Gene Expression Chapter 9.
Introduction to DNA Microarrays Todd Lowe BME 88a March 11, 2003.
Analysis of microarray data. Gene expression database – a conceptual view Samples Genes Gene expression levels Sample annotations Gene annotations Gene.
Gene Regulation and Microarrays. Overview A. Gene Expression and Regulation B. Measuring Gene Expression: Microarrays C. Finding Regulatory Motifs.
Microarray Data Preprocessing and Clustering Analysis
Genome annotation. What we have GATCAATGATGATAGGAATTGAAAGTGTCTTAATTACAATCCCTGTGCAATTATTAATAACTTTTTTGTT CACCTGTTCCCAGAGGAAACCTCAAGCGGATCTAAAGGAGGTATCTCCTCAAAAGCATCCTCTAATGTCA.
Parallel human genome analysis: Microarray-based expression monitoring of 1000 genes Mark Schena, Dari Shalon, Renu Heller, Andrew Chai, Patrick O. Brown,
Identification of regulatory elements. Transcriptional Regulation Strongest regulation happens during transcription Best place to regulate: No energy.
Microarray II. What is a microarray Microarray Experiment RT-PCR LASER DNA “Chip” High glucose Low glucose.
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Microarray I. Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages.
Microarrays. Regulation of Gene Expression Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages.
Microarrays Technology behind microarrays Data analysis approaches
Clustering (Gene Expression Data) 6.095/ Computational Biology: Genomes, Networks, Evolution LectureOctober 4, 2005.
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
Experimental methods in genome analysis. Genomic sequences are boring GATCAATGATGATAGGAATTGAAAGTGTCTTAATTACAATCCCTGTGCAATTATTAATAACTTTTTTGTT CACCTGTTCCCAGAGGAAACCTCAAGCGGATCTAAAGGAGGTATCTCCTCAAAAGCATCCTCTAATGTCA.
Analysis of microarray data
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Cluster Analysis for Gene Expression Data Ka Yee Yeung Center for Expression Arrays Department of Microbiology.
Fuzzy K means.
Inferring the nature of the gene network connectivity Dynamic modeling of gene expression data Neal S. Holter, Amos Maritan, Marek Cieplak, Nina V. Fedoroff,
Introduce to Microarray
Review of important points from the NCBI lectures. –Example slides Review the two types of microarray platforms. –Spotted arrays –Affymetrix Specific examples.
CS262 Lecture 17, Win07, Batzoglou Gene Regulation and Microarrays.
Statistical Analysis of Microarray Data
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Gene expression.
and analysis of gene transcription
Analysis of microarray data
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
COT 6930 HPC & Bioinformatics Microarray Data Analysis
Microarrays (Gene Chips) Pioneered by Pat Brown in mid 1990’s To monitor thousands of mRNAs simultaneously Comparative Northern blot on thousands of genes.
DNA MICROARRAYS WHAT ARE THEY? BEFORE WE ANSWER THAT FIRST TAKE 1 MIN TO WRITE DOWN WHAT YOU KNOW ABOUT GENE EXPRESSION THEN SHARE YOUR THOUGHTS IN GROUPS.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
Microarrays.
LSM3241: Bioinformatics and Biocomputing Lecture 8: Gene Expression Profiles and Microarray Data Analysis Prof. Chen Yu Zong Tel:
CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
Microarray Data Analysis (Lecture for CS498-CXZ Algorithms in Bioinformatics) Oct 13, 2005 ChengXiang Zhai Department of Computer Science University of.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
Quantitative analysis of 2D gels Generalities. Applications Mutant / wild type Physiological conditions Tissue specific expression Disease / normal state.
Transcription Central Dogma Sense (codon) vs antisense (non-coding) Types of RNAs Structural genes vs regulatory genes RNA polymerases (I, II, III) Promoters.
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Gene Expression and Networks. 2 Microarray Analysis Supervised Methods -Analysis of variance -Discriminate analysis -Support Vector Machine (SVM) Unsupervised.
Lecture 7. Functional Genomics: Gene Expression Profiling using
An Overview of Clustering Methods Michael D. Kane, Ph.D.
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
Computational Biology Clustering Parts taken from Introduction to Data Mining by Tan, Steinbach, Kumar Lecture Slides Week 9.
Analyzing Expression Data: Clustering and Stats Chapter 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Microarray: An Introduction
Gene Expression Analysis
DNA Chip Data Interpretation Tools: Genmapp & Dragon View
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
Presentation transcript:

Microarrays

Regulation of Gene Expression

Cells respond to environment Heat Food Supply Responds to environmental conditions Various external messages

Where gene regulation takes place Opening of chromatin Transcription Translation Protein stability Protein modifications

Transcriptional Regulation Strongest regulation happens during transcription Best place to regulate: No energy wasted making intermediate products However, slow response time After a receptor notices a change: 1.Cascade message to nucleus 2.Open chromatin & bind transcription factors 3.Recruit RNA polymerase and transcribe 4.Splice mRNA and send to cytoplasm 5.Translate into protein

Transcription Factors Binding to DNA Transcription regulation: Certain transcription factors bind DNA Binding recognizes DNA substrings: Regulatory motifs

RNA Polymerase TBP Promoter and Enhancers Promoter necessary to start transcription Enhancers can affect transcription from afar Enhancer 1 TATA box Gene X DNA binding sites Transcription factors

Example: A Human heat shock protein TATA box: positioning transcription start TATA, CCAAT: constitutive transcription GRE: glucocorticoid response MRE: metal response HSE: heat shock element TATASP1 CCAAT AP2 HSE AP2CCAAT SP1 promoter of heat shock hsp GENE Motifs:

The Cell as a Regulatory Network AB Make D C If C then D If B then NOT D If A and B then D D Make B D If D then B C gene D gene B B Promoter D Promoter B

DNA Microarrays Measuring gene transcription in a high- throughput fashion

Measuring transcription AAAAAAAAA Gene (DNA) Transcript (RNA) RNA polymerase – cellular enzyme AAAAAAAAA TTTTTTTTT Synthetic primer (oligo dT) Reverse transcriptase (RT) – Retroviral enzyme - Flourescence tags Extract RNA Complementary DNA (cDNA) Expression ~ RNA ~ flourescence

What is a microarray

What is a microarray (2) A 2D array of DNA sequences from thousands of genes Each spot has many copies of same gene Allow mRNAs from a sample to hybridize Measure number of hybridizations per spot

How to make a microarray Method 1: Printed Slides (Stanford) –Use PCR to amplify a 1 kb portion of each gene / EST –Apply each sample on glass slide Method 2: DNA Chips (Affymetrix) –Grow oligonucleotides (20bp) on glass –Several words per gene (choose unique words) If we know the gene sequences, Can sample all genes in one experiment!

Microarray Experiment RT-PCR LASER DNA “Chip” High glucose Low glucose

Raw data – images Red (Cy5) dot – overexpressed or up-regulated Green (Cy3) dot – underexpressed or down- regulated Yellow dot –equally expressed Intensity - “absolute” level cDNA plotted microarray

Levels of analysis Level 1: Which genes are induced / repressed? Gives a good understanding of the biology Methods: Factor-2 rule, t-test. Level 2: Which genes are co-regulated? Inference of function. -Clustering algorithms. Level 3: Which genes regulate others? Reconstruction of networks. - Transcriptions factor binding sites.

Experiment: time course Time 0 Genes Sample annotations Gene annotations Intensity (Red) Intensity (Green)

Experiment: time course Time 0.5 Genes Intensity (Red) Intensity (Green) Time 0

Experiment: time course Genes Time (hours)

Gene expression database Genes Gene expression levels Samples Sample annotations Gene annotations Gene expression matrix

Gene expression database Samples Genes Gene expression matrix Timeseries, Conditions A, B, … Mutants in genes a, b … Etc.

Data normalization expression of gen x in experiment i expression of gen x in reference Logarithm of ratio - treats induction and repression of identical magnitude as numerical equal but with opposite sign. red/green - ratio of expression – 2 - 2x overexpressed – x underexpressed log 2 ( red/green ) - “log ratio” – 1 2x overexpressed – -1 2x underexpressed

Analysis of multiple experiments Expression of gene x in m experiments can be represented by an expression vector with m elements Z-transformation: If X ~ N(  ),

Level 1 2-fold rule: Is a gene 2-fold up (or down) regulated? Students t-test: Is the regulation significantly different from background variation? (Needs repeated measurements)

T-test X ~ N(  ), Cannot reject H 0 Reject H 0 The p-value is the probability of drawing the wrong conclusion by rejecting a null hypothesis 

Multiple testing In a microarray experiment, we perform 1 test / gene Prob (correct) = 1 –  c Prob (globally correct) = (1 –  c  n Prob (wrong somewhere) = 1 - (1 –  c  n  e = 1 - (1 –  c  n For small  e :  c   e  n Bonferroni correction for multiple testing of independent events Single comparison Experiment comparison

Multiple testing GenesTreated 1Treated 2Control 1Control 2p-value Gene Gene Gene Gene Gene Gene Gene Gene Gene Gene Gene Gene Gene Gene Gene Gene Gene Gene Gene Gene Significance level

Clustering Hierachical clustering: - Transforms n (genes) * m (experiments) matrix into a diagonal n * n similarity (or distance) matrix Similarity (or distance) measures: Euclidic distance Pearsons correlation coefficent Eisen et al PNAS 95:

Vectors in space: distances Gene 1 Gene 2 Experiment 1 Experiment 3 Experiment 2 d

Distance Measures: Minkowski Metric r r m i ii m m yxyxd yyyy xxxx myx ||),( )( )(      by defined is metric Minkowski The :features have both and objects two Suppose  

Most Common Minkowski Metrics ||max),( ||),( 1 ||),( ii m i m i ii m i ii yxyxd r yxyxd r yxyxd r            )distance sup"(" 3, distance) (Manhattan 2, ) distance (Euclidean 1,

An Example 4 3 x y

Similarity Measures: Correlation Coefficient. and :averages )()( ))(( ),(           m i i m m i i m m i m i ii m i ii yyxx yyxx yyxx yxs

Similarity Measures: Correlation Coefficient Time Gene A Gene B Gene A Time Gene B Expression Level Time Gene A Gene B

Clustering of Genes and Conditions Unsupervised: –Hierarchical clustering –K-means clustering –Self Organizing Maps (SOMs)

Ordered dendrograms Hierachical clustering: Hypothesis: guilt-by-association Common regulation -> common function Eisen98

Hierarchical Clustering Given a set of n items to be clustered, and an n*n distance (or similarity) matrix, the basic process hierarchical clustering is this: 1.Start by assigning each item to its own cluster, so that if you have n items, you now have n clusters, each containing just one item. Let the distances (similarities) between the clusters equal the distances (similarities) between the items they contain. 2.Find the closest (most similar) pair of clusters and merge them into a single cluster, so that now you have one less cluster. 3.Compute distances (similarities) between the new cluster and each of the old clusters. 4.Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.

Merge two clusters by: Single-Link Method / Nearest Neighbor (NN): minimum of pairwise dissimilarities Complete-Link / Furthest Neighbor (FN): maximum of pairwise dissimilarities Unweighted Pair Group Method with Arithmetic Mean (UPGMA): average of pairwise dissimilarities

Single-Link Method Diagonal n*n distance Matrix Euclidean Distance b a cd (1) cd a,b (2) a,b,c d (3) a,b,c,d

Complete-Link Method b a Distance Matrix Euclidean Distance (1) (2) (3) a,b ccd d c,d a,b,c,d

Compare Dendrograms Single-LinkComplete-Link

Serum stimulation of human fibroblasts (24h) Cholesterol biosynthesis Celle cyclus I-E response Signalling/ Angiogenesis Wound healning

Partitioning k-means clustering Self organizing maps (SOMs)

k-means clustering Tavazoie et al Nature Genet. 22:

k-Means Clustering Algorithm 1) Select an initial partition of k clusters 2) Assign each object to the cluster with the closest centre 3) Compute the new centres of the clusters 4) Repeat step 2 and 3 until no object changes cluster

1. centroide

2. centroide 3. centroide 4. centroide 5. centroide 6. centroide k = 6

1. centroide 2. centroide 3. centroide 5. centroide 6. centroide k = 6

1. centroide 2. centroide 3. centroide 4. centroide 5. centroide 6. centroide k = 6

Self organizing maps Tamayo et al PNAS 96:

1. centroide2. centroide3. centroide 4. centroide 5. centroide6. centroide k = (2,3) = 6

k = 6