Gene Expression Analysis
DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray consists of a small membrane or glass slide containing samples of many genes arranged in a regular pattern.
Microarray Applications Identify gene function Similar expression can infer similar function Find tissue/developmental specific genes Different expression in different cells/tissues Find genes affected by different conditions Different expression under different conditions Diagnostics Different genes expression can indicate a disease state
Chips or Microarrays Different types of microarray technologies Spotted Microarray Two channel cDNA microarrays. DNA chips- (Affymetrix, Agilent), One channel oligonucleotide arrays
Microarray Experiment http://www.bio.davidson.edu/Courses/genomics/chip/chip.html
Experimental Protocol Two Channel Arrays Design an experiment (probe design) Extract RNA molecules from cell Label molecules with fluorescent dye Pour solution onto microarray Then wash off excess molecules Shine laser light onto array Scan for presence of fluorescent dye 6. Analyze the microarray image
Analyzing Microarray Images One gene or mRNA One tissue or condition Original Image
The ratio of expression is indicated by the intensity of the color Red= High mRNA abundance in the experiment sample Green= High mRNA abundance in the control sample Transforming raw data to ratio of expression Cy5 Cy3 Cy5 log2 Cy3 Cy3 Cy5
The ratio of expression is indicated by the intensity of the color Red= High mRNA abundance in the experiment sample Green= High mRNA abundance in the control sample Transforming raw data to ratio of expression Cy5 Cy3 Cy5 log2 Cy3 Cy3 Cy5
Expression Data Format Conditions normal hot cold uch1 -2.0 0.0 0.924 gut2 0.398 0.402 -1.329 fip1 0.225 0.225 -2.151 msh1 0.676 0.685 -0.564 vma2 0.41 0.414 -1.285 meu26 0.353 0.286 -1.503 git8 0.47 0.47 -1.088 sec7b 0.39 0.395 -1.358 apn1 0.681 0.636 -0.555 wos2 0.902 0.904 -0.149 Genes / mRNAs
One channel DNA chips Each sequence is represented by a probe set 1 probe set = N probes (Affymetrix 16 probes of length 25 mer). Unknown sequence or mixture (target) colored with on\e fluorescent dye. Target hybridizes to complimentary probes only The fluorescence intensity is indicative of the expression of the target sequence
Affymetrix Chip
Designing probes for microarray experiments Probe on DNA chip is shorter than target Choice of which section to hybridize Select a region which is unstructured RNA folding, DNA stem-and-loop Choose region which is target-specific Avoid cross-hybridization with other DNA Avoid regions containing variation Minimize presence of mutation sites
Probe Design Two main factors to optimize Sensitivity Specificity Strength of interaction with target sequence Requires knowledge of target only Specificity Weakness of interaction with other sequences Requires knowledge of ‘background’
Sources of Inaccuracy Some sequences bind better than others Cross-hybridization, A–T versus G–C Scanning of microarray images Scratches, smears, cell spillage Effects of experimental conditions Point in cell cycle, temperature, density
Different types of probes cDNA – Longer probes (~70), more stable reactions Readily available (by reverse transcription) Specific Oligonucleotides 20-60 mers Allow higher density Enable more flexible designs (e.g differentially measuring splice variants)
Splicing Specific Microarrays + Pre-mRNA mRNA Total transcript level
Microarray Analysis Unsupervised Supervised Methods -Partion Methods K-means SOM (Self Organizing Maps) -Hierarchical Clustering Supervised Methods -Analysis of variance -Discriminate analysis -Support Vector Machine (SVM)
Clustering Grouping genes together according to their expression profiles. Hierarchical clustering Michael Eisen, 1998 : Generate a tree based on similarity (similar to a phylogenetic tree) Each gene is a leaf on the tree Distances reflect similarity of expression Internal nodes represent functional groups
Clustering Self Organizing Maps Genes are clustered according to similar expression patterns
What can we learn from clusters with similar gene expression ?? Similar expression between genes One gene controls the other in a pathway Both genes are controlled by another Both genes required at the same time in cell cycle Both genes have similar function Clusters can help identify regulatory motifs Search for motifs in upstream promoter regions of all the genes in a cluster
EXAMPLE HNRPA1 SRp40 hnrnpA1 SRp40 hnrnpA1 binding sites
How can we use microarray for diagnostics?
+ - How can microarrays be used as a basis for diagnostic ? patient 1 Gen1 + - Gen2 Gen3 Gen4 Gen5
Informative Genes Differentially expressed in the two classes. Goal – Identifying (statistically significant) informative genes
+ - How can microarrays be used as a basis for diagnostic ? patinet1 patient 2 patient4 patient 3 patient 5 Gen1 + - Gen3 Gen4 Gen2 Gen5 Informative Genes
Specific Examples Cancer Research Hundreds of genes that differentiate between cancer tissues in different stages of the tumor were found. The arrow shows an example of a tumor cells which were not detected correctly by histological or other clinical parameters. Ramaswamy et al, 2003 Nat Genet 33:49-54
Using SVMs to diagnose tumors based on expression data Each dot represents a vector of the expression pattern taken from a microarray experiment . For example the expression pattern of all genes from a cancer patients.
How do SVM’s work with expression data? In this example red dots can be primary tumors and blue are from metastasis stage. The SVM is trained on data which was classified based on histology. ? After training the SVM we can use it to diagnose the unknown tumor.
Gene Expression Databases and Resources on the Web GEO Gene Expression Omnibus - http://www.ncbi.nlm.nih.gov/geo/ List of gene expression web resources http://industry.ebi.ac.uk/~alan/MicroArray/ Another list with literature references http://www.gene-chips.com/ Cancer Gene Anatomy Project http://cgap.nci.nih.gov/ Stanford Microarray Database http://genome-www.stanford.edu/microarray/
If time permits…..
Predicting function Expression data Structure
other RNA processing export transcription decay splicing export transcription splicing IAI wt decay 2.0 -2.0
Structural Genomics : a large scale structure determination project designed to cover all representative protein structures ATP binding domain of protein MJ0577 Zarembinski, et al., Proc.Nat.Acad.Sci.USA, 99:15189 (1998)
Wanted ! As a result of the Structure Genomic initiative many structures of proteins with unknown function will be solved Wanted ! Automated methods to predict function from the protein structures resulting from the structural genomic project.
Approaches for predicting function from structure ConSurf - Mapping the evolution conservation on the protein structure http://consurf.tau.ac.il/
Approaches for predicting function from structure PHPlus – Identifying positive electrostatic patches on the protein structure http://pfp.technion.ac.il/
Approaches for predicting function from structure SHARP2 – Identifying positive electrostatic patches on the protein structure http://www.bioinformatics.sussex.ac.uk/SHARP2