Download presentation
Presentation is loading. Please wait.
Published byDwain Morton Modified over 8 years ago
1
Roy Williams PhD Sanford | Burnham Medical Research Institute
2
The Omics Revolution
3
Omics Biology Methods Expression data mRNA/ MicroRNA (microarrays, RNA- seq): A tool for studying how large numbers of genes interact with each other and how a cell's regulatory networks control vast batteries of genes simultaneously Proteomics: the branch of genetics that studies the full set of proteins encoded by a genome chIP-chip, chIP-seq: ChIP-on-chip is used to investigate interactions between proteins and DNA in vivo Epigenetics: epigenetics is the study of changes in phenotype (appearance) or gene expression caused by mechanisms other than changes in the underlying DNA sequence; e.g. bp methylation. Metabolomics: systematic study of the unique chemical fingerprints (metabolite profiles) that specific cellular processes leave behind
4
Omics Workflows Converge Acquire Sequence Generate Bases, FASTQ Count Reads Align, Annotate Interpret Data Statistical test, Visualization, Pathway Analysis Scan Array Determine Intensity MS Spectral counts, Align Acquire Peptide Sequence Hybridize Array
5
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein would be more direct, but is currently harder. Measuring Gene Expression
6
Assumption of microarray/RNA-seq technology Use mRNA transcript abundance level as a measure of expression for the corresponding gene Proportional to degree of gene expression
7
How to measure RNA abundance RNA-seq: short reads from SOLiD, ion torrent, 454, Solexa. Align, count, normalize reads to length of gene (RPKM). Illumina bead array – highly redundant oligo array (also Affymetrix, NimbleGen) Spotted 2-colour array (very long cDNA; low redundancy) SAGE (random Sanger sequencing of cDNA library)
8
RNA-seq for digital expression
9
The Illumina Beadarray Technology Highly redundant ~50 copies of a bead 60mer oligos Absolute expression Each array is deconvoluted using a color coding tag system Human, Mouse, Rat, Custom
10
Affymetrix Technology Highly redundant (~25 short oligos per gene) Absolute expression PM-MM oligo system valuable for cross hybe detection Human, Mouse, E. coli, Yeast…….. Affy and Illumina arrays have been systematically compared
11
Spotted Arrays Low redundancy cDNA and oligo Two dyes Cy5/Cy3 Relative expression Cost and custom
12
Microarrays in action off on
13
The Application of Expression Studies Differential gene expression between two (or more) sample types Similar gene expression across treatments Tumour sub-class identification using gene expression profiles Identification of “marker” genes that characterize different cell types Identification of genes associated with clinical outcomes (e.g. survival)
14
Experimental Design Design Experiment Biological Replicates 2x 3 chips <2x 5 chips Perform Experiment Standardize conditions Batch effects Dump outliers
15
Transcriptome Data Analysis Workflow Quality Control Normalize Data Set up experimental data Filter for differential expression Advanced analysis techniques- clustering Compare results to biology; NextBio, GeneGo; IPA
16
Analysis Software for Microarray and RNA-seq data Free Software – GenePattern / IGV -- powerful, many plug-in packages and pipelines -- good video examples/tutorials GeneSpring GX11 (commercial, 1 public copy) Partek (commercial) GE-Workbench (free, nice workflows) R-Bioconductor (free, with guidance) Cytoscape, GSEA – for pathway visualisation IPA, NextBio, GeneGo <= Burnham subscriptions!
17
Log Transformed Data 2/2 = 1log2(1) = 0 4/1=4log2(4) = +2 ¼=0.25log2(0.25) = -2 Transformation often performed before normalisation Skewed raw Normal distribution
18
After QC for low confidence genes (P<0.99) Note: ~50 replicate beads per array Median Outliers 25% quartile 75% quartile BAD CHIP BOXPLOT REPRESENTATION OF DATA SPREAD CHIP NUMBER SIGNAL INTENSITY
19
The effect of quantiles Normalisation on the filtered 36 data sets IMPORTANT: use non-linear normalisation >library(affy) >Qdata <- normalize.quantiles(Rawdata) All same range
20
Data Analysis Examples 1# Illumina arrays with GeneSpring GX11 2# Affymetrix data, with a GenePattern module Import, Quality Control, normalize Detect differentially expressed genes Pathway analysis
21
Illumina Analysis Workflow Check array hybridisation quality Direct Export file as “sample probe profile” Import into GENESPRING GX11 Genome Studio Application: process binary.idat files to txt Normalisation here is optional
22
GeneSpring GX11 features Guided workflows Pathways GSEA IPA integration Ontologies MySQL R script API
23
Illumina Advanced Workflow
24
Grouping Sample Replicates
25
QC: Check Replicates Are Similar
26
Scatterplot of replicates
27
Scatterplot of differently treated samples
28
Filter genes on P-value
29
Filter for significantly different genes in a Volcano plot
30
Significant Pathway Determination
31
Which types of genes are enriched in a cluster? Idea: Compare your cluster of genes with lists of genes with common properties (function, expression, location). Find how many genes overlap between your cluster and a gene list. Calculate the probability of obtaining the overlap by chance. This measures if the enrichment is significant. This analysis provides an unbiased way of detecting connections between expression and function. 25 0 7 GeneOntology Cell cycle Our Cell cycle 15000
32
Automatically Send list to IPA for pathway Analysis
33
Significant Pathways sent to Ingenuity Pathway Analysis
34
Completed Analysis genelists Data Pathways
35
Affymetrix Workflow: GenePattern
36
Comparative Marker Selection
37
Paste the URLs for Data files
38
Send results to next module Viewer module
39
Outputs ranked list of genes List of Marker genes can be Filtered and exported
40
Cluster FAQ: what do the hierarchy of black lines making up the tree mean, How do I interpret them? Answer: The vertical black lines joining related genes together represent the correlation distance between those genes. Hierarchical clustering programs first join the two genes with the highest correlation. http://llama.mshri.on.ca/funcassociate/ (converting geneID to gene names) http://llama.mshri.on.ca/funcassociate/
41
FAQ: How to find transcriptional regulatory networks from expression data? Reverse engineering using gene expression data If a gene is upregulated following an increased production of a transcription factor, or down-regulated following a knockout of a transcription factor, a regulatory interaction between the two is inferred Inferring networks by predicting cis-regulatory elements Known TFBS used to make inferences about regulatory interactions. The set of genes which are predicted to have a binding site are hypothesized to be regulated by the corresponding transcription factor.
42
The ume6 regulon in yeast
43
FAQ: How to find transcriptional regulatory networks from expression data ? ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) (Basso 2005, Margolin 2006a, 2006b). Known transcription factors are given as hub genes to seed the system ARACNe output in Cytoscape
44
Networks: Take Whole Cell Approach
45
Similarities All commercial off site tools – microarray/NGS data Upload your gene lists to analysis tool Tool detects networks/ontologies in your data They can give different results (!) Allows you to look for connections between genes and drugs/small molecules/diseases Focused on Man and Mouse FAQ: GeneGo - IPA – NextBio; similarities and differences? How do I get an account?
46
Differences IPA: most user friendly interface NextBio Based on experimental analysis and comparisons Reanalysis of public data and made platform independent GeneGo: More refined classification of diseases/networks 3 click analysis gives a nice report 50 PhD curators, more journals, data QC higher FAQ: GeneGo - IPA – NextBio; similarities and differences?
47
Start a new core analysis
48
IPA determines functions
49
Overlay drug and disease data
50
NextBio Compares your Genelists to the NextBio database Can reveal unexpected similarities between datasets Has a very good literature database connected to the results Contains data from model organisms
51
The NextBio Report Page
52
What else does my gene do?
53
Resources Many thanks for coming! Website: http://bsrweb.burnham.org/ We are located in building 10, Offices 2405/6 Feel free to come and ask questions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.