Roy Williams PhD Sanford | Burnham Medical Research Institute.

Slides:



Advertisements
Similar presentations
Peter Tsai Bioinformatics Institute, University of Auckland
Advertisements

1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Microarrays Dr Peter Smooker,
DNA Microarray Bioinformatics - #27612 Normalization and Statistical Analysis.
Chip arrays and gene expression data. With the chip array technology, one can measure the expression of 10,000 (~all) genes at once. Can answer questions.
Kate Milova MolGen retreat March 24, Microarray experiments: Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
Kate Milova MolGen retreat March 24, Microarray experiments. Database and Analysis Tools. Kate Milova cDNA Microarray Facility March 24, 2005.
Microarray Technology Types Normalization Microarray Technology Microarray: –New Technology (first paper: 1995) Allows study of thousands of genes at.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Fuzzy K means.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Introduce to Microarray
Analysis of microarray data
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Paola CASTAGNOLI Maria FOTI Microarrays. Applicazioni nella genomica funzionale e nel genotyping DIPARTIMENTO DI BIOTECNOLOGIE E BIOSCIENZE.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Affymetrix vs. glass slide based arrays
‘Omics’ - Analysis of high dimensional Data
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
Gene Set Enrichment Analysis (GSEA)
DNA microarray technology allows an individual to rapidly and quantitatively measure the expression levels of thousands of genes in a biological sample.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
CDNA Microarrays MB206.
Data Type 1: Microarrays
BIF Group Project Group (A)rabidopsis: David Nieuwenhuijse Matthew Price Qianqian Zhang Thijs Slijkhuis Species: C. Elegans Project: Advanced.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Networks and Interactions Boo Virk v1.0.
RNAseq analyses -- methods
Agenda Introduction to microarrays
Finish up array applications Move on to proteomics Protein microarrays.
Systems Biology through Pathway Statistics Chris Evelo BiGCaT Bioinformatics Group – BMT-TU/e & UM Diepenbeek; May
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Day 5-2 What bioinformatics.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
GeWorkbench John Watkinson Columbia University. geWorkbench The bioinformatics platform of the National Center for the Multi-scale Analysis of Genomic.
1 ArrayTrack Demonstration National Center for Toxicological Research U.S. Food and Drug Administration 3900 NCTR Road, Jefferson, AR
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
High-throughput omic datasets and clustering
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Disease Diagnosis by DNAC MEC seminar 25 May 04. DNA chip Blood Biopsy Sample rRNA/mRNA/ tRNA RNA RNA with cDNA Hybridization Mixture of cell-lines Reference.
Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.
Data Integration & Data Mining Tool Donald Dunbar BHF CoRE Bioinformatics Team Edinburgh Bioinformatics Meeting April 2013.
Microarray Data Analysis The Bioinformatics side of the bench.
No reference available
Genomic Signal Processing Dr. C.Q. Chang Dept. of EEE.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS) LECTURE 13 ANALYSIS OF THE TRANSCRIPTOME.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.
Microarray: An Introduction
Canadian Bioinformatics Workshops
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Pathway Informatics 30 th March, 2016 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System University.
Microarray Data Analysis Roy Williams PhD; Burnham Institute for Medical Research.
Practice:submit the ChIP_Streamline.pbs 1.Replace with your 2.Make sure the.fastq files are in your GMS6014 directory.
Transcriptomics History and practice.
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
Day 2: Session 8: Questions and follow-up…. James C. Fleet, PhD
Gene expression analysis
Transcriptomics History and practice.
Gene Expression Analysis
The Omics Dashboard.
Data Type 1: Microarrays
Presentation transcript:

Roy Williams PhD Sanford | Burnham Medical Research Institute

The Omics Revolution

Omics Biology Methods Expression data mRNA/ MicroRNA (microarrays, RNA- seq): A tool for studying how large numbers of genes interact with each other and how a cell's regulatory networks control vast batteries of genes simultaneously Proteomics: the branch of genetics that studies the full set of proteins encoded by a genome chIP-chip, chIP-seq: ChIP-on-chip is used to investigate interactions between proteins and DNA in vivo Epigenetics: epigenetics is the study of changes in phenotype (appearance) or gene expression caused by mechanisms other than changes in the underlying DNA sequence; e.g. bp methylation. Metabolomics: systematic study of the unique chemical fingerprints (metabolite profiles) that specific cellular processes leave behind

Omics Workflows Converge Acquire Sequence Generate Bases, FASTQ Count Reads Align, Annotate Interpret Data Statistical test, Visualization, Pathway Analysis Scan Array Determine Intensity MS Spectral counts, Align Acquire Peptide Sequence Hybridize Array

Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein would be more direct, but is currently harder. Measuring Gene Expression

Assumption of microarray/RNA-seq technology Use mRNA transcript abundance level as a measure of expression for the corresponding gene Proportional to degree of gene expression

How to measure RNA abundance RNA-seq: short reads from SOLiD, ion torrent, 454, Solexa. Align, count, normalize reads to length of gene (RPKM). Illumina bead array – highly redundant oligo array (also Affymetrix, NimbleGen) Spotted 2-colour array (very long cDNA; low redundancy) SAGE (random Sanger sequencing of cDNA library)

RNA-seq for digital expression

The Illumina Beadarray Technology Highly redundant ~50 copies of a bead 60mer oligos Absolute expression Each array is deconvoluted using a color coding tag system Human, Mouse, Rat, Custom

Affymetrix Technology Highly redundant (~25 short oligos per gene) Absolute expression PM-MM oligo system valuable for cross hybe detection Human, Mouse, E. coli, Yeast…….. Affy and Illumina arrays have been systematically compared

Spotted Arrays Low redundancy cDNA and oligo Two dyes Cy5/Cy3 Relative expression Cost and custom

Microarrays in action off on

The Application of Expression Studies Differential gene expression between two (or more) sample types Similar gene expression across treatments Tumour sub-class identification using gene expression profiles Identification of “marker” genes that characterize different cell types Identification of genes associated with clinical outcomes (e.g. survival)

Experimental Design Design Experiment Biological Replicates 2x 3 chips <2x 5 chips Perform Experiment Standardize conditions Batch effects Dump outliers

Transcriptome Data Analysis Workflow Quality Control Normalize Data Set up experimental data Filter for differential expression Advanced analysis techniques- clustering Compare results to biology; NextBio, GeneGo; IPA

Analysis Software for Microarray and RNA-seq data Free Software – GenePattern / IGV -- powerful, many plug-in packages and pipelines -- good video examples/tutorials GeneSpring GX11 (commercial, 1 public copy) Partek (commercial) GE-Workbench (free, nice workflows) R-Bioconductor (free, with guidance) Cytoscape, GSEA – for pathway visualisation IPA, NextBio, GeneGo <= Burnham subscriptions!

Log Transformed Data 2/2 = 1log2(1) = 0 4/1=4log2(4) = +2 ¼=0.25log2(0.25) = -2 Transformation often performed before normalisation Skewed raw Normal distribution

After QC for low confidence genes (P<0.99) Note: ~50 replicate beads per array Median Outliers 25% quartile 75% quartile BAD CHIP BOXPLOT REPRESENTATION OF DATA SPREAD CHIP NUMBER SIGNAL INTENSITY

The effect of quantiles Normalisation on the filtered 36 data sets IMPORTANT: use non-linear normalisation >library(affy) >Qdata <- normalize.quantiles(Rawdata) All same range

Data Analysis Examples 1# Illumina arrays with GeneSpring GX11 2# Affymetrix data, with a GenePattern module Import, Quality Control, normalize Detect differentially expressed genes Pathway analysis

Illumina Analysis Workflow Check array hybridisation quality Direct Export file as “sample probe profile” Import into GENESPRING GX11 Genome Studio Application: process binary.idat files to txt Normalisation here is optional

GeneSpring GX11 features Guided workflows Pathways GSEA IPA integration Ontologies MySQL R script API

Illumina Advanced Workflow

Grouping Sample Replicates

QC: Check Replicates Are Similar

Scatterplot of replicates

Scatterplot of differently treated samples

Filter genes on P-value

Filter for significantly different genes in a Volcano plot

Significant Pathway Determination

Which types of genes are enriched in a cluster? Idea: Compare your cluster of genes with lists of genes with common properties (function, expression, location). Find how many genes overlap between your cluster and a gene list. Calculate the probability of obtaining the overlap by chance. This measures if the enrichment is significant. This analysis provides an unbiased way of detecting connections between expression and function GeneOntology Cell cycle Our Cell cycle 15000

Automatically Send list to IPA for pathway Analysis

Significant Pathways sent to Ingenuity Pathway Analysis

Completed Analysis genelists Data Pathways

Affymetrix Workflow: GenePattern

Comparative Marker Selection

Paste the URLs for Data files

Send results to next module Viewer module

Outputs ranked list of genes List of Marker genes can be Filtered and exported

Cluster FAQ: what do the hierarchy of black lines making up the tree mean, How do I interpret them? Answer: The vertical black lines joining related genes together represent the correlation distance between those genes. Hierarchical clustering programs first join the two genes with the highest correlation. (converting geneID to gene names)

FAQ: How to find transcriptional regulatory networks from expression data? Reverse engineering using gene expression data If a gene is upregulated following an increased production of a transcription factor, or down-regulated following a knockout of a transcription factor, a regulatory interaction between the two is inferred Inferring networks by predicting cis-regulatory elements Known TFBS used to make inferences about regulatory interactions. The set of genes which are predicted to have a binding site are hypothesized to be regulated by the corresponding transcription factor.

The ume6 regulon in yeast

FAQ: How to find transcriptional regulatory networks from expression data ? ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) (Basso 2005, Margolin 2006a, 2006b). Known transcription factors are given as hub genes to seed the system ARACNe output in Cytoscape

Networks: Take Whole Cell Approach

Similarities All commercial off site tools – microarray/NGS data Upload your gene lists to analysis tool Tool detects networks/ontologies in your data They can give different results (!) Allows you to look for connections between genes and drugs/small molecules/diseases Focused on Man and Mouse FAQ: GeneGo - IPA – NextBio; similarities and differences? How do I get an account?

Differences IPA: most user friendly interface NextBio Based on experimental analysis and comparisons Reanalysis of public data and made platform independent GeneGo: More refined classification of diseases/networks 3 click analysis gives a nice report 50 PhD curators, more journals, data QC higher FAQ: GeneGo - IPA – NextBio; similarities and differences?

Start a new core analysis

IPA determines functions

Overlay drug and disease data

NextBio Compares your Genelists to the NextBio database Can reveal unexpected similarities between datasets Has a very good literature database connected to the results Contains data from model organisms

The NextBio Report Page

What else does my gene do?

Resources Many thanks for coming! Website: We are located in building 10, Offices 2405/6 Feel free to come and ask questions