1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:

Slides:



Advertisements
Similar presentations
Application of available statistical tools Development of specific, more appropriate statistical tools for use with microarrays Functional annotation of.
Advertisements

From the homework: Distribution of DNA fragments generated by Micrococcal nuclease digestion mean(nucs) = bp median(nucs) = 110 bp sd(nucs+ = 17.3.
Microarray Normalization
Transcriptome Sequencing with Reference
Peter Tsai Bioinformatics Institute, University of Auckland
Gene regulation in cancer 11/14/07. Overview The hallmark of cancer is uncontrolled cell proliferation. Oncogenes code for proteins that help to regulate.
RNA-seq: the future of transcriptomics ……. ?
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Transcriptomics Jim Noonan GENE 760.
Getting the numbers comparable
Introduction to DNA Microarrays Todd Lowe BME 88a March 11, 2003.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
Data analytical issues with high-density oligonucleotide arrays A model for gene expression analysis and data quality assessment.
Alternative Splicing As an introduction to microarrays.
Introduce to Microarray
Office hours Wednesday 3-4pm 304A Stanley Hall Review session 5pm Thursday, Dec. 11 GPB100.
Affymetrix GeneChip Data Analysis Chip concepts and array design Improving intensity estimation from probe pairs level Clustering Motif discovering and.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Statistics Or Do our Data mean Diddly?. Why are stat important Sometimes two data sets look different, but aren’t Other times, two data sets don’t look.
Microarray Preprocessing
Lecture 10. Microarray and RNA-seq
Microarray Data Analysis Illumina Gene Expression Data Analysis Yun Lian.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Affymetrix GeneChips Oligonucleotide.
CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
The Genome is Organized in Chromatin. Nucleosome Breathing, Opening, and Gaping.
CDNA Microarrays MB206.
Data Type 1: Microarrays
Amandine Bemmo 1,2, David Benovoy 2, Jacek Majewski 2 1 Universite de Montreal, 2 McGill university and Genome Quebec innovation centre Analyses of Affymetrix.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
RNAseq analyses -- methods
Agenda Introduction to microarrays
Verna Vu & Timothy Abreo
ARK-Genomics: Centre for Comparative and Functional Genomics in Farm Animals Richard Talbot Roslin Institute and R(D)SVS University of Edinburgh Microarrays.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits)
Lecture Topic 5 Pre-processing AFFY data. Probe Level Analysis The Purpose –Calculate an expression value for each probe set (gene) from the PM.
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
Introduction to Statistical Analysis of Gene Expression Data Feng Hong Beespace meeting April 20, 2005.
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Day 5-2 What bioinformatics.
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Introduction to RNAseq
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Alternative Splicing (a review by Liliana Florea, 2005) CS 498 SS Saurabh Sinha 11/30/06.
A Report on CAMDA’01 Biointelligence Lab School of Computer Science and Engineering Seoul National University Kyu-Baek Hwang and Jeong-Ho Chang.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.
Microarray Data Analysis The Bioinformatics side of the bench.
Oigonucleotide (Affyx) Array Basics Joseph Nevins Holly Dressman Mike West Duke University.
Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?
Gene expression  Introduction to gene expression arrays Microarray Data pre-processing  Introduction to RNA-seq Deep sequencing applications RNA-seq.
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
Microarray: An Introduction
STAT115 STAT215 BIO512 BIST298 Introduction to Computational Biology and Bioinformatics Spring 2016 Xiaole Shirley Liu.
Arrays How do they work ? What are they ?. WT Dwarf Transgenic Other species Arrays are inverted Northerns: Extract target RNA YFG Label probe + hybridise.
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
RNA-Seq analysis in R (Bioconductor)
Functional Genomics in Evolutionary Research
Microarrays 1/31/2018.
EXTENDING GENE ANNOTATION WITH GENE EXPRESSION
Getting the numbers comparable
Presentation transcript:

1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8: Calling differentially expressed genes with baySeq - *read the paper! baySeq lab for RNA-seq data Wednesday 10/10: Clustering analysis Monday 10/15: Clustering analysis Clustering lab Wednesday 10/17: Motif analysis Monday 10/12: Motif analysis Motif lab Wednesday 10/14: ChIP/RIP/Nuc/Ect-Seq

2 Global expression analysis Goal: To measure transcript abundance of every gene in your organism at once … AND make sense out of it The power is in organizing genomic expression data to find meaningful patterns & groups of genes

Gasch et al. 2000, 2001

4 What kinds of information can we extract from genomic expression data? 1.Hypothetical functions for uncharacterized genes -- genes encoding subunits of multi-subunit protein complexes are often highly coregulated example: ribosomal protein genes, proteasome genes in yeast -- genes involved in the same cellular processes are often coregulated 2.New roles for characterized genes 5.Understanding developmental pathways 4. Implications of gene regulation -- WT vs. mutants can identify transcription factor targets -- promoter analysis of coregulated genes = upstream elements -- gene coregulation with known pathway targets can implicate pathway activity 3. Better understanding of the experimental conditions -- based on expression patterns of characterized genes 6. Defining samples based on expression profiles example: comparing tumor samples from patients

5 Technologies for Quantifying & Identifying Nucleic Acids DNA microarraysDeep sequencing 1.Collect RNA 2.Generate fluorescently-labeled cDNA 3.Hybridize to array 4.Detect fluorescence emission with scanning laser Data: Continuous measurements of relative fluorescence 1.Collect RNA 2.Make strand-specific cDNA library 3.Deep sequence short reads 4.Relate sequences back to genome / transcriptome location (or de novo assembly) Data: Number of sequencing reads per each base in the genome = Discrete ‘Counts’

6 ORF mRNA Array Probes Tiled-genome arrays cover the entire genome

7 Tiled sequences across each gene / locus To get relative differences in expression across two samples: 1. Need to normalize array signals across arrays 2. Need to compress measurements to a single score for each gene/transcript Tiled genomic arrays (Nimblegen, Affymetrix, Agilent)

8 PM MM ‘Robust Multiarray Analysis’ (RMA Irizarry et al. 2003) 1. On Affy: Throw out elements where MM signal > PM signal … but otherwise ignore MM 2. Local background subtraction from each probe intensity 3. Quantile normalization of arrays to be compared … sets the distribution of probe intensities to be the same 4. Convert intensity values to log 2 scale 5. Use a linear model to fit a given probe set and compute one expression value per gene PM = ‘perfect match’ oligo MM = ‘mismatch’ oligo (central nucleotide is mutated) Tiled genomic arrays (Nimblegen, Affymetrix, Agilent)

9 Deep sequencing for gene expression analysis mRNA Old protocol: make ds cDNA New protocols: 1 st strand cDNA (2 nd strand with dUTP) Sequence Number of sequencing reads per region ~= number of starting transcripts

10 Number of sequencing reads per region ~= number of starting transcripts * But sometimes one lane of sequencing works better than others: Simple normalization: Avg counts within gene length / Total Counts in That Lane RPKM: Reads Per Kb per Million mapped reads BUT … have to account for the length of the gene/transcript: Counts per base pair Total reads in lane 40 x x 10 6

11 Another challenge: mapping reads to the genome/transcriptome intron Spliced transcript DNA Should you restrict yourself to ORF annotations? Can map reads to genome or transcriptome sequence, or assemble de novo.

12 Comparing samples via fold-changes: RPKM across samples reflects Differential Expression Usually work in log 2 space

13 Now each sample = list of normalized relative transcript values Array 1Array 2

14 Assessing replicates: how well do the data agree overall? linear regression Where does the noise come from? -- can be biological variation -- can be array artifacts … should define both types of variation …

15 Now you have your data, in the form of relative log2 expression differences Now what?

16 Select differentially expressed genes to focus on Methods of gene selection: -- arbitrary fold-expression-change cutoff example: genes that change >3X in expression between samples -- statistically significant change in expression requires replicates Expression difference Gene X expression under condition 1 Gene X expression under condition 2

17 Expression difference Gene X expression under condition 1 Gene X expression under condition 2 Select differentially expressed genes to focus on Methods of gene selection: -- arbitrary fold-expression-change cutoff example: genes that change >3X in expression between samples -- statistically significant change in expression requires replicates

18 Expression difference Use statistics to compare the mean & variation of 2 (or more) populations Select differentially expressed genes to focus on Methods of gene selection: -- arbitrary fold-expression-change cutoff example: genes that change >3X in expression between samples -- statistically significant change in expression requires replicates

19 Test if the means of 2 (or more) groups are the same or statistically different The ‘null hypothesis’ H 0 says that the two groups are statistically the same -- you will either accept or reject the null hypothesis Choosing the right test: parametric test if your data are normally distributed with equal variance nonparametric test if neither of the above are true Why do the data need to be normally distributed?

20 Test if the means of 2 groups are the same or statistically different The ‘null hypothesis’ H 0 says that the two groups are statistically the same -- you will either accept or reject the null hypothesis T = X 1 – X 2 difference in the means standard error of the difference in the means SED If your two samples are normally distributed with equal variance, use the t-test If T > T c where T c is the critical value for the degrees of freedom & confidence level, then reject H 0 Notice that if the data aren’t normally distributed mean and standard deviation are not meaningful.

21 Differential expression on DNA microarrays: Bioconductor package Limma (ref) ** See previous years’ limma lab for a walk-through example 1.Load your data 2.Provide a ‘target’ file that says which samples are on which arrays 3.Provide a ‘design’ file (and in some cases a ‘contrast matrix’) to specify which samples you want to compare 4.Limma will look at the entire dataset and model the error on the data, to try to over-come measurement error 5.Limma then does a modified T-test to identify genes with significant expression differences across the samples you specified.