Transcriptomics History and practice.

Slides:

Advertisements

Similar presentations

RNA-seq library prep introduction

Advertisements

Application of available statistical tools Development of specific, more appropriate statistical tools for use with microarrays Functional annotation of.

Bioconductor in R with a expectation free dataset Transcriptomics - practical 2012.

Peter Tsai Bioinformatics Institute, University of Auckland

1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.

Microarray technology and analysis of gene expression data Hillevi Lindroos.

RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.

Gene Expression Chapter 9.

DNA microarray and array data analysis

Additional Powerful Molecular Techniques Synthesis of cDNA (complimentary DNA) Polymerase Chain Reaction (PCR) Microarray analysis Link to Gene Therapy.

DNA Sequencing and Gene Analysis

Central Dogma 2 Transcription mRNA Information stored In Gene (DNA) Translation Protein Transcription Reverse Transcription SELF-REPAIRING ARABIDOPSIS,

RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.

5 µm Millions of copies of a specific oligonucleotide probe >5 760,000 different complementary probes ~ targets Single stranded, labeled ‘target’

Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.

Gene Regulation: What it is, and how to detect it By Jordan, Jennifer, and Brian.

and analysis of gene transcription

By Moayed al Suleiman Suleiman al borican Ahmad al Ahmadi

with an emphasis on DNA microarrays

CDNA Microarrays Neil Lawrence. Schedule Today: Introduction and Background 18 th AprilIntroduction and Background 25 th AprilcDNA Mircoarrays 2 nd MayNo.

-The methods section of the course covers chapters 21 and 22, not chapters 20 and 21 -Paper discussion on Tuesday - assignment due at the start of class.

Expression Analysis of RNA-seq Data

Chapter 14 Genomes and Genomics. Sequencing DNA dideoxy (Sanger) method ddGTP ddATP ddTTP ddCTP 5’TAATGTACG TAATGTAC TAATGTA TAATGT TAATG TAAT TAA TA.

Transcriptome analysis With a reference – Challenging due to size and complexity of datasets – Many tools available, driven by biomedical research – GATK.

Data Type 1: Microarrays

Arrays against time Transcriptomics ‘101’ Wuhan 2011 CCC.

Bioconductor in R with a expectation free dataset Transcriptomics - practical 2014.

Intro to Microarray Analysis Courtesy of Professor Dan Nettleton Iowa State University (with some edits)

Genomics I: The Transcriptome

RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.

Introduction to RNAseq

Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.

Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.

Microarray analysis Quantitation of Gene Expression Expression Data to Networks BIO520 BioinformaticsJim Lund Reading: Ch 16.

ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.

Lecture 23 – Functional Genomics I Based on chapter 8 Functional and Comparative Genomics Copyright © 2010 Pearson Education Inc.

Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.

No reference available

Transcriptome What is it - genome wide transcript abundance How do you obtain it - Arrays + MPSS What do you do with it when you have it - ?

Topic Cloning and analyzing oxalate degrading enzymes to see if they dissolve kidney stones with Dr. VanWert.

Microarray: An Introduction

Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.

Canadian Bioinformatics Workshops

Arrays How do they work ? What are they ?. WT Dwarf Transgenic Other species Arrays are inverted Northerns: Extract target RNA YFG Label probe + hybridise.

RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.

Transcriptomics History and practice.

RNA-seq Manpreet S. Katari.

Part 3 Gene Technology & Medicine

Next generation sequencing

Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.

Using Web-Based Tools for Microarray Analysis

Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017

RNA-Seq analysis in R (Bioconductor)

The Basics of cDNA Microarray Technology

Expression of the Genome

Functional Genomics in Evolutionary Research

Microarray Technology and Applications

DNA Tools & Biotechnology

Example of a DNA Array (note green, yellow red colors; also note that only part of the total array is depicted)

Design and Analysis of Single-Cell Sequencing Experiments

Chapter 20 – DNA Technology and Genomics

DNA Tools & Biotechnology

Today… Review a few items from last class

Getting the numbers comparable

Volume 18, Issue 2, Pages (April 2005)

Gene Expression Analysis

Additional file 2: RNA-Seq data analysis pipeline

Sequence Analysis - RNA-Seq 2

Data Type 1: Microarrays

Presentation transcript:

Transcriptomics History and practice

Early RNA analysis used Northerns: …..One gene at a time YFG Label probe + hybridise Tissue sample Transgenic Other species Dwarf WT Next gene Quantify RNA levels Extract target RNA

Northerns are too slow for Systems Biology where we want to assay ALL transcripts simultaneously Massive Datasets for thousands of genes Genes, protein and metabolites link together into biological SYSTEMS

Arabidopsis Merged Network 19392 nodes and 72715 edges Proteins (red) Metabolites (blue) & Genes (green) 19392 nodes and 72715 edges EXAMPLE: Cytoscape software Allows the visualisation of all transcript levels for an organism This one is based on ARRAY data Arabidopsis transcriptome network (Ma et al. Genome Research 2007)

Post 2000: Microarrays & RNAseq…. Mass transcript profiling: Transcriptomics Historically (pre-2000): Sequencing ESTs and ranking representation Differential display (random 5’ primers + fixed polyA primers) Post 2000: Microarrays & RNAseq….

‘All the genes you want’ Microarrays Probe preparation Target preparation Acquire or Generate probes ‘All the genes you want’ Extract RNA from your Control AND your Experimental plant Label cDNA from sample 1 RNA …and sample 2 RNA Spot

Microarrays Hybridise & Scan Identify ‘spots’ remove background produce ‘red/green’ ratios Hybridise & Scan Link ratio to relative abundance. Link spot to gene. Link genes to each other. Networks / systems

Before processing, we have a LOT of spots ‘Landing lights’ xyz normalisation After processing, we have a LOT of objective data

What biological questions can be explored with transcriptomics ? Learning outcome: What biological questions can be explored with transcriptomics ?

Arrays can separate similar genes Pretend specialist microarray. Only 5 genes ALL responding to a hormone: 1 2 3 4 5 Plus hormone vs control (i.e. known / expected challenge) All ‘on’ 1 2 3 4 5 The classic types of array experiments: 1. Normal vs challenge (e.g. pathology, induction) 1 2 3 4 5 2. Tissue A vs Tissue B (e.g. muscle vs liver) 1 2 3 4 5

Remember: Genomes are not tidy – duplication is common Plant (arabidopsis) Fungal (yeast) Animal (human) This is a big problem for arrays : Cross - hybridisation

Apart from gross syntenic duplication Gene families (recycling of function) is common: e.g. in arabidopsis: Gene family size Unique 2 3 4 5 >5 35% 12.5% 7% 4.4% 3.6% 37.4% Proportion of the genome Conservation at the base-pair level within genes: 37% of genes highly conserved (TBLASTX E<10-30) 10% partially conserved (TBLASTX E<10-5)

Pioneer arrays were cDNAs Derived from mRNA amplified by reverse transcriptase and cloned. Selected based on partial sequence primed from vector cloning sites (e.g. SP6, T7, T3) Commonly called ESTs (Expressed Sequence Tags)

Homologous EST sequence Dissimilar EST sequence ESTs can be misleading Gene of interest Example EST sequence 1 Homologous EST sequence 2 Dissimilar EST sequence 3 On the slide 1 2 3 Labelled target cross hybridises

Multiple Short Probes 25-mers Genechips have better specificity Known Gene Sequences 5’ 3’ Algorithmic selection Multiple Short Probes 25-mers Hybridisation

Biotin-labeled transcripts Example single colour target labelling - 3’ IVT Fragment (heat, Mg2+) Fragmented cRNA B B Biotin-labeled transcripts IVT or WT (Biotin-UTP Biotin-CTP) AAAA RNA Target Preparation RNA isolation is the first step. 1-2 hours The messenger RNA is then reverse transcribed into cDNA (we then go on to make the second strand of cDNA). 4 hours An in vitro transcription reaction using biotinylated nucleotides is then done to both amplify and label the transcripts. 4-6 hours These are then fragmented in order to get a more efficient hybridization (30-100 bases pairs is the goal). 1.0 hours The fragmented target is then hybridized overnight to a GeneChip expression array. 16 hours When washing and staining of the array is complete it can then be scanned. 1-2 hours Wash & Stain cDNA Scan Hybridise (16 hours)

Detection: Hybridisation and staining Array Biotin labelled cRNA Target Hybridisation Antibody detection

Each probe call is derived from the 75% quantile of the pixel values (sweet spot). All the probes of a probeset (gene) are combined into ONE measure of expression

Data handling: Chips need to be normalised against each other. Each different colour line maps all the intensities of a single chip They are NOT co-incident lines (e.g. yellow and black are outliers) To compare they need to be comparable

Average the intensities at each rank PA PB PC PD PE Chip 1 Chip 2 Chip 3 1 2 4 3 5 7 2 5 3 1 5 3 4 2 9 Normalisation Chip 1 Chip 2 Chip 3 1 2 3 4 5 1 2 3 5 7 2 3 4 5 9 Order by ranks RMA is a very powerful but simple process that works at the probe level Average the intensities at each rank Chip 1 Chip 2 Chip 3 1.33 2.33 3.33 4.66 7 PA PB PC PD PE Chip 1 Chip 2 Chip 3 1.33 2.33 4.66 3.33 7 7 2.33 4.66 3.33 1.33 4.66 2.33 3.33 1.33 7 Reorder by probe

RMA Normalisation makes data more comparable So we can derive / display differentially expressed genes ..as candidates for further research... volcano plot trend graph

RNAseq 3 ‘simple’ steps: A complementary solution: Take an RNA sample, 1. sequence it, align it to the genome. 2. count how many times each transcript appears. 3. work out the frequency of each transcript.

RNAseq software – Tuxedo suite (2012) Bowtie (2009)*: Ultrafast short read alignment Aligns short DNA reads at 25 million x 35-bp p/h TopHat: Alignment of short RNA-Seq reads Aligns RNA-Seq reads to genomes using Bowtie. Identifies splice junctions between exons. Named after the Burrows-Wheeler transform algorithm (BWT) Cufflinks (includes cuffmerge, cuffcompare, cuffdiff) Uses TopHat to assemble the ‘best’ transcriptome. Estimates relative abundance based on how many reads support each transcript. CummeRbund: Visualization of RNA-Seq analysis R package for Cufflinks RNA-Seq output.

Fragments Per Kilobase per Million reads FDR-adjusted p-values (q-values) - replication

Mutant Control

Once we have candidates - we can discover their function... GO annotations of genes higher in muscle GO annotations of genes higher in liver Graham et al. (2011) Animal

....and use these to allocate those differential genes to pathways and biological systems