Bulk RNA-Seq Analysis Using CLCGenomics Workbench December 11, 2018 Ansuman Chattopadhyay, PhD Asst Director, Molecular Biology information service Health sciences library system University of pittsburgh ansuman@pitt.edu
Topics Brief introduction to RNA-Seq experiments Analyze RNA-seq data Dexamethasone treatment on airway smooth muscle cells (Himes et al. PLos One 2014) Download seq reads from EBI-ENA/NCBI SRA Import reads to CLC Genomics Workbench Align reads to Reference Genome Estimate expressions in the gene level Estimate expressions in the transcript isoform level Statistical analysis of the differential expressed genes and transcripts Create Heat Map, Volcano Plots, and Venn Diagram
Differential Gene Expressions Raw Reads Venn Diagram Volcano Plot
Workshop Page https://hsls.libguides.com/rnaseq
Software @ HSLS MolBio http://hsls.libguides.com/molbio/licensedtools/resources
NGS Software @ HSLS MolBio NGS Analysis Sanger Seq Analysis Human , Mouse and Rat NGS Analysis
RNA-Seq Software @ HSLS MolBio Enrichment Analysis Deferentially Expressed Genes CLC Genomics Work Bench Ingenuity Pathway Analysis Functions Diseases Pathways RNA-Seq Reads Key Pathway Advisor Upstream Regulators Any Organism Volcano Plot PCA Plot Venn Diagram Heat Map Illumina BaseSpace Correlation Engine Correlated Expression Studies CLC BioMedical Work Bench Variant Detection Ingenuity Variant Analysis Human, Mouse and Rat Variant Annotation and Prioritization RNA-Seq Analysis Down Stream Analysis
CLCGx 12 Genomics Workbench BioMedical Workbench
Install Plugins
CLCbio Genomics Workbench System Requirements Windows Vista, Windows 7, Windows 8, Windows 10, Windows Server 2008, or Windows Server 2012 Mac OS X 10.7 or later. Linux: Red Hat 5.0 or later. SUSE 10.2 or later. Fedora 6 or later. 8 GB RAM required 16 GB RAM recommended 1024 x 768 display required 1600 x 1200 display recommended Intel or AMD CPU required Minimum 10 GB free disc space in the tmp directory
CLC Genomics Workbench @pitt Mike Barmada, PhD 1969 - 2016
CLCBio Genomics Workbench Server - You can connect your CLC Genomics Workbench software to the 8000-core HTC cluster available to University of Pittsburgh researchers through the Center for Research Computing (CRC). https://crc.pitt.edu/ - This allows you to transparently migrate data from your workstation to the cluster, and run analyses on the cluster, which then run independently of your workstation (i.e. you can shutdown your machine and your analyses will continue unabated).
Center for Research computing (CRC) https://crc.pitt.edu/
Request access to CRC
CLC Genomics workbench Ensure you have the most up-to-date version of the CLCbio Genomics Workbench (the software should tell you if there's a more recent version when you start it, or you can check on the CLCbio website) If you have not already done so, request a user account/allocation on the Center for Research Computing (CRC) for HTC cluster by filling out the required information https://crc.pitt.edu/ If your computer is not connected to the Pitt network (e.g. you are working from home or on a trip), or you are working from a laptop that is connected to the Pitt wireless system, make sure you setup Pitt VPN, so that you can communicate with the CLC Bioserver on HTC cluster. Start the CLC Genomics Workbench
Connect to CLC Server
Access to CRC-HTC Cluster – CLC Server If you DO NOT HAVE CRC-HTC account: Use the following for a limited access UserID: hslsmolb PW: library1# Server host: clcbio.crc.pitt.edu Server host: 7777 If you have CRC-HTC account Use – pitt user name; pitt password Server host: clcbio.crc.pitt.edu Server host: 7777
Pre-analyzed Results
Bulk RNA-seq Study http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0099625
http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52778
NCBI SRA
NCBI SRA
NCBI SRA Untreated Vs DEX
Bulk RNA-seq Basic Steps convert to cDNA fragments adaptors ligation short seq reads align reads to reference genome Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009 Jan;10(1):57–63.
Create Folder in CRC-HTC Cluster 1 2
Create Workshop Folder@ HTC-CLC Server 1 2 3
Illumina 1,131,359 4,330,403 NGS Technologies AB SoLid 18,495 25,170 NCBI Seq Read Archive Illumina 1,131,359 4,330,403 AB SoLid 18,495 25,170 Ion Torrent 10,484 63,855 PacBio 11,473 39,097 MinIon 286 2033 Tutorial: Galaxy NGS101 – Overview of NGS Technologies; https://wiki.galaxyproject.org/Learn/GalaxyNGS101#Overview_of_NGS_technologies
Nature Reviews on NGS Technologies http://www.nature.com/nrg/journal/v17/n6/full/nrg.2016.49.html
Illumina Technology https://vimeo.com/121178846 https://wiki.galaxyproject.org/Learn/GalaxyNGS101
STEP 1: Import Reads to CLC 2
STEP 1: Import Reads to CLC 3 4 5
Help : Import Illumina Reads
Contact CLCBio Support Team
FASTQ format http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2847217/
Results By CLC : Imported Illumina Reads CLC_Server_Data -- achattopadhyay ----AnsumanC ---- workshop_RNA_Seq_May2016 ----- Reads
Results By CLC: Imported Illumina Reads
CLC SRA Download
EBI ENA http://www.ebi.ac.uk/ena/data/search?query=SRP033351
EBI-ENA
STEP 1: Import Reads to CLC; Download from NCBI SRA 2
FASTQC Project http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Phred Score wikipedia
Step 2: Create Seq QC Report 1 2
Results By CLC: Read QC Report
RNA Seq Questionnaire What is the scientific objective of the RNA Seq experiment? How many classes will be compared? Are only coding RNA (mRNA) or long non coding RNA, miRNA expected to be detected? Did all the samples pass RNA quality checks before sequencing? Are there biological replicates? If so how many? What type of sequencing platform was used to sequence the reads? Illumina, Ion torrent, Solid Where was the sequencing performed? Facility name and contact info When was the sequencing performed? Year/date Which RNA – extraction method was used in the experiment? Total RNA/ poly A/ rRNA depletion method and kit name and if possible, link to protocol Whether the protocol is strand specific or not? Unstranded/ forward/reverse, kit name and if possible link to protocol Whether the data is single end or paired end? What is the expected read length? Do the reads contain adapters? If adapters present, what type of adapters? Adapter sequence, if available, or link (usually can get this info from facility) What are the experimental conditions to perform differential expression analysis? Which organism and the reference genome to be used for analysis?
Read Seq Trimming
STEP 3: Create Metadata Table
Step4: Import Metadata
Step4: Import Metadata 2 1 3
Step4: Import Metadata
STEP 5: Read Mapping
Read Mapping Wikipedia
Read Mapping Ozsolak et al. Nature Review Genetics
RNA-Seq vs. Microarrays covers more dynamic range allows to discover novel transcripts able to detect SNPs more costly ($300-$1000/sample) than Microarray ($100-$200/sample) Generates 30-40 times larger dataset than Microarray uncompressed RNA-Seq raw files: >5GB Microarray RNA-Seq Riki Kawaguchi’s Blog: https://bioinfomagician.wordpress.com/about/ Zhao S, Fung-Leung WP, Bittner A, Ngo K, Liu X. Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells. PLoS ONE. 2014 Jan 16;9(1):e78644.
Must Read http://rnaseq.uoregon.edu/ Cresko Lab, University of Oregon
Best Practices
RNA-seq Analysis Pipeline
Popular Software
STEP 5: Read Mapping 5
STEP 5: Reads Mapping 7
STEP 5: Reads Mapping 8
Reference Genome http://www.gencodegenes.org/releases/current.html http://useast.ensembl.org/info/data/ftp/index.html?redirect=no http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/
STEP 5: Read Mapping
STEP 5: Read Mapping 9
STEP 5: Reads Mapping 10
STEP 5: Reads Mapping 12 11
Expression Values
STEP 5: Reads Mapping
Normalization Methods
STEP 5: Reads Mapping 12 Click on Role
STEP 5: Reads Mapping 13
Results By CLC: Reads Mapping
STEP 5: Reads Mapping; Fusion Tracks 14
STEP 5: Reads Mapping; Fusion Tracks
STEP 5: Reads Mapping; Gene expression Track
Step6: Create a PCA Plot
Step6: Create a PCA Plot
Step7: Differential Expressions
Step7: Differential Expressions
Step7: Differential Expressions; Dex vs Unt
GraphPad Statistics Guide : https://www.graphpad.com/guides/prism/7/statistics/index.htm
Step7: Differential Expressions; Dex vs Unt Volcano Plot
Step8: Create a HeatMap
Step8: Create a HeatMap
Step8: Create a HeatMap
Step8: Create a HeatMap
Step7: Create a Venn Diagram
Step7: Create a Venn Diagram
Step7: Create a Venn Diagram
Create a Track
Step8: Create a Track Track for CRISPLD2
Step8: Create a Track Track for CRISPLD2
Step8: Create a Track
Normalization Methods
Downstream Analysis DEG Annotates differentially expressed genes from an RNA-seq experiment, using the curated public data from GEO
NextBio Research
Export Data from CLC
Find Correlated Gene Expression Studies from GEO
Find Correlated Gene Expression Studies from GEO
Ingenuity IPA Analysis
RNA Seq Questionnaire What is the scientific objective of the RNA Seq experiment? How many classes will be compared? Are only coding RNA (mRNA) or long non coding RNA, miRNA expected to be detected? Did all the samples pass RNA quality checks before sequencing? Are there biological replicates? If so how many? What type of sequencing platform was used to sequence the reads? Illumina, Ion torrent, Solid Where was the sequencing performed? Facility name and contact info When was the sequencing performed? Year/date Which RNA – extraction method was used in the experiment? Total RNA/ poly A/ rRNA depletion method and kit name and if possible, link to protocol Whether the protocol is strand specific or not? Unstranded/ forward/reverse, kit name and if possible link to protocol Whether the data is single end or paired end? What is the expected read length? Do the reads contain adapters? If adapters present, what type of adapters? Adapter sequence, if available, or link (usually can get this info from facility) What are the experimental conditions to perform differential expression analysis? Which organism and the reference genome to be used for analysis?
Thanks To…. HSLS Carrie Iwema David Leung Michael Sweezer CLCBio Shawn Prince Center for Simulation and Modeling Kim F Wong Mu Fangping