Bulk RNA-Seq Analysis Using CLCGenomics Workbench

Slides:



Advertisements
Similar presentations
Introduction to CLC Main Workbench 20 June, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System.
Advertisements

The Past, Present, and Future of DNA Sequencing
Vanderbilt Center for Quantitative Sciences Summer Institute Sequencing Analysis Yan Guo.
IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy
Peter Tsai Bioinformatics Institute, University of Auckland
Transcriptomics Jim Noonan GENE 760.
MCB Lecture #21 Nov 20/14 Prokaryote RNAseq.
RNA-seq Analysis in Galaxy
Pathway Informatics 6 th July, 2015 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System University of.
mRNA-Seq: methods and applications
Before we start: Align sequence reads to the reference genome
NGS Analysis Using Galaxy
Gene expression services: ArrayExpress and the Gene Expression Atlas Contact: Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Copyright © 2011 Partek Incorporated. All rights reserved. Statistics Visualizations Annotations Start-to-Finish Analysis of Integrated Genomics.
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
RNAseq analyses -- methods
Next Generation DNA Sequencing
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Bioconductor in R with a expectation free dataset Transcriptomics - practical 2014.
RNA-Seq Assembly 转录组拼接 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.
Genomics Core Facility at UNH: High-Throughput Sequencing on the Illumina HiSeq 2500 Platform Project Consultation Sample Submission Library Creation Illumina.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Bioinformatics for biologists Dr. Habil Zare, PhD PI of Oncinfo Lab Assistant Professor, Department of Computer Science Texas State University Presented.
No reference available
__________________________________________________________________________________________________ Fall 2015GCBA 815 __________________________________________________________________________________________________.
User-friendly Galaxy interface and analysis workflows for deep sequencing data Oskari Timonen and Petri Pölönen.
Canadian Bioinformatics Workshops
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Pathway Informatics 30 th March, 2016 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System University.
Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.
Canadian Bioinformatics Workshops
Short Read Workshop Day 1 - Experimental Design Example 1: How to log in to vieques.
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop.
Building Excellence in Genomics and Computational Bioscience miRNA Workshop: miRNA biogenesis & discovery Simon Moxon
Introductory RNA-seq Transcriptome Profiling
Pathway Informatics 16th August, 2017
Next generation sequencing
An Introduction to RNA-Seq Data and Differential Expression Tools in R
Placental Bioinformatics
RNA-Seq for the Next Generation RNA-Seq Intro Slides
Regulation of Gene Expression
Cancer Genomics Core Lab
WS9: RNA-Seq Analysis with Galaxy (non-model organism )
Moderní metody analýzy genomu
Dr. Christoph W. Sensen und Dr. Jung Soh Trieste Course 2017
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
Lab meeting
S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.
Pathway Visualization
How to store and visualize RNA-seq data
Introductory RNA-Seq Transcriptome Profiling
ChIP-Seq Analysis – Using CLCGenomics Workbench
The FASTQ format and quality control
Next Generation Sequencing
Many Sample Size and Power Calculators Exist On-Line
Covering the Bases: Carrie Iwema, PhD, MLS
Transcriptome Assembly
2nd (Next) Generation Sequencing
Pathway Informatics December 5, 2018 Ansuman Chattopadhyay, PhD
RNA sequencing (RNA-Seq) and its application in ovarian cancer
ChIP-seq Robert J. Trumbly
BF nd (Next) Generation Sequencing
Pathway Visualization
Session 1: WELCOME AND INTRODUCTIONS
Transcriptomics Data Visualization Using Partek Flow Software
Sequence Analysis - RNA-Seq 1
Bulk RNA-Seq Analysis Using CLCGenomics Workbench
Campus and Phoenix Resources
Presentation transcript:

Bulk RNA-Seq Analysis Using CLCGenomics Workbench December 11, 2018 Ansuman Chattopadhyay, PhD Asst Director, Molecular Biology information service Health sciences library system University of pittsburgh ansuman@pitt.edu

Topics Brief introduction to RNA-Seq experiments Analyze RNA-seq data Dexamethasone treatment on airway smooth muscle cells (Himes et al. PLos One 2014) Download seq reads from EBI-ENA/NCBI SRA Import reads to CLC Genomics Workbench Align reads to Reference Genome Estimate expressions in the gene level Estimate expressions in the transcript isoform level Statistical analysis of the differential expressed genes and transcripts Create Heat Map, Volcano Plots, and Venn Diagram

Differential Gene Expressions Raw Reads Venn Diagram Volcano Plot

Workshop Page https://hsls.libguides.com/rnaseq

Software @ HSLS MolBio http://hsls.libguides.com/molbio/licensedtools/resources

NGS Software @ HSLS MolBio NGS Analysis Sanger Seq Analysis Human , Mouse and Rat NGS Analysis

RNA-Seq Software @ HSLS MolBio Enrichment Analysis Deferentially Expressed Genes CLC Genomics Work Bench Ingenuity Pathway Analysis Functions Diseases Pathways RNA-Seq Reads Key Pathway Advisor Upstream Regulators Any Organism Volcano Plot PCA Plot Venn Diagram Heat Map Illumina BaseSpace Correlation Engine Correlated Expression Studies CLC BioMedical Work Bench Variant Detection Ingenuity Variant Analysis Human, Mouse and Rat Variant Annotation and Prioritization RNA-Seq Analysis Down Stream Analysis

CLCGx 12 Genomics Workbench BioMedical Workbench

Install Plugins

CLCbio Genomics Workbench System Requirements Windows Vista, Windows 7, Windows 8, Windows 10, Windows Server 2008, or Windows Server 2012 Mac OS X 10.7 or later. Linux: Red Hat 5.0 or later. SUSE 10.2 or later. Fedora 6 or later. 8 GB RAM required 16 GB RAM recommended 1024 x 768 display required 1600 x 1200 display recommended Intel or AMD CPU required Minimum 10 GB free disc space in the tmp directory

CLC Genomics Workbench @pitt Mike Barmada, PhD 1969 - 2016

CLCBio Genomics Workbench Server - You can connect your CLC Genomics Workbench software to the 8000-core HTC cluster available to University of Pittsburgh researchers through the Center for Research Computing (CRC). https://crc.pitt.edu/ - This allows you to transparently migrate data from your workstation to the cluster, and run analyses on the cluster, which then run independently of your workstation (i.e. you can shutdown your machine and your analyses will continue unabated).

Center for Research computing (CRC) https://crc.pitt.edu/

Request access to CRC

CLC Genomics workbench Ensure you have the most up-to-date version of the CLCbio Genomics Workbench (the software should tell you if there's a more recent version when you start it, or you can check on the CLCbio website) If you have not already done so, request a user account/allocation on the Center for Research Computing (CRC) for HTC cluster by filling out the required information https://crc.pitt.edu/ If your computer is not connected to the Pitt network (e.g. you are working from home or on a trip), or you are working from a laptop that is connected to the Pitt wireless system, make sure you setup Pitt VPN, so that you can communicate with the CLC Bioserver on HTC cluster. Start the CLC Genomics Workbench

Connect to CLC Server

Access to CRC-HTC Cluster – CLC Server If you DO NOT HAVE CRC-HTC account: Use the following for a limited access UserID: hslsmolb PW: library1# Server host: clcbio.crc.pitt.edu Server host: 7777 If you have CRC-HTC account Use – pitt user name; pitt password Server host: clcbio.crc.pitt.edu Server host: 7777

Pre-analyzed Results

Bulk RNA-seq Study http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0099625

http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52778

NCBI SRA

NCBI SRA

NCBI SRA Untreated Vs DEX

Bulk RNA-seq Basic Steps convert to cDNA fragments adaptors ligation short seq reads align reads to reference genome Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009 Jan;10(1):57–63.

Create Folder in CRC-HTC Cluster 1 2

Create Workshop Folder@ HTC-CLC Server 1 2 3

Illumina 1,131,359 4,330,403 NGS Technologies AB SoLid 18,495 25,170 NCBI Seq Read Archive Illumina 1,131,359 4,330,403 AB SoLid 18,495 25,170 Ion Torrent 10,484 63,855 PacBio 11,473 39,097 MinIon 286 2033 Tutorial: Galaxy NGS101 – Overview of NGS Technologies; https://wiki.galaxyproject.org/Learn/GalaxyNGS101#Overview_of_NGS_technologies

Nature Reviews on NGS Technologies http://www.nature.com/nrg/journal/v17/n6/full/nrg.2016.49.html

Illumina Technology https://vimeo.com/121178846 https://wiki.galaxyproject.org/Learn/GalaxyNGS101

STEP 1: Import Reads to CLC 2

STEP 1: Import Reads to CLC 3 4 5

Help : Import Illumina Reads

Contact CLCBio Support Team

FASTQ format http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2847217/

Results By CLC : Imported Illumina Reads CLC_Server_Data -- achattopadhyay ----AnsumanC ---- workshop_RNA_Seq_May2016 ----- Reads

Results By CLC: Imported Illumina Reads

CLC SRA Download

EBI ENA http://www.ebi.ac.uk/ena/data/search?query=SRP033351

EBI-ENA

STEP 1: Import Reads to CLC; Download from NCBI SRA 2

FASTQC Project http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

Phred Score wikipedia

Step 2: Create Seq QC Report 1 2

Results By CLC: Read QC Report

RNA Seq Questionnaire What is the scientific objective of the RNA Seq experiment? How many classes will be compared? Are only coding RNA (mRNA) or long non coding RNA, miRNA expected to be detected? Did all the samples pass RNA quality checks before sequencing? Are there biological replicates? If so how many? What type of sequencing platform was used to sequence the reads? Illumina, Ion torrent, Solid Where was the sequencing performed? Facility name and contact info When was the sequencing performed? Year/date Which RNA – extraction method was used in the experiment? Total RNA/ poly A/ rRNA depletion method and kit name and if possible, link to protocol Whether the protocol is strand specific or not? Unstranded/ forward/reverse, kit name and if possible link to protocol Whether the data is single end or paired end? What is the expected read length? Do the reads contain adapters? If adapters present, what type of adapters? Adapter sequence, if available, or link (usually can get this info from facility) What are the experimental conditions to perform differential expression analysis? Which organism and the reference genome to be used for analysis?

Read Seq Trimming

STEP 3: Create Metadata Table

Step4: Import Metadata

Step4: Import Metadata 2 1 3

Step4: Import Metadata

STEP 5: Read Mapping

Read Mapping Wikipedia

Read Mapping Ozsolak et al. Nature Review Genetics

RNA-Seq vs. Microarrays covers more dynamic range allows to discover novel transcripts able to detect SNPs more costly ($300-$1000/sample) than Microarray ($100-$200/sample) Generates 30-40 times larger dataset than Microarray uncompressed RNA-Seq raw files: >5GB Microarray RNA-Seq Riki Kawaguchi’s Blog: https://bioinfomagician.wordpress.com/about/ Zhao S, Fung-Leung WP, Bittner A, Ngo K, Liu X. Comparison of RNA-Seq and microarray in transcriptome profiling of activated T cells. PLoS ONE. 2014 Jan 16;9(1):e78644.

Must Read http://rnaseq.uoregon.edu/ Cresko Lab, University of Oregon

Best Practices

RNA-seq Analysis Pipeline

Popular Software

STEP 5: Read Mapping 5

STEP 5: Reads Mapping 7

STEP 5: Reads Mapping 8

Reference Genome http://www.gencodegenes.org/releases/current.html http://useast.ensembl.org/info/data/ftp/index.html?redirect=no http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/

STEP 5: Read Mapping

STEP 5: Read Mapping 9

STEP 5: Reads Mapping 10

STEP 5: Reads Mapping 12 11

Expression Values

STEP 5: Reads Mapping

Normalization Methods

STEP 5: Reads Mapping 12 Click on Role

STEP 5: Reads Mapping 13

Results By CLC: Reads Mapping

STEP 5: Reads Mapping; Fusion Tracks 14

STEP 5: Reads Mapping; Fusion Tracks

STEP 5: Reads Mapping; Gene expression Track

Step6: Create a PCA Plot

Step6: Create a PCA Plot

Step7: Differential Expressions

Step7: Differential Expressions

Step7: Differential Expressions; Dex vs Unt

GraphPad Statistics Guide : https://www.graphpad.com/guides/prism/7/statistics/index.htm

Step7: Differential Expressions; Dex vs Unt Volcano Plot

Step8: Create a HeatMap

Step8: Create a HeatMap

Step8: Create a HeatMap

Step8: Create a HeatMap

Step7: Create a Venn Diagram

Step7: Create a Venn Diagram

Step7: Create a Venn Diagram

Create a Track

Step8: Create a Track Track for CRISPLD2

Step8: Create a Track Track for CRISPLD2

Step8: Create a Track

Normalization Methods

Downstream Analysis DEG Annotates differentially expressed genes from an RNA-seq experiment, using the curated public data from GEO

NextBio Research

Export Data from CLC

Find Correlated Gene Expression Studies from GEO

Find Correlated Gene Expression Studies from GEO

Ingenuity IPA Analysis

RNA Seq Questionnaire What is the scientific objective of the RNA Seq experiment? How many classes will be compared? Are only coding RNA (mRNA) or long non coding RNA, miRNA expected to be detected? Did all the samples pass RNA quality checks before sequencing? Are there biological replicates? If so how many? What type of sequencing platform was used to sequence the reads? Illumina, Ion torrent, Solid Where was the sequencing performed? Facility name and contact info When was the sequencing performed? Year/date Which RNA – extraction method was used in the experiment? Total RNA/ poly A/ rRNA depletion method and kit name and if possible, link to protocol Whether the protocol is strand specific or not? Unstranded/ forward/reverse, kit name and if possible link to protocol Whether the data is single end or paired end? What is the expected read length? Do the reads contain adapters? If adapters present, what type of adapters? Adapter sequence, if available, or link (usually can get this info from facility) What are the experimental conditions to perform differential expression analysis? Which organism and the reference genome to be used for analysis?

Thanks To…. HSLS Carrie Iwema David Leung Michael Sweezer CLCBio Shawn Prince Center for Simulation and Modeling Kim F Wong Mu Fangping