Quality Control & Nascent Sequencing

Slides:

Advertisements

Similar presentations

IMGS 2012 Bioinformatics Workshop: RNA Seq using Galaxy

Advertisements

High Throughput Sequencing

Before we start: Align sequence reads to the reference genome

NGS Analysis Using Galaxy

Expression Analysis of RNA-seq Data

Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.

Experimental validation. Integration of transcriptome and genome sequencing uncovers functional variation in human populations Tuuli Lappalainen et al.

Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.

IPlant Collaborative Discovery Environment RNA-seq Basic Analysis Log in with your iPlant ID; three orange icons.

Introductory RNA-seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis.

RNA-Seq Transcriptome Profiling. Before we start: Align sequence reads to the reference genome The most time-consuming part of the analysis is doing the.

Genome-wide association study between DSE polymorphism and Poly-A usage in Human population Hiren Karathia Sridhar Hannenhalli.

Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015

IGV tools. Pipeline Download genome from Ensembl bacteria database Export the mapping reads file (SAM) Map reads to genome by CLC Using the mapping.

Objectives Genome-wide investigation – to estimate alternate Poly-Adenylation (APA) usage on 3’UTR – to identify polymorphism of Downstream Sequence Elements.

Chip – Seq Peak Calling in Galaxy Lisa Stubbs Lisa Stubbs | Chip-Seq Peak Calling in Galaxy1.

HOMER – a one stop shop for ChIP-Seq analysis

Introduction to Exome Analysis in Galaxy Carol Bult, Ph.D. Professor Deputy Director, JAX Cancer Center Short Course Bioinformatics Workshops 2014 Disclaimer…I.

Short Read Workshop Day 1 - Experimental Design Example 1: How to log in to vieques.

IGV Demo Slides:/g/funcgen/trainings/visualization/Demos/IGV_demo.ppt Galaxy Dev: 0.

RNA Seq Analysis Aaron Odell June 17 th Mapping Strategy A few questions you’ll want to ask about your data… - What organism is the data from? -

Introductory RNA-seq Transcriptome Profiling of the hy5 mutation in Arabidopsis thaliana.

Canadian Bioinformatics Workshops

Short Read Workshop Day 1 - Experimental Design Example 1: How to log in to vieques.

Short Read Workshop Day 1 - Experimental Design Video 1- Why to short read sequence (or not)

Canadian Bioinformatics Workshops

Konstantin Okonechnikov Qualimap v2: advanced quality control of

Introductory RNA-seq Transcriptome Profiling

Using command line tools to process sequencing data

NGS File formats Raw data from various vendors => various formats

Day 5 Mapping and Visualization

Short Read Sequencing Analysis Workshop

Placental Bioinformatics

What is a Hidden Markov Model?

Cancer Genomics Core Lab

WS9: RNA-Seq Analysis with Galaxy (non-model organism )

RNA Sequencing Day 7 Wooohoooo!

Variant Calling Workshop

Short Read Sequencing Analysis Workshop

RNA-Seq analysis in R (Bioconductor)

Chip – Seq Peak Calling in Galaxy

S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10.

Introductory RNA-Seq Transcriptome Profiling

GE3M25: Data Analysis, Class 4

ChIP-Seq Analysis – Using CLCGenomics Workbench

Figure 1. Effect of acute TNF treatment on transcription in human SGBS adipocytes as assessed by RNA-seq and RNAPII ChIP-seq. Following 10 days in vitro.

Volume 5, Issue 3, Pages (November 2013)

by Leighton J. Core, Joshua J. Waterfall, and John T. Lis

TSS Annotation Workflow

Regulation of Gene Expression by Eukaryotes

Welcome to the LMS Quick Manager Guide.

Two-sided p-values (1.4) and Theory-based approaches (1.5)

Stat 217 – Day 28 Review Stat 217.

ChIP-Seq Data Processing and QC

Agenda 3/16 Eukaryotic Control Introduction and Reading

EXTENDING GENE ANNOTATION WITH GENE EXPRESSION

The transcription process is similar to replication.

Adrien Le Thomas, Georgi K. Marinov, Alexei A. Aravin Cell Reports

Maximize read usage through mapping strategies

Welcome to the LMS Quick Manager Guide.

Garbage In, Garbage Out: Quality control on sequence data

Regulatory Genomics Lab

Additional file 2: RNA-Seq data analysis pipeline

BF528 - Sequence Analysis Fundamentals

Introduction to RNA-Seq & Transcriptome Analysis

Chip – Seq Peak Calling in Galaxy

by Leighton J. Core, Joshua J. Waterfall, and John T. Lis

Short Read Sequencing Analysis Workshop

RNA-Seq Data Analysis UND Genomics Core.

Presentation transcript:

Quality Control & Nascent Sequencing 2019 Short Read Sequencing Analysis Workshop Day 7 Quality Control & Nascent Sequencing

Today’s plan Quality Control (QC) Pileup, RSeQC, MultiQC Nascent Analysis FStitch, Tfit, & DAStk

Before we begin.... Run the following script in the EC2 instance: $ sh /scratch/Workshop/SR2019/7_nascent/scripts/install_python.sh

Part 1: Quality Control

Data quality affects the type and quality of information that can be obtained from a sequencing experiment What do we notice about the two samples below?

Preseq : Measures data complexity Complexity = # of unique reads / total reads sequenced

Pileup : Assesses coverage Coverage = # of base pairs with at least 1 read mapping (not depth)

Two Important Points 1. Coverage and complexity are correlative but are NOT the same 2. Increased coverage/complexity does not guarantee better data

Running FastQC We've already done this... so this should be a good warm-up! Copy the script /scratch/Workshop/SR2019/7_nascent/scripts/fastqc.sbatch to your /scratch/Users directory Run FastQC on the fastq files in the RNA-seq and GRO-seq directories RNA-seq directory: /scratch/Workshop/SR2019/RNA-seq/fastqs GRO-seq directory: /scratch/Workshop/SR2019/GRO-seq/fastqs

Running Pileup Copy the script /scratch/Workshop/SR2019/7_nascent/scripts/pileup.sbatch to your /scratch/Users directory Edit the pileup.sbatch script and run it on one RNA-seq BAM file and one GRO-seq BAM file RNA-seq directory: /scratch/Workshop/SR2019/RNA-seq/mapped/bams GRO-seq directory: /scratch/Workshop/SR2019/GRO-seq/mapped/bams 3. Look at the two *coverage.stats.txt files – what do you notice?

Running RSeQC Read Distribution Copy the script /scratch/Workshop/SR2019/7_nascent/scripts/read_dist.sbatch to your /scratch/Users directory Edit the read_dist.sbatch script and run it on one RNA-seq BAM file and one GRO-seq BAM file RNA-seq directory: /scratch/Workshop/SR2019/RNA-seq/mapped/bams GRO-seq directory: /scratch/Workshop/SR2019/GRO-seq/mapped/bams 3. Look at the two *read_dist.txt files – what do you notice?

But... what if I have a lot of samples? It would be nice to be able to view the stats on all of your samples at once.... Fortunately, there's a tool for that: MultiQC And I already posted the link for it on Day 4: http://dowell.colorado.edu/ShortRead/sr2019.html#/day4

Running MultiQC If you kept the scripts formatted the way I had them, you should now have a /scratch/Users/USERNAME/qc directory In scratch, enter the following lines into your terminal: $ export PATH=~/.local/bin:$PATH $ multiqc /scratch/Users/USERNAME/qc -o /scratch/Users/USERNAME/qc/multiqc NOTE: Because there aren't many files, this is a super quick, low- memory. If it were a larger job, we would run this in the queue

Look at MultiQC html output file in X2Go

Part 2: Generating Annotations/Regions of Interest

Steady-state (RNA-seq) Nascent (GRO-seq, PRO-seq) Alternative gene splicing/isoform analysis Differential expression of steady- state Post-transcriptional modifications Exon/intron boundaries Transcriptional regulatory elements (TREs) RNA-polymerase activity Initiation Pausing Elongation Termination Immediate changes upon permutation

Steady-state vs. Nascent

Information Obtained from Nascent Sequencing Transcription Regulatory Elements (TREs) (orange) RNAP Pausing Index (red) 3'-end run-on (blue)

Need a way to quickly annotate transcriptional regions of interest Gene annotations are based on mRNA... Need a way to quickly annotate transcriptional regions of interest

FStitch | Train Generates model parameters given a set of hand-annotated training regions 1. Look at training file format: $ less /scratch/Workshop/SR2019/7_nascent/chr1_hct116.bed 2. Log onto X2Go and load IGV: $ sh /opt/igv/2.3.75/igv.sh 3. Then, in IGV, import our training file: Regions --> Region Navigator … 4. Choose 3 more "OFF" regions and 3 more "ON" region Regions --> Export Regions … 5. Edit and run 7_fstitch_train.sbatch script

Training File Output

FStitch | Segment Given the model parameters, segment the genome into active ("ON") and inactive ("OFF") regions of transcription 1. Edit and run 7_fstitch_segment.sbatch script 2. When the job is complete, look at segment data in IGV

FStitch | Bidir Using FStitch segment output and a gene annotation file, we can now parse bidirectionals (active regions of RNAP loading) 1. Edit and run 7_fstitch_bidir.sbatch script 2. When the job is complete, look at bidirectional annotations in IGV

Applications of FStitch Output Differential transcription analysis of genes/bidirectionals

Applications of FStitch Output Pre-filter to Tfit (a tool that models RNAP behavior)

Tfit Model Cartoon

Motif Displacement Analysis Generate a "metagene" using the annotations Calculate the ratio of the number of motifs within 300bp from the center of each annotation to those found within 3000bp Determine whether the change in MD score is statistically significant

Homework Run QC on ChIP and ATAC and compare this data to RNA-seq and GRO-seq Use the scripts provided to calculate MD scores using Tfit annotations If you are struggling, keep in mind we will get more practice with this on Day 9/10

The End Questions?? Don’t forget the homework. Help session in JSCBB A108 from 1-3pm Watch videos for Day 8