Download presentation
Presentation is loading. Please wait.
1
RNAseq Applications in Genome Studies
Alexander Kanapin, PhD Wellcome Trust Centre for Human Genetics, University of Oxford
2
RNAseq Protocols Next generation sequencing protocol
cDNA, not RNA sequencing Types of libraries available: Total RNA sequencing polyA+ RNA sequencing Small RNA sequencing Special protocols: DSN treatment Ribosomal depletion
3
Genome Study Applications
transcriptome analysis identifying new transcribed regions expression profiling Resequencing to find genetic polymorphisms: SNPs, micro-indels CNVs
4
cDNA Synthesis
5
Sequencing details Standard sequencing Strand-specific sequencing
polyA/total RNA Size slection Primers and adapters Single- and paired-end sequencing Strand-specific sequencing Beta version Sequencing only + or – strand Mostly paired-end
6
Arrays vs RNAseq (1) Correlation of fold change between arrays and RNAseq is similar to correlation between array platforms (0.73) Technical replicates are almost identical, no need to run Extra analysis: prediction of alternative splicing, SNPs Low- and high-expressed genes do not match
7
Array vs RNAseq (2)
8
A bit of statistics Short reads distribution
Poisson Negative binomial Normal Expression values normalization FPKM Normalized reads number VST (variance stabilized transformation) Differential expression analysis Replicates vs non-replicates
9
Expression profiles/RNA abundance
Analysis Dataflow Illumina Pipeline (FASTQ) Alignment (BAM) FASTX Toolkit (FASTQ/FASTA) Expression profiles/RNA abundance Splice variants SNP analysis
10
Software Short reads aligners
Stampy, BWA, Novoalign, Bowtie,… Data preprocessing (reads statistics, adapter clipping, formats conversion, read counters) Fastx toolkit Htseq MISO samtools Expression studies Cufflinks package RSEQtools R packages (DESeq, edgeR, baySeq, DEGseq, Genominator) Alternative splicing Cufflinks Augustus Commercial software Partek CLCBio
11
FASTQ: Sequence Data “FASTA with Qualities”
@HWI-EAS225:3:1:2:854#0/1 GGGGGGAAGTCGGCAAAATAGATCCGTAACTTCGGG +HWI-EAS225:3:1:2:854#0/1 GGGAAGATCTCAAAAACAGAAGTAAAACATCGAACG +HWI-EAS225:3:1:2:1595#0/1 a`abbbababbbabbbbbbabb`aaababab\aa_`
12
SAM(BAM): Alignment Data
Read ID Bitwise flag Chr Pos MapQ CIGAR Mate ref Mate pos Insert size Sequence Scores Extra tags S35_42763_4 X 255 18M * CACACGATTCTCAAAGGT IIIIIIIIIIIIIIIIII XA:i:0
13
FPKM (RPKM): Expression Values
Fragments Reads Per Kilobase of exon model per Million mapped fragments Nat Methods. 2008, Mapping and quantifying mammalian transcriptomes by RNA-Seq. Mortazavi A et al. C= the number of reads mapped onto the gene's exons N= total number of reads in the experiment L= the sum of the exons in base pairs.
14
Cufflinks package http://cufflinks.cbcb.umd.edu/ Cufflinks:
Expression values calculation Transcripts de novo assembly Cuffcompare: Transcripts comparison (de novo/genome annotation) Cuffdiff: Differential expression analysis
15
Cufflinks (Expression analysis)
gene_id bundle_id chr left right FPKM FPKM_conf_lo FPKM_conf_hi status ENSG chr OK ENSG chr OK ENSG chr OK ENSG chr OK ENSG chr OK ENSG chr OK ENSG chr OK ENSG chr OK ENSG chr OK ENSG chr OK ENSG chr OK ENSG chr OK
16
Cuffdiff (differential expression)
Pairwise or time series comparison Normal distribution of read counts Fisher’s test test_id gene locus sample_1 sample_2 status value_1 value_2 ln(fold_change) test_stat p_value significant ENSG TSPAN6 chrX: q1 q2 NOTEST no ENSG TNMD chrX: q1 q2 NOTEST no ENSG DPM1 chr20: q1 q2 NOTEST no ENSG SCYL3 chr1: q1 q2 OK yes
17
Cufflinks: Alternative splicing
trans_id bundle_id chr left right FPKM FMI frac FPKM_conf_lo FPKM_conf_hi coverage length effective_length status ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK ENST chr OK
18
R/bioconductor Packages
Based on raw read counts per gene/transcript/genome feature (miRNA) Differential expression analysis DESeq Negative binomial distribution baySeq views/release/bioc/html/baySeq.html Bayesian approach Choice of Poisson and negative binomial distribution edgeR DEGSeq Genominator …
19
DESeq: Variance estimation
SCV: the ratio of the variance at base level to the square of the base mean Solid line: biological replicates noise Dotted line: full variance scaled by size factors Shot noise: dotted minus solid
20
DESeq: Differential Expression
id B cells expression IFG expression log2FoldChange pValue ENSG e-17 ENSG e-13 ENSG e-33 ENSG e-07 ENSG e-05 ENSG e-13 ENSG e-133 ENSG e-10 ENSG e-16 ENSG e-30 ENSG e-08 ENSG e-08 ENSG e-14 ENSG e-40 ENSG e-10 ENSG e-33 ENSG e-07 ENSG e-06 ENSG e-11 ENSG e-06 ENSG e-05 ENSG e-07 ENSG e-12 ENSG e-12 ENSG e-18 ENSG e-06
21
Visualization: Genome Viewers
Web based: Gbrowse ( UCSC Genome Browser ( Standalone Integrated Genome Viewer (
22
IGV: Differential Expression Visualization
23
An Introduction to ChIP-Sequencing analysis
Linda Hughes
24
What is ChIP-Seq? Chromatin-Immunoprecipitation (ChIP)- Sequencing
ChIP - A technique of precipitating a protein antigen out of solution using an antibody that specifically binds to the protein. Sequencing – A technique to determine the order of nucleotide bases in a molecule of DNA. Used in combination to study the interactions between protein and DNA.
25
ChIP-Seq Applications
Enables the accurate profiling of Transcription factor binding sites Polymerases Histone modification sites DNA methylation
26
ChIP-Seq: The Basics
27
ChIP-Seq Analysis Pipeline
Sequencing Base Calling Read quality assessment 30-50 bp Sequences Genome Alignment Peak Calling Enriched Regions Visualisation with genome browser Differential peaks Motif Discovery Combine with gene expression
28
ChIP-Seq: Genome Alignment
Several Aligners Available BWA NovoAlign Bowtie Currently the Sequencing analysis pipeline uses the Stampy as the default aligner for all sequencing. All aligner output containing information about the mapping location and quality of the reads are out put in SAM format
29
ChIP-Seq Peak Calling The main function of peak finding programs is to predict protein binding sites First the programs must identify clusters (or peaks) of sequence tags The peak finding programs must determine the number of sequence tags (peak height) that constitutes “significant” enrichment likely to represent a protein binding site
30
ChIP-Seq: Peak Calling
Several ChIP-seq peak calling tools Available MACS PICS PeakSeq Cisgenome F-Seq
31
ChIP-Seq: Identification of Peaks
Several methods to identify peaks but they mainly fall into 2 categories: Tag Density Directional scoring In the tag density method, the program searches for large clusters of overlapping sequence tags within a fixed width sliding window across the genome. In directional scoring methods, the bimodal pattern in the strand-specific tag densities are used to identify protein binding sites.
33
ChIP-Seq: Determination of peak significance
To account for the background signal, many methods incorporate sequence data from a control dataset. This is usually generated from fixed chromatin or DNA immunoprecipitated with a nonspecific antibody. Calculate false discovery rate account the background signal in ChIP-sequence tags Assess the significance of predicted ChIP-seq peaks
34
ChIP-Seq: Determination of peak significance
More statistically sophisticated models developed to model the distribution of control sequence tags across the genome. Used as a parameter to assess the significance of ChIP tag peaks t-distribution Poisson model Hidden Markov model Primarily used to assign each peak a significance metric such as a P-value FDR or posterior probability.
35
ChIP-Seq: Output chr start end length summit tags *log10(pvalue) fold_enrichment FDR(%) chr chr chr chr chr chr chr chr chr chr chr chr chr chr
36
ChIP-Seq: Output A list of enriched locations Can be used:
In combination with RNA-Seq, to determine the biological function of transcription factors Identify genes co-regulated by a common transcription factor Identify common transcription factor binding motifs
37
ChIP-Seq: Need help? http://seqanswers.com/ Good for: Publications
Answering FAQ Troubleshooting Contacting the programs authors
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.