Download presentation
Presentation is loading. Please wait.
Published byFrancis Matthews Modified over 6 years ago
1
ChIP-Seq Analysis – Using CLCGenomics Workbench
Nov 16,2017 Ansuman Chattopadhyay, PhD Health sciences library system University of pittsburgh
2
Transcription Factor ChIP-Seq Histone ChIP-Seq ATAC-Seq
Topics Transcription Factor ChIP-Seq Histone ChIP-Seq ATAC-Seq
4
Transcription Factor and Histone ChIP-Seq
5
ATAC-Seq Study
6
Galaxy : http://galaxy.crc.pitt.edu:8080/
Graphical User Interface based software Galaxy : CLC Genomics Workbench
7
HSLS MolBio
8
NGS Software @ HSLS MolBio
NGS Analysis Sanger Seq Analysis Human , Mouse and Rat NGS Analysis
9
CLCbio Genomics Workbench
System Requirements Windows Vista, Windows 7, Windows 8, Windows 10, Windows Server 2008, or Windows Server 2012 Mac OS X 10.7 or later. Linux: Red Hat 5.0 or later. SUSE 10.2 or later. Fedora 6 or later. 8 GB RAM required 16 GB RAM recommended 1024 x 768 display required 1600 x 1200 display recommended Intel or AMD CPU required Minimum 10 GB free disc space in the tmp directory
10
CLC Plugins to Install CLC Workbench Client Plugin Histone ChIP-Seq
Advanced Peak Shape Tools Plugin – Beta Download available at Top Right Corner
11
Integrating with the CLCbio Genomics Server @ CRC
12
You need Secure Remote Access via Pulse to run CLCGx from off campus locations / Pitt Wireless
13
CLC files at the CRC HTC Cluster
Reference Sequences Look for Folders organized by PI’s name
14
Create Folders at CRC-HTC
15
Create Folder in SaM-HTC Cluster
1 2
16
Create Workshop Folder@ FRANK
1 2 3
17
ChIP-Seq Workflow
18
Dataset
19
GEO Dataset
20
Download FASTQ Reads MyoD_Undiff_ChIP-Seq
21
Download FASTQ Reads MyoD_Undiff_ChIp-Seq
22
ENA : Download FASTQ Reads MyoD_Undiff_ChIp-Seq
23
Import : FASTQ Reads MyoD_Undiff_ChIp-Seq
1
24
Import : FASTQ Reads MyoD_Undiff_ChIp-Seq (single)
25
GEO Dataset – ATAC-Seq
26
STEP 1: Import Reads to CLC (Paired End)
2
27
STEP 1: Import Reads to CLC (Paired End)
3 4 5
28
FASTQ format
29
FASTQ Reads
30
FASTQC Project
31
Step 2: Create a Seq QC Report
1 2
32
Trim Reads – Adapter Seq etc.
33
Create Adapter List
34
Create Adapter List
35
Create FAST QC Report
36
FASTQC Report
37
Read Mapping to Ref Genome
38
Read Mapping to Ref Genome
39
Read Mapping to Ref Genome
40
Read Mapping to Ref Genome
41
Read Mapping to Ref Genome
42
Read Mapping around GM20652 Result from MyOD1 ChIP-Seq
43
Peak Calling Strino etal.,BMC Bioinformatics, June 2016
44
Peak Calling Strino etal.,BMC Bioinformatics, June 2016
Landt etal.,Genome Research,2012
45
Peak Calling Strino etal.,BMC Bioinformatics, June 2016
46
Discovering Obvious Peaks
The CLC shape-based peak caller finds peaks by building a Gaussian filter based on the mean and variance of the fragment length distribution, which are inferred from the cross-correlation profile Strino etal.,BMC Bioinformatics, June 2016
47
Peak Shape Score The Peak Shape Score is standardised and follows a standard normal distribution, so a p-value for each genomic position can be calculated as p-value=Φ(−Peak Shape Score of the peak centre), where Φ is the standard normal cumulative distribution function. Score = genomic coverage * filter; *: cross-correlation operator Score indicates how likely a genomic position is to be a center of a peak Strino etal.,BMC Bioinformatics, June 2016
48
Once the positive and negative regions have been identified,
Peak Shape Filter Once the positive and negative regions have been identified, the CLC shape-based peak caller learns a filter that matches the average peak shape, which is called Peak Shape Filter. Strino etal.,BMC Bioinformatics, June 2016
49
Peak Shape Filter Strino etal.,BMC Bioinformatics, June 2016
50
Peak Detection peaks are called by first identifying the genomic positions whose p-value is higher than the specified threshold and which do not have any higher value in a window around them. The size of this window is determined by the filter as the longest distance between two positive values in the filter. These maxima define the center of the peak, while the peak boundaries are identified by expanding from the center both left and right until either the score becomes 0 or the peak touches a window boundary Strino etal.,BMC Bioinformatics, June 2016
51
Call Peaks using Peak Shape information
52
Call Peaks using Peak Shape information
53
Call Peaks using Peak Shape information
54
Peak Calls Result
55
Peak Calls Result
56
Annotate Peaks with near by genes
57
Annotate Peaks with near by genes
58
5Prime and 3Prime Gene Distance
59
ChIP-Seq Result
60
Compare Datasets
61
Compare Datasets
62
Compare Datasets
63
Compare Datasets
64
Commonly Used Open-Source Tool
65
Comparison of CLC Results with MACS2.0
66
Histone ChIP-Seq Li etal., Cell
67
Histone ChIP-Seq
68
Histone Modifications
Li etal., Cell
69
Running Histone ChIP-Seq
Classify Regions of variable length by Peak Shape
70
Running Histone ChIP-Seq
71
Running Histone ChIP-Seq
72
Running Histone ChIP-Seq
73
Histone ChIP-Seq Result
74
Histone ChIP-Seq Result
Classified Gene Regions in the genome
75
H3K4Me3 – Diff : Result by Txnfactor ChIP-Seq tool
76
ATAC-Seq
77
ATAC-Seq Data Analysis
78
Comparison of DNAse-Seq Results
79
HSLS-MBIS and Genomics Analysis Core
GAC Ansuman Chattopadhyay, PhD Uma Chandran, PhD, MSIS Sri Chaparala Carrie Iwema, PhD, MLS
80
Thanks To…. CLCBio Center for Research Computing Shawn Prince
HSLS Sri Chaparala Carrie Iwema David Leung Michael Sweezer CLCBio Shawn Prince Center for Research Computing Mu Fangping
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.