Download presentation
Presentation is loading. Please wait.
1
GE3M25: Data Analysis, Class 4
TCD, 30/11/2017 Karsten Hokamp, PhD Genetics
2
Python 6 Functions, Regex
NGS 1 Intro-duction GE3M11Exam Week 10 Week 11 Python 1 Intro-duction Python 2 Strings and Files NGS 2 QC, Trimming Week 12 Python 3 File I/O, Branching Python 4 Modules,Lists, Sets NGS 3 Mapping Week 13 Python 5 Dictiona-ries Python 6 Functions, Regex NGS 4 Peak Calling Week 14
3
ChIP-Seq project report
NGS 5 Gene Lists, Tuning NGS 6 / Python 7 Pipelines NGS 7 / Python 8 Revision Week 15 Python Exam Week 16 ChIP-Seq project report January 2018:
4
Marks for GE3M25 Python exam: 50% 2/3 data handling 1/3 statistics ChIP-Seq report: 50%
5
Python exam Date: Mon, 11th Dec, 11am – 12.45 Venue: Mac Lab Structure: 10 multiple-choice questions (20 points) 4 programming tasks: 2 short ones (30 points each) 2 more involved ones (50 points each) Submission: multiple-choice test (1 sheet print-out) 1 – 2 Python scripts with execution output (file upload)
6
Python exam Material: Anything from the course Website Official Python documentation Python Books Content: Material covered during classes Note: Add comments! Include copy of output (Terminal/Idle) Include Student ID in script and file name Submit frequently – only last version counts Even scripts that don't work can receive points
7
Class 4: Project overview Visualisation Peak detection Motif detection
8
ChIP-Seq Different sets of genes are expressed under different conditions Regulated through transcription factors that bind to promoters Binding can be captured by ChIP Enriched regions are revealed through NGS
10
Class 1: ChIP-seq data analysis in a nutshell
11
ChIP-Seq Analysis Goal
12
Recap – From Reads to Peaks (Visualisation)
NGS data (FastQ format) Mapped reads (SAM format) bowtie2 samtools Index files (*.bt2) Sorting/indexing (*.bam, *.bai) Reference (Fasta format) bowtie2-build IGV
13
Recap – From Reads to Peaks (Visualisation)
NGS data (FastQ format) Mapped reads (SAM format) bowtie2 samtools Sorting/indexing (*.bam, *.bai) BigWig file Index files (*.bt2) Reference (Fasta format) bowtie2-build IGV
14
Recap – From Reads to Peaks (Calling)
NGS data (FastQ format) Mapped reads (SAM format) bowtie samtools Index files (*.bt2) Sorting/indexing (*.bam, *.bai) Reference (Fasta format) bowtie-build Gem Peak list, motifs
15
Project Data http://bioinf.gen.tcd.ie/GE3M25/project
Antimicrob. Agents Chemother. (2014)
16
Project Data Three strains: Wild type TAP-Pdr1 Pdr1-k.o.
17
Project Data Three strains, two antibodies Wild type TAP-Pdr1
Pdr1-k.o. Pdr1 antibody TAP antibody
18
Project Data Paul et al. Figure 2A
19
Project Data Potential consensus for the C. glabrata PDR1 binding site
Paul et al. Figure 2B
20
GE3M25 Project Previous steps:
Download FastQ data set (ChIP-Seq of TF in yeast) ✔ Quality assessment with FastQC ✔ Read mapping (Bowtie2) ✔ Generate indexed and sorted BAM file ✔ Visualisation in IGV ✔ Store BAM and index files ✔
21
GE3M25 Project Data Download: Start here: bioinf.gen.tcd.ie/GE3M25
22
GE3M25 Project Data Download: NGS page: bioinf.gen.tcd.ie/GE3M25/ngs
23
GE3M25 Project Data Download: bioinf.gen.tcd.ie/GE3M25/ngs/data
Main data files (Fastq format)
24
GE3M25 Project Data Download: bioinf.gen.tcd.ie/GE3M25/ngs/data/fastq
Control data files ChIP data files download files that have your student id
25
Preparations – new tools folder
1. Rename previous directory (in Terminal): mv tools tools.prev If you see mv: rename tools to tools.old/tools: No such file or directory then there was no tools directory – that's ok!
26
GE3M25 Project Data Download: bioinf.gen.tcd.ie/GE3M25/ngs/data
additional files in tools.zip
27
Preparations Tools Rename previous directory (in Terminal)
Download 'tools.zip' from webpage Unpack archive (if not done by browser): unzip tools.zip If you see unzip: cannot find or open tools.zip, tools.zip.zip then it was already unpacked during download
28
Preparations Tools Rename previous directory (in Terminal)
Download 'tools.zip' from webpage Unpack archive (if not done by browser) Check content of the folder: ls -lh tools
29
Preparations
30
Preparations Download tools.zip (class 4) again if this is missing!
31
Data Processing Indexing Mapping Compressing Sorting Visualisation
32
Data Processing Indexing Mapping Compressing Sorting BigWig generation
Visualisation Peak/Motif detection
33
Data Processing Indexing Mapping Compressing Sorting BigWig generation
Visualisation Peak/Motif detection can be combined
34
Data Processing Indexing Mapping | Compressing | Sorting
BigWig generation Visualisation Peak/Motif detection
35
GE3M25 Project – Read Mapping
Build an index of the Genome: Syntax: bowtie2-build fasta_file index_name e.g. tools/bowtie2-build ASM254v2.fa C_glabrata This name to be used in mapping step!
36
GE3M25 Project – Read Mapping
Bowtie2 mapping: Single-end data: bowtie2 -U _exp_1_fastq.bz2 -x C_glabrata -p 4 > exp1.sam 2. Paired-end data: bowtie2 -1 file1 -2 file2 -x C_glabrata -p 4 > exp.sam e.g.: bowtie _exp_1_fastq.bz _exp_2_fastq.bz2 -x C_glabrata -p 4 > exp.sam
37
GE3M25 Project – Sorting and Indexing
Change SAM to BAM format: tools/samtools view -b exp.sam > exp.bam 2. Sorting with 4 threads for speed-up: tools/samtools sort 4 exp.bam > exp_sorted.bam intermediates Results file
38
Data Processing output from left is used as input on right of pipe
Mapping | Compressing | Sorting tools/bowtie2 -1 file1 -2 file2 -x index | tools/samtools view -b - | tools/samtools sort - > out.bam all on one line file names replaced with '-' redirect output into file
39
make output name descriptive
Data Processing Indexing Mapping | Compressing | Sorting, e.g.: tools/bowtie2 -x C_glabrata -p 4 _exp_1_fastq.bz2 _exp_2_fastq.bz2 | tools/samtools view -b - | tools/samtools sort - > exp.sorted.bam make output name descriptive
40
Data Processing Indexing Mapping | Compressing | Sorting
BigWig generation Visualisation Peak/Motif detection
41
file that lists BAM files
Data Processing The bigWig format is useful for dense, continuous data that will be displayed in the Genome Browser as a graph. file that lists BAM files
42
GE3M25 Project
43
Kill stuck IGV via Activity Monitor
44
GE3M25 Project New file with .bw ending:
Load .bam and .bw files into IGV
45
BigWig track visible across whole genome!
46
GE3M25 Project Data formats: Fastq SAM BAM BAM index BigWig
47
GE3M25 Project Peak calling with GEM Required input parameters:
BAM file Fasta file with reference sequence File with chromosome size(s) Genome size Read distribution Output directory
48
GE3M25 Project Peak calling with GEM java -jar tools/gem/gem.jar
--expt exp.sorted.bam --f BAM --genome . --g chrom.sizes.txt --s --d tools/gem/Read_Distribution_default.txt --out peaks BAM file Directory with fasta file(s) File with chromosome size(s) Genome size Read distribution Output directory
49
GE3M25 Project Download these two files
50
GE3M25 Project Running Gem:
51
GE3M25 Project Output produced by GEM:
52
GE3M25 Project Check out top peaks: head peaks/peaks_GPS_events.txt
53
GE3M25 Project Peak calling with GEM
54
GE3M25 Project Peak calling with GEM
Add parameters to initiate motif finding: --k_min 6 --k_max 13
55
GE3M25 Project Output produced by GEM: open peaks/peaks_result.htm
56
GE3M25 Project Peak calling with GEM Add control file to remove noise:
--ctrl ctrl.sorted.bam Check how detected peaks/motif differ!
57
GE3M25 Project Calculate chromosome sizes
tools/samtools idxstats exp_sorted.bam | cut -f 1,2 > chrom.sizes
58
GE3M25 Project Storage of results files
Upload .bam, bam.bai, .bw etc through bioinf.gen.tcd.ie/GE3M25/project
59
Don't forget to log out!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.