MRNA protein DNA Activation Repression Translation Localization Stability Pol II 3’UTR Transcriptional and post-transcriptional regulation of gene expression.

Slides:



Advertisements
Similar presentations
Methods to read out regulatory functions
Advertisements

Periodic clusters. Non periodic clusters That was only the beginning…
Epigenetics Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
ChIP-seq Olivier Elemento, PhD TA: Jenny Giannopoulou, PhD Institute for Computational Biomedicine CSHL High Throughput Data Analysis Workshop, June 2012.
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
20,000 GENES IN HUMAN GENOME; WHAT WOULD HAPPEN IF ALL THESE GENES WERE EXPRESSED IN EVERY CELL IN YOUR BODY? WHAT WOULD HAPPEN IF THEY WERE EXPRESSED.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Gene regulation in cancer 11/14/07. Overview The hallmark of cancer is uncontrolled cell proliferation. Oncogenes code for proteins that help to regulate.
Analysis of ChIP-Seq Data
1 Statistics Achim Tresch UoC / MPIPZ Cologne treschgroup.de/OmicsModule1415.html Chromatin Immunoprecipitation (ChIP) data.
Genome-wide Regulatory Complexity in Yeast Promoters Zhu YANG 15 th Mar, 2006.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Promoter Panel Review. Background related Promoter In genetics, a promoter is a DNA sequence that enables a gene to be transcribed. It may be very long.
MRNA protein DNA Activation Repression Translation Localization Stability Pol II 3’UTR Transcriptional and post-transcriptional regulation of gene expression.
Identification of a Novel cis-Regulatory Element Involved in the Heat Shock Response in Caenorhabditis elegans Using Microarray Gene Expression and Computational.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
RNA-Seq An alternative to microarray. Steps Grow cells or isolate tissue (brain, liver, muscle) Isolate total RNA Isolate mRNA from total RNA (poly.
ChIP-seq QC Xiaole Shirley Liu STAT115, STAT215. Initial QC FASTQC Mappability Uniquely mapped reads Uniquely mapped locations Uniquely mapped locations.
Introduction to computational genomics – hands on course Gene expression (Gasch et al) Unit 1: Mapper Unit 2: Aggregator and peak finder Solexa MNase Reads.
High Throughput Sequencing
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Motif finding: Lecture 1 CS 498 CXZ. From DNA to Protein: In words 1.DNA = nucleotide sequence Alphabet size = 4 (A,C,G,T) 2.DNA  mRNA (single stranded)
Mapping protein-DNA interactions by ChIP-seq Zsolt Szilagyi Institute of Biomedicine.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Massive Parallel Sequencing
“Hotspot” algorithm chr5:131,975, ,012,092 Idea: gauge enrichment of tags relative to a local background model based on the number of tags in a 50kb.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics Lab v1 | Saurabh Sinha1 Powerpoint by Casey Hanson.
Identification of Regulatory Binding Sites Using Minimum Spanning Trees Pacific Symposium on Biocomputing, pp , 2003 Reporter: Chu-Ting Tseng Advisor:
Next Generation DNA Sequencing
Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
ChIP-chip Data. DNA-binding proteins Constitutive proteins (mostly histones) –Organize DNA –Regulate access to DNA –Have many modifications Acetylation,
I519 Introduction to Bioinformatics, Fall, 2012
Achim Tresch Computational Biology ‘Omics’ - Analysis of high dimensional Data.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Chip-Seq Peak Calling in Galaxy | Lisa Stubbs | PowerPoint by Casey Hanson.
EDACC Quality Characterization for Various Epigenetic Assays
Alistair Chalk, Elisabet Andersson Stem Cell Biology and Bioinformatic Tools, DBRM, Karolinska Institutet, September Day 5-2 What bioinformatics.
Signatures of Accelerated Somatic Evolution in Gene Promoters in Multiple Cancer Types Update Talk Kyle Smith De Lab.
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Algorithms in Bioinformatics: A Practical Introduction
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.
Trinity College Dublin, The University of Dublin GE3M25: Data Analysis, Class 4 Karsten Hokamp, PhD Genetics TCD, 07/12/2015
Local Multiple Sequence Alignment Sequence Motifs
Analysis of ChIP-Seq Data Biological Sequence Analysis BNFO 691/602 Spring 2014 Mark Reimers.
Introduction of the ChIP-seq pipeline Shigeki Nakagome November 16 th, 2015 Di Rienzo lab meeting.
Chip – Seq Peak Calling in Galaxy Lisa Stubbs Lisa Stubbs | Chip-Seq Peak Calling in Galaxy1.
HOMER – a one stop shop for ChIP-Seq analysis
Network Motifs See some examples of motifs and their functionality Discuss a study that showed how a miRNA also can be integrated into motifs Today’s plan.
High Throughput Sequence (HTS) data analysis 1.Storage and retrieving of HTS data. 2.Representation of HTS data. 3.Visualization of HTS data. 4.Discovering.
Practice:submit the ChIP_Streamline.pbs 1.Replace with your 2.Make sure the.fastq files are in your GMS6014 directory.
The regulation of Caspase 8 chIP-seq motifs mRNA expression DNA methylation.
Regulatory Genomics Lab
Chip – Seq Peak Calling in Galaxy
GE3M25: Data Analysis, Class 4
De novo Motif Finding using ChIP-Seq
Volume 51, Issue 5, Pages (September 2013)
Volume 23, Issue 5, Pages (May 2018)
Adrien Le Thomas, Georgi K. Marinov, Alexei A. Aravin  Cell Reports 
Volume 133, Issue 3, Pages (May 2008)
ChIP-seq Robert J. Trumbly
Regulatory Genomics Lab
Volume 21, Issue 9, Pages (November 2017)
Volume 63, Issue 3, Pages (August 2016)
Volume 22, Issue 2, Pages (April 2006)
Regulatory Genomics Lab
HOXA9 and STAT5 co-occupy similar genomic regions and increase JAK/STAT signaling. HOXA9 and STAT5 co-occupy similar genomic regions and increase JAK/STAT.
Presentation transcript:

mRNA protein DNA Activation Repression Translation Localization Stability Pol II 3’UTR Transcriptional and post-transcriptional regulation of gene expression

Where does each transcription factor bind in the genome, in each cell type, at a given time ? Near which genes ? What is the cis-regulatory code of each factor ? Does they require any co- factors ? DNA Activation Repression

ChIP-seq Genome Analyzer II (Solexa) Transcription factor of interest Antibody

Control: input DNA Genome Analyzer II (Solexa)

ACCAATAACCGAGGCTCATGCTAAGGCGTTAGCCACAGATGGAAGTCCGACGGCTTGATCCAGAATGGTGTGTGGATTGCCTTGGAACTGA TTAGTGAATTC TGGTTATTGGCTCCGAGTACGATTCCGCAATCGGTGTCTACCTTCAGGCTGCCGAACTAGGTCTTACCACACACCTAACGGAACCTTGACTA ATCACTTAAG Average length ~ 250bp

ACCAATAACCGAGGCTCATGCTAAGGCGTTAGCCACAGATGGAAGTCCGACGGCTTGATCCAGAATGGTGTGTGGATTGCCTTGGAACTGA TTAGTGAATTC TGGTTATTGGCTCCGAGTACGATTCCGCAATCGGTGTCTACCTTCAGGCTGCCGAACTAGGTCTTACCACACACCTAACGGAACCTTGACTA ATCACTTAAG Average length ~ 250bp 25-40bp

ACCAATAACCGAGGCTCATGCTAAGGCGTTAGCCACAGATGGAAGTCCGACGGCTTGATCCAGAATGGTGTGTGGATTGCCTTGGAACTGA TTAGTGAATTC TGGTTATTGGCTCCGAGTACGATTCCGCAATCGGTGTCTACCTTCAGGCTGCCGAACTAGGTCTTACCACACACCTAACGGAACCTTGACTA ATCACTTAAG Average length ~ 250bp 25-40bp

BCL6 ChIP-seq Lymphoma cell line (OCI-Ly1) Solexa/Illumina 6 lanes for ChIP, 1 for input DNA, 1 for QC 36nt long sequences 32 Million reads Aligned/mapped to hg18 with Eland Melnick lab at WCMC

AAAAATTCTCCCAAAACAAAAAAATACGCGTATTCTCCCAAAACAATATCTTACAAGATGTAAATATACCCAAGAT G Reference Human Genome (hg18) AAAATACGCGTATTCTCCCAAAACAATATC Solexa Read Read mapping with Eland

AAAAATTCTCCCAAAACAAAAAAATACGCGTATTCTCCCAAAACAATATCTTACAAGATGTAAATATACCCAAGAT G Reference Human Genome (hg18) AAAATACGCCTATTCTCCCAAAACAATATC Solexa Read Read mapping with Eland

AAAAATTCTCCCAAAACAAAAAAATACGCGTATTCTCCCAAAACAATATCTTACAAGATGTAAATATACCCAAGAT G Reference Human Genome (hg18) AAAATACGCCTATTCTCCCATAACAATATC Solexa Read Read mapping with Eland

Reads can map to multiple locations/chromosomes Solexa Read 1 Solexa Read 2 Reference Human Genome (hg18)

Reads map to one strand or the other Solexa Read 1 Solexa Read 2 hg18

>HWI-EAS83_30UCEAAXX:1:2:915:1011AGGTCACAAAACAAGTCCTAACAAATTTAAGAGTATU011362chr8.fa RDD >HWI-EAS83_30UCEAAXX:1:2:826:1245GTCAGAAAAATCCTTTTTATTATATAAACAATACATU2001chr5.fa FDD15G20G >HWI-EAS83_30UCEAAXX:1:2:900:945GTCATCAAACTCCAAGGATTCTGTTTTCAACATACTU0110chr18.fa RDD >HWI-EAS83_30UCEAAXX:1:2:1037:1118GAAAGTGATTAGCAGATTGTCATTTAATAATTGTCTU2001chr1.fa FDD18G28G >HWI-EAS83_30UCEAAXX:1:2:898:874GATAAATTTTTTCCTACAATCTTAAATTATTACACAU1010chr3.fa RDD10C >HWI-EAS83_30UCEAAXX:1:2:918:928AAAAATTAAACAATTCTAAAAATATTTTTATCTTAAU2001chr2.fa RDD18C31G >HWI-EAS83_30UCEAAXX:1:2:1324:4GCACATGTCATACTCTTTCTAGCTCTCTTATTTTTCU0100chr8.fa RDD >HWI-EAS83_30UCEAAXX:1:2:899:1015AAATTAATGTAAAAAATAGGATACTGAATTGTGATAU1010chr10.fa FDD30G >HWI-EAS83_30UCEAAXX:1:2:909:926GTAGTTAACAATAATTTATTTTATACTTCAAAATTCU10117chrX.fa RDD7A >HWI-EAS83_30UCEAAXX:1:2:701:1702GTCAGAATTAATTAATCAAAACACCAAATGTACTTCU0100chr12.fa FDD >HWI-EAS83_30UCEAAXX:1:2:996:1003ATTTTGACTTTATTATTTTTTCTTCAATGTTTTTAANM000 >HWI-EAS83_30UCEAAXX:1:2:884:1090GAAAGTACATCAAATACATATTATATACTTTACATAR2002 >HWI-EAS83_30UCEAAXX:1:2:911:937AATCCATATACATTTCTTTTTAATCATTTCCTCTTTU1010chr11.fa FDD20G >HWI-EAS83_30UCEAAXX:1:2:1517:330GTGAGTTTCTTAATCCTGAGTTCTAATTTTATTTCAR >HWI-EAS83_30UCEAAXX:1:2:904:1031ACATTTTATAAATTTTTAATTTCATTTTAATTTATANM000 >HWI-EAS83_30UCEAAXX:1:2:1291:1469GTTTTTAAAATCAACACTTTTATTATAGAAGTAGCAU0101chr12.fa RDD >HWI-EAS83_30UCEAAXX:1:2:1697:828GTACTGATGTAAACTTGGTAAAAACATTGACATAAAU0100chr14.fa FDD >HWI-EAS83_30UCEAAXX:1:2:1415:583GAAGAAAATGACTATGTCAAAATATTATCTCTCAATU0100chr5.fa FDD >HWI-EAS83_30UCEAAXX:1:2:1561:1653GTTTTACTGATTTTCTTACTTACTAAACTACCTGTTU0100chr7.fa FDD >HWI-EAS83_30UCEAAXX:1:2:1579:943AATGATACGGCGACCACCGACAGGTTCAGAGTTCTANM000 >HWI-EAS83_30UCEAAXX:1:2:1705:268GAGAATTATTCAGAAGTCAAATCTGTGCTTAGTTTAU2001chr5.fa RDD3G7C >HWI-EAS83_30UCEAAXX:1:2:1489:318GTATGTATCATATATATTTATGTATCATATATATTTR1032 >HWI-EAS83_30UCEAAXX:1:2:1003:1113GATTGCTCCATTATTTGTTAAAAACATAGTAAAATANM000 >HWI-EAS83_30UCEAAXX:1:2:895:1072ATGAGATCAGTACTTCAAAGAGATATCTGCACTCCCU0119chr12.fa RDD >HWI-EAS83_30UCEAAXX:1:2:853:1178GTTAGTCCCAATATTCCATTAATCCCAATAAATATAU2001chr6.fa FDD15G19G >HWI-EAS83_30UCEAAXX:1:2:1432:972GAGATAATAATAGCAGTTATGGCATCGAGATAATTTU0100chr2.fa RDD >HWI-EAS83_30UCEAAXX:1:2:1718:341GTAGAGGGCACACATCACAAACAAGTTTCTGAGAATR2003 >HWI-EAS83_30UCEAAXX:1:2:1171:302GAATATCCACTTGCAGACTTTACAAACAAATTTTTTR2004 >HWI-EAS83_30UCEAAXX:1:2:1055:1126GGCAGATGAAACTTCTATACACTATATTTTAGCCAGU0100chr13.fa FDD >HWI-EAS83_30UCEAAXX:1:2:971:1371GAAAGAAAAACTATTGAAAAAATAGTTACTTTCCAAU0100chr1.fa RDD >HWI-EAS83_30UCEAAXX:1:2:1774:614GTGTAGATGATATCGAGGGCATTAGAAGTAAATAGCU0100chr5.fa FDD >HWI-EAS83_30UCEAAXX:1:2:1207:808GAGAGGAAATAATAAAGATAAAAGTAGAAAAAGTGAU0100chr1.fa FDD >HWI-EAS83_30UCEAAXX:1:2:1680:815GATAATTATGTTGTTGTAATTATTGTTTGTTTTTTTU0100chr15.fa RDD >HWI-EAS83_30UCEAAXX:1:2:1688:260GTTGACAATCCAGCTGTCATAGAAACTGACTATTTTU0100chr12.fa RDD >HWI-EAS83_30UCEAAXX:1:2:1051:916AAAAATTCTCCCAAAACAACAAGATGTAAATATACCU0100chr3.fa RDD >HWI-EAS83_30UCEAAXX:1:2:1771:308GTTCTTACACTGATATGAAGAAATACCTGAGACTGGU01267chr2.fa RDD >HWI-EAS83_30UCEAAXX:1:2:911:917GAGAAACACACATATTTTTGTAAGTGCCATCACATCU1010chr7.fa RDD18C >HWI-EAS83_30UCEAAXX:1:2:1105:348GTATTATCTAACACACAAGATGATGTTTGTTTTTATNM000 >HWI-EAS83_30UCEAAXX:1:2:1048:857GAGTGTAGAAAATTTTCTGCCCTAAAATATTTGTTAU1010chr6.fa FDD13G >HWI-EAS83_30UCEAAXX:1:2:743:1729GTATCCTAAAGTGTATCTTATGTTTTTTCATCTTCTU1010chr12.fa RDD9C >HWI-EAS83_30UCEAAXX:1:2:1287:64AATAAAACAAATTCCAATGGCTTAGATTCTACTTAAU2001chr10.fa RDD15C20C >HWI-EAS83_30UCEAAXX:1:2:940:1059AAATGGTCATACTTCCCAAAGCGATCTACAGATTCAU10129chr3.fa RDD19C >HWI-EAS83_30UCEAAXX:1:2:898:1061ACATTTCCACATTTCTGTGGAAGCCTCACAATCATTR2002 >HWI-EAS83_30UCEAAXX:1:2:913:932ATTAATCAACAGCAACATTAATCAACTGAATCAACAU0100chr2.fa RDD >HWI-EAS83_30UCEAAXX:1:2:43:1647GAATAAATAATCAAAACATATAATACATTTTTTTATU1010chr5.fa FDD32G >HWI-EAS83_30UCEAAXX:1:2:1412:731ATATACACATATATATACATATATATATACACATATR >HWI-EAS83_30UCEAAXX:1:2:1389:1196GAGAAGGAAATGTGTTTTCTAAGTTTCTTTATCTTCU1010chr4.fa FDD32G >HWI-EAS83_30UCEAAXX:1:2:1264:1479GTGTAGGAAAGAAAAAAGGAGGTTGTGTAGAAAAGAU0100chr2.fa FDD >HWI-EAS83_30UCEAAXX:1:2:38:890TTTATTTAAATCTTTTAAAAANTTTTTTCCAACAAANM000 >HWI-EAS83_30UCEAAXX:1:2:1341:1065GATACATATACACAAAGTAAAACTATTCAGCCTCTAU0100chr17.fa FDD >HWI-EAS83_30UCEAAXX:1:2:1132:929GAGTTGTATTAATCTTAAATTGATAATTTACCATATU1010chr10.fa FDD24G >HWI-EAS83_30UCEAAXX:1:2:1758:275GCATTTTAACAAAATCACCATATCTGGGTAACCATTU1010chr21.fa RDD18C >HWI-EAS83_30UCEAAXX:1:2:914:1000GAAAGCACTTTATAATAAAACAACATTGGAGCACCTU1010chr8.fa FDD16G

Number of reads per Eland type U % U % U % R % R % R % NM % QC %

Peak detection Calculate read count at each position (bp) in genome Determine if read count is greater than expected

Peak detection We need to correct for input DNA reads (control) - non-uniformaly distributed (form peaks too) - vastly different numbers of reads between ChIP and input

Peak detection using ChIPseeqer

Read count genome Expected read count Expected read count = total number of reads * extended fragment length / chr length genome T A T T A A T T A T C C C C A T A T A T G A T A T

Is the observed read count at a given genomic position greater than expected ? x = observed read count λ = expected read count The Poisson distribution Read count Frequency

Is the observed read count at a given genomic position greater than expected ? x = 10 reads (observed) λ = 0.5 reads (expected) The Poisson distribution genome P(X>=10) = 1.7 x log10 P(X>=10) = log10 P(X>=10) = 9.77

Read count Expected read count -Log(p) Expected read count = total number of reads * extended frag len / chr len

Read count Expected read count Input reads -Log(p) Expected read count = total number of reads * extended frag len / chr len

Read count Expected read count -Log(P c ) Read count Expected read count -Log(P i ) Log(P c ) - Log(P i ) Threshold Genome positions (bp) INPUT ChIP

Normalized Peak score (at each bp) R = -log10 P(X input ) P(X ChIP ) Will detect peaks with high read counts in ChIP, low in Input Works when no input DNA !

Non-mappable fraction of the genome chr / (=12%) chr / chr / chr / chr / chr / chr / chr / chr / chr / chr / chr / chr / chrX / chr / chr / chr / chr / chrM4628/ chr / chr / chr / chr / chr / chrY / (=74%) We enumerated all 30-mers, counted # occurrences, calculated non-unique fraction of genome

Peak detection Determine all genomic regions with R>=15 Merge peaks separated by less than 100bp Output all peaks with length >= 100bp Process 23M reads in <7mins

ChIP reads Input reads Detected Peaks BCL6: 18,814 peaks 80% are within <20kb of a known gene

Where does each transcription factor bind in the genome, in each cell type, at a given time ? Near which genes ? What is the cis-regulatory code of each factor ? Does they require any co- factors ? DNA Activation Repression

Regulatory Sequence Discovery using FIRE

No … Random regions Discovering regulatory sequences associated with peak regions True TF binding peak? Yes … Target regions True TF peak Absent Present No Yes Motif correlation is quantified using the mutual information

Motif Search Algorithm k-mer MI CTCATCG TCATCGC AAAATTT GATGAGC AAAAATT ATGAGCT TTGCCAC TGCCACC ATCTCAT ACGCGCG CGACGCG TACGCTA ACCCCCT CCACGGC TTCAAAA AGACGCG CGAGAGC CTTATTA Not informative Highly informative... MI=0.081 MI=0.045 MI=0.040

No … Random regions Optimizing k-mers into more informative degenerate motifs ATCCGTACA ATCC[C/G]TACA which character increases the mutual information by the largest amount ? A/G T/G C/GA/C/G A/T/G C/G/T True TF binding peak? Yes … Target regions

Optimizing k-mers into more informative degenerate motifs ATCC[C/G]TACA A/C T/C C/GA/C/G A/T/C C/G/T No … Random regions True TF binding peak? Yes … Target regions

change Motif Conservation with S. bayanus Similarity to ChIP-chip RAP1 motif Mutual information

k-mer MI CTCATCG TCATCGC AAAATTT GCTCATC AAAAATT ATGAGCT TTGCCAC TGCCACC ATCTCAT Highly informative k- mers Only optimize k-mer if I(k-mer;expression | motif) is large enough (for all motifs optimized so far) MI=0.081 MI=0.045 Motifs optimized so far optimize ? Conditional mutual information I(X;Y|Z)

Enrichment Depletion Motif co-occurrence anallysis Discovered Motifs FIRE automatically compares discovered motifs to known motifs in TRANSFAC and JASPAR

ChIPseeqer: an integrated framework for ChIP-seq data analysis ChIPseeqer (peak detection) ChIPseeqer2Track (for Genome Browser) ChIPseeqer2FIRE (+ motif analysis) ChIPseeqer2iPAGE (+ pathway analysis) ChIPseeqer2cons (conservation analysis)

Installing and setting up programs Install ChIPseeqer and FIRE: Execute following commands: export FIREDIR=/Applications/FIRE-1.1 export PATH=$PATH:$FIREDIR export CHIPSEEQERDIR=/Applications/ChIPseeqer-1.0 export PATH=$PATH:$CHIPSEEQERDIR:$CHIPSEEQERDIR/SCRIPTS chmod +x $CHIPSEEQERDIR/ChIP* chmod +x $CHIPSEEQERDIR/SCRIPTS/*.pl

Peak Detection - Input file: CTCF.bed cd ~/Desktop/elemento Or download from: seq/ U0 reads in BED format (check by typing wc –l CTCF.bed) (view by typing more CTCF.bed and q to exit) - No input DNA for this experiment

Peak Detection Step 1: Split big read file into one file per chromosome split_bed_or_mit_files.pl CTCF.bed Expected output: Opening CTCF.bed Current directory =. Creating./reads.chr1 …

Peak Detection Step 2. Detect peaks ChIPseeqer --chipdir=. --t=15 --fraglen=250 --format=bed -outfile=CTCF_peaks_t15.txt Expected output: Processing reads in chrY... done. Processing reads in chrX... done. Processing reads in chr9... done. Processing reads in chr8... done. Step 3. Count how many peaks were found wc -l CTCF_peaks_t15.txt

Making a Genome Browser track Command lines: cd JuliaChild wc –l CTCF_peaks_t15.txt ChIPseeqer2track --targets=CTCF_peaks_t15.txt --trackname=“CTCF peaks” Expected output: CTCF_peaks_t15.txt.wgl.gz created. To check that the file was created: ls

Making a Genome Browser track

Making FIRE input files Command line (type instructions below as one single line): ChIPseeqer2FIRE --targets=CTCF_peaks_t15.txt –genome=wg.fa --suffix=CTCF_peaks_t15_FIRE wg.fa is also available from: (decompress with gunzip wg.fa.gz) Expected output: Extracting sequences... Done. Extracting randomly selected sequences... Done. CTCF_peaks_t15_FIRE.txt and CTCF_peaks_t15_FIRE.seq have been generated. …

FIRE analysis Command line (type instructions below as one single line): fire.pl --expfile=CTCF_peaks_t15_FIRE.txt --fastafile_dna=CTCF_peaks_t15_FIRE.seq --nodups=1 --minr=2 --species=human --dorna=0 --dodnarna=0 Expected output: Extracting sequences... Done. Extracting randomly selected sequences... Done. CTCF_peaks_t15_FIRE.txt and CTCF_peaks_t15_FIRE.seq have been generated. …

FIRE main output file Peak sequences Randomly selected sequences open CTCF_peaks_t15_FIRE.txt_FIRE/DNA/CTCF_peaks_t15_FIRE.txt.summary.pdf