Presentation is loading. Please wait.

Presentation is loading. Please wait.

MRNA protein DNA Activation Repression Translation Localization Stability Pol II 3’UTR Transcriptional and post-transcriptional regulation of gene expression.

Similar presentations


Presentation on theme: "MRNA protein DNA Activation Repression Translation Localization Stability Pol II 3’UTR Transcriptional and post-transcriptional regulation of gene expression."— Presentation transcript:

1 mRNA protein DNA Activation Repression Translation Localization Stability Pol II 3’UTR Transcriptional and post-transcriptional regulation of gene expression

2 Where does each transcription factor bind in the genome, in each cell type, at a given time ? Near which genes ? What is the cis-regulatory code of each factor ? Does they require any co- factors ? DNA Activation Repression

3 ChIP-seq Genome Analyzer II (Solexa) Transcription factor of interest Antibody

4 Control: input DNA Genome Analyzer II (Solexa)

5 ACCAATAACCGAGGCTCATGCTAAGGCGTTAGCCACAGATGGAAGTCCGACGGCTTGATCCAGAATGGTGTGTGGATTGCCTTGGAACTGA TTAGTGAATTC TGGTTATTGGCTCCGAGTACGATTCCGCAATCGGTGTCTACCTTCAGGCTGCCGAACTAGGTCTTACCACACACCTAACGGAACCTTGACTA ATCACTTAAG Average length ~ 250bp

6 ACCAATAACCGAGGCTCATGCTAAGGCGTTAGCCACAGATGGAAGTCCGACGGCTTGATCCAGAATGGTGTGTGGATTGCCTTGGAACTGA TTAGTGAATTC TGGTTATTGGCTCCGAGTACGATTCCGCAATCGGTGTCTACCTTCAGGCTGCCGAACTAGGTCTTACCACACACCTAACGGAACCTTGACTA ATCACTTAAG Average length ~ 250bp 25-40bp

7 ACCAATAACCGAGGCTCATGCTAAGGCGTTAGCCACAGATGGAAGTCCGACGGCTTGATCCAGAATGGTGTGTGGATTGCCTTGGAACTGA TTAGTGAATTC TGGTTATTGGCTCCGAGTACGATTCCGCAATCGGTGTCTACCTTCAGGCTGCCGAACTAGGTCTTACCACACACCTAACGGAACCTTGACTA ATCACTTAAG Average length ~ 250bp 25-40bp

8 BCL6 ChIP-seq Lymphoma cell line (OCI-Ly1) Solexa/Illumina 6 lanes for ChIP, 1 for input DNA, 1 for QC 36nt long sequences 32 Million reads Aligned/mapped to hg18 with Eland Melnick lab at WCMC

9 AAAAATTCTCCCAAAACAAAAAAATACGCGTATTCTCCCAAAACAATATCTTACAAGATGTAAATATACCCAAGAT G Reference Human Genome (hg18) AAAATACGCGTATTCTCCCAAAACAATATC Solexa Read Read mapping with Eland

10 AAAAATTCTCCCAAAACAAAAAAATACGCGTATTCTCCCAAAACAATATCTTACAAGATGTAAATATACCCAAGAT G Reference Human Genome (hg18) AAAATACGCCTATTCTCCCAAAACAATATC Solexa Read Read mapping with Eland

11 AAAAATTCTCCCAAAACAAAAAAATACGCGTATTCTCCCAAAACAATATCTTACAAGATGTAAATATACCCAAGAT G Reference Human Genome (hg18) AAAATACGCCTATTCTCCCATAACAATATC Solexa Read Read mapping with Eland

12 Reads can map to multiple locations/chromosomes Solexa Read 1 Solexa Read 2 Reference Human Genome (hg18)

13 Reads map to one strand or the other Solexa Read 1 Solexa Read 2 hg18

14 >HWI-EAS83_30UCEAAXX:1:2:915:1011AGGTCACAAAACAAGTCCTAACAAATTTAAGAGTATU011362chr8.fa59699745RDD >HWI-EAS83_30UCEAAXX:1:2:826:1245GTCAGAAAAATCCTTTTTATTATATAAACAATACATU2001chr5.fa121195098FDD15G20G >HWI-EAS83_30UCEAAXX:1:2:900:945GTCATCAAACTCCAAGGATTCTGTTTTCAACATACTU0110chr18.fa8914049RDD >HWI-EAS83_30UCEAAXX:1:2:1037:1118GAAAGTGATTAGCAGATTGTCATTTAATAATTGTCTU2001chr1.fa97496963FDD18G28G >HWI-EAS83_30UCEAAXX:1:2:898:874GATAAATTTTTTCCTACAATCTTAAATTATTACACAU1010chr3.fa95643444RDD10C >HWI-EAS83_30UCEAAXX:1:2:918:928AAAAATTAAACAATTCTAAAAATATTTTTATCTTAAU2001chr2.fa177727639RDD18C31G >HWI-EAS83_30UCEAAXX:1:2:1324:4GCACATGTCATACTCTTTCTAGCTCTCTTATTTTTCU0100chr8.fa79132719RDD >HWI-EAS83_30UCEAAXX:1:2:899:1015AAATTAATGTAAAAAATAGGATACTGAATTGTGATAU1010chr10.fa69774166FDD30G >HWI-EAS83_30UCEAAXX:1:2:909:926GTAGTTAACAATAATTTATTTTATACTTCAAAATTCU10117chrX.fa26496842RDD7A >HWI-EAS83_30UCEAAXX:1:2:701:1702GTCAGAATTAATTAATCAAAACACCAAATGTACTTCU0100chr12.fa72700465FDD >HWI-EAS83_30UCEAAXX:1:2:996:1003ATTTTGACTTTATTATTTTTTCTTCAATGTTTTTAANM000 >HWI-EAS83_30UCEAAXX:1:2:884:1090GAAAGTACATCAAATACATATTATATACTTTACATAR2002 >HWI-EAS83_30UCEAAXX:1:2:911:937AATCCATATACATTTCTTTTTAATCATTTCCTCTTTU1010chr11.fa94204222FDD20G >HWI-EAS83_30UCEAAXX:1:2:1517:330GTGAGTTTCTTAATCCTGAGTTCTAATTTTATTTCAR029255255 >HWI-EAS83_30UCEAAXX:1:2:904:1031ACATTTTATAAATTTTTAATTTCATTTTAATTTATANM000 >HWI-EAS83_30UCEAAXX:1:2:1291:1469GTTTTTAAAATCAACACTTTTATTATAGAAGTAGCAU0101chr12.fa62166701RDD >HWI-EAS83_30UCEAAXX:1:2:1697:828GTACTGATGTAAACTTGGTAAAAACATTGACATAAAU0100chr14.fa65160857FDD >HWI-EAS83_30UCEAAXX:1:2:1415:583GAAGAAAATGACTATGTCAAAATATTATCTCTCAATU0100chr5.fa97782464FDD >HWI-EAS83_30UCEAAXX:1:2:1561:1653GTTTTACTGATTTTCTTACTTACTAAACTACCTGTTU0100chr7.fa133200265FDD >HWI-EAS83_30UCEAAXX:1:2:1579:943AATGATACGGCGACCACCGACAGGTTCAGAGTTCTANM000 >HWI-EAS83_30UCEAAXX:1:2:1705:268GAGAATTATTCAGAAGTCAAATCTGTGCTTAGTTTAU2001chr5.fa162472124RDD3G7C >HWI-EAS83_30UCEAAXX:1:2:1489:318GTATGTATCATATATATTTATGTATCATATATATTTR1032 >HWI-EAS83_30UCEAAXX:1:2:1003:1113GATTGCTCCATTATTTGTTAAAAACATAGTAAAATANM000 >HWI-EAS83_30UCEAAXX:1:2:895:1072ATGAGATCAGTACTTCAAAGAGATATCTGCACTCCCU0119chr12.fa33830898RDD >HWI-EAS83_30UCEAAXX:1:2:853:1178GTTAGTCCCAATATTCCATTAATCCCAATAAATATAU2001chr6.fa110722427FDD15G19G >HWI-EAS83_30UCEAAXX:1:2:1432:972GAGATAATAATAGCAGTTATGGCATCGAGATAATTTU0100chr2.fa47305609RDD >HWI-EAS83_30UCEAAXX:1:2:1718:341GTAGAGGGCACACATCACAAACAAGTTTCTGAGAATR2003 >HWI-EAS83_30UCEAAXX:1:2:1171:302GAATATCCACTTGCAGACTTTACAAACAAATTTTTTR2004 >HWI-EAS83_30UCEAAXX:1:2:1055:1126GGCAGATGAAACTTCTATACACTATATTTTAGCCAGU0100chr13.fa90021137FDD >HWI-EAS83_30UCEAAXX:1:2:971:1371GAAAGAAAAACTATTGAAAAAATAGTTACTTTCCAAU0100chr1.fa74303257RDD >HWI-EAS83_30UCEAAXX:1:2:1774:614GTGTAGATGATATCGAGGGCATTAGAAGTAAATAGCU0100chr5.fa16031200FDD >HWI-EAS83_30UCEAAXX:1:2:1207:808GAGAGGAAATAATAAAGATAAAAGTAGAAAAAGTGAU0100chr1.fa187326417FDD >HWI-EAS83_30UCEAAXX:1:2:1680:815GATAATTATGTTGTTGTAATTATTGTTTGTTTTTTTU0100chr15.fa46739015RDD >HWI-EAS83_30UCEAAXX:1:2:1688:260GTTGACAATCCAGCTGTCATAGAAACTGACTATTTTU0100chr12.fa38910133RDD >HWI-EAS83_30UCEAAXX:1:2:1051:916AAAAATTCTCCCAAAACAACAAGATGTAAATATACCU0100chr3.fa101625712RDD >HWI-EAS83_30UCEAAXX:1:2:1771:308GTTCTTACACTGATATGAAGAAATACCTGAGACTGGU01267chr2.fa214128537RDD >HWI-EAS83_30UCEAAXX:1:2:911:917GAGAAACACACATATTTTTGTAAGTGCCATCACATCU1010chr7.fa13668652RDD18C >HWI-EAS83_30UCEAAXX:1:2:1105:348GTATTATCTAACACACAAGATGATGTTTGTTTTTATNM000 >HWI-EAS83_30UCEAAXX:1:2:1048:857GAGTGTAGAAAATTTTCTGCCCTAAAATATTTGTTAU1010chr6.fa74625385FDD13G >HWI-EAS83_30UCEAAXX:1:2:743:1729GTATCCTAAAGTGTATCTTATGTTTTTTCATCTTCTU1010chr12.fa7400023RDD9C >HWI-EAS83_30UCEAAXX:1:2:1287:64AATAAAACAAATTCCAATGGCTTAGATTCTACTTAAU2001chr10.fa98020799RDD15C20C >HWI-EAS83_30UCEAAXX:1:2:940:1059AAATGGTCATACTTCCCAAAGCGATCTACAGATTCAU10129chr3.fa50834510RDD19C >HWI-EAS83_30UCEAAXX:1:2:898:1061ACATTTCCACATTTCTGTGGAAGCCTCACAATCATTR2002 >HWI-EAS83_30UCEAAXX:1:2:913:932ATTAATCAACAGCAACATTAATCAACTGAATCAACAU0100chr2.fa46078825RDD >HWI-EAS83_30UCEAAXX:1:2:43:1647GAATAAATAATCAAAACATATAATACATTTTTTTATU1010chr5.fa41496935FDD32G >HWI-EAS83_30UCEAAXX:1:2:1412:731ATATACACATATATATACATATATATATACACATATR047255255 >HWI-EAS83_30UCEAAXX:1:2:1389:1196GAGAAGGAAATGTGTTTTCTAAGTTTCTTTATCTTCU1010chr4.fa188020201FDD32G >HWI-EAS83_30UCEAAXX:1:2:1264:1479GTGTAGGAAAGAAAAAAGGAGGTTGTGTAGAAAAGAU0100chr2.fa192227804FDD >HWI-EAS83_30UCEAAXX:1:2:38:890TTTATTTAAATCTTTTAAAAANTTTTTTCCAACAAANM000 >HWI-EAS83_30UCEAAXX:1:2:1341:1065GATACATATACACAAAGTAAAACTATTCAGCCTCTAU0100chr17.fa51416321FDD >HWI-EAS83_30UCEAAXX:1:2:1132:929GAGTTGTATTAATCTTAAATTGATAATTTACCATATU1010chr10.fa2376138FDD24G >HWI-EAS83_30UCEAAXX:1:2:1758:275GCATTTTAACAAAATCACCATATCTGGGTAACCATTU1010chr21.fa27648337RDD18C >HWI-EAS83_30UCEAAXX:1:2:914:1000GAAAGCACTTTATAATAAAACAACATTGGAGCACCTU1010chr8.fa67496303FDD16G

15 Number of reads per Eland type U0 2101970265% U1 328005910% U2 1007173 3% R0 366105411% R1 815275 2% R2 406002 1% NM 2050499 6% QC 306352 1%

16

17

18

19 Peak detection Calculate read count at each position (bp) in genome Determine if read count is greater than expected

20 Peak detection We need to correct for input DNA reads (control) - non-uniformaly distributed (form peaks too) - vastly different numbers of reads between ChIP and input

21 Peak detection using ChIPseeqer

22 Read count genome Expected read count Expected read count = total number of reads * extended fragment length / chr length genome T A T T A A T T A T C C C C A T A T A T G A T A T

23 Is the observed read count at a given genomic position greater than expected ? x = observed read count λ = expected read count The Poisson distribution Read count Frequency

24 Is the observed read count at a given genomic position greater than expected ? x = 10 reads (observed) λ = 0.5 reads (expected) The Poisson distribution genome P(X>=10) = 1.7 x 10 -10 log10 P(X>=10) = -9.77 -log10 P(X>=10) = 9.77

25 Read count Expected read count -Log(p) Expected read count = total number of reads * extended frag len / chr len

26 Read count Expected read count Input reads -Log(p) Expected read count = total number of reads * extended frag len / chr len

27 Read count Expected read count -Log(P c ) Read count Expected read count -Log(P i ) Log(P c ) - Log(P i ) Threshold Genome positions (bp) INPUT ChIP

28 Normalized Peak score (at each bp) R = -log10 P(X input ) P(X ChIP ) Will detect peaks with high read counts in ChIP, low in Input Works when no input DNA !

29 Non-mappable fraction of the genome chr189369067/761171530.123087459668913 (=12%) chr233849240/2429511490.139325292921335 chr327854877/1995018270.139622164963933 chr427090014/1912730630.141630052737745 chr624330283/1708999920.142365618132972 chr820932821/1462748260.143106107677065 chr526029902/1808578660.143924633059643 chr1219382853/1323495340.14645199279659 chr1120039443/1344523840.149044906485258 chr2010017788/624359640.160449000194824 chr726182588/1588214240.164855517225434 chr1022968951/1353747370.169669404417753 chr1714496284/787747420.184021980040252 chrX31269270/1549137540.201849540099583 chr155186693/2472497190.223202247602959 chr1328668063/1141429800.251159230291692 chr1623552340/888272540.265147676410215 chr1429689825/1063685850.279122120502026 chrM4628/165710.279283084907368 chr943125838/1402732520.307441635415995 chr1920251255/638116510.317359834491667 chr1531877970/1003389150.317702957023205 chr2116867677/469443230.359312392256674 chr2221176578/496914320.426161556382597 chrY43209644/577729540.747921665906161 (=74%) We enumerated all 30-mers, counted # occurrences, calculated non-unique fraction of genome

30 Peak detection Determine all genomic regions with R>=15 Merge peaks separated by less than 100bp Output all peaks with length >= 100bp Process 23M reads in <7mins

31 ChIP reads Input reads Detected Peaks BCL6: 18,814 peaks 80% are within <20kb of a known gene

32 Where does each transcription factor bind in the genome, in each cell type, at a given time ? Near which genes ? What is the cis-regulatory code of each factor ? Does they require any co- factors ? DNA Activation Repression

33 Regulatory Sequence Discovery using FIRE

34 No … Random regions Discovering regulatory sequences associated with peak regions True TF binding peak? Yes … Target regions 0.400.100.33 0.100.400.00 True TF peak Absent Present No Yes Motif correlation is quantified using the mutual information

35 Motif Search Algorithm k-mer MI CTCATCG 0.0618 TCATCGC 0.0485 AAAATTT 0.0438 GATGAGC 0.0434 AAAAATT 0.0383 ATGAGCT 0.0334 TTGCCAC 0.0322 TGCCACC 0.0298 ATCTCAT 0.0265... ACGCGCG 0.0018 CGACGCG 0.0012 TACGCTA 0.0011 ACCCCCT 0.0010 CCACGGC 0.0009 TTCAAAA 0.0005 AGACGCG 0.0004 CGAGAGC 0.0003 CTTATTA 0.0002 Not informative Highly informative... MI=0.081 MI=0.045 MI=0.040

36 No … Random regions Optimizing k-mers into more informative degenerate motifs ATCCGTACA ATCC[C/G]TACA which character increases the mutual information by the largest amount ? A/G T/G C/GA/C/G A/T/G C/G/T True TF binding peak? Yes … Target regions

37 Optimizing k-mers into more informative degenerate motifs ATCC[C/G]TACA A/C T/C C/GA/C/G A/T/C C/G/T...... No … Random regions True TF binding peak? Yes … Target regions

38 change Motif Conservation with S. bayanus Similarity to ChIP-chip RAP1 motif Mutual information

39 k-mer MI CTCATCG 0.0618 TCATCGC 0.0485 AAAATTT 0.0438 GCTCATC 0.0434 AAAAATT 0.0383 ATGAGCT 0.0334 TTGCCAC 0.0322 TGCCACC 0.0298 ATCTCAT 0.0265... Highly informative k- mers Only optimize k-mer if I(k-mer;expression | motif) is large enough (for all motifs optimized so far) MI=0.081 MI=0.045 Motifs optimized so far optimize ? Conditional mutual information I(X;Y|Z)

40 Enrichment Depletion Motif co-occurrence anallysis Discovered Motifs FIRE automatically compares discovered motifs to known motifs in TRANSFAC and JASPAR

41 ChIPseeqer: an integrated framework for ChIP-seq data analysis ChIPseeqer (peak detection) ChIPseeqer2Track (for Genome Browser) ChIPseeqer2FIRE (+ motif analysis) ChIPseeqer2iPAGE (+ pathway analysis) ChIPseeqer2cons (conservation analysis)

42 Installing and setting up programs Install ChIPseeqer and FIRE: http://physiology.med.cornell.edu/faculty/elemento/lab/chipseq.shtml http://tavazoielab.princeton.edu/FIRE/ Execute following commands: export FIREDIR=/Applications/FIRE-1.1 export PATH=$PATH:$FIREDIR export CHIPSEEQERDIR=/Applications/ChIPseeqer-1.0 export PATH=$PATH:$CHIPSEEQERDIR:$CHIPSEEQERDIR/SCRIPTS chmod +x $CHIPSEEQERDIR/ChIP* chmod +x $CHIPSEEQERDIR/SCRIPTS/*.pl

43 Peak Detection - Input file: CTCF.bed cd ~/Desktop/elemento Or download from: http://physiology.med.cornell.edu/faculty/elemento/lab/files/chip seq/ - 2947043 U0 reads in BED format (check by typing wc –l CTCF.bed) (view by typing more CTCF.bed and q to exit) - No input DNA for this experiment

44 Peak Detection Step 1: Split big read file into one file per chromosome split_bed_or_mit_files.pl CTCF.bed Expected output: Opening CTCF.bed Current directory =. Creating./reads.chr1 …

45 Peak Detection Step 2. Detect peaks ChIPseeqer --chipdir=. --t=15 --fraglen=250 --format=bed -outfile=CTCF_peaks_t15.txt Expected output: Processing reads in chrY... done. Processing reads in chrX... done. Processing reads in chr9... done. Processing reads in chr8... done. Step 3. Count how many peaks were found wc -l CTCF_peaks_t15.txt

46 Making a Genome Browser track Command lines: cd JuliaChild wc –l CTCF_peaks_t15.txt ChIPseeqer2track --targets=CTCF_peaks_t15.txt --trackname=“CTCF peaks” Expected output: CTCF_peaks_t15.txt.wgl.gz created. To check that the file was created: ls

47 Making a Genome Browser track http://genome.ucsc.edu/cgi-bin/hgGateway

48 Making FIRE input files Command line (type instructions below as one single line): ChIPseeqer2FIRE --targets=CTCF_peaks_t15.txt –genome=wg.fa --suffix=CTCF_peaks_t15_FIRE wg.fa is also available from: http://physiology.med.cornell.edu/faculty/elemento/lab/files/chipseq/ (decompress with gunzip wg.fa.gz) Expected output: Extracting sequences... Done. Extracting randomly selected sequences... Done. CTCF_peaks_t15_FIRE.txt and CTCF_peaks_t15_FIRE.seq have been generated. …

49 FIRE analysis Command line (type instructions below as one single line): fire.pl --expfile=CTCF_peaks_t15_FIRE.txt --fastafile_dna=CTCF_peaks_t15_FIRE.seq --nodups=1 --minr=2 --species=human --dorna=0 --dodnarna=0 Expected output: Extracting sequences... Done. Extracting randomly selected sequences... Done. CTCF_peaks_t15_FIRE.txt and CTCF_peaks_t15_FIRE.seq have been generated. …

50 FIRE main output file Peak sequences Randomly selected sequences open CTCF_peaks_t15_FIRE.txt_FIRE/DNA/CTCF_peaks_t15_FIRE.txt.summary.pdf

51


Download ppt "MRNA protein DNA Activation Repression Translation Localization Stability Pol II 3’UTR Transcriptional and post-transcriptional regulation of gene expression."

Similar presentations


Ads by Google