S1 Supporting information Bioinformatic workflow and quality of the metrics Number of slides: 10
RNA-seq: BIOINFORMATIC PIPELINE Cluster 3.0 / JavaTreeView (v1.1.6r4) Construction of functional clusters Pick Random (v1.0.0*) Standardization of read number to analyse Cufflinks (v2.1.1) Exon-intron structures Cuffmerge (v.1.0.0) Merger of the 21 exon-intron structures Htseq (v0.6.1p1) Quantification of read abundance RNA-seq: BIOINFORMATIC PIPELINE http://galaxy.univ-perp.fr cDNA Library construction PolyA Single strand 50 nucleotides Sequencing HiSeq2000 (Illumina) SBS technique FastQ groomer (v1.0.4*) Fastqsanger format check FASTX-Toolkit (v1.0.0*) (1) Quality statistics (2) Quality score boxplot (3) Nucleotide distribution chart Tophat (v2.0.9) with the aligner Bowtie (v2.1.0.0) Mapping on S. mansoni genome version 5.2 BAM File SAM BAM-to-SAM (v0.1.18) Filter SAM (v1.0.0*) Deletion of unmapped reads New S.mansoni transcriptome Used as reference Sex-biased genes per developmental stages QUALITY CHECK MAPPED READS SAMPLING DE NOVO TRANSCRIPTOMEASSEMBLY IDENTIFICATION OF SEX-BIASED GENES FUNCTIONAL ANALYSES Blast2GO (v2.6.4) Male and female specific biological pathways through S. mansoni lifecycle Blastx (v2.2.30)/ AmiGO (v1.8) / GeneDB De novo annotation of the 100 best sex-biased genes per stage STRUCTURAL ANALYSES IGV (v2.3.16) Blat (v34) Cuffcompare (v2.2.1) Exon/Intron structure genome v5.2 vs de novo transcriptome DEseq (v1.12.1 ) Assessment of differences in gene expression * Galaxy tool version S3 – 2/10 Cluster Analysis candidate genes expression variation through S. mansoni lifecycle
RNA-seq: QUALITY OF THE METRICS
RNA-seq: REPLICATE CLUSTERING ♀ 1 ♀ 2 ♂ 1 ♂ 2 ♀ 1 ♀ 2 ♂ 1 ♂ 2 ♀ 1 ♀ 2 ♂ 1 ♂ 2 ♀ 1 ♀ 1 ♀ 1 ♀ 2 ♀ 2 ♀ 2 ♂ 1 ♂ 1 ♂ 1 ♂ 2 ♂ 2 ♂ 2 Cercariae Schistosomula s#2 Adult worms Schistosomula s#1 ♂ 1 ♀ 1 ♂ 2 ♀ 2 Schistosomula s#3 ♂ 1 ♀ 1 ♂ 2 ♀ 2 100% 0% identity ♀1 & ♀2 female duplicates ♂1 & ♂2 male duplicates DESeq package (v1.12.1) S3 – 4/10
RNA-seq: HEATMAPS (100 best P-values per stage) ♂ 1 ♀ 1 ♂ 2 ♀ 2 Cercariae ♂ 1 ♂ 2 ♀ 1 ♀ 2 Schistosomula s#1 ♂ 1 ♀ 1 ♂ 2 ♀ 2 Schistosomula s#2 ♂ 1 ♀ 1 ♂ 2 ♀ 2 Schistosomula s#3 ♂ 1 ♀ 1 ♂ 2 ♀ 2 Adult worms DESeq package (v1.12.1) S3 – 5/10
Proportion of categories Quality analysis of the de novo transcriptome Description of the type of matches between the Cufflinks transcripts (XLOC) and the reference transcripts (Smp_ID v5.2) Number of XLOC Proportion of XLOC Match categories Proportion of categories Complete match of intron chain 6642 19,11% Smp overlap 27,86% Contained 113 0,33% Potentially novel isoform (fragment): at least one splice junction is shared with a reference transcript 2417 6,95% Single exon transfrag overlapping a reference exon and at least 10 bp of a reference intron, indicating a possible pre-mRNA fragment. 272 0,78% Generic exonic overlap with a reference transcript 239 0,69% A transfrag falling entirely within a reference intron 6613 19,03% Intronic transcript Possible polymerase run-on fragment (within 2Kbases of a reference transcript) 1466 4,22% Others 7,72% Exonic overlap with reference on the opposite strand 1193 3,43% An intron of the transfrag overlaps a reference intron on the opposite strand (likely due to read mapping errors) 24 0,07% Unknown, intergenic transcript 15776 45,39% Intergenic Repeat. Currently determined by looking at the soft-masked reference sequence and applied to transcripts where at least 50% of the bases are lower case 0,00% - (.tracking file only, indicates multiple classifications) Cuffcompare (Cufflinks v2.2.1) *Source of the S. mansoni genome reference: ftp://ftp.sanger.ac.uk/pub/pathogens/Schistosoma/mansoni/Latest_assembly_annotation_others/add_utrs.gff S3 – 6/10
ChIP-seq: QUALITY OF THE METRICS Male adults Female adults Male cercariae Female cercariae Unbound_1 Unbound_2 H3K27Me3_1 H3K27Me3_2 Raw data 33 119 430 29 687 218 39 508 805 28 879 342 39 423 831 25 378 692 36 815 594 45 320 761 44 606 481 44 157 403 34 670 844 46 214 556 38 753 631 40 949 706 51 819 705 44 155 986 Groomed QC passed 28879342 45320761 46214556 Aligned (=1) 17 685 004 15 049 346 22 241 047 15 355 606 18 556 375 12 576 032 17 723 235 23 034 830 24 152 929 22 002 385 20 714 025 26 201 405 16 732 388 19 657 069 25 418 099 23 382 122 Aligned (=1) % 53,40% 50,69% 56,29% 53,17% 47,07% 49,55% 48,14% 50,83% 54,15% 49,83% 59,74% 56,70% 43,18% 48,00% 49,05% 52,95% Aligned (>1) 14 018 376 13 624 187 15 493 271 12 343 496 18 924 945 11 676 103 16 812 583 19 600 397 18 935 354 20 516 084 12 542 556 18 287 941 20 258 268 19 593 421 24 058 455 18 592 334 Aligned (>1) % 42,33% 45,89% 39,21% 42,74% 46,01% 45,67% 43,25% 42,45% 46,46% 36,18% 39,57% 52,27% 47,85% 46,43% 42,11% Total mapping % 95,72% 96,59% 95,51% 95,91% 95,07% 95,56% 93,81% 94,07% 96,60% 96,29% 95,92% 96,27% 95,45% 95,85% 95,48% 95,06% Used for peak calling 15 000 000 15000000 12576032 Number of peaks N.A. 8363 6947 14 302 5 697 7 116 11 382 5 044 4 719 S3 – 7/10
ChIP-seq: REPLICATE CONSISTENCY OF EPICHIP ANALYSIS TSS H3K27Me3 enrichment Male cercariae, replicate 1 Male cercariae, replicate 2 Female cercariae, replicate 1 Female cercariae, replicate 2 Female cercariae, replicate 3 Position on the gene (bases) TSS H3K27Me3 enrichment Position on the gene (bases) Male adults, replicate 1 Male adults, replicate 2 Female adults, replicate 1 Female adults, replicate 2 Female adults, replicate 3 S3 – 8/10 TSS = Transcription Start Site
ChIP-seq: Male and Female H3K27Me3 enrichments, depending on the developmental stages. A: Males Cercariae Adults -1000 0 = TSS +5000 0.5 0.6 0.7 0.8 0.9 1 B: Females Cercariae Adults -1000 0 = TSS +5000 0.5 0.6 0.7 0.8 0.9 1 Position on the gene (bases) S3 – 9/10 TSS = Transcription Start Site
Statistical test of EpiChIP profile differences Comparison between sexes Extreme Differences Z value p Cercariae 76,50% 41,91 <0,001 Adults 10,70% 5,862 Comparison between stages Males 56,30% 30,811 Femelles 25,60% 13,997 All pairs of comparison were significant (Kolmogorov-Smirnov two sample tests; p<0.001). The extreme differences given by the Kolmogorov-Smirnov two sample tests show that: (i) The difference between adult male and adult female distributions is low (10.7% of maximum difference) compare to cercarial stage (76.5% of maximum difference). (ii) The difference in chromatin structural changes from cercariae to adult is twice in males compare to females (25.6% of maximum difference for females vs 56.3% of maximum difference for males). S3 – 10/10