Long way to solve short ncRNA data analysis problems – evaluation of small RNA-Seq datasets from non-model organisms in Galaxy Jochen Bick Jochen Bick 21.09.2018
Sus scrofa embryo samples 36 pig embryos collected at day 10 of pregnancy, sows fed with different doses of estradiol (n=6 per group, 6 groups in total) Illumina TruSeq Small RNA libraries Sequenced on Illumina HiSeq 4000 at FGCZ 126 bp single-end reads # of embryos sex E2 (µg) 6 ♀ 10 1000 ♂ exogenous substances affecting: endogenous hormonal systems reproduction health Jochen Bick 21.09.2018
Workflow overview in Galaxy Filter and sequence mapping Count number of sequences Quality control with FastQC trimming, adapter clipping
Filtering Clip adapter Trimmomatic: FastQC Universal Illumina Adapter Min length of 16 bp Only keep sequences with clipped adapter Trimmomatic: MINLEN: 16 LEADING: 3 No quality trimming FastQC After each step for quality checks Warum macht man nochmal die ersten 3 weg? ((Jochen Bick)) 21.09.2018
How to generate the count table Warum macht man nochmal die ersten 3 weg? ((Jochen Bick)) 21.09.2018
More converting steps Collapse FASTA-TO-Tabular Convert + Cut >1-234764 CGCGACCTCAGATCAGA >2-44042 CGCGACCTCAGATCAGAC >3-31455 ACGCGACCTCAGATCAGA >4-29208 ACTCAAACTGTGGGGGCACTTT >5-27371 TAGCTTATCAGACTGATGTTGAC >6-26520 TAAGTGCTTCCATGTTTTAGTAG >7-22345 ACGCGACCTCAGATCAGACG 1 2 1-234764 CGCGACCTCAGATCAGA 2-44042 CGCGACCTCAGATCAGAC 3-31455 ACGCGACCTCAGATCAGA 4-29208 ACTCAAACTGTGGGGGCACTTT 5-27371 TAGCTTATCAGACTGATGTTGAC 6-26520 TAAGTGCTTCCATGTTTTAGTAG 7-22345 ACGCGACCTCAGATCAGACG Convert + Cut 1 2 234764 CGCGACCTCAGATCAGA 44042 CGCGACCTCAGATCAGAC 31455 ACGCGACCTCAGATCAGA 29208 ACTCAAACTGTGGGGGCACTTT 27371 TAGCTTATCAGACTGATGTTGAC 26520 TAAGTGCTTCCATGTTTTAGTAG 22345 ACGCGACCTCAGATCAGACG ((Jochen Bick)) 21.09.2018
Jochen Bick 21.09.2018
Join datasets by identifier column Library 1 Library 2 Library 3 sequence counts Sequence_id1 32 Sequence_id3 6 Sequence_id4 1 Sequence_id5 7 sequence counts Sequence_id1 8 Sequence_id4 5 sequence counts Sequence_id1 9 Sequence_id2 3 Sequence_id3 2 Results in joined table sequence counts lib1 counts lib2 counts lib3 Sequence_id1 32 8 9 Sequence_id2 3 Sequence_id3 6 2 Sequence_id4 1 5 Sequence_id5 7 ((Jochen Bick)) 21.09.2018
Join datasets by identifier column results ((Jochen Bick)) 21.09.2018
Counttable statistics Jochen Bick 21.09.2018
Library size differences I got no good results using Velvet and oases so I first focused on trinity. ((Jochen Bick)) 21.09.2018
Library size differences I got no good results using Velvet and oases so I first focused on trinity. ((Jochen Bick)) 21.09.2018
Library size differences I got no good results using Velvet and oases so I first focused on trinity. ((Jochen Bick)) 21.09.2018
Filter count table by CPM cutoff Jochen Bick 21.09.2018
Filtering and mapping mature miRNA precursor miRNA tRNAs - Scc piRNA cluster - Ssc all transcripts - Ssc ncRNAs - Hsa Jochen Bick 21.09.2018
Annotation sources used for BLAST Mature and precursor miRNA Translate RNA to DNA, join FASTA files into groups, remove duplicates, create BLAST databases Non-coding RNA, (tRNAs, small RNAs) Non-coding RNA (tRNA, small RNAs), piRNA cluster mRNAs Precursor mircoRNAs with and hairpin/stemloop structure produce one or two mature mircoRNAs ((Jochen Bick)) 21.09.2018
Analysis of sequence counts Huge variation in percentage of read counts for miRNAs Originating from RNA isolation procedure? % miRNA # sequences ((Jochen Bick)) 21.09.2018
Analysis of sequence counts # sequences ((Jochen Bick)) 21.09.2018
Sequence statistics Unique sequences 97.2% Raw reads ~24,000,000 1.8% Filtering ~68,000 1.1% Annotation ~26,000 used for DEG analysis ~1,300 Only miRNA ((Jochen Bick)) 21.09.2018
Thank you for your attention Acknowledgements Veronika Flöter Supervisors: Stefan Bauersachs Susanne Ulbrich Mark Robinson plus group Thank you for your attention