Presentation is loading. Please wait.

Presentation is loading. Please wait.

Long way to solve short ncRNA data analysis problems – evaluation of small RNA-Seq datasets from non-model organisms in Galaxy Jochen Bick Jochen Bick.

Similar presentations


Presentation on theme: "Long way to solve short ncRNA data analysis problems – evaluation of small RNA-Seq datasets from non-model organisms in Galaxy Jochen Bick Jochen Bick."— Presentation transcript:

1 Long way to solve short ncRNA data analysis problems – evaluation of small RNA-Seq datasets from non-model organisms in Galaxy Jochen Bick Jochen Bick

2 Sus scrofa embryo samples
36 pig embryos collected at day 10 of pregnancy, sows fed with different doses of estradiol (n=6 per group, 6 groups in total) Illumina TruSeq Small RNA libraries Sequenced on Illumina HiSeq 4000 at FGCZ 126 bp single-end reads # of embryos sex E2 (µg) 6 10 1000 exogenous substances affecting: endogenous hormonal systems reproduction health Jochen Bick

3 Workflow overview in Galaxy
Filter and sequence mapping Count number of sequences Quality control with FastQC trimming, adapter clipping

4 Filtering Clip adapter Trimmomatic: FastQC Universal Illumina Adapter
Min length of 16 bp Only keep sequences with clipped adapter Trimmomatic: MINLEN: 16 LEADING: 3 No quality trimming FastQC After each step for quality checks Warum macht man nochmal die ersten 3 weg? ((Jochen Bick))

5 How to generate the count table
Warum macht man nochmal die ersten 3 weg? ((Jochen Bick))

6 More converting steps Collapse FASTA-TO-Tabular Convert + Cut
> CGCGACCTCAGATCAGA > CGCGACCTCAGATCAGAC > ACGCGACCTCAGATCAGA > ACTCAAACTGTGGGGGCACTTT > TAGCTTATCAGACTGATGTTGAC > TAAGTGCTTCCATGTTTTAGTAG > ACGCGACCTCAGATCAGACG 1 2 CGCGACCTCAGATCAGA CGCGACCTCAGATCAGAC ACGCGACCTCAGATCAGA ACTCAAACTGTGGGGGCACTTT TAGCTTATCAGACTGATGTTGAC TAAGTGCTTCCATGTTTTAGTAG ACGCGACCTCAGATCAGACG Convert + Cut 1 2 CGCGACCTCAGATCAGA 44042 CGCGACCTCAGATCAGAC 31455 ACGCGACCTCAGATCAGA 29208 ACTCAAACTGTGGGGGCACTTT 27371 TAGCTTATCAGACTGATGTTGAC 26520 TAAGTGCTTCCATGTTTTAGTAG 22345 ACGCGACCTCAGATCAGACG ((Jochen Bick))

7 Jochen Bick

8 Join datasets by identifier column
Library 1 Library 2 Library 3 sequence counts Sequence_id1 32 Sequence_id3 6 Sequence_id4 1 Sequence_id5 7 sequence counts Sequence_id1 8 Sequence_id4 5 sequence counts Sequence_id1 9 Sequence_id2 3 Sequence_id3 2 Results in joined table sequence counts lib1 counts lib2 counts lib3 Sequence_id1 32 8 9 Sequence_id2 3 Sequence_id3 6 2 Sequence_id4 1 5 Sequence_id5 7 ((Jochen Bick))

9 Join datasets by identifier column results
((Jochen Bick))

10 Counttable statistics
Jochen Bick

11 Library size differences
I got no good results using Velvet and oases so I first focused on trinity. ((Jochen Bick))

12 Library size differences
I got no good results using Velvet and oases so I first focused on trinity. ((Jochen Bick))

13 Library size differences
I got no good results using Velvet and oases so I first focused on trinity. ((Jochen Bick))

14 Filter count table by CPM cutoff
Jochen Bick

15 Filtering and mapping mature miRNA precursor miRNA tRNAs - Scc
piRNA cluster - Ssc all transcripts - Ssc ncRNAs - Hsa Jochen Bick

16 Annotation sources used for BLAST
Mature and precursor miRNA Translate RNA to DNA, join FASTA files into groups, remove duplicates, create BLAST databases Non-coding RNA, (tRNAs, small RNAs) Non-coding RNA (tRNA, small RNAs), piRNA cluster mRNAs Precursor mircoRNAs with and hairpin/stemloop structure produce one or two mature mircoRNAs ((Jochen Bick))

17 Analysis of sequence counts
Huge variation in percentage of read counts for miRNAs Originating from RNA isolation procedure? % miRNA # sequences ((Jochen Bick))

18 Analysis of sequence counts
# sequences ((Jochen Bick))

19 Sequence statistics Unique sequences 97.2% Raw reads ~24,000,000 1.8%
Filtering ~68,000 1.1% Annotation ~26,000 used for DEG analysis ~1,300 Only miRNA ((Jochen Bick))

20 Thank you for your attention
Acknowledgements Veronika Flöter Supervisors: Stefan Bauersachs Susanne Ulbrich Mark Robinson plus group Thank you for your attention


Download ppt "Long way to solve short ncRNA data analysis problems – evaluation of small RNA-Seq datasets from non-model organisms in Galaxy Jochen Bick Jochen Bick."

Similar presentations


Ads by Google