Download presentation
Presentation is loading. Please wait.
Published byJasmin Daniel Modified over 9 years ago
1
Pyrosequencing for Metagenomics: accessing and organizing raw data Giuseppe D’Auria FISABIO, Valencia Norwich 08-12 September 2014
2
We will start from a single sff (standard flowgram format) file containing a metagenome and a metatranscriptome experiments labelled by two MIDs (Multiplex Identifiers) Recruitment protocol by MUMmer Search for tRNA Assembly protocol via MIRAAnnotate 16S rRNA Organize data and folders Extract fasta and quality files belonging to each dataset Practice workflow Searching for rRNAsClusterize 16S rRNA
3
Practice workflow We will start from a single sff (standard flowgram format) file containing a metagenome and a metatranscriptome experiments labelled by two MIDs (Multiplex Identifiers) Organize data and folders Extract fasta and quality files belonging to each dataset
4
Extracting MIDs → FASTA file → Fasta Qual → mid_fasta_file Identify Mids and separate Fasta and Fasta Quality files bin_fasta_on_mid_primers.pl SFF FASTA-Mid1 QUALITY-Mid1 FASTA-Mid2 QUALITY-Mid2 FASTA-MidX QUALITY-MidX Excercise 2 sff_extract 1)Use sff_extract to extract sequences from sff -c parameter to remove adaptor sequences and make possible MIDs to be identified bin_fasta_on_mid_primers.pl 2) Use bin_fasta_on_mid_primers.pl to separate mids Extract fasta and quality files belonging to each dataset http://sourceforge.net/projects/mira-assembler/files/MIRA/
5
Open the terminal out_midi_CCAACC Metagenome out_midi_CGCCAT Metatranscriptome Extract fasta and quality files belonging to each dataset embo@embo-VirtualBox:~$ # Go to data folder embo@embo-VirtualBox:~$ cd data embo@embo-VirtualBox:~/data$ # Create project2 folder embo@embo-VirtualBox:~/data$ mkdir project2 embo@embo-VirtualBox:~$ # Go to project2 folder embo@embo-VirtualBox:~/data$ cd project2 embo@embo-VirtualBox:~/data/project2 # Link SFF file embo@embo-VirtualBox:~/data/project2$ ln -s ~/data/Sequences/dataset2.sff ~/data/project2/dataset2.sff embo@embo-VirtualBox:~/data/project2$ # Extract FASTQ and QUALITY from sff embo@embo-VirtualBox:~/data/project2$ sff_extract -c -A dataset2.sff embo@embo-VirtualBox:~/data/project2$ # Sort reads by MIDs embo@embo-VirtualBox:~/data/project2$ bin_fasta_on_mid_primers.pl -r dataset2.fasta -q dataset2.fasta.qual -m../Sequences/mids.fas -b out
6
Open the terminal Extract fasta and quality files belonging to each dataset embo@embo-VirtualBox:~/data/project2$ # Create Metagenome folder embo@embo-VirtualBox:~/data/project2$ mkdir metage embo@embo-VirtualBox:~/data/project2$ # Create Metatranscriptome folder embo@embo-VirtualBox:~/data/project2$ mkdir metatra embo@embo-VirtualBox:~/data/project2$ # Move project files in folders embo@embo-VirtualBox:~/data/project2$ mv out_midi_CCAACC.fasta* metage/ embo@embo-VirtualBox:~/data/project2$ mv out_midi_CGCCAT.fasta* metatra/ embo@embo-VirtualBox:~/data/project2$ # Go to Metagenome folder embo@embo-VirtualBox:~/data/project2/metage$ cd metage embo@embo-VirtualBox:~/data/project2/metage$ # Take a look at the folder embo@embo-VirtualBox:~/data/project2/metage$ ls -ltr
7
We will start from a single sff (standard flowgram format) file containing a metagenome and a metatranscriptome experiments labelled by two MIDs (Multiplex Identifiers) Organize data and folders Extract fasta and quality files belonging to each dataset Practice workflow Recruitment protocol by MUMmer
8
Open the terminal Mapping and recruitment graph embo@embo-VirtualBox:~/data/project2/metage$ # Link file to simpler name embo@embo-VirtualBox:~/data/project2/metage$ ln -s out_midi_CCAACC.fasta metage.fas embo@embo-VirtualBox:~/data/project2/metage$ # Mapping of reads on reference genome embo@embo-VirtualBox:~/data/project2/metage$ # Obtaining mapping coordinates embo@embo-VirtualBox:~/data/project2/metage$ nucmer --prefix=recruit../../References/reference.fasta metage.fas --coords embo@embo-VirtualBox:~/data/project2/metage$ # Obtaining mapping image (postscript) embo@embo-VirtualBox:~/data/project2/metage$ mummerplot recruit.delta -R../../References/reference.fasta -Q metage.fas --coverage --postscript -p recruit embo@embo-VirtualBox:~/data/project2/metage$ # Visualizing mapping embo@embo-VirtualBox:~/data/project2/metage$ evince recruit.ps & http://mummer.sourceforge.net/
9
We will start from a single sff (standard flowgram format) file containing a metagenome and a metatranscriptome experiments labelled by two MIDs (Multiplex Identifiers) Organize data and folders Extract fasta and quality files belonging to each dataset Practice workflow Recruitment protocol by MUMmer Assembly protocol via MIRA
10
embo@embo-VirtualBox:~/data/project2/metage$ # Linking metagenome file for assembly embo@embo-VirtualBox:~/data/project2/metage$ ln -s out_midi_CCAACC.fasta metage_in.454.fasta embo@embo-VirtualBox:~/data/project2/metage$ ln -s out_midi_CCAACC.fasta.qual metage_in.454.fasta.qual embo@embo-VirtualBox:~/data/project2/metage$ ln -s../dataset2.xml metage_traceinfo_in.454.xml embo@embo-VirtualBox:~/data/project2/metage$ # Start denovo assembly embo@embo-VirtualBox:~/data/project2/metage$ mira --project=metage --job=denovo,genome,draft,454 454_SETTINGS -LR:ft=fasta embo@embo-VirtualBox:~/data/project2/metage$ # Goto results folder embo@embo-VirtualBox:~/data/project2/metage$ cd metage_assembly embo@embo-VirtualBox:~/data/project2/metage/metage_assembly$ cd metage_d_results embo@embo-VirtualBox:~/data/project2/metage/metage_assembly/metage_d_results$ # Take a look at the results embo@embo-VirtualBox:~/data/project2/metage/metage_assembly/metage_d_results$ tablet metage_out.ace & http://chevreux.org/projects_mira.html http://sourceforge.net/apps/mediawiki/mira-assembler Assmebly viewer http://bioinf.scri.ac.uk/tablet/
11
We will start from a single sff (standard flowgram format) file containing a metagenome and a metatranscriptome experiments labelled by two MIDs (Multiplex Identifiers) Organize data and folders Extract fasta and quality files belonging to each dataset Practice workflow Recruitment protocol by MUMmer Assembly protocol via MIRASearching for rRNAs
12
embo@embo-VirtualBox:~/data/project2/metage/metage_assembly/metage_d_results$ cd../../../ embo@embo-VirtualBox:~/data/project2$ cd metatra embo@embo-VirtualBox:~/data/project2/metatra$ # Link needed files embo@embo-VirtualBox:~/data/project2/metatra$ ln -s out_midi_CGCCAT.fasta metatra.fas embo@embo-VirtualBox:~/data/project2/metatra$ # Searching for 16S sequences embo@embo-VirtualBox:~/data/project2/metatra$ rna_hmm3.py -i metatra.fas -m ssu -o metatra_16S -L embo@embo-VirtualBox:~/data/project2/metatra$../../References/hmm3 embo@embo-VirtualBox:~/data/project2/metatra$ # Extract 16S sequences from the 16S table embo@embo-VirtualBox:~/data/project2/metatra$ extract_sequences_by_list.pl -f metatra.fas -t metatra_16S -c 0 -o -d 1 http://weizhong-lab.ucsd.edu/meta_rna/ extract_sequences_by_list One of my perl scripts
13
Practice workflow We will start from a single sff (standard flowgram format) file containing a metagenome and a metatranscriptome experiments labelled by two MIDs (Multiplex Identifiers) Organize data and folders Extract fasta and quality files belonging to each dataset Recruitment protocol by MUMmer Assembly protocol via MIRASearching for rRNAsClusterize 16S rRNA
14
Clustering embo@embo-VirtualBox:~/data/project2/metatra$ # Filtering out chimeras embo@embo-VirtualBox:~/data/project2/metatra$ #ChimeraSlayer.pl --query_FASTA 16S.list.fasta embo@embo-VirtualBox:~/data/project2/metatra$ # Clustering 16S sequences embo@embo-VirtualBox:~/data/project2/metatra$ cdhit -i 16S.list.fasta -o 16Sc90s90 -c 0.9 -s 0.9 -bak 1 embo@embo-VirtualBox:~/data/project2/metatra$ cd-hit_translate.pl 16Sc90s90.bak.clstr > 16S.tab cd-hit_translate Oneother of my perl scripts http://weizhong-lab.ucsd.edu/cd-hit/
15
Practice workflow We will start from a single sff (standard flowgram format) file containing a metagenome and a metatranscriptome experiments labelled by two MIDs (Multiplex Identifiers) Organize data and folders Extract fasta and quality files belonging to each dataset Recruitment protocol by MUMmer Assembly protocol via MIRASearching for rRNAsClusterize 16S rRNAAnnotate 16S rRNA
16
embo@embo-VirtualBox:~/data/project2/metatra$ # 16S assignation by RDP_classifie embo@embo-VirtualBox:~/data/project2/metatra$ java -jar ~/Software/rdp_classifier_2.2/rdp_classifier-2.2.jar -q 16S.remain.fasta -o 16S_rdp -f fixrank Annotate 16S rRNA http://rdp.cme.msu.edu/index.jsp
17
Practice workflow We will start from a single sff (standard flowgram format) file containing a metagenome and a metatranscriptome experiments labelled by two MIDs (Multiplex Identifiers) Organize data and folders Extract fasta and quality files belonging to each dataset Recruitment protocol by MUMmer Assembly protocol via MIRASearching for rRNAsClusterize 16S rRNAAnnotate 16S rRNA Search for tRNA
18
embo@embo-VirtualBox:~/data/project2/metatra$ # Searching for tRNAs embo@embo-VirtualBox:~/data/project2/metatra$ tRNAscan-SE -B 16S.remain.fasta > tRNAs.tab embo@embo-VirtualBox:~/data/project2/metatra$ # Extract tRNAs sequences from the tRNAs table embo@embo-VirtualBox:~/data/project2/metatra$ extract_sequences_by_list.pl -f 16S.remain.fasta -t tRNAs.tab -c 0 -o tRNAs -d 1 Searching for tRNAs http://lowelab.ucsc.edu/tRNAscan-SE/ extract_sequences_by_list.pl Another of my perl scripts
19
Running out of physical limits
20
http://www.perl.org/ For INTREPID and BRAVE people
21
Perl is a scripting language widely used for system administration and programming on the World Wide Web. It originated in the UNIX community and has a strong UNIX slant, but usage on Windows has grown rapidly. ActivePerl is a quality-assured binary distribution of Perl for popular UNIX platforms and Windows. perl (small 'p') is the program used to interpret the Perl language.
23
http://www.r-project.org/ For INTREPID and BRAVE people II R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.
24
http://www.bioconductor.org/ Thank you again for your attention..........
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.