Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pathogen Informatics 26 th Nov 2013 Pathogen Sequencing Informatics Jacqui Keane Pathogen Informatics Wellcome Trust Sanger Institute Hinxton, Cambridge,

Similar presentations


Presentation on theme: "Pathogen Informatics 26 th Nov 2013 Pathogen Sequencing Informatics Jacqui Keane Pathogen Informatics Wellcome Trust Sanger Institute Hinxton, Cambridge,"— Presentation transcript:

1 Pathogen Informatics 26 th Nov 2013 Pathogen Sequencing Informatics Jacqui Keane Pathogen Informatics Wellcome Trust Sanger Institute Hinxton, Cambridge, UK jm15@sanger.ac.uk

2 Pathogen Informatics 26 th Nov 2013 Pathogen Informatics Team ▸ Support the pathogen variation programme at Sanger ▸ Researchers and visiting scientists ▸ Approx. 140 people ▸ Ad-hoc bioinformatics support and training ▸ Applications and systems to support research activities ▸ Artemis, ACT ▸ Automated pipelines for NGS data processing and analysis

3 Pathogen Informatics 26 th Nov 2013 Sequence Analysis Pipelines ▸ High throughput analysis pipeline ▸ Fully automated ▸ Standard and stable tools ▸ Standard data formats (FASTA, BAM, VCF) ▸ Established parameter sets ▸ Large amounts of data ▸ Large numbers of samples ▸ Target is large scale experiments

4 Pathogen Informatics 26 th Nov 2013 Cumulative TBp Sequenced (Virus) 7

5 Pathogen Informatics 26 th Nov 2013 Cumulative Samples Sequenced (Virus) 4.7K

6 Pathogen Informatics 26 th Nov 2013 Sequence Analysis Pipelines Pipeline2010201120122013 Tracking/Import/QC ✔ Mapping ✔ RNA-Seq Analysis ✔ TraDIS Analysis ✔✔ SNP/Indel Calling ✔ Automated Assembly ✔ Sequence Typing ✔ External Tracking ✔ Virus Assembly ✔ Single Cell Assembly ✔ PacBio Assembly ✔ Prokaryote Annotation ✔ Eukaryote Annotation (InterPro) ✔ Pan-genome Construction ✔

7 Pathogen Informatics 26 th Nov 2013 Assembly Pipeline

8 Pathogen Informatics 26 th Nov 2013 Assembly Pipeline

9 Pathogen Informatics 26 th Nov 2013 Assembly Pipeline ▸ Generic pipeline management system ▸ Optimal use of computing cluster resources (memory, queues) ▸ Automatic retry of failed jobs ▸ Error reporting ▸ Automatically assembles data as comes off sequencers ▸ Automatically manage storage requirements ▸ Tools available for researchers to easily find data ▸ Pipeline stages are available as standalone tools ▸ All software freely available ▸ Perl CPAN ▸ GitHub : https://github.com/sanger-pathogenshttps://github.com/sanger-pathogens

10 Pathogen Informatics 26 th Nov 2013 Assembly Pipeline ▸ Assembled 2.6K samples across 6 viruses VirusNumber of Samples Betacoronavirus169 Human herpesvirus83 HIV302 Influenza1453 Norovirus237 RSV385

11 Pathogen Informatics 26 th Nov 2013 Future Work ▸ Evaluate and improve assembly pipeline ▸ Human herpesvirus, HIV, Influenza (PB1) ▸ Evaluate and add a stage to order and orientate assembly ▸ Existing tools ▸ Auto-detect appropriate reference ▸ Automatic annotation pipeline ▸ Adapt pipelines to run externally


Download ppt "Pathogen Informatics 26 th Nov 2013 Pathogen Sequencing Informatics Jacqui Keane Pathogen Informatics Wellcome Trust Sanger Institute Hinxton, Cambridge,"

Similar presentations


Ads by Google