Pathogen Informatics 26 th Nov 2013 Pathogen Sequencing Informatics Jacqui Keane Pathogen Informatics Wellcome Trust Sanger Institute Hinxton, Cambridge, UK
Pathogen Informatics 26 th Nov 2013 Pathogen Informatics Team ▸ Support the pathogen variation programme at Sanger ▸ Researchers and visiting scientists ▸ Approx. 140 people ▸ Ad-hoc bioinformatics support and training ▸ Applications and systems to support research activities ▸ Artemis, ACT ▸ Automated pipelines for NGS data processing and analysis
Pathogen Informatics 26 th Nov 2013 Sequence Analysis Pipelines ▸ High throughput analysis pipeline ▸ Fully automated ▸ Standard and stable tools ▸ Standard data formats (FASTA, BAM, VCF) ▸ Established parameter sets ▸ Large amounts of data ▸ Large numbers of samples ▸ Target is large scale experiments
Pathogen Informatics 26 th Nov 2013 Cumulative TBp Sequenced (Virus) 7
Pathogen Informatics 26 th Nov 2013 Cumulative Samples Sequenced (Virus) 4.7K
Pathogen Informatics 26 th Nov 2013 Sequence Analysis Pipelines Pipeline Tracking/Import/QC ✔ Mapping ✔ RNA-Seq Analysis ✔ TraDIS Analysis ✔✔ SNP/Indel Calling ✔ Automated Assembly ✔ Sequence Typing ✔ External Tracking ✔ Virus Assembly ✔ Single Cell Assembly ✔ PacBio Assembly ✔ Prokaryote Annotation ✔ Eukaryote Annotation (InterPro) ✔ Pan-genome Construction ✔
Pathogen Informatics 26 th Nov 2013 Assembly Pipeline
Pathogen Informatics 26 th Nov 2013 Assembly Pipeline
Pathogen Informatics 26 th Nov 2013 Assembly Pipeline ▸ Generic pipeline management system ▸ Optimal use of computing cluster resources (memory, queues) ▸ Automatic retry of failed jobs ▸ Error reporting ▸ Automatically assembles data as comes off sequencers ▸ Automatically manage storage requirements ▸ Tools available for researchers to easily find data ▸ Pipeline stages are available as standalone tools ▸ All software freely available ▸ Perl CPAN ▸ GitHub :
Pathogen Informatics 26 th Nov 2013 Assembly Pipeline ▸ Assembled 2.6K samples across 6 viruses VirusNumber of Samples Betacoronavirus169 Human herpesvirus83 HIV302 Influenza1453 Norovirus237 RSV385
Pathogen Informatics 26 th Nov 2013 Future Work ▸ Evaluate and improve assembly pipeline ▸ Human herpesvirus, HIV, Influenza (PB1) ▸ Evaluate and add a stage to order and orientate assembly ▸ Existing tools ▸ Auto-detect appropriate reference ▸ Automatic annotation pipeline ▸ Adapt pipelines to run externally