Pathogen Informatics 26 th Nov 2013 Pathogen Sequencing Informatics Jacqui Keane Pathogen Informatics Wellcome Trust Sanger Institute Hinxton, Cambridge,

Slides:



Advertisements
Similar presentations
In Silico Primer Design and Simulation for Targeted High Throughput Sequencing I519 – FALL 2010 Adam Thomas, Kanishka Jain, Tulip Nandu.
Advertisements

AIDS 2012 Mobile tech in PMTCT programme evaluation.
V Improvements to 3kb Long Insert Size Paired-End Library Preparation Naomi Park, Lesley Shirley, Michael Quail, Harold Swerdlow Wellcome Trust Sanger.
Running Assembly Jobs on the Cluster with Checkpointing NERSC Tutorial 2/12/2013 Alicia Clum.
Modeling Functional Genomics Datasets CVM Lesson 3 13 June 2007Fiona McCarthy.
Computational Biology: A Measurement Perspective Alden Dima Information Technology Laboratory
HPC in the Human Genome Project James Cuff
UK -Tomato Chromosome Four Sarah Butcher Bioinformatics Support Service Centre For Bioinformatics Imperial College London
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Pathogen Informatics 21 st Nov 2014 Pathogen Sequencing Informatics Jacqui Keane Pathogen Informatics.
High Throughput Sequencing
Charles Caltagirone  A computer based software program for documentation  Allows athletic trainer to keep files in a computer rather than in a cabinet.
TGAC Training Coordination for the BBSRC Strategically-Funded Institutes Tanya Dickie: Bioinformatics & Biomathematics Training.
NGS Analysis Using Galaxy
Sequence Analysis with Artemis & Artemis Comparison Tool (ACT) South East Asian Training Course on Bioinformatics Applied to Tropical Diseases (Sponsored.
Peter Clapham Informatics Support Group. About the Institute ● Funded by Wellcome Trust. ● 2 nd largest research charity in the world. ● ~700 employees.
Bioinformatics Core Facility Ernesto Lowy February 2012.
LEQ: WHAT ARE THE BENEFITS OF DNA TECHNOLOGY & THE HUMAN GENOME PROJECT? to
Bioinformatics and OMICs Group Meeting REFERENCE GUIDED RNA SEQUENCING.
Bioinformatics Institute work with ASAS Genomics Centre By Dan Jones.
EBI is an Outstation of the European Molecular Biology Laboratory. Bert Overduin Daniel Rios Stephen Fitzgerald Edinburgh, 24 & 25 February 2009 Ensembl.
High-Throughput Crystallography at Monash Noel Faux Dept of Biochemistry and Molecular Biology Monash University.
DAN LAWSON BRC 2011 – ANNUAL MEETING UT SOUTHWESTERN MEDICAL CENTER DALLAS, TX SEPTEMBER 2011 Challenges and opportunities of new sequencing technologies.
David R. McWilliams, Ph.D. Section of Statistical Genetics, Department of Biostatistical Sciences, Center for Public Health Genomics Bioinformatician IV.
The iPlant Collaborative
Jodi Humann, Stephen Ficklin, Taein Lee, Chun-Huai Cheng, Sook Jung, Jill Wegrzyn, David Neale and Dorrie Main An easy to use, web-based solution for specialty.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Parallel Kernels*: An Architecture for Parallel Distributed Computing N. Patel (University of Maryland)‏ M. McKerns (California Institute of Technology)‏
Rapid method to identify the mutated gene responsible for a trait A systems approach to understand biological mechanism High throughput sequencing to develop.
Alexis DereeperCIBA courses – Brasil 2011 Detection and analysis of SNP polymorphisms.
1 Removing Sequential Bottlenecks in Analysis of Next-Generation Sequencing Data Yi Wang, Gagan Agrawal, Gulcin Ozer and Kun Huang The Ohio State University.
Bioinformatics Core Facility Guglielmo Roma January 2011.
BRUDNO LAB: A WHIRLWIND TOUR Marc Fiume Department of Computer Science University of Toronto.
Tsute (George) Chen Bioinformatics Core Department of Microbiology The Forsyth Institute March 24 th, 2015 HOMD A Tour to the Data and Tools.
Wellcome Trust Sanger Institute Informatics Systems Group Ensembl Compute Grid issues James Cuff Informatics Systems Group Wellcome Trust Sanger Institute.
Process Characteristics
Active Sampling for Accelerated Learning of Performance Models Piyush Shivam, Shivnath Babu, Jeff Chase Duke University.
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
UK NGS Sequencing Update July 2009 Dr Gerard Bishop - Division of Biology Dr Sarah Butcher – Centre for Bioinformatics.
Bioinformatics Curriculum Issues, goals, curriculum.
An approach to carry out research and teaching in Bioinformatics in remote areas Alok Bhattacharya Centre for Computational Biology & Bioinformatics JAWAHARLAL.
Cassava Needs ProjectUrgentOther needs Biotic stressGenotypingDecision and support tools Implementing MARS - cassava Management of genotyping data Field.
Next Generation Sequencing and Bioinformatics Analysis Pipelines
Bioinformatics support at School of Biological Sciences
2010 Practice Management Annual Conference Streamline Information flow using Report Queues & Distributions Presented by: Gail Henderson Juris ®
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Bioinformatics activity Christophe BLANCHET.
A brief guide to sequencing Dr Gavin Band Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for Health.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
Zach Miller Computer Sciences Department University of Wisconsin-Madison Supporting the Computation Needs.
Transforming Science Through Data-driven Discovery Workshop Overview Ohio State University MCIC Jason Williams – Lead, CyVerse – Education, Outreach, Training.
Computer Orgnization Rabie A. Ramadan Lecture 9. Cache Mapping Schemes.
Galaxy based BLAST submission to distributed high throughput computing resources Rob Quick and Soichi Hayashi Open Science Grid Operations Indiana University.
Million Veteran Program: Industry Day Genomic Data Processing and Storage Saiju Pyarajan, PhD and Philip Tsao, PhD Million Veteran Program: Industry Day.
Misleading bioinformatics: Mistakes, Biases, Mis-interpretations and how to avoid them Festival of Genomics 2017 Course Exercise Material:
Cancer Genomics Core Lab
Tools and Services Workshop
Joslynn Lee – Data Science Educator
Population Imaging Use Case - EuroBioImaging
Applying Control Theory to Stream Processing Systems
Sequencing technology and assembly
AWS Batch Overview A highly-efficient, dynamically-scaled, batch computing service May 2017.
Tracking parameter optimization
Ensembl Genomes: Overview Poznań, 27th-28th June 2013
Ensembl Genomes: Overview Versailles, 12th-13th November 2012
Overview of Workflows: Why Use Them?
1.1.3 MI.
Presentation transcript:

Pathogen Informatics 26 th Nov 2013 Pathogen Sequencing Informatics Jacqui Keane Pathogen Informatics Wellcome Trust Sanger Institute Hinxton, Cambridge, UK

Pathogen Informatics 26 th Nov 2013 Pathogen Informatics Team ▸ Support the pathogen variation programme at Sanger ▸ Researchers and visiting scientists ▸ Approx. 140 people ▸ Ad-hoc bioinformatics support and training ▸ Applications and systems to support research activities ▸ Artemis, ACT ▸ Automated pipelines for NGS data processing and analysis

Pathogen Informatics 26 th Nov 2013 Sequence Analysis Pipelines ▸ High throughput analysis pipeline ▸ Fully automated ▸ Standard and stable tools ▸ Standard data formats (FASTA, BAM, VCF) ▸ Established parameter sets ▸ Large amounts of data ▸ Large numbers of samples ▸ Target is large scale experiments

Pathogen Informatics 26 th Nov 2013 Cumulative TBp Sequenced (Virus) 7

Pathogen Informatics 26 th Nov 2013 Cumulative Samples Sequenced (Virus) 4.7K

Pathogen Informatics 26 th Nov 2013 Sequence Analysis Pipelines Pipeline Tracking/Import/QC ✔ Mapping ✔ RNA-Seq Analysis ✔ TraDIS Analysis ✔✔ SNP/Indel Calling ✔ Automated Assembly ✔ Sequence Typing ✔ External Tracking ✔ Virus Assembly ✔ Single Cell Assembly ✔ PacBio Assembly ✔ Prokaryote Annotation ✔ Eukaryote Annotation (InterPro) ✔ Pan-genome Construction ✔

Pathogen Informatics 26 th Nov 2013 Assembly Pipeline

Pathogen Informatics 26 th Nov 2013 Assembly Pipeline

Pathogen Informatics 26 th Nov 2013 Assembly Pipeline ▸ Generic pipeline management system ▸ Optimal use of computing cluster resources (memory, queues) ▸ Automatic retry of failed jobs ▸ Error reporting ▸ Automatically assembles data as comes off sequencers ▸ Automatically manage storage requirements ▸ Tools available for researchers to easily find data ▸ Pipeline stages are available as standalone tools ▸ All software freely available ▸ Perl CPAN ▸ GitHub :

Pathogen Informatics 26 th Nov 2013 Assembly Pipeline ▸ Assembled 2.6K samples across 6 viruses VirusNumber of Samples Betacoronavirus169 Human herpesvirus83 HIV302 Influenza1453 Norovirus237 RSV385

Pathogen Informatics 26 th Nov 2013 Future Work ▸ Evaluate and improve assembly pipeline ▸ Human herpesvirus, HIV, Influenza (PB1) ▸ Evaluate and add a stage to order and orientate assembly ▸ Existing tools ▸ Auto-detect appropriate reference ▸ Automatic annotation pipeline ▸ Adapt pipelines to run externally