Download presentation
Presentation is loading. Please wait.
Published byAudra Barnett Modified over 9 years ago
1
Robust Software Tools for Variant Identification and Functional Assessment (Boston College & University of Michigan) Gabor Marth, Goncalo Abecasis, PIs
2
Informatics challenges for genomic analysis Tool building Facilitating analysis Widening accessibility
3
Intentions of the RFA
4
Our approach Complete toolbox including variant interpretation Full pipelines for start-to-finish analysis Easily accessible and well documented methods Cloud deployment (in addition to single machine/local compute cluster) Open development model
5
Progress in first 6 months Starting with two sets of tools and pipelines, geared toward high quality local analysis, battle-tested in the 1000GP data and medical sequencing projects The two groups follow a “divide and conquer” strategy to put critical pieces in place for making our algorithms available for the wider genomics community Boston College – A universal tool/pipeline launcher application – Infrastructure for dissemination – Cloud access via Galaxy University of Michigan – Integration of variant annotation/impact assessment – Pipeline/workflow control infrastructure – Adaptation for Amazon Cloud Services
6
FUNCTIONALITY & TOOLS
7
Scope
8
Include latest versions Tools constantly evolving (as they must to remain relevant) Our community toolbox to be updated with new tools as they become available ref: TATAGAGAGAGAGAGAGAGCGAGAGAGAGAGAGAGAGGGAGAGACGGA GTT alt: TATAGAGAGAGAGAGAGCGAGAGAGAGAGAGAGAGAGGGAGAGACGGA GTT ref: TATAGAGAGAGAGAGAGAGC-- GAGAGAGAGAGAGAGAGGGAGAGACGGAGTT alt: TATAGAGAGAGAGAGAG-- CGAGAGAGAGAGAGAGAGAGGGAGAGACGGAGTT New algorithms for complex variant detection (FreeBayes)
9
Include tools when ready for prime time MEI type RetroSeqTangramTea SampleTotalSensitivityTotalSensitivityTotalSensitivity ALUNA1289171989%119298%112792% NA1289268786%118598%107892% NA1287879382%132699%103889% L1NA128915278%19081%28681% The BC mobile element insertion caller performs best in its class
10
EPACTS variant interpretation tools (Efficient and Parallelizable Association Container Toolbox) Genetic analysis tool based on VCF o Fast and parallelizable access to large VCF files o Built-in widely used single variant and burden tests o R/C++ interface for extending to newer tests o Binary & quantitative phenotypes with covariates o Useful visualization tools of association results Automated visualization
11
PIPELINES & WORKFLOW
12
The UM pipeline Optional LD-aware step Genotype Likelihood BAM Unfiltered VCF Hard-filtered VCF Genotype Likelihood BAM Genotype Likelihood BAM samtools glfMultiples vcfCooker Filtered VCF SVM Filtered/Phased VCF Beagle/Thunder Filtered/Phased VCF EPACTS
13
UMAKE workflow system Makefile based approach – The Make utility is very good for representing dependencies – Pick up where left off on Failure Flexible deployment – Local Machine – Local Cluster (Mosix) – Amazon Web Services Elastic Compute Cloud (EC2) Default options – User configurable
14
Application of UMAKE to large-scale projects 14 Project Depth / Region N#SNPs %dbSNP (129) Known Ts/Tv Novel Ts/Tv 1000G4x Genome1,09234.5M24.42.142.16 1000G>40x Exome822598K22.12.962.80 GoT2D4x Genome~2,80026.7M25.52.162.19 ESP>80x Exome~6,9001.92M8.62.942.83 Sardinia3x Genome212017.6M38.42.152.22 Bipolar10x Genome Computational cost is ~1 week / 1000 samples in a 5 node mini-cluster
15
ACCESSIBILITY
16
The Boston College tool hub http://gkno.me (genome)
17
Simplified installation & use Unified launcher application (gkno) – single tools (e.g. Mosaik) – tool “macros” (e.g. map) – pipelines (e.g. exome variant calling) Download and installation – All tools pulled in a single step from github – All tools installed – All tools tested
19
Easily configurable pipeline system Part of our new unified launcher system (gkno) Pipeline types (e.g. mapping, variant calling) and instances (exome, whole-genome) User-configurable: tools can be swapped in and out, parameters configured via config files
20
Support Documentation Tutorials / Blog User forum Bug reports
21
DEPLOYMENT / CLOUD
22
Software deployment All software is ready for running locally on a single machine UMAKE adds cluster support Cloud deployment – Simple Michigan pipelines ported to Amazon – Portation of all project software on the way
23
Cloud-based analysis – Galaxy
24
OPEN & COLLABORATIVE DEVELOPMENT MODEL
25
Integration Our workflows leverage 3 rd party tools for specific functionality All our tools are open- source, available on github (many clones, community contributed code) Ensemble approach (multiple tools for critical tasks)
26
Ensemble approach Multiple tools usually benefit analysis Ts/Tv Called in# SNPs%dbSNPNovelKnownTotal Union907,17022.092.222.302.24 2 of 5766,60825.332.382.332.37 3 of 5696,35827.052.442.362.42 4 of 5601,13229.622.492.402.46 Intersection520,08332.202.532.422.49
27
Ensemble approach Our pipelines will use multiple aligners (BWA, Mosaik) and variant callers (Freebayes, glfMultiples), developed by BC/UM
28
In progress Expanding pipelines to integrate all tools Michigan tools -> gkno BC tools -> Michigan cloud ready pipelines Large data set analysis on the cloud Integrate variant interpretation tools Integrate SV tools as they become more robust Integrate consensus analysis (SVM and MLP approaches to callset aggregation) Minimal, functional pipeline -> Galaxy
29
Team Boston College Alistair Ward Derek Barnett Chase Miller Wan-Ping Lee Erik Garrison Gabor Marth University of Michigan Mary-Kate Trost Tom Blackwell Hyun-Min Kang Youna Hu Adrian Tan Xiaowei Zhan Dajiang Liu Goncalo Abecasis
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.