Download presentation
Presentation is loading. Please wait.
Published byKristen Selwyn Modified over 9 years ago
1
© 2010 LabKey Software www.labkey.com Managing Next Generation Sequencing and Multiplexed Genotyping Data Using Open Source LabKey Server Adam Rauch adam@labkey.com
2
LabKey Software 2010 LabKey Software Company Overview LabKey Software is a consulting company Spun off from the McIntosh Lab (part owned by FHCRC) Professional software engineers from Amazon, Microsoft, BEA etc Work in partnership with scientists For-profit fee-for-service contracts Non-profit grant sub-awards –Co-investigators with a shared research agenda All development approved by and relevant to FHCRC Development & support around LabKey Server Extending the base LabKey Server platform Creating customized lab-specific solutions Hosting LabKey server Support 2
3
LabKey Software 2010 What Is LabKey Server? An open-source, web-based platform for organizing, analyzing & sharing scientific data Data integration analysis for assays Proteomics, flow cytometry, plate-based assays, etc. Study Data Management Combines demographic, clinical, assay & specimen data LabKey Server powers many deployments… CPAS: FHCRC proteomics repository Atlas Science Portal: SCHARP’s HIV vaccine studies AdaptiveTCR: Customer analytics for ImmunoSEQ NGS UW (Katze, Heinecke, et al), USC, Markey, Harvard, IDRI, TGen, Wisconsin Primate EHR, UC Denver, etc. 3
4
LabKey Software 2010 Dave O’Connor Lab, University of Wisconsin Academic research lab Focus: understanding SIV using nonhuman primate models & applying NHP methods to human HIV disease research Academic research lab Focus: understanding SIV using nonhuman primate models & applying NHP methods to human HIV disease research
5
Source: modified from Yewdell et al., Nature Reviews Immunology 2003 Source: Korber et al., British Medical Bulletin 2001 Host Immune Genetics Virus Genetics O’Connor Lab SIV/HIV Research
6
Source: modified from Yewdell et al., Nature Reviews Immunology 2003 Host Immune Genetics MHC class I molecules dictate immunity to disease High degree of polymorphism within the MHC class I peptide-binding domain Specific MHC alleles associated with superior control of HIV infection Importance of MHC Class I
7
Source: Korber et al., British Medical Bulletin 2001 Virus Genetics HIV has fast replication cycle, high mutation rate Evolution of the virus causes escape from immune responses Specific mutations are associated with resistance to antiretroviral drug therapy Importance of Viral Variability
8
LabKey Software 2010 Sequencing in the O’Connor Lab 8 2005 – 2009 Sanger sequencing “Prohibitively expensive” for most experiments 2009 Roche/454 GS FLX at UIUC 2010 Roche/454 GS Junior in lab Roche/454 GS Junior Long-read instrument, critical for genotyping Identical to GS FLX, but 1/8 throughput & lower cost ~100,000 reads per run (~1¢ per read), average ~560bp read length 115 runs this year MID tagging Allows pooling multiple samples (30-100) into a single run Galaxy server Open-source sequence analysis tool (Giardine et al, Genome Res 2005) Lab has built custom workflow to match sequences to known MHC alleles Uses BLAT, transitioning to AGILE (Northwestern alignment tool)
9
Roche/454 MHC Workflow Total RNA isolation and cDNA synthesis – RNA isolation ~4 hrs; cDNA synthesis ~2 hrs Primary PCR amplification – plus SPRI purification, quantification, pooling ~3 hrs emPCR – set-up ~1 hr, run ~5.5 hrs Breaking and enrichment – ~3 hrs Roche/454 GS Junior run – set-up ~1.5 hrs; run time ~10 hrs Data processing and analysis – run processing ~2 hrs; analysis time varies www.454.co m
10
LabKey Software 2010 PROBLEM: DATA MANAGEMENT! There is a real disconnect between the ability to collect next-generation sequence data (easy) and the ability to analyze it meaningfully (hard) Dave O’Connor 10
11
LabKey Software 2010 Problem: Data Management As volume has increased, lab has found it difficult to manage all their sequencing data & meta data: Run meta data Run metrics Sequencing reads and quality scores Sample information and multiplex identifiers (MIDs) Reference sequences for genotyping experiments Genotyping matches O’Connor asked LabKey to build a system that can: Store sequencing and genotyping data in a single database that links all the tables, allowing arbitrary queries and reports Provide tools for analysis, querying, visualization and export Automate data workflows for efficiency & consistency Eventually, link sequencing results to their primate EHR system 11
12
LabKey Software 2010 LabKey Sequencing System 12 Reads Quality Scores Metrics Sample Information Sequencing and Genotyping Database External Tools AnalysisReportingExport Galaxy Genotyping Workflow Reference Sequences Visualization
13
Database Schema 13
14
LabKey Software 2010 Demo 14
15
LabKey Software 2010 Possible Future Directions Respond to O’Connor lab’s near-term needs Genomics-specific analytics Additional export formats Tighter integration with Galaxy Support for amplicon-designated reads Match combining Simplify configuration and operation Integrate with Wisconsin primate EHR Better integration with R / Bioconductor Visualization Other sequencing platforms: Illumina, PacBio… 15
16
LabKey Software 2010 Acknowledgements O’Connor Laboratory David O’Connor Simon Lank Julie Karl Benjamin Bimber LabKey Software Mark Igra Brian Connolly Elizabeth Nelson Josh Eckels Matthew Bellew Et al
17
LabKey Software 2010 Questions? 17
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.