Download presentation
Presentation is loading. Please wait.
Published byOctavia Golden Modified over 9 years ago
1
my CoGe Comparing our genomes
2
Background and Introduction Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal Genomics” Companies now providing sequencing include: 23andMe ($99) AncestryDNA ($99) CompleteGenomics ($5000) Counsyl ($1000) Ubiome ($89-$400) Genelex …and more!
3
Huge set of data provides lots of promise for researchers. 600k of 23andMe’s 800k customers have consented to using data for research. Multiple sources now provide means for individuals to share their genetics and health histories with researchers. i.e. Personal Genome Project, OpenHuman Unfortunately, data from different sources cannot be directly compared. Background and Introduction
4
Goal of myCoGe Data Integration Pipeline Provide a mechanism for automated retrieval of publically available genomic experiment datasets for import into CoGe. Provide the necessary tools for converting raw experiment files to formats accepted by CoGe. Provide tools for converting experiments to utilize the same reference genome. What is my CoGe ? Ultimate Goal of myCoGe Provide a powerful framework of tools and datasets to allow for analyses into how variation affects function in human genomes. Provide a useful toolbox for individuals to investigate their own, personal genetic data.
5
my CoGe Data Integration Conceptual Pipeline ReviewDownloadConvertLoadIdentify
6
Operational File Structure
7
my CoGe Data Integration Full Pipeline
8
Fun Facts Lines of Code Slowest Process: Loading 20gig reference SNP file - ~4min Convert 900,000 SNPs from reference file: 5-30seconds Speed Benchmarks Initiate : 123 lines. myCoGe: 692 lines. Finalize: 11 lines. Execute_myCoGe: 3 lines. SNPScraper: 59 lines. Total: 888
9
Initial Execution Complete pipeline was executed Friday, May 1 st. Initial query of PGP obtained 579 potential experiments and associated metadata. Complications PGP servers slow, largely unresponsive Through weekend, just under 100 experiments were able to be downloaded. Of this, 79 yielded good results. CoGe API Load Experiment not functional Code for loading is complete, but CoGe returns authentication error. Reference genome chromosome names are NCBI IDs instead of numbers.
10
Future Directions myCoGe Data-Integration Pipeline Functional CoGe API loading. Increased stability in face of poor connections. Expanded file types. Expanded experiment sources. Automated execution. myCoGe Web-based personal data integration Integrated comparison tools Gene model annotations Functional and expression experiments Full-genome sequencing
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.