Presentation is loading. Please wait.

Presentation is loading. Please wait.

My CoGe Comparing our genomes. Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal.

Similar presentations


Presentation on theme: "My CoGe Comparing our genomes. Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal."— Presentation transcript:

1 my CoGe Comparing our genomes

2 Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal Genomics”  Companies now providing sequencing include:  23andMe ($99)  AncestryDNA ($99)  CompleteGenomics ($5000)  Counsyl ($1000)  Ubiome ($89-$400)  Genelex  …and more!

3  Huge set of data provides lots of promise for researchers.  600k of 23andMe’s 800k customers have consented to using data for research.  Multiple sources now provide means for individuals to share their genetics and health histories with researchers.  i.e. Personal Genome Project, OpenHuman  Unfortunately, data from different sources cannot be directly compared. Background and Introduction

4 Goal of myCoGe Data Integration Pipeline  Provide a mechanism for automated retrieval of publically available genomic experiment datasets for import into CoGe.  Provide the necessary tools for converting raw experiment files to formats accepted by CoGe.  Provide tools for converting experiments to utilize the same reference genome. What is my CoGe ? Ultimate Goal of myCoGe  Provide a powerful framework of tools and datasets to allow for analyses into how variation affects function in human genomes.  Provide a useful toolbox for individuals to investigate their own, personal genetic data.

5 my CoGe Data Integration Conceptual Pipeline ReviewDownloadConvertLoadIdentify

6 Operational File Structure

7 my CoGe Data Integration Full Pipeline

8 Fun Facts Lines of Code  Slowest Process: Loading 20gig reference SNP file - ~4min  Convert 900,000 SNPs from reference file:  5-30seconds Speed Benchmarks  Initiate : 123 lines.  myCoGe: 692 lines.  Finalize: 11 lines.  Execute_myCoGe: 3 lines.  SNPScraper: 59 lines.  Total: 888

9 Initial Execution  Complete pipeline was executed Friday, May 1 st.  Initial query of PGP obtained 579 potential experiments and associated metadata. Complications  PGP servers slow, largely unresponsive  Through weekend, just under 100 experiments were able to be downloaded.  Of this, 79 yielded good results.  CoGe API Load Experiment not functional  Code for loading is complete, but CoGe returns authentication error.  Reference genome chromosome names are NCBI IDs instead of numbers.

10 Future Directions  myCoGe Data-Integration Pipeline  Functional CoGe API loading.  Increased stability in face of poor connections.  Expanded file types.  Expanded experiment sources.  Automated execution.  myCoGe  Web-based personal data integration  Integrated comparison tools  Gene model annotations  Functional and expression experiments  Full-genome sequencing


Download ppt "My CoGe Comparing our genomes. Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal."

Similar presentations


Ads by Google