My CoGe Comparing our genomes. Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal.

my CoGe Comparing our genomes

Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal Genomics”  Companies now providing sequencing include:  23andMe ($99)  AncestryDNA ($99)  CompleteGenomics ($5000)  Counsyl ($1000)  Ubiome ($89-$400)  Genelex  …and more!

 Huge set of data provides lots of promise for researchers.  600k of 23andMe’s 800k customers have consented to using data for research.  Multiple sources now provide means for individuals to share their genetics and health histories with researchers.  i.e. Personal Genome Project, OpenHuman  Unfortunately, data from different sources cannot be directly compared. Background and Introduction

Goal of myCoGe Data Integration Pipeline  Provide a mechanism for automated retrieval of publically available genomic experiment datasets for import into CoGe.  Provide the necessary tools for converting raw experiment files to formats accepted by CoGe.  Provide tools for converting experiments to utilize the same reference genome. What is my CoGe ? Ultimate Goal of myCoGe  Provide a powerful framework of tools and datasets to allow for analyses into how variation affects function in human genomes.  Provide a useful toolbox for individuals to investigate their own, personal genetic data.

my CoGe Data Integration Conceptual Pipeline ReviewDownloadConvertLoadIdentify

Operational File Structure

my CoGe Data Integration Full Pipeline

Fun Facts Lines of Code  Slowest Process: Loading 20gig reference SNP file - ~4min  Convert 900,000 SNPs from reference file:  5-30seconds Speed Benchmarks  Initiate : 123 lines.  myCoGe: 692 lines.  Finalize: 11 lines.  Execute_myCoGe: 3 lines.  SNPScraper: 59 lines.  Total: 888

Initial Execution  Complete pipeline was executed Friday, May 1 st.  Initial query of PGP obtained 579 potential experiments and associated metadata. Complications  PGP servers slow, largely unresponsive  Through weekend, just under 100 experiments were able to be downloaded.  Of this, 79 yielded good results.  CoGe API Load Experiment not functional  Code for loading is complete, but CoGe returns authentication error.  Reference genome chromosome names are NCBI IDs instead of numbers.

Future Directions  myCoGe Data-Integration Pipeline  Functional CoGe API loading.  Increased stability in face of poor connections.  Expanded file types.  Expanded experiment sources.  Automated execution.  myCoGe  Web-based personal data integration  Integrated comparison tools  Gene model annotations  Functional and expression experiments  Full-genome sequencing

My CoGe Comparing our genomes. Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal.

Similar presentations

Presentation on theme: "My CoGe Comparing our genomes. Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

My CoGe Comparing our genomes. Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal.

Similar presentations

Presentation on theme: "My CoGe Comparing our genomes. Background and Introduction  Decreases in sequencing costs, coupled with increases in speed have paved the way for “Personal."— Presentation transcript:

Similar presentations

About project

Feedback