August 20, 2007 BDGP modENCODE Data Production
BDGP Data Production Project Goals 21,000 RACE experiments 6,000 cDNA’s from directed screening and full insert sequencing 3,000 RT-PCR experiments and insert sequencing Data Tracking Requirements Identification of genomic regions for interrogation Tracking and associations of experiments Analysis of experimental data Submission of results to GenBank and DCC
Data Resources The identification of experiments is based on existing resources Affymetrix microarray data BDGP EST/cDNA clones
Embryonic RNA Expression on Genome Tiling Arrays Manak et al., (2006) Nat. Genet. 38(10):
BDGP EST and cDNA Projects Data Resources Project Resources 295,379 BDGP EST end sequences 109,398 Exelixis EST end sequences 15,015 BDGP clone full length sequences Production tracking and analysis in an integrated database LIMS
BDGP Production Tracking Existing production tracking through an internal web-based LIMS system
Production Data Workflow Benchwork Registration Gel Processing Clone Data Processing
BDGP Data Analysis
BDGP 5’ RACE Identification of 5’ 2,074 RACE primers from set of CG’s from Ohler et al. 96 selected for experiments
Directed cDNA Screening using iPCR The congo exon screen is a model for the 5’ RACE, directed cDNA, and RT-PCR screening congo: 41,564 protein coding exons from comparative analysis from Manolis Kellis 434 exons did not overlap Rel 4.3 annotations or existing EST/cDNA data 267 (61.5%) completed full insert sequencing
cDNA Clone Capture using iPCR Identification of ExonPrimer Design and Experiment RegistrationPCR Plate ProductionCloning, end and internal sequencingAssembly and Analysis of screen dataFull insert sequencing of positive matches
Computationally predicted conserved exons validated by cDNA screening and sequencing I. Gene modificationsII. Identification of New Genes Predictions - Kellis
BDGP Data Production The remaining work on the LIMS and data production system: Completion of migration from EST/cDNA project to new code. Identification and prioritization of experiments Integration of microarray data Specification of success Definition of data transfer to DCC
BDGP Data Production
cDNA Sequencing Corrects Gene Models