ENCODE Data Coordination at UCSC Kate Rosenbloom ENCODE DCC Technical Project Manager UCSC Genome Bioinformatics Group September 2010 Genome Browser SAB.

Slides:



Advertisements
Similar presentations
From the eyes of an Administrator A general overview of e-CFunds Administrative Site, including navigation and exploring the features of this powerful.
Advertisements

Iowa Code and Rules Easy Navigation and Search Scope Analysis &Planning Phases Completed Request for Execution Funding.
Andrea Eastman-Mullins Information & Technology Coordinator University of North Carolina, Office of the President Teaching and Learning with Technology.
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
PAZAR DATABASE CHIP-SEQ DEPOSIT Wyeth Wasserman.
Copyright OpenHelix. No use or reproduction without express written consent1.
Jim Kent, UC Santa Cruz. A little ENCODE There is a need to do integrated tracks! Some work going on at UCSC Hope to bring in integrated tracks from.
ENCODE Data Coordination at UCSC Kate Rosenbloom ENCODE DCC Technical Project Manager UCSC Genome Bioinformatics Group September 2010 Genome Browser SAB.
Effort in hours Duration Over Weeks Or Months Inception Launch Web Lifecycle Methodology Maintenance Phases Copyright Wonderlane Studios.
Federated Searching Pre-Conference Workshop - The federated searching cookbook Qin Zhu HP Labs Research Library February 18, 2007.
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
Making Sense of the ENCODE Project (ENCyclopedia Of DNA Elements) Data Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences.
ENCODE enhancers 12/13/2013 Yao Fu Gerstein lab. ‘Supervised’ enhancer prediction Yip et al., Genome Biology (2012) Get enhancer list away to genes DNase.
MEGS+ Michigan Electronic Grants System Plus Office of Special Education May 2012.
WebGBrowse A Web Server for GBrowse Configuration Ram Podicheti B.V.Sc. & A.H. (D.V.M.), M.S. Staff Scientist – Bioinformatics Center for Genomics and.
XNAT and Basic Knowledge Vanderbilt University Benjamin Yvernault, Bennett Landman, Brian Boyd 1.
Data Management Subsystem: Data Processing, Calibration and Archive Systems for JWST with implications for HST Gretchen Greene & Perry Greenfield.
Lorie Stolarchuk Learning Technology Trainer 1 What has changed with the 2.7.X Upgrade to CLEW?
Understanding the Web Site Development Process. Understanding the Web Site Development You need a good project plan Larger projects need a project manager.
Copyright OpenHelix. No use or reproduction without express written consent1.
Center for Biomolecular Science and Engineering University of California, Santa Cruz Robert Kuhn, PhD Center for Biomolecular Science and Engineering University.
Genomics Virtual Lab: analyze your data with a mouse click Igor Makunin School of Agriculture and Food Sciences, UQ, April 8, 2015.
Computer Lab (I) Introduction of galaxy and UCSC genome browser.
The Impact of Community: AT Plug-in Development at BYU Cory L. Nimer Brigham Young University.
U.S. Department of the Interior U.S. Geological Survey Next Generation Data Integration Challenges National Workshop on Large Landscape Conservation Sean.
Second Annual Japan CDISC Group (JCG) Meeting 28 January 2004 Julie Evans Director, Technical Services.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
University of Wisconsin System HRS Project Update to ITC November 19, 2010.
2010 W EST V IRGINIA GIS C ONFERENCE Wednesday, June 9, 2010.
ENCODE Data Coordination at UCSC Kate Rosenbloom ENCODE DCC Technical Project Manager UCSC Genome Bioinformatics Group September 2010 Genome Browser SAB.
CBEO Portal Presentation 2/6/2008, 4:30pm EST SDSC Or link from
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
© 2006 ITT Educational Services Inc. System Analysis for Software Engineers: Unit 3 Slide 1 Chapter 16 Maintaining Information Systems.
NIH Extracellular RNA Communication Consortium 2 nd Investigators’ Meeting May 19 th, 2014 Sai Lakshmi Subramanian – (Primary
Customer Tracking System. Lucent designs, manufacturers and sells telecommunications networks and equipment to communications service providers Challenge:
Dakota State University.
Sackler Medical School
M&E requirements for grant signing: M&E Plan Workshop on effective Global Fund Grant negotiation and implementation planning January 2008 Manila,
Copyright OpenHelix. No use or reproduction without express written consent1.
GEOSS Interoperability Workshop November 12-13, Introduction to the SIF Steven F. Browdy, IEEE
Copyright OpenHelix. No use or reproduction without express written consent1.
Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions.
The UCSC Table Browser & Custom Tracks Advanced searching and discovery using the UCSC Table Browser and Custom Tracks Osvaldo Graña CNIO Bioinformatics.
Copyright OpenHelix. No use or reproduction without express written consent1.
State of Palestine Palestinian Central Bureau of Statistics (PCBS) UNSD DFID Project on National Development Indicators October, 2014.
Internet Organization Structure
Children’s Health Exposure Analysis Resource (CHEAR) CHEAR Center for Data Science Susan Teitelbaum, PhD November 4, 2015.
LG DATABASE AND REPORTING SYSTEM (LGDRS) 8-9 September 2015
Copyright OpenHelix. No use or reproduction without express written consent1.
UPDATE ON TRB INFORMATION SERVICES GTRIC June 8, 2003 Barbara Post Manager, Information Services
BEN Tools & Isovera Services Isovera Consulting Cal Collins, Shakib Mostafa, Sergey Demidenko Feb
Development of the West Virginia University Electronic Theses & Dissertations System Presented By Haritha Garapati at ETD the 7 th International.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Global Change Master Directory (GCMD) Mission “To assist the scientific community in the discovery of Earth science data, related services, and ancillary.
Accessing and visualizing genomics data
Welcome to the combined BLAST and Genome Browser Tutorial.
National Workshop on ANSN Capacity Building IT modules OAP, Thailand 25 th – 27 th June 2013 KUNJEER Sameer B Pool of experts database and further enhancements.
IGV Demo Slides:/g/funcgen/trainings/visualization/Demos/IGV_demo.ppt Galaxy Dev: 0.
Practice:submit the ChIP_Streamline.pbs 1.Replace with your 2.Make sure the.fastq files are in your GMS6014 directory.
Denise Carvalho-Silva Ensembl Outreach
Regulation of Gene Expression
Hub Updates for Year 3 Carl Kesselman.
National Human Genome Research Institute
Regional Workshop on Data Dissemination and Communication
Day 5 Session 29: Questions and follow-up…. James C. Fleet, PhD
Angie S. Hinrichs1, Kate R. Rosenbloom1, Matthew L
SRA Submission Pipeline
Yating Liu July 2018 G-OnRamp workshop
Supplier Webinar: IBM PCD Data Collection Process Changes & New Version of IBM’s Product Content Declaration(PCD) Form February 21 and 26, 2019 This.
FaceBase Hub Years 1 through 5
Presentation transcript:

ENCODE Data Coordination at UCSC Kate Rosenbloom ENCODE DCC Technical Project Manager UCSC Genome Bioinformatics Group September 2010 Genome Browser SAB Review

ENCODE in a nutshell: Y3Q3 2 species (human, mouse) 3 years of production phase (2 years left) 17 grants, 27 labs 20 experiment types 27 browser tracks 140 cell & tissue types 1559 datasets

Topics: 1.DCC role in ENCODE 2.Progress since last SAB review 3.Challenges 4.Browser impact

DCC role in ENCODE Define data formats and submission process Load, display, curate, review & release data Collect metadata and documentation Provide public website, viz, tools User outreach & support Support consortium communications (wiki) Support analysis group

Data submission website: submissions as of September 2010

Lifecycle of a data submission sub.tar.gz lab uploads to submission website loaded uploaded pipeline validates and loads into database wrangler configures browser track data fails validation or loading validate failed, load failed displayed (on test browser) lab approves track on test browser Q/A reviews and releases approved, reviewing released (to public browser)

ENCODE track quality review Data format checks Description and metadata complete & correct Configurability Display at different zoom levels and visibilities Performance Does the data make biological sense ? Usability

Released Data: 27 tracks in hg18, representing 860 experiments as of September 2010

Features planned for Year 2: High-resolution wiggle (bigWig) DONE RNAseq display enhancements DONE NCBI accessioning of seq data IN PROGRESS Track search tool IN REVIEW Progress since last SAB review (Feb ‘09) Plus: Integrated regulatory track Hg19/GRCh37 migration BAM support (spec, validation, display, c-tracks) Mid-course review, 4 data freezes, 2 analysis workshops, DCC site review Mouse ENCODE

Integrated regulatory tracks UCSC-developed integrative ENCODE track – shows enrichment of histone modifications suggestive of enhancer and promoter activity, DNAse clusters indicating open chromatin, regions of transcription factor binding, and transcription levels, derived from ENCODE data collected in multiple cell lines.

ENCODE data at NCBI GEO: Caltech RNA-seq

Mouse ENCODE experiment matrix 4 grants funded by the ARRA, 3 are now submitting data more cell types more factors

Simple search looks at: Track names and labeling Tracl description Metadata terms (specifically ENCODE controlled vocabulary) Finding data in the browser: Simple free-text search

Advanced search allows selection by defined metadata terms. (Currently only for ENCODE tracks) This search finds histone modification H3K4me3 as seen in H1-hESC cells. Finding data in the browser: By metadata terms

The results of the search on the previous slide is a single track of histone modification H3K4me3 as seen in H1-hESC cells. Clicking ‘View in Browser’ will display this data. Results from track search

Challenges Number of labs, difficulty of some Metadata expansion, special handling beyond normal browser data Multiple customers: NHGRI, analysis group, labs, user community Production vs. research Mission expansion: GEO/SRA, standards, ARRA, year 5 Reporting overhead Engineering staff -> hire ‘wranglers’ Funding delays

Impact on browser Expanded data – mostly useful, some not so much Pushes development of viz, tools, formats for large datasets Competes for staff and mgmt resources

People at the DCC PI: Jim Kent Technical project manager: Kate Rosenbloom Engineering / Wrangling: Tim Dreszer, Venkat Malladi, Brian Raney, Cricket Sloan, Melissa Cline Outreach, usability: Melissa, OpenHelix (contractor) Submissions website: Galt Barber GEO tools: Krishna Roskin Quality assurance: Katrina Learned, Vanessa Swing Browser management: Donna Karolchik, Bob Kuhn, Ann Zweig

Reporting: Monthly Quarterly Annual

Additional slides

Plans Track search tool ENCODE tutorial Portal upgrade Complete GEO submissions Analysis tracks ARRA grants (protegenomics, epitope-tag) Release Mouse data

Browser features developed for ENCODE High resolution wiggle (bigWig) HTS formats (BAM and bigBed) BIG custom tracks View-based tracks Data selection matrix Metadata links Coming soon: Track Search

Initial tracks of mouse data (test browser)

GEO Submission Pipeline

Portal

ENCODE Outreach Publication: NAR 2010 Database issue (2011 update in press) Presentations: CSHL Statistical Analysis course June 2010, Stanford Computational Systems Bioinformatics, Aug 2010 Posters: CSHL Biology of Genomes May 2009, CSB 2010

OpenHelix ENCODE tutorial