ENCODE Data Coordination at UCSC Kate Rosenbloom ENCODE DCC Technical Project Manager UCSC Genome Bioinformatics Group September 2010 Genome Browser SAB.

Slides:



Advertisements
Similar presentations
From the eyes of an Administrator A general overview of e-CFunds Administrative Site, including navigation and exploring the features of this powerful.
Advertisements

CLIENT SERVICE, IT(IL) BEST PRACTICES & REQUEST TRACKER ON A FEDERATED IT CAMPUS CLICK CLICK
Iowa Code and Rules Easy Navigation and Search Scope Analysis &Planning Phases Completed Request for Execution Funding.
Knowledge Management, Texas-style Session 508. Presented by: Belinda Perez Stephanie Moorer Knowledge Management, Texas-Style.
Andrea Eastman-Mullins Information & Technology Coordinator University of North Carolina, Office of the President Teaching and Learning with Technology.
Sustainability Tracking, Assessment & Rating System Reporting Tool 101 stars.aashe.org.
How to best leverage support & minimize issues By Lenin Martinez Interneer Customer Support.
Webinar will start shortly Thank you for joining us!
SciVal Experts & SciVal Funding Information Sessions.
Information Retrieval in Practice
ENCODE Data Coordination at UCSC Kate Rosenbloom ENCODE DCC Technical Project Manager UCSC Genome Bioinformatics Group September 2010 Genome Browser SAB.
PAZAR DATABASE CHIP-SEQ DEPOSIT Wyeth Wasserman.
Copyright OpenHelix. No use or reproduction without express written consent1.
A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources Simon Jones*, Guy Warner*,
RefWorks Create bibliography in choice of style– MLA, APA, etc. “Saves time. Awesome!” “It’s nice to keep track of resources.” “It created the bibliography.
ENCODE Data Coordination at UCSC Kate Rosenbloom ENCODE DCC Technical Project Manager UCSC Genome Bioinformatics Group September 2010 Genome Browser SAB.
Overview of Search Engines
Federated Searching Pre-Conference Workshop - The federated searching cookbook Qin Zhu HP Labs Research Library February 18, 2007.
Overview of New Behind the Blackboard for Blackboard Customers APRIL 2012 TM.
SQA Work Procedures.
Making Sense of the ENCODE Project (ENCyclopedia Of DNA Elements) Data Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences.
MEGS+ Michigan Electronic Grants System Plus Office of Special Education May 2012.
Overview of Features and Reports Version 2.0 Send inquiries to:
Trimble Connected Community
XNAT and Basic Knowledge Vanderbilt University Benjamin Yvernault, Bennett Landman, Brian Boyd 1.
Lorie Stolarchuk Learning Technology Trainer 1 What has changed with the 2.7.X Upgrade to CLEW?
Center for Biomolecular Science and Engineering University of California, Santa Cruz Robert Kuhn, PhD Center for Biomolecular Science and Engineering University.
Deliverable Readiness Review LexEVS 5.1 December 17, 2009.
Uganda Science Digital Library (USDL) Digitizing and publishing documents Bergen – Makerere visit February 2005.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
LET’S WORK TOGETHER: Integrating Social Media, Online Marketing, and Outreach ALA Annual 2012 June 25, 2012 Marshall Breeding Independent Consultant, Author,
LinkOut Update Medical Library Association Annual Meeting 2006 Phoenix, AZ.
Presented by: Alicia Goodwin
WEBMOB Portal Management Lifecycle – Current features and issues 3rd WEBMOB project meeting Hvar MEF, Milan Zdravković 3rd WEBMOB project meeting.
& & YORK RESEARCH Research Communications Overview for ORU coordinators Elizabeth Monier-Williams—April 20, 2010.
CIP Quality System for Genebank ISO 17025
Reorientation for Moodle 2 Staff Guide. File Repositories With Moodle 2’s file repository system: Duplicate files are only stored once, saving disk space.
Pathway Interaction Database (PID) Market Research BioPortals Tiger Team Meeting Mervi Heiskanen January 31, 2013.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Project Management Documentation The Problems: 1)Many developers must share information. 2)New developers must get up to speed quickly. 3)Documentation.
The generic Genome Browser (GBrowse) A combination database and interactive web page for manipulating and displaying annotations on genomes Developed by.
Using the Right Method to Collect Information IW233 Amanda Murphy.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Copyright OpenHelix. No use or reproduction without express written consent1.
Who's in charge here? Jim Kent ENCODE Data Coordinating Center (DCC) University of California Santa Cruz Finding and characterizing regulatory regions.
The UCSC Table Browser & Custom Tracks Advanced searching and discovery using the UCSC Table Browser and Custom Tracks Osvaldo Graña CNIO Bioinformatics.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
BEN Scholars Program Wrap Up Presentation by Yolanda S. George, AAAS.
Copyright OpenHelix. No use or reproduction without express written consent1.
A Volunteer Supervisor’s Guide to Volunteer Connection a modern, online volunteer management solution.
LG DATABASE AND REPORTING SYSTEM (LGDRS) 8-9 September 2015
Internet Documentation and Integration of Metadata (IDIOM) Presented by Ahmet E. Topcu Advisor: Prof. Geoffrey C. Fox 1/14/2009.
Copyright OpenHelix. No use or reproduction without express written consent1.
Multi-institutional collaborative research program. Established in 1988 to document the composition and status of natural vegetation of the Carolinas.
The role of the National Agricultural Library in arthropod genomics research - implementing and developing tools for genomic data management Monica Poelchau.
+ Publishing Your First Post USING WORDPRESS. + A CMS (content management system) is an application that allows you to publish, edit, modify, organize,
CyVerse-enabled NCBI Sequence Read Archive (SRA) Submission Pipeline
TSS Project Update WRAP Technical Analysis Forum Boise, ID May 22, 2007.
Accessing and visualizing genomics data
Sequence Curation Paul Davis Sanger Institute. Overview Sequence curation within WormBase consortium. Import of sequence data. Prediction stats. Work.
Public Libraries Survey Data File Overview. What We’ll Talk About PLS: Public Libraries Survey State level data Public library data (Administrative Entities)
Welcome to the combined BLAST and Genome Browser Tutorial.
PDS4 Project Report PDS MC F2F UCLA Dan Crichton November 28,
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Hub Updates for Year 3 Carl Kesselman.
Using ArrayExpress.
Angie S. Hinrichs1, Kate R. Rosenbloom1, Matthew L
to the Fort Worth Chamber’s Member Information Center
SRA Submission Pipeline
FaceBase Hub Years 1 through 5
Presentation transcript:

ENCODE Data Coordination at UCSC Kate Rosenbloom ENCODE DCC Technical Project Manager UCSC Genome Bioinformatics Group September 2010 Genome Browser SAB Review

ENCODE in a nutshell: Y3Q3 2 species (human, mouse) 3 years of production phase (2 years left) 17 grants, 27 labs 20 experiment types 27 browser tracks 140 cell & tissue types 1559 datasets

Topics: 1.DCC role in ENCODE 2.Progress since last SAB review 3.Challenges 4.Browser impact

DCC role in ENCODE Define data formats and submission process Load, display, curate, review & release data Collect metadata and documentation Provide public website, viz, tools User outreach & support Support consortium communications (wiki) Support analysis group

Data submission website: submissions as of September 2010

Lifecycle of a data submission sub.tar.gz lab uploads to submission website loaded uploaded pipeline validates and loads into database wrangler configures browser track data fails validation or loading validate failed, load failed displayed (on test browser) lab approves track on test browser Q/A reviews and releases approved, reviewing released (to public browser)

ENCODE track quality review Data format checks Description and metadata complete & correct Configurability Display at different zoom levels and visibilities Performance Does the data make biological sense ? Usability

Released Data: 27 tracks in hg18, representing 860 experiments as of September 2010

Features planned for Year 2: High-resolution wiggle (bigWig) DONE RNAseq display enhancements DONE NCBI accessioning of seq data IN PROGRESS Track search tool IN REVIEW Progress since last SAB review (Feb ‘09) Plus: Integrated regulatory track Hg19/GRCh37 migration BAM support (spec, validation, display, c-tracks) Mid-course review, 4 data freezes, 2 analysis workshops, DCC site review Mouse ENCODE

Integrated regulatory tracks UCSC-developed integrative ENCODE track – shows enrichment of histone modifications suggestive of enhancer and promoter activity, DNAse clusters indicating open chromatin, regions of transcription factor binding, and transcription levels, derived from ENCODE data collected in multiple cell lines.

ENCODE data at NCBI GEO: Caltech RNA-seq

Mouse ENCODE experiment matrix 4 grants funded by the ARRA, 3 are now submitting data more cell types more factors

Simple search looks at: Track names and labeling Tracl description Metadata terms (specifically ENCODE controlled vocabulary) Finding data in the browser: Simple free-text search

Advanced search allows selection by defined metadata terms. (Currently only for ENCODE tracks) This search finds histone modification H3K4me3 as seen in H1-hESC cells. Finding data in the browser: By metadata terms

The results of the search on the previous slide is a single track of histone modification H3K4me3 as seen in H1-hESC cells. Clicking ‘View in Browser’ will display this data. Results from track search

Challenges Number of labs, difficulty of some Metadata expansion, special handling beyond normal browser data Multiple customers: NHGRI, analysis group, labs, user community Production vs. research Mission expansion: GEO/SRA, standards, ARRA, year 5 Reporting overhead Engineering staff -> hire ‘wranglers’ Funding delays

DCC site visit recommendations 1.Data accessibility Track search, Feature supertracks, Tutorial 2.Data usability 3.Data quality Post standards on website, Flag non-conforming data 4.Long-term repository Deposit data to GEO 5.Metadata user review 6.Use cases Session gallery on website 7.Reproducibility in publications 8.Web site Data snapshot on website, Improve labeling 9.Analysis data sets Integrated regulatory track, Imports from AWG 10.Metrics for success Blue items are DCC-specific

Impact on browser Expanded data – mostly useful, some not so much Pushes development of viz, tools, formats for large datasets Competes for staff and mgmt resources

People at the DCC PI: Jim Kent Technical project manager: Kate Rosenbloom Engineering / Wrangling: Tim Dreszer, Venkat Malladi, Brian Raney, Cricket Sloan, Melissa Cline Outreach, usability: Melissa, OpenHelix (contractor) Submissions website: Galt Barber GEO tools: Krishna Roskin Quality assurance: Katrina Learned, Vanessa Swing Browser management: Donna Karolchik, Bob Kuhn, Ann Zweig

Reporting: Monthly Quarterly Annual

Additional slides

Plans ENCODE tutorial Portal upgrade Complete GEO submissions Analysis tracks ARRA grants (protegenomics, epitope-tag) Release Mouse data

Browser features developed for ENCODE High resolution wiggle (bigWig) HTS formats (BAM and bigBed) BIG custom tracks View-based tracks Data selection matrix Metadata links Coming soon: Track Search

Initial tracks of mouse data (test browser)

GEO Submission Pipeline

Portal

ENCODE Outreach Publication: NAR 2010 Database issue (2011 update in press) Presentations: CSHL Statistical Analysis course June 2010, Stanford Computational Systems Bioinformatics, Aug 2010 Posters: CSHL Biology of Genomes May 2009, CSB 2010

OpenHelix ENCODE tutorial

Key items from site visit recommendations Make a track search tool to make it easy to find all data on one cell line or one transcription factor Organize data by biochemical entities rather than by lab. Put effort into high level documentation on website –“Sessions gallery” to show use cases –Page that give overview of what data is available in ENCODE including cells, antibodies, and assays. –Put up data summaries, figures, and presentations generated by the AWG onto site

Some user comments from DCC survey Linking annotation across all cell types and linking all annotations across one cell type would be quite nice. As it is now, it takes a fair bit of manual manipulation to do this. Great job, awesome resource. Thanks to all! Need more cell types and conditions. Do data from non- ENCODE consortium groups get incorporated? Amazingly there isn't a useful Encode summary, let alone a detailed description of the project and results. There's a nice do-loop with links between UCSC & NHGRI that don't lead anywhere. Is there a publication or link that I'm missing somewhere that informs, educates & is a users' guide? Great project, just difficult to sift through in it's current form. Encode only covers 1% of the human genome. Not sufficient coverage