FaceBase Hub Updates of Year 4

Slides:



Advertisements
Similar presentations
Holdings Management Overview
Advertisements

My EBSCOhost Tutorial Tutorial support.ebsco.com.
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
A Toolbox for Blackboard Tim Roberts
LeadManager™- Internet Marketing Lead Management Solution May, 2009.
Introducing new web content management tools for Priority...
ENCODE Data Coordination at UCSC Kate Rosenbloom ENCODE DCC Technical Project Manager UCSC Genome Bioinformatics Group September 2010 Genome Browser SAB.
Comprehensive Large Array-data Stewardship System (CLASS) Web Site Tutorial Visit CLASS Site at
AMI GUI Design V1.1 by Kilian Pohl - Reflects changes in AMI MRML Structure - Includes feedback from AMI Workshop in Dec 09.
Customized cloud platform for computing on your terms !
Understanding the Web Site Development Process. Understanding the Web Site Development You need a good project plan Larger projects need a project manager.
1 The following presentation is from the Oracle Webcast “What’s New in P6 EPPM Release 8.1.” As a partner, you may not use the Oracle Power Point template,
Introduction to Mendeley. What is Mendeley? Mendeley is a reference manager allowing you to manage, read, share, annotate and cite your research papers...
SOML Issue Tracking Guide to using the new ETSEDMS server for Issue Tracking.
Support.ebsco.com My EBSCOhost Tutorial Tutorial.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
Session 1 SESSION 1 Working with Dreamweaver 8.0.
1.Getting Started 2.Modifying Design 3.Page 4.News 5.Events 6.Photo Gallery 7.Newsletter Index Training 15 th Mar., 2011.
Website Project Development Presentation by APNARAJ.COM.
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
Session Session 15 FAFSA on the Web - Onward and Upward!
SOML Large Optics Daily Reporting Guide to using the new ETSEDMS server for Large Optics Daily Reporting.
Confidential Web Ordering Overview. Confidential LOG ON:   Enter your login name &
Evaluating & Maintaining a Site Domain 6. Conduct Technical Tests Dreamweaver provides many tools to assist in finalizing and testing your website for.
Building Dashboards SharePoint and Business Intelligence.
MBAT User Workflows View an Atlas Open Data Upload Data Run a Query –Search Data Further Examination Microarray Data Further Examination of 2D Data –Search.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
PMIS Introduction to Request New Catalog Item Training Presentation US Department of Health and Human Services.
Fab25 User Training Cerium Labs LabCollector - LIMS Lynette Ballast.
Core LIMS Training: Entering Experimental Data – Simple Data Entry.
Gennia Michlin, Clinical Data Management Systems (CDMS) Project Leader Mar 2010 New RDC features training.
Canadian Bioinformatics Workshops
Internet Made Easy! Make sure all your information is always up to date and instantly available to all your clients.
Software Application Overview
Working in the Forms Developer Environment
Hub Updates for Year 3 Carl Kesselman.
Training Manual.
NGS Analysis Using Galaxy
Assess Survey Invitations
INDUSTRIAL HYGIENE Conduct Industrial Hygiene Risk Assessments and log, communicate and analyze IH Monitoring results Catalog & assess site projects &
Enhancing Web Map Performance in ArcGIS Online
Parts.cat.com Client training 2016.
Naviance for the Novice
Presenter Date | Location
Bomgar Remote support software
Sport Clips Google Analytics for Franchisee - June 2017
Presenter: Karoline Lapko
SRA Submission Pipeline
Printer Admin Print Job Manager
iCIMS 17.3 Release: Highlights
WEBSITE REP TRAINING.
Chapter 12: Automated data collection methods
Release Highlights Last Updated for September Monthly Release.
Introduction to EBSCOhost
Intermountain West Data Warehouse
Microsoft Project Past, Present and Future
The Educator Development Solution
Academy Hub An eUnomia Factory Solution.
Welcome to the Quantitative Trait Loci (QTL) Tutorial
What’s New in I-Hub for ADP Workforce Now
How to Navigate MSA-U Need help?
Storing and Accessing G-OnRamp’s Assembly Hubs outside of Galaxy
Smart Connect – Supplier Portal Training
Overview of Contract Association Batch Upload
Introduction to EBSCOhost
Academy Hub An eUnomia Factory Solution.
Tutorial Introduction to help.ebsco.com.
Contract Management Software 100% Cloud-Based ContraxAware provides you with a deep set of easy to use contract management features.
FaceBase Hub Years 1 through 5
Presentation transcript:

FaceBase Hub Updates of Year 4 Carl Kesselman

Overview of the past year Growth in data collections and usage statistics Major enhancements of the FaceBase database and service Self-serve data curation tools released, conducted tutorials w/ spoke projects Established common Bioinformatics Pipeline Released Data Browser enhancements - including a new filtering interface Updated TrackHub and introduced JBrowse - new plugin to access FB datasets directly Conducted a usability survey of the FaceBase site and Data Browser

Data & Usage Statistics

A very diverse and growing collection of research data Total Datasets Categories Entries Datasets -- 777 Experiments 12 assay/exp. types 498 Biosamples info. 4 species 1,508 Data Files 7 data types 4,300 - seq: 1,676 - imaging: 1,666 - other: 958 = 6 TBs FB1 = ~ 2 TB File counts by type: mesh | processed | file | tracks | array | sequencing | imaging ------+---------------+----+---------+--------+----------------+--------- 356 | 645 | 913 | 523 | 45 | 498 | 1320 A very diverse and growing collection of research data

Dataset Curation Status FB2 datasets submitted: 142 (73 in Year 4) Datasets submitted with self-curation tools in Yr 4: 66 Species Curated Released Pending Mouse 61 15 Human 8 22 1 Zebrafish 35 Total: 102 37 Curated: FB2 Dataset that has been fully curated to the current FB2 model Released: FB2 Dataset that appears in our catalog with minimal metadata information and is in the process of being curated Pending: FB2 Dataset that appears in our catalog but some data and/or metadata needs to be completed

Data Download Statistics User activity within the Data Browser for the past year: 1,053 data file downloads 21,992 thumbnails* Usage of our Track Hub for the UCSC Genome Browser: 186,819 track downloads** * Filtering out for generic placeholder thumbnails ** The Genome Browser reads byte ranges of the part of the file the user is actually looking at (Beyond Downloads) So these are the number of times files of datasets have been downloaded. But there’s more to measuring the impact of this site. If you’re researching a phenotype, once you’ve seen a good jpg on the webpage, do you really need to download the source file? So here’s usage stats for images as well as track hub, etc.. And note that the number of track file accesses is actually a quite accurate measure of usage because it specifically captures only what the user actually looked at.

Web traffic statistics for www.facebase.org Analytics 5/1/2016 - 4/26/2017 5/3/2017 - 5/3/2018 Pageviews 42,373 61,179 Avg Session Duration 6:07 3:39 “Users” 7,905 14,764 Sessions 12,227 21,810 Returning visitors 60.53% 14.8% Y4: 35% of pageviews were on data browser pages Evidence of continued growth in use year over year

Data Service Enhancements

Challenges of representing diverse research data Experiment types (partial listing!) Genomics: microarray, RNA-seq, ChIP-seq, scRNA-seq, enhancer, whole exome sequencing Imaging: microCT, microMR, optical projection tomography, laser capture microdissection, confocal fluorescent microscopy Morphology and other: morphometric analysis, facial norms, clinical measures/syndromes Organisms Cross-species: Homo sapiens, Danio rerio, Mus musculus, Pan trog. Data types (partial listing!) Raw data: stranded, paired, single- ended reads; array (cel) Processed data: aligned reads, quantification data Images: 3D volumes, 2D thumbnails, Surface mesh/models This represents a sampling of the complexity of representing facebase research data

Requirements and tradeoffs in the re-design Find cross-cutting data Enough detail for re-use Simple enough for data entry Interoperate with external data standards Support automated pipelines and more complex views of data Key is to find a good tradeoff Few details/simple Easy/trivial to enter Unable to reuse/automate (w/out human interpretation) Highly detailed/complex Supports reuse/automation Difficult to enter

Significantly improved data representation Findability/Accessibility: unified representation of diverse data (-seq, array, imaging, enhancer, morph., …) Interoperability: controlled vocabularies (support for concept hierarchies, synonyms, relationships) Reusability: machine-readable encodings of information about experimental results Dataset Experiment Biosample Replicate Data Vocabulary include: from Uberon, OCDM (coming soon), NCBI Taxon, HGNC, MGI, OBI, … Data Types include: Raw Seq, Processed, Tracks; Array; Volumes, Surface Mesh, Images; Supplementary (all other) Critical for supporting Bioinformatics Pipeline and Streamlined Curation

Bioinformatics Pipeline

Developing the new Bioinformatics Pipeline Rationale - ensure that sequencing data between spokes can be compared. Solution - establish a common sequencing pipeline, (based on ENCODE) and operate on a cloud-based genome informatics service (DNAnexus). Process - Visel’s lab in Berkeley administers the routing of sequencing data from FaceBase to DNAnexus and back. To date - Preliminary testing is successful. Finalizing appropriate measures for safeguarding human data. Roll out is expected in 3Q 2018.

Data Browser Enhancements

New features in the Data Browser Redesigned search and filtering interface Image navigation integrated into 3D surface model viewers JBrowse gene browser enhancements Improved page layout for dataset details Performance improvements

Redesigned search and filtering interface Filter through the data via familiar-looking shopping-cart-like categories and lists that can be selected and de-selected to find the specific type of data you’re looking for.

Image Navigation via surface model viewer Building on the surface model viewer we introduced last year. Connecting anatomical regions to the database. Clicking an image of an anatomical region pulls up the list of all datasets with data related to that region. Available on ALL facebase dataset pages DEMO IMAGE (shown): https://dev.facebase.org/~mei/mesh-viewer/view.html?model=https://dev.facebase.org/data/mesh/JI296CCMB/3model.json

JBrowse dynamic plugin for FaceBase The FaceBase integrated Genome Browser is now available for all relevant datasets and is updated dynamically Updates to the data are immediately displayed. Year 5: integrating even further with the database to create customized comparison views across datasets. Main thing here is we created a JBrowse FaceBase plugin. Rather than config files, instead all tracks are pulled dynamically from the database. So changes will be reflected immediately. We can continue to enhance the viewer with information in the database, as well so create customized comparison views across data in FaceBase.

Data Submission

Self-serve Data Curation Tools Rolled out over last summer Added “constraints” to ensure data integrity; automatic accession numbering of datasets Online data entry forms and file upload fields Improved filtering of related data to streamline data entry Desktop tools to upload bulk data files from Windows and Mac Automatically link data files into dataset details; auto-linking of thumbnails, meshes, etc. Command line client for remote servers User training Individual tutorials with spokes Curation wiki: https://github.com/informatics-isi-edu/facebase-curation/wiki Self-curated datasets are rolling out: 66 datasets to date Self-serve curation is key to scaling FaceBase to support new data submitters

Online Data Curation Forms & File Upload Online Metadata Entry Forms Add, edit and delete any metadata entries at any time yourself: datasets, experiments, biosamples, replicates, etc. Online File Upload Forms All (approved) data types supported 3D models w/ config. (color, opacity, etc.) Tracks (instantly available in browser) Processed data, Raw seq files, etc.

Desktop & Command-Line Data Upload Tools Desktop Client Graphical client for Windows & Mac users Command-line Client CLI for uploading directly from a computer cluster or other remote server Common data file layout supported by desktop and command-line tools

Demonstration www.facebase.org https://www.facebase.org/~schuler/anav.html https://www.facebase.org/image-nav-demo.html

For Year 5... Bioinformatics Pipeline: coordinate curation of data and operation of pipeline, full automation. Vocabulary enhancements: integration with MONDO, other vocabs., improve semantic search Anatomical/visual search/navigation Image visualization and display: 3D mesh, imaging results across datasets, control vs mutant Bulk download capability Dashboards, email notifications and quality control metrics JBrowse integration and enhancements: ie, cross-dataset browsing of genomic data FAIR Identifiers and Resolver Historical information tracking Ongoing usability improvements (new in-depth external user interviews for usability)

Let us know your questions, comments, feedback at: Q & A Let us know your questions, comments, feedback at: help@facebase.org