1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project.

Slides:



Advertisements
Similar presentations
Duke Enterprise CMS CGS Meeting 5/7/2004 Cheryl Crupi Senior Manager, Duke OIT Office of Web Services.
Advertisements

DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Copyright Hub Software Engineering Ltd 2010All rights reserved Hub Document Manager Product Overview.
Unveiling ProjectWise V8 XM Edition. ProjectWise V8 XM Edition An integrated system of collaboration servers that enable your AEC project teams, your.
Multi-Mode Survey Management An Approach to Addressing its Challenges
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Introduction KWizCom Business Card Founded in 2005 Headquartered in Toronto Global provider of add-ons and services customers worldwide Business.
T Sponsors Sameer Chabungbam Principal Program Manager, Microsoft Connector API Apps BizTalk Summit 2015 – London ExCeL London | April 13th & 14th.
EdShare: next steps for sharing in learning & teaching Dr. Jessie Hey University of Southampton, Education Development Network Group Thursday 8 May, 2008.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Collaborative Information Systems for Student Projects Chapter Extension 2.
© 2012, Cognizant Technology Solutions Monalisa Sen STC India Chapter, 2012 Social Documentation-The Era of Interaction.
Libraries and Institutional Content Management Systems
Definitions Collaboration – working together on team projects and sharing information, often through ad-hoc processes, to accomplish project goals. Document.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Shop 9000 Product Review 2007 Visual User Group Nov 21 st 2007.
Digital Asset Management for All? Visualising a Flexible DAMS Solution for Small and Medium Scale Institutions Paul Bevan Llyfrgell Genedlaethol Cymru.
STORY TITLE 1 1 1H ‘08 Lotus Quickr 8.1 Lotus Quickr Lotus Quickr 8.2 Lotus Quickr Lotus Quickr Entry 2. UI enhancements 3. System administration.
Git for Version Control These slides are heavily based on slides created by Ruth Anderson for CSE 390a. Thanks, Ruth! images taken from
Sharepoint Makes daily tasks more efficient and improves internal as well as external collaboration Not just cost savings, but adds business value.
Web 2.0 for Government Knowledge Management Everyone benefits by sharing knowledge March 24, 2010 Emerging Technologies Work Group Rich Zaziski, CEO FYI.
ORCID Technical Report May 18, Development Approach 2 Alpha Completed Spring 2010 Self-claim oriented Limited light integration with a few participant.
Back to content Final Presentation Mr. Phay Sok Thea, class “2B”, group 3, Networking Topic: Mail Client “Outlook Express” *At the end of the presentation.
© 2013 National Ecological Observatory Network, Inc. ALL RIGHTS RESERVED. THE NEON APPROACH TO DATA INGEST, CURATION, AND SHARING Christine Laney (Data.
Cube Enterprise Database Solution presented to MTF GIS Committee presented by Minhua Wang Citilabs, Inc. November 20, 2008.
Building Search Portals With SP2013 Search. 2 SharePoint 2013 Search  Introduction  Changes in the Architecture  Result Sources  Query Rules/Result.
1 Introductory Notes on the Git Source Control Management Ric Holt, 8 Oct 2009.
LinkWare LinkWare is a web-enabled, open platform for generation and distribution of electronic technical documentation and e–catalogues. The LinkWare.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
WCM Platform Improvements ECM and Enterprise Metadata Advanced Routing and Document Sets In Place Records Management.
Version control Using Git Version control, using Git1.
© 2008 IBM Corporation ® IBM Cognos Business Viewpoint Miguel Garcia - Solutions Architect.
ITEC 370 Lecture 16 Implementation. Review Questions? Design document on F, feedback tomorrow Midterm on F Implementation –Management (MMM) –Team roles.
…using Git/Tortoise Git
Information Systems and Network Engineering Laboratory II DR. KEN COSH WEEK 1.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Collaborative Modeling Best Practices for Distributed Teams Ben Constable Chief Operations Officer Sparx Systems CIM Users Group Meeting,
Introducing HingX now with Capacity Development Network.
0 SharePoint Search 2013 Rafael de la Cruz SharePoint Developer Seneca Resources twitter.com/delacruz_rafael
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
1 GIT NOUN \’GIT\ A DISTRIBUTED REVISION CONTROL AND SOURCE CODE MANAGEMENT (SCM) SYSTEM WITH AN EMPHASIS ON SPEED. INITIALLY DESIGNED AND DEVELOPED BY.
Virtual techdays INDIA │ august 2010 ENTERPRISE CONTENT MANAGEMENT WITH SHAREPOINT 2010 Naresh K Satapathy │ Solution Specialist, Microsoft Corporation.
Building a Topic Map Repository Xia Lin Drexel University Philadelphia, PA Jian Qin Syracuse University Syracuse, NY * Presented at Knowledge Technologies.
Any data..! Any where..! Any time..! Linking Process and Content in a Distributed Spatial Production System Pierre Lafond HydraSpace Solutions Inc
ArcGIS Editor for OpenStreetMap: Contributing Data Christine White.
Module 9 User Profiles and Social Networking. Module Overview Configuring User Profiles Implementing SharePoint 2010 Social Networking Features.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
An Early Prototype of the Comprehensive Extensible Data Documentation and Access Repository (CED 2 AR) William C. Block and Jeremy Williams, 1 John Abowd.
Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files William C. Block Jeremy Williams Lars Vilhuber Carl Lagoze.
Introduction to Git Yonglei Tao GVSU. Version Control Systems  Also known as Source Code Management systems  Increase your productivity by allowing.
1 Crowdsourcing Metadata – Challenges and Outlook Lars Vilhuber, William Block (Cornell University) Washington, 10 May 2016.
Information Systems and Network Engineering Laboratory I DR. KEN COSH WEEK 1.
Virtual Lab Overview 5/21/2015 xxxxxxxxxx NWS/MDL/CIRA.
Git A distributed version control system Powerpoint credited to University of PA And modified by Pepper 28-Jun-16.
Can <oXygen/> XML Web Author work with my documents?
Integrating ArcSight with Enterprise Ticketing Systems
GeoNetwork OpenSource: Geographic data sharing for everyone
Information Systems and Network Engineering Laboratory II
DSpace standard Data model and DSpace-CRIS
An Overview of Data-PASS Shared Catalog
Version control, using Git
Joseph JaJa, Mike Smorul, and Sangchul Song
What’s New in Colectica 5.3 Part 1
Getting Started with Git and Bitbucket
Question Banks, Reusability, and DDI 3.2 (Use Parameters)
Agro Hackathon Hack 5: Agro Portal and VEST Registry
Managing Private and Public Views of DDI Metadata Repositories
Final Review 27th March Final Review 27th March 2019.
Presentation transcript:

1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

Part of the NSF Census Research Network (NCRN) (Grant # ) Lightweight, DDI driven web application Enables search, browsing and editing across codebooks Provides an open API for developers Live example at demo.ncrn.cornell.edudemo.ncrn.cornell.edu What is CED 2 AR? 2

Emphasis on collaborative editing (small set of users) – Online editor – Versioned and tracked metadata through Git – Tied into external authentication frameworks EDDI 2014 “Collaborative editing…” 3

Support crowdsourced DDI curation through CED 2 AR – Accommodating more users – Allow for application specific customization – Create incentives and guidance for users – Abstract technical barriers Now 4

Initial metadata (DDI) has been created and ingested into a CED²AR instance Metadata may be – Incomplete (valid DDI but empty or non-informative fields) – Lacking user feedback (on value or constraints of variables) Assumption: – Archivist is not the only specialist on a particular dataset – Users collectively have information that is not initially included in metadata Starting point here 5

1.User searches through CED 2 AR or external search engine 2.User discovers data relevant to their query 3.User can choose to contribute structured or unstructured documentation for datasets – No DDI knowledge required – user documents on fields, without needing to know how that fits into a particular metadata structure – May involve creating links (provenance) to other datasets User Workflow 6

1.Search engine optimization enhancements to DDI 2.Exposing community contributions Retaining Users 1.Flexible authentication 2.Easy to use editor 3.Metadata scoring 4.Tracking and identifying community contributions Attracting Users 7

8 Search Engine Optimization Expanding the interoperability of DDI

Support OpenID and OAuth2 – Currently using Google with OAuth2 – Developing connectors to work with additional providers CED 2 AR handles identity management Authentication 9

10 Editing Automatic validation, and editor for rich content

11 Editing Allows for ASCII Math

12 Editing Growing support for additional DDI fields, exposed or not

13 Metadata Scoring Exposing sparse documentation

14 User Contributions

Uses Git, a distributed version control system Every aspect of the system is configurable – Scheduled tasks check for changes – Once changes exceed threshold, they are pushed – Pending changes are pushed after a time limit or on demand Versioning 15

Architecture 16 Master Branch (Official version) User Contributed Branch Codebook User gets copy of DDI to edit

Architecture 17 Master Branch (Official version) User Contributed Branch Codebook 1.0 Codebook 1.0 rev 1 Codebook 1.0 rev 2 Codebook 1.0 rev N Codebook 1.0 … 1. User gets copy of DDI to edit 2. Each edit is versioned

Architecture 18 Master Branch (Official version) User Contributed Branch Codebook 1.0 Codebook 1.0 rev 1 Codebook 1.0 rev 2 Codebook 1.0 rev N Codebook 1.0 Codebook 1.1 … 1. User gets copy of DDI to edit 3. Data provider merges user’s edits back into official DDI 2. Each edit is versioned

Architecture

Web Application Server Local Repository 20 Database Remote Repository

Architecture Server 21 Remote Repository CED 2 AR Instance

Architecture 22 Remote Repository CED 2 AR Instance

Architecture 23 Remote Repository CED 2 AR Instance (Official)

Our implementation uses Bitbucket Commit messages describe changes Users linked by address Commit hashes are stored on CED 2 AR Remote synchronization is optional Remote Location 24

Remote Location 25

26 Tracking Changes

Continued Work: Improving merge control 27 Master Branch (Official version) User Contributed Branch Codebook 1.0 Codebook 1.0 rev 1 Codebook 1.0 rev 2 Codebook 1.0 rev N Codebook 1.0 Codebook 1.1 … 1. User gets copy of DDI to edit 3. Data provider merges user’s edits back into official DDI 2. Each edit is versioned

28 Workflow as described assumes metadata curator merges information Within the limits of a 24-hour day: what’s the likelihood that that process scales? Alternate: “wiki” methodology Continued Work: The uncontrolled merge

Architecture (alternate) Master Branch (Official version) Codebook 1.0 Wiki Branch (Community version)

Architecture (alternate) Master Branch (Official version) Codebook 1.0 Wiki Branch (Community version) 1 User Branches 2 Users pull from wiki branch into any instance of CED 2 AR

Architecture (alternate) Master Branch (Official version) Codebook 1.0 Wiki Branch (Community version) 1 User Branches 2 Codebook rev 1 Users push back to branch manually

Architecture (alternate) Master Branch (Official version) Codebook 1.0 Wiki Branch (Community version) 1 User Branches 2 3 Codebook rev 1 New users work off most recent revision by default

Architecture (alternate) Master Branch (Official version) Codebook 1.0 Wiki Branch (Community version) 1 User Branches 2 3 Codebook rev 1 Codebook rev X …

Architecture (alternate) Master Branch (Official version) Codebook 1.0 Wiki Branch (Community version) 1 User Branches 2 3 Codebook rev 1 Codebook rev X …

Architecture (alternate) Master Branch (Official version) Codebook 1.0 Wiki Branch (Community version) 1 User Branches 2 3 Codebook rev 1 Codebook rev X … User is responsible for merging

Architecture (alternate) 36 Master Branch (Official version) Codebook 1.0 Codebook rev X Wiki Branch (Crowdsource version) CED²AR User Interface exposes both versions (with attribution)

Merging crowd-sourced content back into official documentation Continued Work: Improving merge control 37

Thank you! Questions? ncrn.cornell.edu github.com/ncrncornell 38

39 Extra slides

Tagging variables with a controlled vocabulary and a folksonomy Continued Work: Facilitating Editing 40

Ingest Workflow 41