Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project.

Similar presentations


Presentation on theme: "1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project."— Presentation transcript:

1 1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project

2 Part of the NSF Census Research Network (NCRN) (Grant #1131848) Lightweight, DDI driven web application Enables search, browsing and editing across codebooks Provides an open API for developers Live example at demo.ncrn.cornell.edudemo.ncrn.cornell.edu What is CED 2 AR? 2

3 Emphasis on collaborative editing (small set of users) – Online editor – Versioned and tracked metadata through Git – Tied into external authentication frameworks EDDI 2014 “Collaborative editing…” 3

4 Support crowdsourced DDI curation through CED 2 AR – Accommodating more users – Allow for application specific customization – Create incentives and guidance for users – Abstract technical barriers Now 4

5 Initial metadata (DDI) has been created and ingested into a CED²AR instance Metadata may be – Incomplete (valid DDI but empty or non-informative fields) – Lacking user feedback (on value or constraints of variables) Assumption: – Archivist is not the only specialist on a particular dataset – Users collectively have information that is not initially included in metadata Starting point here 5

6 1.User searches through CED 2 AR or external search engine 2.User discovers data relevant to their query 3.User can choose to contribute structured or unstructured documentation for datasets – No DDI knowledge required – user documents on fields, without needing to know how that fits into a particular metadata structure – May involve creating links (provenance) to other datasets User Workflow 6

7 1.Search engine optimization enhancements to DDI 2.Exposing community contributions Retaining Users 1.Flexible authentication 2.Easy to use editor 3.Metadata scoring 4.Tracking and identifying community contributions Attracting Users 7

8 8 Search Engine Optimization Expanding the interoperability of DDI

9 Support OpenID and OAuth2 – Currently using Google with OAuth2 – Developing connectors to work with additional providers CED 2 AR handles identity management Authentication 9

10 10 Editing Automatic validation, and editor for rich content

11 11 Editing Allows for ASCII Math

12 12 Editing Growing support for additional DDI fields, exposed or not

13 13 Metadata Scoring Exposing sparse documentation

14 14 User Contributions

15 Uses Git, a distributed version control system Every aspect of the system is configurable – Scheduled tasks check for changes – Once changes exceed threshold, they are pushed – Pending changes are pushed after a time limit or on demand Versioning 15

16 Architecture 16 Master Branch (Official version) User Contributed Branch Codebook 1.0 1. User gets copy of DDI to edit

17 Architecture 17 Master Branch (Official version) User Contributed Branch Codebook 1.0 Codebook 1.0 rev 1 Codebook 1.0 rev 2 Codebook 1.0 rev N Codebook 1.0 … 1. User gets copy of DDI to edit 2. Each edit is versioned

18 Architecture 18 Master Branch (Official version) User Contributed Branch Codebook 1.0 Codebook 1.0 rev 1 Codebook 1.0 rev 2 Codebook 1.0 rev N Codebook 1.0 Codebook 1.1 … 1. User gets copy of DDI to edit 3. Data provider merges user’s edits back into official DDI 2. Each edit is versioned

19 Architecture

20 Web Application Server Local Repository 20 Database Remote Repository

21 Architecture Server 21 Remote Repository CED 2 AR Instance

22 Architecture 22 Remote Repository CED 2 AR Instance

23 Architecture 23 Remote Repository CED 2 AR Instance (Official)

24 Our implementation uses Bitbucket Commit messages describe changes Users linked by email address Commit hashes are stored on CED 2 AR Remote synchronization is optional Remote Location 24

25 Remote Location 25

26 26 Tracking Changes

27 Continued Work: Improving merge control 27 Master Branch (Official version) User Contributed Branch Codebook 1.0 Codebook 1.0 rev 1 Codebook 1.0 rev 2 Codebook 1.0 rev N Codebook 1.0 Codebook 1.1 … 1. User gets copy of DDI to edit 3. Data provider merges user’s edits back into official DDI 2. Each edit is versioned

28 28 Workflow as described assumes metadata curator merges information Within the limits of a 24-hour day: what’s the likelihood that that process scales? Alternate: “wiki” methodology Continued Work: The uncontrolled merge

29 Architecture (alternate) Master Branch (Official version) Codebook 1.0 Wiki Branch (Community version)

30 Architecture (alternate) Master Branch (Official version) Codebook 1.0 Wiki Branch (Community version) 1 User Branches 2 Users pull from wiki branch into any instance of CED 2 AR

31 Architecture (alternate) Master Branch (Official version) Codebook 1.0 Wiki Branch (Community version) 1 User Branches 2 Codebook rev 1 Users push back to branch manually

32 Architecture (alternate) Master Branch (Official version) Codebook 1.0 Wiki Branch (Community version) 1 User Branches 2 3 Codebook rev 1 New users work off most recent revision by default

33 Architecture (alternate) Master Branch (Official version) Codebook 1.0 Wiki Branch (Community version) 1 User Branches 2 3 Codebook rev 1 Codebook rev X …

34 Architecture (alternate) Master Branch (Official version) Codebook 1.0 Wiki Branch (Community version) 1 User Branches 2 3 Codebook rev 1 Codebook rev X …

35 Architecture (alternate) Master Branch (Official version) Codebook 1.0 Wiki Branch (Community version) 1 User Branches 2 3 Codebook rev 1 Codebook rev X … User is responsible for merging

36 Architecture (alternate) 36 Master Branch (Official version) Codebook 1.0 Codebook rev X Wiki Branch (Crowdsource version) CED²AR User Interface exposes both versions (with attribution)

37 Merging crowd-sourced content back into official documentation Continued Work: Improving merge control 37

38 Thank you! Questions? ced2ar-devs-l@cornell.edu ncrn.cornell.edu github.com/ncrncornell 38

39 39 Extra slides

40 Tagging variables with a controlled vocabulary and a folksonomy Continued Work: Facilitating Editing 40

41 Ingest Workflow 41


Download ppt "1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project."

Similar presentations


Ads by Google