Mock-ups for Discussing the CMS Administrator Interface
CMS: COMET Management System Regular login for users with an account Request an account Ask for info Authorize (by email) Send email to appointed super-admin Super-admin can click OK or notOK OR Authorize (by PIN) On authorized: Add user Send email to requestor (if authorized by email) On not authorized, send email to requestor CMS: COMET Management System UserName: Full Name: Email: Password: Confirm Password: PIN:
CMS: COMET Management System Click to do a task Click on “Configure Extraction Tools”: Display tools again for click Click on tool CMS: COMET Management System Import Book Configure Extraction Tools FROntIER (as is, all setup work done outside CMS, but testing done inside CMS) GreenDDA GreenFIE ListReader OntoES (OntoES + snippet facilities for creating hand-generated rules) OntoSoar Initiate Ensemble Run Manage Jobs and Batches Edit User Account (only for super-admins) Exit
Database Schema user (id, username, email, password, name, type) job (id, name, book, max_page) batch (id, pages, status, job_id, user_id) form (id, form_name, version) eventlog (id, user_id, batch_id, session_id, log-info, time_written) status_code (status, next_status, description) user_type (type, description) P: patron A: adjudicator-patron M: administrator S: super-administrator N: not assigned or not submitted S: submitted (initial patron submission) D: done (sent downstream to Gedcomx) Q: quality recheck by patron R: resubmitted (patron resubmission) A: adjudicator re-check C: checked by adjudicator T: testing (under control of CMS admin)
Workflow Pipeline PCF = Person, Couple, Family directories PRF = Precision, Recall, F-score 1.pages 2.tools <toolName> PCF: extracted 2.1.tool-ontology-extracted 2.2.ontology-extracted 2.3.extracted-text-cleaned 3.json-from-osmx 3.1.ontology-merged 3.2.value-cleaned 3.3.date-value-parsed 3.4.constraint-checked 3.5.violation-corrected PCF: osmx & json 4.json-working PCF: json 5.json-final CMS: setup & test 6.osmx-merged 6.1.ontology-merged 6.2.value-cleaned 6.3.date-value-parsed 6.4.constraint-checked 6.5.authority-checked PCF: osmx & json ground-truth PRF-report 7.osmx-enhanced 7.1.target-ontology-generated 7.2.value-standardized 7.3.information-inferred 8.gedcomx 8.1.gedcomx-generated 8.2.reports-generated Q, A, T D, C Q, A, T COMET bypass N, T S, R, C, T
Import Book Browse file of available books Includes browse pages Future organization: books may be categorized, ordered, … Select book from file for import Add meta.xml text shortTitle biblio url (on dithers) auto-assigned Filename (in file of available books) auto-assigned Auto-assign id (six digits + shortTitle) Initialize book authority files (NameAuthority.txt, PlaceAuthority.txt) Initiate PDF-indexer Split pages Generate 1.pages files for each page .html .pdf .png .txt .xml Import Book
Test tool on selected page(s) User-selected list or range of test pages Generate filled in forms (in quality-check mode) Allow filled-in form(s) to be ground-truthed Produce PRF reports for ground-truthed form(s) Test tool in ensemble with other tools Results of a test run For each page/form, the data and form json Regular COMET display of first page and filled-in form Ground-truth reports Differences in COMET display of auxiliary information No “Annotation Actions” or click for instructions link Batch sequence according to list or range of test pages Instead of “Submit Batch”, “Generate Ground Truth” A list of links to any and all ground-truth reports For all tools Test <list, e.g. 031,032,099> or <range, e.g. 030-031> Include Tools: [ ] FROntIER, [ ] GreenDDA, [ ] GreenFIE, [ ] ListReader, [ ] OntoES++, [ ] OntoSoar The tool being worked on is checked immutably.
For all tools Generate Ground Truth range.030-031.PRFreport-soft.txt Person/range.030-031.PRFreport-soft.txt Person/030.PRFreport-soft.txt Person/031.PRFreport-soft.txt Couple/range.030-031.PRFreport-soft.txt Couple/030.PRFreport-soft.txt Couple/031.PRFreport-soft.txt Family/range.030-031.PRFreport-soft.txt Family/030.PRFreport-soft.txt Family/031.PRFreport-soft.txt
Use FROntIER interface, as is, to create an extraction ontology Use CMS testing interface to test the created extraction ontology Run test (as can be done for all tools) Set ground truth (as possible for all tools) Future: add the CMS regex-checker tool FROntIER
GreenDDA Select training set Train ML tool to training set Select book Apply trained ML tool to book GreenDDA
GreenFIE GreenFIE Regular GreenFIE interface in experimental mode ? Ability to run other tools to initialize the GreenFIE interface ? Ability to selectively save/retract generated rules Changes to COMET interface Add GreenFIE “Generate Rule” button Replace all below the form by “Save” button to save all rules and: ? Initialize from tools request ? List of generated rules with check boxes Make page number a type in box GreenFIE GreenFIE Regex Regex Regex Regex
ListReader ListReader Next Stop Select book and run ListReader (future: also set text abstraction parameters) One record from all three forms on the lhs Highlights on the rhs as generated by ListReader Control buttons Next (to tell ListReader to save the current labeling and ask for the next) Stop (to stop the labeling cycle) ListReader ListReader Next Stop
OntoES Generate extraction rules FROntIER-like rule-creation facilities Re-organized for better UX Built-in regex checker OntoES
OntoSoar
Initiate Ensemble Run Choose which extraction tools to use Select type and initiate run w/ COMET w/o COMET Display status (of ensemble run) Progress of each page Error notifications Browse results w/o COMET, 8.gedcomx files w/ COMET, 3. files in COMET Pause run Resume run Selectively retract pages/batches Specify resumption paramaters Kill run and clean up directories Initiate Ensemble Run
Manage Jobs and Batches Display progress of each job Percent complete Status of each batch: Unassigned/InProgress/Done Status of InProgress batches User assigned Page/form status Eventlog for batch (upon request) Retract batch from user Assign batch to user Reassign batch to user User status Batches completed Batches assigned Eventlog report for user (upon request) Manage Jobs and Batches
Edit User Account Select user: id username email name search Find user Set any/all: Password Username Email Full name Change privileges Disable account Edit User Account Select user: id username email name search
Edit User Account User: id username email name Change password Find user Set any/all: Password Username Email Full name Change privileges Disable account Edit User Account User: id username email name Change password Change username Change email Change full name Change privileges o patron o adjudicator-patron o administrator o super-administrator o Disable account done