Download presentation
Presentation is loading. Please wait.
Published byCaitlin Porter Modified over 6 years ago
1
Mock-ups for Discussing the CMS Administrator Interface
2
Database Schema user (id, username, email, password, name, type)
job (id, name, book, max_page) batch (id, pages, status, job_id, user_id) form (id, form_name, version) eventlog (id, user_id, batch_id, session_id, log-info, time_written) status_code (status, next_status, description) user_type (type, description) P: patron A: adjudicator-patron M: administrator S: super-administrator N: not assigned or not submitted S: submitted (initial patron submission) D: done (sent downstream to Gedcomx) Q: quality recheck for patron R: resubmitted (patron resubmission) A: adjudicator re-check C: checked by adjudicator T: testing (under control of CMS admin)
3
Workflow Pipeline PCF = Person, Couple, Family directories
PRF = Precision, Recall, F-score Black Line: standard Green Line: tool testing Red Line: COMET only 1.pages 2.tools <toolName> PCF: extracted 2.1.tool-ontology-extracted 2.2.ontology-extracted 2.3.extracted-text-cleaned 3.json-from-osmx 3.1.ontology-merged 3.2.value-cleaned 3.3.date-value-parsed 3.4.bad-relationships-fixed PCF: osmx & json 4.json-working PCF: json 5.json-final CMS: setup & test 6.osmx-checked 6.1.ontology-merged 6.2.value-cleaned 6.3.date-value-parsed 6.4.constraint-checked 6.5.authority-checked PCF: osmx & json ground-truth PRF-report 7.osmx-enhanced 7.1.target-ontology-generated 7.2.value-standardized 7.3.information-inferred 8.gedcomx 8.1.results-generated 8.2.FS-imports-generated 9.FS-ingest Q, A, T D, C, T-Finish Q, A, T T COMET bypass N, T S, R, C, T
4
Workflow System Workflow State Net Triggering Events (Admin Initiated)
Import Book Configure Extraction Tools Initiate Pipeline Run Manage Pipeline Run Manage Jobs and Batches Edit User Accounts Triggering Conditions (System States) Can-Do Conditions Error/Warning-Report Conditions Should the “green line”—the development system—be an entirely separate workflow? Different database Different workflow processing code Different state control for work in process … Basics (notes): State Net description of workflow system (OSM-L) when <in state s> @ <event> if <condition> then <do something, e.g. run phase 3> enter <next state> exception <exception> enter <next state> end; Multi-threaded (a thread dies when its task is complete) Thread-firing Tasks fired by triggers Firing based on a task priority queue (if we need to control the number of threads) Event triggering: interrupt driven as initiated by an admin user Condition triggering: checked by continuous polling Workflow System
5
User Name: Full Name: Password: Confirm Password: PIN:
6
Superseded by Steve’s working interface
Banner frozen for admins (banner for P & A?) COMET Management System Select Activity Switch User Type: Patron, Adjudicator, Manager Logout Workspace State maintained on “Switch User Type” and on “Logout” “Select Activity” opens a menu of actions (see “Admin Pull-down Menu”) Superseded by Steve’s working interface
7
Admin Menu Import Book Configure Extraction Tools
Hamburger icon ? Click reveals menu ? Import Book Configure Extraction Tools Initiate Pipeline Run Manage Pipeline Run Manage Jobs and Batches Edit User Account
8
Import Book Book pdf: Browse Pages directory: Browse
Preparation for import (outside of CMS, given pdf of book) Name pdf file with short title (camel case with no spaces) Split into pages and put them in a folder (file names: <short title>.<pg#>.pdf) Upload pdf of book, pages folder, and biblio text Create directory structure and & add initial content Initialize 1.pages Add leading zeros to pdf page files Run PDFindexer Set permissions 664 Change own & grp to www-data Import Book Book pdf: Browse Pages directory: Browse Bibliographic citation: Upload
9
Configure Extraction Tools
Select Book Select Tool Book: Tool: v v Go
10
FROntIER Predefined FROntIER ontology: Browse
Use FROntIER interface, as is, to create an extraction ontology Upload either the predefined single ontology or the three PCF ontologies Future?: upload a user-defined ontology along with a mapping Java jar file Use CMS testing interface to test the created extraction ontology Run test (as can be done for all tools) Set ground truth (as possible for all tools) Future?: embed Workbench FROntIER Predefined FROntIER ontology: Browse Directory of Person, Couple, and Family ontologies: Browse Upload Upload clears all current ontologies and installs whatever is uploaded. A test run uses either the predefined ontology or the PCF ontologies or both depending on what has been uploaded. Test displays the “Test Extraction Tool Ensemble” page with the current book selected for Book, the most recent page or page range filled in for Test Page(s), FROntIER as the only tool checked, and Quick chosen as the Test Type. Test
11
GreenDDA Select book Create training set Train ML tool to training set
Apply trained ML tool to book GreenDDA
12
GreenQQ Book: <selected book> FirstPage: LastPage: SkipTopLines:
Book already selected, but can now be browsed Set parameters Book Page Range over which GreenQQ operates Header and Footer lines to skip GreenQQ Book: <selected book> FirstPage: LastPage: SkipTopLines: SkipBottomLines:
13
GreenQQ James, 15 Dec ELINE Run Save
14
GreenQQ 1523 Name . 1753 Brown, William, in Kilbarchan, and Sarah >
Make Dismiss 48 Name Feb Brune, William Jeane, > Make Dismiss 19 Name Oct Napier and William, born 8 Feb > Make Dismiss 18 Name Robert, in Hilhead James (daughter), 8 June > Make Dismiss
15
GreenQQ Run Save SLINE James (daughter), 8
16
GreenFIE GreenFIE Regular GreenFIE interface embedded in CMS test mode
Selectively save/retract generated rules Changes to COMET interface (or should we just use the TWK annotator? Note that we’ll also need a COMET interface for GreenDDA and GreenGN) Add GreenFIE “Regex” button Replace button controls with a mechanism to save/retract generated rules and to initiate a “Quick” test run. Make page number a type in box GreenFIE GreenFIE Regex Regex Regex Regex
17
ListReader ListReader Next Stop
Select book and run ListReader (future: also set text abstraction parameters) One record from all three forms on the lhs Highlights on the rhs as generated by ListReader Control buttons Next (to tell ListReader to save the current labeling and ask for the next) Stop (to stop the labeling cycle) ListReader ListReader Next Stop
18
OntoES Rule
19
Deryle to specify OntoSoar
20
Test Extraction Tool Ensemble
Select Book Specify test page(s) Include tools in ensemble test Choose run type: Quick or Full Book: Test Page(s): Tools Included in Test Run: [ ] FROntIER [ ] GreenDDA [ ] GreenFIE [ ] GreenQQ [ ] ListReader [ ] OntoES [ ] OntoSoar Test Type: o Quick o Full v Book selection default: the book chosen for configuration, it any. Other choices include books for which tools have been configured but are not yet either fully processed or are in progress of being processed. Test Page(s): <page, e.g. 099> or <page range, e.g >. TestPage(s) default is the most recent choice of test pages for the book. Will we want a page list? We will want to do random pages, and we can do them one at a time with what we currently have. If we want to do a batch of random pages at once, we’ll need to pass along the list and at every step either add code to loop through the list. Until we feel like we really need this, we’ll stick with page ranges. Can just choose one tool. Not selecting any is an error. Quick Test Type: extract, merge, split to forms, and display in COMET. (Allow for edit and creation of ground truth.) Full Test Type: extract, merge, fix OCR & bad relationships, check constraints & authorities and add warnings, split to forms, display in COMET, (allow edits and creation of ground truth), produce PRF reports, standardize, infer, generate GedcomX and import files. Run
21
Test Run Results batch status = “T” Run pipeline with status T
Results of a test run (for both Quick and Full) COMET for each page/form Editable to create/replace ground truth Additional results for Full test run COMET with warning icons and messages Ground-truth reports FS-import products Test Run Results batch status = “T” Generate Ground Truth range PRFreport-soft.txt Person/range PRFreport-soft.txt Person/030.PRFreport-soft.txt Person/031.PRFreport-soft.txt Couple/range PRFreport-soft.txt Couple/030.PRFreport-soft.txt Couple/031.PRFreport-soft.txt Family/range PRFreport-soft.txt Family/030.PRFreport-soft.txt Family/031.PRFreport-soft.txt clickable list of available reports
22
Manage Jobs and Batches
Steve to specify Manage Jobs and Batches
23
Initiate Pipeline Run Book Select from options Page Range – Tools
Pops up on “Initiate Pipeline Run” in admin menu On “Run” Confirm initiation Remind user to click on “Manage Pipeline Run” – banner will be in place Initiate Pipeline Run Book Select from options Page Range – Tools FROntIER GreenDDA GreenFIE ListReader OntoES OntoSoar Run Type with COMET without COMET Run
24
Manage Pipeline Run Display progress for book
Show progress for each page Report failures Allow admin to kill and restart tasks When running through COMET: Display progress of each job Percent complete Status of each batch: Unassigned/InProgress/Done Status of InProgress batches User assigned Page/form status Eventlog for batch (upon request) Retract batch from user Assign batch to user Reassign batch to user User status Batches completed Batches assigned Eventlog report for user (upon request) View results with COMET – view filled records by page & form with and without COMET – view Gedcom X reports Submit completed work to FS Manage Pipeline Run
25
Manage Pipeline Run (post “Initiate Pipeline Run”)
Display progress for book Show progress for each page Report failures Kill and restart tasks Check completed runs Select page/form for COMET view Select page/report for Gedcom X view Release results (depending on the state) for: Patron batch selection Ingest into FamilySearch If “with COMET” directly to tree Otherwise, to LLS Manage Pipeline Run (post “Initiate Pipeline Run”) Extracting data FROntIER OntoES Cleaning extracted text Merging extracted data Cleaning data values Parsing dates Fixing bad relationships Consolidating data Standardizing data Inferring additional data Generating conclusions Nr. pages tried: 100 completed: 100 List only selected tools “with COMET”: replace the last four with Generating PCF json Post COMET, add the last four back Stop Run Resume Run Select from options View Results Release Results Not active until run is done
26
View Results (with COMET)
Display page & form Click on form for page Display filled in records in COMET Any edits are retained ? Release for patron batch selection View Results (with COMET) TheElyAncestry Page Form 571 Person Couple Family 572 Person Couple Family 573 Person Couple Family 574 Person Couple Family 575 Person Couple Family Release Results
27
View Results (without COMET)
Display page & form Click on form for page Display filled in records in COMET Any edits are retained ? Release for patron batch selection View Results (without COMET) <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <gedcomx xmlns=" <person id="osmx112"> <gender type=" <attribution> <changeMessage>inferred based on gender designator (such as "daughter" or "Mrs."), or probabilistically based on first given name or on first given name of spouse</changeMessage> </attribution> </gender> <name> <nameForm> <fullText>Gerard Lathrop</fullText> <field> <value type=" <source description="#TheElyAncestry.573"> <qualifier name=" <qualifier name=" </source> <text>Gerard Lathrop</text> </value> <value type=" TheElyAncestry Page 571 572 573 Release Results
28
View Results (without COMET)
Display page & form Click on form for page Display filled in records in COMET Any edits are retained ? Release for patron batch selection View Results (without COMET) TheElyAncestry Page Person + 571 + 572 - 573 Mary Eliza Warner Samuel Selden Warner Release Results Full page here?
29
View Results (without COMET)
Display page & form Click on form for page Display filled in records in COMET Any edits are retained ? Release for patron batch selection View Results (without COMET) ********************************* Person osmx226: Mary Eliza Warner Name: Conclusion Name: Mary Eliza Warner Original Document Text: Mary Eliza Warner Interpreted Document Text: Mary Eliza Warner Gender: Unknown Facts: BirthDate: Conclusion: 1826 Original Document Text: 1826 Interpreted Document Text: 1826 BirthPlace: Marriage Relationships: ParentOf Relationships ChildOf Relationships: TheElyAncestry Page Person + 571 + 572 - 573 Mary Eliza Warner Samuel Selden Warner Release Results
30
Activity Title Name of Book and link to overall PDF Zoom < Previous Page v Next > Name list Page with highlights Info box (empty for slide 19) URL of page for direct access to unmarked PDF Toolbar Toolbar
31
Edit User Account Select user: id username email name search Find user
Set any/all: Password Username Full name Change privileges Disable account Edit User Account Select user: id username name search
32
Edit User Account User: id username email name Change password
Find user Set any/all: Password Username Full name Change privileges Disable account Edit User Account User: id username name Change password Change username Change Change full name Change privileges o Patron o Adjudicator-patron o Administrator o Super-administrator o Disable account done
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.