Proposal for Synergistic Name Extraction from Historical Text Documents.

Slides:



Advertisements
Similar presentations
Travel and Expense Management Scenario Overview
Advertisements

Auto Quick-Buy. Overview: Based upon the use of system flags, a complete ordering process will be automated as follows: sales order lines automatically.
© Tally Solutions Pvt. Ltd. All Rights Reserved 1 Shoper 7.2 Interface with Tally.ERP 9 January 2010.
Premier Director Document Imaging
EE 553 Integer Programming
Importing Transfer Equivalencies: How to Maximize Efficiency How Columbia College Office of Registrar improved productivity through third party solutions.
FALL 2011 JACKIE STAPLETON, LIAISON LIBRARIAN MARTHA LAUZON, LIBRARY ASSOCIATE RefWorks: The Basics Introduction to Refworks workshop Pre-learning assessment.
ERCOT Nodal Protocols Telemetry Requirements Proposal
Commercial Data Processing Lesson 3: Data Validation.
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
Travel and Expense Management Scenario Overview
Classifications and CASCOT Ritva Ellison Institute for Employment Research University of Warwick.
Introduction to Hypothesis Testing
Automatic Discovery and Classification of search interface to the Hidden Web Dean Lee and Richard Sia Dec 2 nd 2003.
INFO 624 Week 3 Retrieval System Evaluation
Customizing Word Microsoft Office Word 2007 Illustrated Complete.
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
PRODUCTION TOOL FOR FRENCH OVERSEAS TERRITORIES Brigitte Bénech, Météo-France, Toulouse, France and Pascal Venzac.
Capability Maturity Model
1 Wireless Warehouse Management System Compsee’s M.A.T. Mobile Application Terminal.
Collision Recording And Sharing System (CRASH)
Welcome to the Second Tutorial Welcome to the second part of this communication system website tutorial! This tutorial is for church planters. When you.
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
Food Recall System Pre Code System Design Layout Version 1.0.
Classroom User Training June 29, 2005 Presented by:
N By: Md Rezaul Huda Reza n
Transaction Processing Systems and System Development Life Cycle
LBTO IssueTrak User’s Manual Norm Cushing version 1.3 August 8th, 2007.
February 2007 Commercial Vehicle Drivers Hours of Service Module 24: Daily Log Audits.
Guidelines for the Development and Maintenance of RTF- Approved Measure Savings Estimates December 7, 2010 Regional Technical Forum Presented by: Michael.
© Grant Thornton | | | | | Guidance on Monitoring Internal Control Systems COSO Monitoring Project Update FEI - CFIT Meeting September 25, 2008.
Chapter 6 Supplement Knowledge Engineering and Acquisition Chapter 6 Supplement.
Welcome to the Manage Inventory lesson for the North Carolina Immunization Branch. Contents: Adding Inventory Modifying Inventory Inventory Reports **
HD 2007 Rule Diesel Fuel Sulfur Testing and Sampling Methods and Requirements US EPA Office of Transportation and Air Quality November 20, 2002.
Conference Overview. PASSWORDS You can set your password policy to enforce users to change their passwords periodically.
InDesign CS3 Lesson 4 ( Only pages ) Importing and Editing Text.
Middle States Self-Study Online Resources. Primary Web Resources  Provost’s MSCHE site  Document and Feedback request forms  Secure MSCHE Document.
Copyright © 2007 Pearson Education Canada 1 Chapter 13: Audit of the Sales and Collection Cycle: Tests of Controls.
Scanned Books: Annotator Training. Project Overview Untapped sources – 100,000+ scanned/OCRed books – Problem: how to cost-effectively extract Extraction.
ST22 revision proposal June-2006 WIPO-SDWG meeting Geneva.
CHAPTER 1 LESSON 3 Math in Science.
Significant Figures Number of digits believed to be correct by the person doing the measuring Includes 1 estimated digit Read to 1/10 the smallest division.
Math in Science.  Estimate  Accuracy  Precision  Significant Figures  Percent Error  Mean  Median  Mode  Range  Anomalous Data.
The Effects of Ink Color on the Accuracy of Recall Erika Douglas & James Giacomantonio.
Presenter: Shanshan Lu 03/04/2010
Laboratory Software ProCal Calibration Software ProCal Track Management Software ProSales Pricing & quoting Software.
How to Request Materials Tutorial Order, Borrow, Renew made easy (expected running time: ~7 minutes) Oregon State Library.
Agency (BU) Approver Training State of Indiana Instructor: Brian Woods.
Sequence Diagrams And Collaboration Diagrams HungNM.
Cost-Effective Information Extraction from Lists in OCRed Historical Documents Thomas Packer and David W. Embley Brigham Young University FamilySearch.
(Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.
Proficy Workflow for Water reduce. improve. secure.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
GreenFIE-HD: A “Green” Form-based Information Extraction Tool for Historical Documents Tae Woo Kim.
Briefer: John Charles Angermayer Briefer: John Charles AngermayerESC/GA Electronic Systems Center Hanscom AFB, MA Date: 5 Oct 2004 Date: 5 Oct 2004 Briefer:
Integrated ISO ILL for staff users Borrowing requests – part two Yoel Kortick 2007.
1 Budget Execution Availability Control Scenarios November 21, 2002.
Introduction to SMC’s Online Flex Tracking System
Stephen W. Liddle, Deryle W. Lonsdale, and Scott N. Woodfield
Vision for an Automatically Constructed FH-WoK
Discrete Event Simulation - 4
GreenFIE-HD: A Form-based Information Extraction Tool for Historical Documents Tae Woo Kim There are thousands of books that contain rich genealogical.
On transactions, and Atomic Operations
(Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.
Temple Ready within an Hour of Collection Capture
11E The Chi-Square Test of Independence
Family History Merge Duplicates, Edit Info, Establish Relationships
A Green Form-Based Information Extraction System for Historical Documents Tae Woo Kim No HD. I’m glad to present GreenFIE today. A Green Form-…
ESS and Workflow Cale Tanguay and Jodi Dare.
Extracting Information from Diverse and Noisy Scanned Document Images
Presentation transcript:

Proposal for Synergistic Name Extraction from Historical Text Documents

Synergistic Name Extraction System operation – Shift burden to the system to the extent possible – Smoothly ranges from fully manual to fully automatic – Full control over level of automation Expectations – Immediate improvement (click to fill-in form ready for tech transfer) Accuracy can be as good as manual extraction (can guarantee) Time to extract reduced (likely significantly; potential reduction to 0) – Can use while still being studied and initially developed Green Interaction – Improves with use – Learns from observation and being corrected – Can be bootstrapped from scratch (needed for new language) Synergy: The interaction of two or more agents or forces so that their combined effect is greater than the sum of their individual effects.

Levels of Automation Manual – Type in all info of interest, both stated and inferred – Is the church’s current extraction system (except using DEG’s form-based interface) Manual minimal – Click-only form fill-in of stated information – Reasoner provides inferred information – Manual check all info of interest displayed correction option through manual editing Synergistic – Initial automatic form fill-in of info, both stated and inferred – Manual check and edit Automatic with auditing – Automatic form fill-in for a batch of names – Auditor samples and checks accuracy If accuracy deemed sufficient, accept If insufficient, redo synergistically Automatic without auditing – Fully automatic extraction and linking to the FamilySearch tree – Patrons notified when viewing information extracted automatically Patrons can view source and extraction & inference results Patrons can, indeed have the responsibility to, make corrections

Obituaries Project Demo with the synergistic system in mind Demo – 1998 system demo – Korean & French demo – FROntIER demo Includes extraction and inference caution: knowledge engineering not complete – Synergistic (mock-up of editing) – Manual click-only demo – NewsBank ?? Can we strike a deal now based on click-only? Should we first run a pilot experiment to provide numbers for decision making? – Fully manual to establish baseline – Manual-minimal to see if we’re enough better than the baseline to commit now – Do knowledge engineering for FROntIER obit extraction to see if we can either: » Develop and use automated synergistic system » Accept the accuracy levels to go fully automatic

Scanned Book Project Alter envisioned experiment (slightly) – New objective: investigate accuracy & cost wrt levels of automation – Only a slight change – should be acceptable to all Missionary task changes – Single form with fields for all info of interest – Two modes of operation (with different groups of missionaries) Type all Click-only (& let FROntIER get inferred info) Thesis experiment changes – No change to the experiment we’re planning – Additional follow-on experimental work Using the OntologyEditor, display and check results (as directed by low precision & recall) Edit by hand and rerun to get improved precision & recall results What we can learn by the slightly altered experiment – Type-in missionaries establish baseline – Click-only missionaries yield estimated results for manual-minimal level – Knowledge-engineering experiment: Estimates the cost and expertise of knowledge engineering (as originally planned) Establishes the potential for the fully automatic level (as originally planned) Roughly estimates accuracy & cost for the synergistic level (i.e., with added fix)