Stephen W. Liddle, Deryle W. Lonsdale, and Scott N. Woodfield

Slides:



Advertisements
Similar presentations
Don’t Type it! OCR it! How to use an online OCR..
Advertisements

Ontologies for multilingual extraction Deryle W. Lonsdale David W. Embley Stephen W. Liddle Supported by the.
David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Aaron Stewart, and Cui Tao* Brigham Young University, Provo, Utah, USA *Mayo Clinic, Rochester,
Enabling Search for Facts and Implied Facts in Historical Documents David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Spencer Machado, Thomas Packer,
Multilingual Extraction Ontologies. Outline Our MEG A possible WWW paper Getting there from here What we propose(d) to do Multilingual resources Evaluation.
ER 2002BYU Data Extraction Group Automatically Extracting Ontologically Specified Data from HTML Tables with Unknown Structure David W. Embley, Cui Tao,
11/28/06Preliminary Design1 Automated Excel Grading System Welcome Ms. Jami Cotler and Dr. Scott Hunter And Guests.
COBOL for the 21 st Century Stern, Stern, Ley Chapter 1 INTRODUCTION TO STRUCTURED PROGRAM DESIGN IN COBOL.
Ultriva Excel Reports. About Us… Lori McNeely Ultriva Customer Support Specialist Supporting Ultriva > 5 years 2 Scott Stickles
Enabling Efficient Chinese Jiapu Information Extraction
Accessibility Tools in Microsoft Office 2010 and 2013 ADA Conference 2014 Norah Sinclair Tessa Greenleaf.
PI Portfolio Project Update Academic Senate Committee on Research June 18, 2012.
Verification and Validation Yonsei University 2 nd Semester, 2014 Sanghyun Park.
Structured COBOL Programming, Stern & Stern, 9th edition
Managing Software Quality
Cross-Language Hybrid Keyword and Semantic Search David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Joseph S. Park, Andrew Zitzelberger Brigham Young.
Deryle W. Lonsdale, David W. Embley, Stephen W. Liddle, and Joseph Park BYU Data Extraction Research Group.
Oracle Assets Release 12 Enhancements. High-Level Overview Subledger Accounting Enhanced Mass Additions Interface Auto-Prepare Mass Additions Flexible.
Proposal for Synergistic Name Extraction from Historical Text Documents.
AKI Vision Component AKI 1125 Hwy 7 West Hutchinson, MN
Soar and Construction Grammar Peter Lindes, Deryle Lonsdale, David Embley Brigham Young University 2014 Soar Workshop © 2014 Peter Lindes 6/19/2014PL 2014.
Ontology-based Information Extraction with a Cognitive Agent Peter Lindes 1, Deryle Lonsdale, David Embley Brigham Young University AAAI Now at.
PI Portfolio Project Update Academic Senate Committee on Research June 18, 2012.
Cost-Effective Information Extraction from Lists in OCRed Historical Documents Thomas Packer and David W. Embley Brigham Young University FamilySearch.
Static Program Analysis of Embedded Software Ramakrishnan Venkitaraman Graduate Student, Computer Science Advisor: Dr. Gopal Gupta
“Automating Reasoning on Conceptual Schemas” in FamilySearch — A Large-Scale Reasoning Application David W. Embley Brigham Young University More questions.
OCR AS Applied ICT Business Documents. Big picture.
AdLib eDocument Solutions Scott Mackey AdLib eDocument eDocument Solutions.
Submitted To: Rutvi sarang Submitted By: Kushal Bhagat.
(Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.
OntoSoar: Soar Finds Facts in Text Peter Lindes, Deryle Lonsdale, David Embley Brigham Young University 33 rd Soar Workshop, June 2013 pl 6/6/201333rd.
“The Future of Family History Technology” in Academic Research FHTW15 – Panel David W. Embley.
College-Wide Assessment and WEAVEonline Basics for TCC Administrators Steven H. Wilson, Ph.D. Dean, Academic Assessment Part Three: How to Close the Loop,
GreenFIE-HD: A “Green” Form-based Information Extraction Tool for Historical Documents Tae Woo Kim.
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
FROntIER ListReader OntoSoar GreenFIE COMET High-Level Architecture Model.
Presented By: Scott Dickman
Metrics Replication Presentation for Maryland Staff September 26, 2002
IT Training Webinar Converting to CAS 3.x
Review of Authoring Subsystem
Topic for Presentaion-2
Metadata Extraction Progress Report 12/14/2006.
System Design.
Graduation Project Seminar wesome Scanner
IT Roles and Responsibilities
Continuous Deployment tool
Mock-ups for Discussing the CMS Administrator Interface
Mock-ups for Discussing the CMS Administrator Interface
Teacher Student Data Link (TSDL)
Case Application Development Method
Vision for an Automatically Constructed FH-WoK
(Self-improving Extraction Systems)
Pragmatic Quality Assessment for Automatically Extracted Data
Mock-ups for Discussing the CMS Administrator Interface
Joseph S. Park and David W. Embley Brigham Young University
GreenFIE-HD: A Form-based Information Extraction Tool for Historical Documents Tae Woo Kim There are thousands of books that contain rich genealogical.
Quality Assurance Documentation
(Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley.
Director, Synon/2 Development
Methodology Conceptual Databases Design
Temple Ready within an Hour of Collection Capture
Grant Number: IIS Institution of PI: Brigham Young University PI’s: David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale Title:
Helping you make your code better
A Green Form-Based Information Extraction System for Historical Documents Tae Woo Kim No HD. I’m glad to present GreenFIE today. A Green Form-…
Extraction Rule Creation by Text Snippet Examples
Mock-ups for Discussing the CMS Administrator Interface
Best Practices in Higher Education Student Data Warehousing Forum
Validation Activities in the ESS What you will hear today…
Digital accessibility for staff
Presentation transcript:

Stephen W. Liddle, Deryle W. Lonsdale, and Scott N. Woodfield (Semi)automatic Extraction of Genealogical Information from Scanned & OCRed Historical Documents Elder David W. Embley Stephen W. Liddle, Deryle W. Lonsdale, and Scott N. Woodfield

Overview Big Picture Current Status and Expectations Diagram Details & Demo Current Status and Expectations

Fe6: 1. Prepare 2. Extract 3. Merge&Split 4. Check&Correct 5 Fe6: 1. Prepare 2. Extract 3. Merge&Split 4. Check&Correct 5. Generate 6. Convert FROntIER ListReader OntoSoar GreenFIE COMET

Fe6: 1. Prepare 2. Extract 3. Merge&Split 4. Check&Correct 5 Fe6: 1. Prepare 2. Extract 3. Merge&Split 4. Check&Correct 5. Generate 6. Convert FROntIER ListReader OntoSoar GreenFIE

1. Prepare {

2. Extract

3. Merge & Split Person Couple Family

4. Check & Correct

5. Generate

6. Convert

Results

Results

Precision, Recall, F-Measure Results FROntIER (relationships) Person 0.86 0.66 0.75 Couple 1.00 0.40 0.57 ParentsWithChildren 0.89 FROntIER (PCF views) 0.94 0.83 0.88 0.90 0.95 0.78 OntoSoar 0.67 0.30 0.43 0.44 0.62

Fe6: 1. Prepare 2. Extract 3. Merge&Split 4. Check&Correct 5 Fe6: 1. Prepare 2. Extract 3. Merge&Split 4. Check&Correct 5. Generate 6. Convert Administrative and Batch-Processing Management System Automated Check (Fix & Warn) Name, Date, Place Standardization FROntIER ListReader OntoSoar GreenFIE “Sanity” Check Feedback Loop COMET

Fe6: 1. Prepare 2. Extract 3. Merge&Split 4. Check&Correct 5 Fe6: 1. Prepare 2. Extract 3. Merge&Split 4. Check&Correct 5. Generate 6. Convert Administrative and Batch-Processing Management System Non-English Languages Automated Check (Fix & Warn) Name, Date, Place Standardization FROntIER ListReader OntoSoar GreenFIE “Sanity” Check Extraction Tools: Layout Machine Learning Feedback Loop COMET Bootstrapping, Ever-learning, Feedback Loop

Summary (Semi)automatic Extraction Green, Ever-Learning System (improves with use) Status: Extraction Tools (tech-transfer of academic prototypes) Ensemble Prototype (pipeline runs and is being enhanced) Management System (underway; minimally usable)

Summary (Semi)automatic Extraction Green, Ever-Learning System (improves with use) Status: Extraction Tools (tech-transfer of academic prototypes) Ensemble Prototype (pipeline runs and is being enhanced) Management System (underway; minimally usable) BYU Data Extraction Research Group www.deg.byu.edu