Automatically Identifying Record Patterns from the Extracted Data Fields of Genealogical Microfilm Kenneth Tubbs David W. Embley.

Slides:



Advertisements
Similar presentations
Database vocabulary. Data Information entered in a database.
Advertisements

Database Ed Milne. Theme An introduction to databases Using the Base component of LibreOffice LibreOffice.
Access 2007 ® Use Databases How can Microsoft Access 2007 help you to get and stay organized?
Computer Science Research for Family History and Genealogy David W. Embley Heath Nielson, Mike Rimer, Luke Hutchison, Ken Tubbs, Doug Kennard, Tom Finnigan.
Bitmap Index Buddhika Madduma 22/03/2010 Web and Document Databases - ACS-7102.
Understanding Networked Applications: A First Course Chapter 15 by David G. Messerschmitt.
Access Quiz October 24, The database objects bar in Access contains icons for tables, queries, forms and reports 1.True 2.False.
ISP 121 Week 1 Introduction to Databases. ISP 121, Winter Why a database and not a spreadsheet? You have too many separate files or too much data.
Toward Automatic Processing and Indexing of Microfilm.
Automatically Identifying Records from the Extracted Data Fields of Genealogical Microfilm Kenneth Tubbs.
Recognizing Records from the Extracted Cells of Microfilm Tables Kenneth M. Tubbs David W. Embley Brigham Young University Supported by NSF.
Recognizing Records from the Extracted Cells of Genealogical Microfilm Tables Kenneth Martin Tubbs Jr. A Thesis Submitted to the Faculty of Brigham Young.
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
Software Development Unit 2 Databases What is a database? A collection of data organised in a manner that allows access, retrieval and use of that data.
Business Computer Information Systems Microsoft Office XP Access Review Lessons 1 through 5.
Databases & Data Warehouses Chapter 3 Database Processing.
Unit J: Creating a Database Microsoft Office Illustrated Fundamentals.
GTECH 361 Lecture 13a Address Matching. Address Event Tables Any supported tabular format One field must specify an address The name of that field is.
Attribute Data in GIS Data in GIS are stored as features AND tabular info Tabular information can be associated with features OR Tabular data may NOT be.
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
CPSC 203 Introduction to Computers T59 & T64 By Jie (Jeff) Gao.
Advanced Excel for Finance Professionals A self study material from South Asian Management Technologies Foundation.
LSP 121 Week 1 Intro to Databases. Welcome to LSP 121 Quantitative Reasoning and Technological Literacy II Continuation of quantitative data concepts.
HTML II. Factors to consider in designing a website. Organizing your files. HTML Tables. Unordered Lists. Ordered Lists. HTML Forms. Learning Objectives.
Dreamweaver – Dreamweaver Extras Web Design Section 8-4 Part or all of this lesson was adapted from the University of Washington’s “Web Design & Development.
Mail Merge Basics. Use Mail Merge to: Create mass mailings Form letters Envelopes Can Print directly to the envelope Graduation announcements Christmas.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
SAGExplore web server tutorial for Module II: Genome Mapping.
Microsoft Access 2000 Presentation 2 Creating Databases Part I (Creating Tables)
Software. Records Fields Each record is made up of fields – categories of information. The fields here are Name, Surname, Address, Telephone and Date.
Relational Databases (MS Access)
Key Applications Module Lesson 21 — Access Essentials
Chapter 17 Creating a Database.
Lesson 1: Exploring Access Learning Objectives After studying this lesson, you will be able to: Start Access and identify elements of the application.
Moodle with Style Integrating new technologies to empower learning and transform leadership.
EasyQuerier: A Keyword Interface in Web Database Integration System Xian Li 1, Weiyi Meng 2, Xiaofeng Meng 1 1 WAMDM Lab, RUC & 2 SUNY Binghamton.
Database Management Systems.  Database management system (DBMS)  Store large collections of data  Organize the data  Becomes a data storage system.
What have we learned?. What is a database? An organized collection of related data.
Using HTML Textual and Structural Data for Web Image Search Cheng Thao, Ethan Munson, Jim Dabrowski, Nikolas D. Bohne University of Wisconsin-Milwaukee.
Lesson 01: Introduction to Database Software. At the end of this lesson, students should be able to: State the usage of database software. Start a database.
 Decide on the information needed, and create column headings. (See picture below.)
ACIS Introduction to Data Analytics & Business Intelligence Database s Benefits & Components.
Lesson 13 Databases Unit 2—Using the Computer. Computer Concepts BASICS - 22 Objectives Define the purpose and function of database software. Identify.
Chapter 10 Database Management. Data and Information How are data and information related? p Fig Next processing data stored on disk Step.
IN THE NAME OF GOD. Reference Citing Software.
Producing a Mail Merged Letter Step 1 Create an Access database for Names and Addresses you can use the ‘Customers’ template in Group Work. Enter the necessary.
CPSC 203 Introduction to Computers T97 By Jie (Jeff) Gao.
SAGExplore web server tutorial. The SAGExplore server has three different modules …
CPSC 203 Introduction to Computers T59 & T64 By Jie (Jeff) Gao.
Key Applications Module Lesson 22 — Managing and Reporting Database Information Computer Literacy BASICS.
BIT 3193 MULTIMEDIA DATABASE CHAPTER 4 : QUERING MULTIMEDIA DATABASES.
Databases Computer Technology. First Record Last Record New Record Previous Record Current Record Next Record Working with Microsoft Access (Database)
Computers Are Your Future Tenth Edition Spotlight 5: Microsoft Office Copyright © 2009 Pearson Education, Inc. Publishing as Prentice Hall1.
VOCAB REVIEW. A field that can be computed from other fields Calculated field Click for the answer Next Question.
Lesson 17 Mail Merge. Overview Create a main document. Create a data source. Insert merge fields into a main document. Perform a mail merge. Use data.
DAY 20: ACCESS CHAPTERS 5, 6, 7 Larry Reaves October 28,
Week 1 Intro to the Course Intro to Databases.  Formerly ISP 121  “Continuation” of LSP 120 concepts  Topics include: ◦ Databases ◦ Basic statistics.
Advanced Informer Features
Database Essentials Access Lesson 1.
Database Essentials Access Lesson 1.
Example process of consolidating master data
Database Vocabulary Terms.
A research literature search engine with abbreviation recognition
Multimedia Information Retrieval
Case-Based Reasoning System for Bearing Design
Mail Merge a letter for Integration Office 2016
Tutorial 8 Sharing, Integrating, and Analyzing Data
Information system analysis and design
Presentation transcript:

Automatically Identifying Record Patterns from the Extracted Data Fields of Genealogical Microfilm Kenneth Tubbs David W. Embley

Problem Searching through microfilm by hand is tedious.Searching through microfilm by hand is tedious. Extraction by hand requires large amounts of time and manpower.Extraction by hand requires large amounts of time and manpower.

Algorithm Record Patterns Record Patterns XML Input File (Preprocessed Microfilm Image) Genealogical OntologyInputOutputMethod Match Attributes Match Attributes Identify Structure Identify Structure Check Constraints Check Constraints Evaluate Candidates Evaluate Candidates

External Preprocessing Input Features Input Features 1.Coordinates of each zone. 2.Printed text of each zone. 3.Whether or not each zone is empty. XML Input File < zone rectangle="66,55,223,11" printed_text=“NAME and Surname of each Person" empty="0" />

Identify Structure Identify Table PrimitivesIdentify Table Primitives Evaluate PrimitivesEvaluate Primitives Factor Table PrimitivesFactor Table Primitives Identify Structure Identify Structure

Identify Table Primitives Name Row: [label:value+] right, height Identify Structure Identify Structure

Identify Table Primitives Column: [label:value+] down, width Name Identify Structure Identify Structure

Identify Table Primitives Row: [label:value+] right, height Identify Structure Identify Structure

Evaluate Primitives Primitive Confidence Level == Identify Structure Identify Structure

Evaluate Primitives * Confidence (Label i, Value j ) = Identify Structure Identify Structure

Factor Table Primitives ABCDEF [A B C D E F] or [A] [B C D E F] or [E] [A B C D F] or Others. Identify Structure Identify Structure

Factor Table Primitives An expert user assigns probabilities to types of factorings.An expert user assigns probabilities to types of factorings. Example Example [column:column+] left,.90 [column:column+] left,.90 [row:column+] below,.85 [row:column+] below,.85 Identify Structure Identify Structure

Match Attributes Identify Possible Mappings from the Microfilm Table to the Genealogical Ontology.Identify Possible Mappings from the Microfilm Table to the Genealogical Ontology. Match Attributes Match Attributes Identify Structure Identify Structure

Identify Possible Mappings 1.Identical Matches 2.Synonym Matches 3.Composite Matches Genealogical Ontology Printed Text Name SexGender Female AgeFemale, Age Mapping types Match Attributes Match Attributes Identify Structure Identify Structure

Evaluate Mapping Edit distance between wordsEdit distance between words Match Attributes Match Attributes Identify Structure Identify Structure

Check Constraints The algorithm evaluates each the factoring of each record with a genealogical ontology.The algorithm evaluates each the factoring of each record with a genealogical ontology. Match Attributes Match Attributes Identify Structure Identify Structure Check Constraints Check Constraints

Identify Records Table (Address, Name) = 14 / 3 = 4.67 LabelNumber of Values Address 3 Name 14 Age 13 Gender 14 Match Attributes Match Attributes Identify Structure Identify Structure Check Constraints Check Constraints

Genealogical Ontology The genealogical ontology is created by an expert user. The cardinalities are assigned to the ontology by recording the cardinalities of a corpus of microfilm. The cardinalities are assigned to the ontology by recording the cardinalities of a corpus of microfilm. Match Attributes Match Attributes Identify Structure Identify Structure Check Constraints Check Constraints

Genealogical Ontology Ontology (Address, Name) = 1 * 4.3 * 1.1 = 4.73 Family Address AgeGender 1 1 Person Name Match Attributes Match Attributes Identify Structure Identify Structure Check Constraints Check Constraints

Evaluate Factoring Ontology (Address, Name) = 1.0 * 4.3 * 1.1 = 4.73 Table (Address, Name) = 14 / 3 = 4.67 Distance Classifier Distance_From_Ontology = 1 / (4.73 – 4.67) 2 = 277 Distance_From_No_Factoring = 1 / (1 – 4.67) 2 =.0724 Match Attributes Match Attributes Identify Structure Identify Structure Check Constraints Check Constraints

Evaluate Candidates For every combination of primitives, attribute mappings, and factorings compute the product of their confidences.For every combination of primitives, attribute mappings, and factorings compute the product of their confidences. Select most confident combination.Select most confident combination.

Evaluate Candidates Primitive 1 Primitive 2 Primitive 3 Attribute Confidence FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF

Evaluate Candidates Primitive 1 Primitive 2 Primitive 3 Attribute Confidence F F F F F

Evaluate Candidates Primitive 1 Primitive 2 Primitive 3 Attribute Confidence F F F

Evaluate Candidates Primitive 1 Primitive 2 Primitive 3 Attribute Confidence FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF FFFF

Evaluate Candidates Primitive 1 Primitive 2 Primitive 3 Attribute Confidence FFFF FFFF FFFF FFFF FFFF

Evaluate Candidates Primitive 1 Primitive 2 Primitive 3 Attribute Confidence F F F

Algorithm Record Patterns Record Patterns XML Input File (Preprocessed Microfilm Image) Genealogical OntologyInputOutputMethod Match Attributes Match Attributes Identify Structure Identify Structure Check Constraints Check Constraints Evaluate Candidates Evaluate Candidates

Output Record Patterns –Attributes of each record. –Geometry of each record. Attribute mappings for the table to the ontology.

Microfilm Queries A web form provides the interfaceA web form provides the interface to query the microfilm database. Individuals can enter keywords, (such as first and last name), and the system locates the appropriate records among the indexed documents.Individuals can enter keywords, (such as first and last name), and the system locates the appropriate records among the indexed documents.

Web Query EyreJohn

Query Results Click an image to select a result document.

Query Results Relevant region of the document is displayed.

Automatically Identifying Record Patterns from the Extracted Data Fields of Genealogical Microfilm Kenneth Tubbs David W. Embley