Download presentation
Presentation is loading. Please wait.
1
TARTAR Information Extraction Transforming Arbitrary Tables into F-Logic Frames with TARTAR Aleksander Pivk, York Sure, Philipp Cimiano, Matjaz Gams, Vladislav Rajkovic, Rudi Studer Presented By Stephen Lynn
2
TARTAR Information Extraction Free-form Text Linguistic/NLP approaches Tabular Structures Table comprehension task html, excel, pdf, text, etc. Semantic interpretation task More effort???
3
TARTAR Information Extraction TARTAR Architecture
4
TARTAR Information Extraction Semantic Representation Frame Logic (F-Logic) Model-theoretic semantics Complete resolution-based proof theory Expressive power of logic Availability of efficient reasoning tools
5
TARTAR Information Extraction F-Logic Frame
6
TARTAR Information Extraction
7
TARTAR Information Extraction Table Comprehension Dimensions – a grouping of cells representing similar entities
8
TARTAR Information Extraction Table Comprehension Stub – dimension with headers used to index elements in body
9
TARTAR Information Extraction Table Comprehension Box head – column headers (often nested)
10
TARTAR Information Extraction Table Comprehension Body – data values
11
TARTAR Information Extraction Table Classes 1D, 2D, Complex
12
TARTAR Information Extraction Methodology
13
TARTAR Information Extraction Cleaning & Canonicalization Clean DOM tree CyberNeko HTML Parser Rowspan/Colspan expansion
14
TARTAR Information Extraction Structure Detection Token Type Hierarchy Assign Functional Types and Probabilities
15
TARTAR Information Extraction Structure Detection Detect Logical Table Orientation
16
TARTAR Information Extraction Structure Detection Discover and Level Regions Logical Units
17
TARTAR Information Extraction FTM Building Functional Table Model (FTM) Arrange regions into a tree Leaf nodes are data
18
TARTAR Information Extraction Semantic Enriching of FTM Labeling WordNet and GoogleSets Map FTM to a frame
19
TARTAR Information Extraction Evaluation Crawl, extract, filter web tables 135 tables 85.4% success rate Mostly problems with complex tables Compare auto-generated frames with human generated frames 14 people transformed 3 tables each 21 total tables (each done twice) Syntactic/Semantic correctness (Strict and Soft)
20
TARTAR Information Extraction Results Inter-annotator agreement System-annotator agreement
21
TARTAR Information Extraction Benefits Fully automated knowledge formalization Arbitrary tables Independent of domain knowledge Independent of document type Explicit semantics of generated frames Query answering over heterogeneous tables
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.