Cross-Language Hybrid Keyword and Semantic Search David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Joseph S. Park, Andrew Zitzelberger Brigham Young.

Slides:



Advertisements
Similar presentations
Applying Ontology-Based Lexicons to the Semantic Annotation of Learning Objects Kiril Simov and Petya Osenova BulTreeBank Project
Advertisements

Ontologies for multilingual extraction Deryle W. Lonsdale David W. Embley Stephen W. Liddle Supported by the.
Hermes: News Personalization Using Semantic Web Technologies
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Ontology-Based Free-Form Query Processing for the Semantic Web by Mark Vickers Supported by:
David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Aaron Stewart, and Cui Tao* Brigham Young University, Provo, Utah, USA *Mayo Clinic, Rochester,
Knowledge Discovery and Dissemination (KDD) Program IARPA-BAA Question Period: 22 Dec 09 – 2 Feb 10 Proposal Due Date: 16 Feb 10.
Domain-Independent Data Extraction: Person Names Carl Christensen and Deryle Lonsdale Brigham Young University
Enabling Search for Facts and Implied Facts in Historical Documents David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Spencer Machado, Thomas Packer,
Principled Pragmatism: A Guide to the Adaptation of Philosophical Disciplines to Conceptual Modeling David W. Embley, Stephen W. Liddle, & Deryle W. Lonsdale.
HyKSS: A Multiple Ontology Approach to Hybrid Search Andrew Zitzelberger Brigham Young University MS Thesis Proposal.
Multilingual Extraction Ontologies. Outline Our MEG A possible WWW paper Getting there from here What we propose(d) to do Multilingual resources Evaluation.
Extracting and Structuring Web Data D.W. Embley*, D.M Campbell †, Y.S. Jiang, Y.-K. Ng, R.D. Smith, Li Xu Department of Computer Science S.W. Liddle ‡
A Framework for Pay-as-you-go Extraction Ontology Based Information Retrieval Andrew Zitzelberger.
Data Frames Version 3 Proposal. Data Frames Version 2 Year matches [2] constant { extract "\d{2}"; context "([^\$\d]|^)\d{2}[^,\dkK]"; } 0.5, { extract.
Conceptual Model Based Semantic Web Services Muhammed J. Al-Muhammed David W. Embley Stephen W. Liddle Brigham Young University Sponsored in part by NSF.
Ontology-Based Free-Form Query Processing for the Semantic Web Thesis proposal by Mark Vickers.
Recognizing Ontology-Applicable Multiple-Record Web Documents David W. Embley Dennis Ng Li Xu Brigham Young University.
March 17, 2008SAC WT Hermes: a Semantic Web-Based News Decision Support System* Flavius Frasincar Erasmus University Rotterdam.
BYU Craigslist Alerter Oliver Nina, Meher Shaikh Andrew Zitzelberger.
Schema Mapping: Experiences and Lessons Learned Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF.
Semiautomatic Generation of Resilient Data-Extraction Ontologies Yihong Ding Data Extraction Group Brigham Young University Sponsored by NSF.
Resolving Under Constrained and Over Constrained Systems of Conjunctive Constraints for Service Requests Muhammed J. Al-Muhammed David W. Embley Brigham.
DLLS Ontologically-based Searching for Jobs in Linguistics Deryle Lonsdale Funded by:
ER 2002BYU Data Extraction Group Automatically Extracting Ontologically Specified Data from HTML Tables with Unknown Structure David W. Embley, Cui Tao,
Ontology-Based Information Extraction and Structuring Stephen W. Liddle † School of Accountancy and Information Systems Brigham Young University Douglas.
Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen, 1 David W. Embley 1 Stephen W. Liddle 2 1 Department of Computer Science 2 Rollins Center.
Extracting and Structuring Web Data David W. Embley Department of Computer Science Brigham Young University D.M. Campbell, Y.S. Jiang, Y.-K. Ng, R.D. Smith.
Integration of Information Extraction with an Ontology M. Vargas-Vera, J.Domingue, Y.Kalfoglou, E. Motta and S. Buckingham Sum.
By ANDREW ZITZELBERGER A Framework for Extraction Ontology Based Information Management.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Ontology-Based Free-Form Query Processing for the Semantic Web Mark Vickers Brigham Young University MS Thesis Defense Supported by:
Recognition and Satisfaction of Constraints in Free-Form Task Specification Muhammed Al-Muhammed.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Enriching OWL with Instance Recognition Semantics for Automated Semantic Annotation Stephen W. Liddle Information Systems Department Yihong Ding & David.
1 Ontology-Based Constraint Recognition for Free-Form Service Requests Muhammed Al-Muhammed David W. Embley Brigham Young University Supported in part.
Record-Boundary Discovery in Web Documents D.W. Embley, Y. Jiang, Y.-K. Ng Data-Extraction Group* Department of Computer Science Brigham Young University.
Data Frame Augmentation of Free Form Queries for Constraint Based Document Filtering Andrew Zitzelberger.
7/15/20151 A Binary-Categorization Approach for Classifying Multiple-Record Web Documents Using a Probabilistic Retrieval Model Department of Computer.
BYU Data Extraction Group Funded by NSF1 Brigham Young University Li Xu Source Discovery and Schema Mapping for Data Integration.
Extracting and Structuring Web Data D.W. Embley*, D.M Campbell †, Y.S. Jiang, Y.-K. Ng, R.D. Smith Department of Computer Science S.W. Liddle ‡, D.W.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Automatic Creation and Simplified Querying of Semantic Web Content An Approach Based on Information-Extraction Ontologies Yihong Ding, David W. Embley,
Query Rewriting for Extracting Data Behind HTML Forms Xueqi Chen Department of Computer Science Brigham Young University March 31, 2004 Funded by National.
HyKSS: Hybrid Keyword and Semantic Search Andrew Zitzelberger 1.
Stephen W. Liddle, PhD Academic Director, Rollins Center for Entrepreneurship & Technology Professor, Information Systems Department Marriott School, Brigham.
August 21, 2002Szechenyi National Library Support for Multilingual Information Access Douglas W. Oard College of Information Studies and Institute for.
Processing of large document collections Part 10 (Information extraction: multilingual IE, IE from web, IE from semi-structured data) Helena Ahonen-Myka.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Deryle W. Lonsdale, David W. Embley, Stephen W. Liddle, and Joseph Park BYU Data Extraction Research Group.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
GTRI.ppt-1 NLP Technology Applied to e-discovery Bill Underwood Principal Research Scientist “The Current Status and.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
P2P Concept Search Fausto Giunchiglia Uladzimir Kharkevich S.R.H Noori April 21st, 2009, Madrid, Spain.
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
Ontology Resource Discussion
OntoSoar: Soar Finds Facts in Text Peter Lindes, Deryle Lonsdale, David Embley Brigham Young University 33 rd Soar Workshop, June 2013 pl 6/6/201333rd.
Templates of slides for P2 1. A very brief refresher of your problem Describe in English -what artifacts (programs, etc) you wish to synthesize, -from.
Ontology-Based Free-Form Query Processing for the Semantic Web Mark Vickers Brigham Young University MS Thesis Defense Supported by:
Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu and Gagan Agrawal Enabling.
David W. Embley Brigham Young University Provo, Utah, USA.
Extracting and Structuring Web Data
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Cross-language Information Retrieval
David W. Embley Brigham Young University Provo, Utah, USA
Extracting and Structuring Web Data
Stephen W. Liddle, Deryle W. Lonsdale, and Scott N. Woodfield
CSE 635 Multimedia Information Retrieval
Combining Keyword and Semantic Search for Best Effort Information Retrieval  Andrew Zitzelberger 1.
Grant Number: IIS Institution of PI: Brigham Young University PI’s: David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale Title:
Presentation transcript:

Cross-Language Hybrid Keyword and Semantic Search David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Joseph S. Park, Andrew Zitzelberger Brigham Young University, USA & Byung-Joo Shin Kyungnam University, Korea BYU Data Extraction Research Group

Cross-Language Information Retrieval Q English Q C English C English 한국어

Cross-Language Information Retrieval Q English Q C English C English 한국어

Cross-Language Information Retrieval Q English Q C English C English 한국어

Cross-Language Information Retrieval Q C English C English 한국어 Hondas in ‘excellent condition’ under 12 grand

Cross-Language Information Retrieval Q C English C English 한국어 Hondas in ‘excellent condition’ under 12 grand Make Cost Honda $9,853

Cross-Language Information Retrieval Q English C English C 한국어 1,340 만원 미만의 좋은 상태인 혼다 자동차 제조사 가격 혼다 1,284 만원

Cross-Language Information Retrieval C English English 한국어 Hondas in ‘excellent condition’ under 12 grand 12 달러 이하 ' 좋은 조건 ' 에 혼다 12 dollars less than good condition with honda

Cross-Language Information Retrieval Q C English C English français Address of funeral home for Oscar Urbain Funérarium Coton-Hanon rue de Quaregnon, 38 Flénu

ML-HyKSS Q English Q C English C English 한국어 Free-form query Semi-structured/unstructured repository

ML-HyKSS Q English Q C English C English 한국어 Free-form query Semi-structured/unstructured repository Linguistically grounded extraction ontology Cross-language mappings

ML-HyKSS Q English Q C English C English 한국어 Free-form query Semi-structured/unstructured repository Linguistically grounded extraction ontologyKeyword and semantic indexer Translator Cross-language mappings Query interpreter

ML-HyKSS Q English Q C English C English 한국어 Free-form query Semi-structured/unstructured repository Advanced query form Linguistically grounded extraction ontologyKeyword and semantic indexer Translator Cross-language mappings Query interpreter

Linguistically Grounded Extraction Ontology Price internal representation: Double external representations: \$[1-9]\d{0,2},?\d{3} | \d?\d [Gg]rand |... context keywords: price|asking|obo|neg(\.|otiable)| units: dollars|[Kk]... canonicalization method: toUSDollars comparison methods: LessThan(p1: Price, p2: Price) returns (Boolean) external representation: (less than | < | under |...)\s*{p2} | output method: toUSDollarsFormat... end Make... external representation: CarMake.lexicon...

Semantic & Keyword Indexing Q English Q C English C English 한국어 Free-form query Semi-structured/unstructured repository Advanced query form Linguistically grounded extraction ontologyKeyword and semantic indexer Translator Cross-language lexicons Query interpreter

Semantic Indexing ‘97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415 JERRY SEINER MIDVALE, or

Semantic Indexing ‘97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415 JERRY SEINER MIDVALE, or

Semantic Indexing ‘97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415 JERRY SEINER MIDVALE, or

Semantic Indexing ‘97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles on her. Previous owner heart broken! Asking only $11,995. #1415 JERRY SEINER MIDVALE, or

Hybrid Semantic & Keyword Queries Q English Q C English C English 한국어 Free-form query Semi-structured/unstructured repository Advanced query form Linguistically grounded extraction ontologyKeyword and semantic indexer Translator Cross-language lexicons Query interpreter

Free-Form Query Interpretation Hondas in ‘excellent condition’ under 12 grand

Free-Form Query Interpretation Hondas in ‘excellent condition’ under 12 grand < $12,000 = Honda

Keywords in Queries Hondas in ‘excellent condition’ under 12 grand Remove inequality semantic phrases Keep equality semantic words and phrases (singular as semantics, literals as keywords) Remove stop words

Query Processing Hondas in ‘excellent condition’ under 12 grand Demo

Advanced Form Queries

Conceptual-Level Translation Q English Q C English C English 한국어 Free-form query Semi-structured/unstructured repository Advanced query form Linguistically grounded extraction ontologyKeyword and semantic indexer Translator Cross-language lexicons Query interpreter

Structural Equivalence 자동차 색상주행거리 제조사 모델 등급 액세서리 변속기 차대 모델등급 엔진 특징 연식 가격

Structural Equivalence Accident 자동차 색상주행거리 제조사 모델 등급 액세서리 변속기 차대 모델등급 엔진 특징 연식 가격 사고유무

Translations Lexicon translations Formulaic translations Currency translations Transliterations Keyword translations Commentary translations

Lexicon Translations 혼다 Honda blue

Formulaic Translations Karfreitag, 2012Good Friday, 2012 April 6, 2012 footmeter 2012 년 4 월 6 일 Vendredi saint /6/12

Formulaic Translations Karfreitag, 2012Good Friday, 2012 April 6, 2012 footmeter 2012 년 4 월 6 일 Vendredi saint /6/12

Formulaic Translations Karfreitag, 2012Good Friday, 2012 April 6, 2012 footmeter 2012 년 4 월 6 일 Vendredi saint /6/ meter

Currency Translations under 12 grand < $12,000 < 13,361,331 원 1,100 만원 $9,880

Transliterations Hangul/Latin-Language Transliterator 신병주 Byungjoo Shin

Keyword Translations “is in” “inn” (i.e., “hotel”) ‘excellent condition’ (good condition) 좋은 상태인 ‘excellent condition’ > 인

Commentary Translations Korean age reckoning is a newborn child was one year old. Since then, the age is changed in the new year. In this way, children born on December 31, the next day is changed to two years of age. In this age of reckoning is not used officially (and legally) on a daily basis, and widely accepted in Korea.

Construction Issues Semi-automatic construction of extraction ontologies Information extraction research Query ontologies vs. extraction ontologies (WordNet) Semi-automatic construction of cross-language mappings Language-agnostic, star-shaped mapping configuration Pay-as-you-go construction

Evaluation: Semantic Indexer Q English Q C English C English 한국어 Free-form query Semi-structured/unstructured repository Advanced query form Linguistically grounded extraction ontologyKeyword and semantic indexer Translator Cross-language lexicons Query interpreter

Evaluation: Semantic Indexer Make Model Year Price Color Mileage FrenchRecall 87% 76% 96% 89% 82% 98% Precision 65% 67% 90% 95% 47% 92% Korean Recall 99% 99% 100% 100% 100% 95% Precision 99% 99% 100% 100% 100% 95% Death Funeral Mortuary Relative Name Title Name Date Date Time Place Name &Relation French Recall 76% 42% 80% 69% 43% 38% N/A Precision 99% 63% 88% 70% 30% 83% N/A Korean Recall N/A 97% 97% 50% 50% 100% 99% 97% Precision N/A 97% 97% 100% 100% 67% 94% 94%

Evaluation: Interpreter/Translator Q English Q C English C English 한국어 Keyword and semantic indexer Translator Query interpreter

Evaluation: Interpreter/Translator Recall Precision Car Ad Queries       French to English 77% 86% 100% 81% 90% 74% Korean to English 98% 100% 100% 93% 99% 52% }} within language query interpretation (cross-language, necessarily correct) cross-language translation

Conclusions Cross-language query translation At conceptual level Rather than at language level Prototype implementation Semantic indexing (average F-measures) 90% for semi-structured car-ads 75% for unstructured obituaries Query interpretation (average F-measures) 94% for identifying semantic constraints 87% for identifying referenced concepts 77% for identifying keywords