Towards the Self-Annotating Web Philipp Cimiano, Siegfried Handschuh, Steffen Staab Presenter: Hieu K Le (most of slides come from Philipp Cimiano) CS598CXZ.

Slides:



Advertisements
Similar presentations
Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.
Advertisements

Semantic Access to Data from the Web Raquel Trillo *, Laura Po +, Sergio Ilarri *, Sonia Bergamaschi + and E. Mena * 1st International Workshop on Interoperability.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Towards large-scale, open-domain and ontology-based named entity classification Philipp Cimiano and Johanna Völker University of Karlsruhe Proceedings.
OWL-AA: Enriching OWL with Instance Recognition Semantics for Automated Semantic Annotation 2006 Spring Research Conference Yihong Ding.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Overall Information Extraction vs. Annotating the Data Conference proceedings by O. Etzioni, Washington U, Seattle; S. Handschuh, Uni Krlsruhe.
Annotating Documents for the Semantic Web Using Data-Extraction Ontologies Dissertation Proposal Yihong Ding.
Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:
A Survey of Approaches on Mining the Structure from Unstructured Data Dutch-Belgian Database Day 2009 (DBDBD 2009) 1 Nov. 30, 2009 Frederik Hogenboom
1/17 Acquiring Selectional Preferences from Untagged Text for Prepositional Phrase Attachment Disambiguation Hiram Calvo and Alexander Gelbukh Presented.
Knowledge Extraction by using an Ontology- based Annotation Tool Knowledge Media Institute(KMi) The Open University Milton Keynes, MK7 6AA October 2001.
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Chapter 10: Information Integration and Synthesis.
Information Extraction with Unlabeled Data Rayid Ghani Joint work with: Rosie Jones (CMU) Tom Mitchell (CMU & WhizBang! Labs) Ellen Riloff (University.
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Towards Automatic Structured Web Data Extraction System Tomas Grigalis, 2nd year PhD student Scientific supervisor: prof. habil. dr. Antanas Čenys.
BY PHILIPP CIMIANO PRESENTED BY JOSEPH PARK CONCEPT HIERARCHY INDUCTION.
Knowledge Discovery in Ontology Learning A survey.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Institute of Informatics and Telecommunications – NCSR “Demokritos” Bootstrapping ontology evolution with multimedia information extraction C.D. Spyropoulos,
1 The BT Digital Library A case study in intelligent content management Paul Warren
“How much context do you need?” An experiment about context size in Interactive Cross-language Question Answering B. Navarro, L. Moreno-Monteagudo, E.
Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.
Survey of Semantic Annotation Platforms
Workshop – 10, December 2014, Berlin ICCS / NTUA Greece Efthymios Chondrogiannis An Intelligent Ontology Alignment Tool Dealing with Complicated Mismatches.
ONTOLOGY LEARNING AND POPULATION FROM FROM TEXT Ch8 Population.
Web-scale Information Extraction in KnowItAll Oren Etzioni etc. U. of Washington WWW’2004 Presented by Zheng Shao, CS591CXZ.
Researcher affiliation extraction from homepages I. Nagy, R. Farkas, M. Jelasity University of Szeged, Hungary.
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
Populating A Knowledge Base From Text Clay Fink, Tim Finin, Christine Piatko and Jim Mayfield.
-1- Philipp Heim, Thomas Ertl, Jürgen Ziegler Facet Graphs: Complex Semantic Querying Made Easy Philipp Heim 1, Thomas Ertl 1 and Jürgen Ziegler 2 1 Visualization.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Mining fuzzy domain ontology based on concept Vector from wikipedia category network.
© Copyright 2008 STI INNSBRUCK Semantic Annotation Semantic Web Lecture Dieter Fensel.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Automatic Set Instance Extraction using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University Pittsburgh,
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Exploitation of Semantic Web Technology in ERP Systems Amin Andjomshoaa, Shuaib Karim Ferial Shayeganfar, A Min Tjoa (andjomshoaa, skarim, ferial,
Majid Sazvar Knowledge Engineering Research Group Ferdowsi University of Mashhad Semantic Web Reasoning.
CREAM: Semantic annotation system May 24, 2013 Hee-gook Jun.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
KnowItAll April William Cohen. Announcements Reminder: project presentations (or progress report) –Sign up for a 30min presentation (or else) –First.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Learning Taxonomic Relations from Heterogeneous Evidence Philipp Cimiano Aleksander Pivk Lars Schmidt-Thieme Steffen Staab (ECAI 2004)
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Annotating and measuring Temporal relations in texts Philippe Muller and Xavier Tannier IRIT,Université Paul Sabatier COLING 2004.
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
Learning Attributes and Relations
Information Extraction from Wikipedia: Moving Down the Long Tail
Semantic Web Annotation
Social Knowledge Mining
CSE 635 Multimedia Information Retrieval
Presentation transcript:

Towards the Self-Annotating Web Philipp Cimiano, Siegfried Handschuh, Steffen Staab Presenter: Hieu K Le (most of slides come from Philipp Cimiano) CS598CXZ - Spring UIUC

Outline  Introduction  The Process of PANKOW  Pattern-based categorization  Evaluation  Integration to CREAM  Related work  Conclusion

The annotation problem in 4 cartoons

The annotation problem from a scientific point of view

The annotation problem in practice

The viscious cycle

Annotating A Noun A Concep t ?

Annotating To annotate terms in a web page: –Manually defining –Learning of extraction rules  Both require lot of labor

A small Quiz What is “Laska” ? A A. A dish B B. A city C C. A temple D D. A mountain  The answer is:

A small Quiz What is “Laska” ? A A. A dish B B. A city C C. A temple D D. A mountain  The answer is:

A small Quiz What is “Laska” ? A A. A dish B B. A city C C. A temple D D. A mountain  The answer is:

From Google “Laska”

From Google „cities such as Laksa“ 0 hits „dishes such as Laksa“ 10 hits „mountains such as Laksa“ 0 hits „temples such as Laksa“ 0 hits  Google knows more than all of you together!  Example of using syntactic information + statistics to derive semantic information

Self-annotating PANKOW ( P attern-based A nnotation through K nowledge O n the W eb) –Unsupervised –Pattern based –Within a fixed ontology –Involve information of the whole web

The Self-Annotating Web There is a huge amount of implicit knowledge in the Web Make use of this implicit knowledge together with statistical information to propose formal annotations and overcome the viscious cycle: semantics ≈ syntax + statistics? Annotation by maximal statistical evidence

Outline Introduction  The Process of PANKOW  Pattern-based categorization  Evaluation  Integration to CREAM  Related work  Conclusion

PANKOW Process

Outline Introduction The Process of PANKOW  Pattern-based categorization  Evaluation  Integration to CREAM  Related work  Conclusion

Patterns HEARST1: s such as HEARST2: such s as HEARST3: s, (especially/including) HEARST4: (and/or) other s Examples: –dishes such as Laksa –such dishes as Laksa –dishes, especially Laksa –dishes, including Laksa –Laksa and other dishes –Laksa or other dishes

Patterns (Cont‘d) DEFINITE1: the DEFINITE2: the APPOSITION:, a COPULA: is a Examples: the Laksa dish the dish Laksa Laksa, a dish Laksa is a dish

Asking Google (more formally) Instance i  I, concept c  C, pattern p  {Hearst1,...,Copula} count(i,c,p) returns the number of Google hits of instantiated pattern E.g. count(Laksa,dish):=count(Laksa,dish,def1)+... Restrict to the best ones beyond threshold

Outline Introduction The Process of PANKOW Pattern-based categorization  Evaluation  Integration to CREAM  Related work  Conclusion

Evaluation Scenario Corpus: 45 texts from Ontology: tourism ontology from GETESS project –#concepts: original – 1043; pruned – 682 Manual Annotation by two subjects: –A: 436 instance/concept assignments –B: 392 instance/concept assignments –Overlap: 277 instances (Gold Standard) –A and B used 59 different concepts –Categorial (Kappa) agreement on 277 instances: 63.5%

Examples Atlantic city Bahamas island USA country Connecticut state Caribbean sea Mediterranean sea Canada country Guatemala city Africa region Australia country France country Germany country Easter island St Lawrence river Commonwealth state New Zealand island Adriatic sea Netherlands country St John church Belgium country San Juan island Mayotte island EU country UNESCO organization Austria group Greece island Malawi lake Israel country Perth street Luxembourg city Nigeria state St Croix river Nakuru lake Kenya country Benin city Cape Town city 13768

Results F=28,24% R/Acc=24,90%

Comparison System#Preprocessing / CostAccuracy [MUC-7]3Various (?)>> 90% [Fleischman02]8N-gram extraction ($)70.4% PANKOW59none24.9% [Hahn98] –TH196syn. & sem. analysis ($$$)21% [Hahn98]-CB196syn. & sem. analysis ($$$)26% [Hahn98]-CB196syn. & sem. analysis ($$$)31% [Alfonseca02]1200syn. analysis ($$)17.39% (strict)

Outline Introduction The Process of PANKOW Pattern-based categorization Evaluation  Integration to CREAM  Related work  Conclusion

CREAM/OntoMat Document Management Annotation Environment Annotated Web Pages Web Pages Domain Ontologies WWW PANKOW annotate crawl Annotation Tool GUI plugin query extract load Annotation Inference Server Annotation by Markup Ontology Guidance & Fact Browser Document Editor / Viewer

PANKOW & CREAM/OntoMat

Results (Interactive Mode) F=51,65% R/Acc=49.46%

Outline Introduction The Process of PANKOW Pattern-based categorization Evaluation Integration to CREAM  Related work  Conclusion

Current State-of-the-art Large-scale IE –only disambiguation Standard IE (MUC) –need of handcrafted rules ML-based IE –need of hand-annotated training corpus –does not scale to large numbers of concepts –rule induction takes time KnowItAll (Etzioni et al. WWW‘04) –shallow (pattern-matching-based) approach

Outline Introduction The Process of PANKOW Pattern-based categorization Evaluation Integration to CREAM Related work  Conclusion

Conclusion Summary new paradigm to overcome the annotation problem unsupervised instance categorization first step towards the self-annotating Web difficult task: open domain, many categories decent precision, low recall very good results for interactive mode currently inefficient (590 Google queries/instance) Challenges: contextual disambiguation annotating relations (currently restricted to instances) scalability (e.g. only choose reasonable queries to Google) accurate recognition of Named Entities (currently POS-tagger)

Outline Introduction The Process of PANKOW Pattern-based categorization Evaluation Integration to CREAM Related work Conclusion

Thanks to… Philipp Cimiano karlsruhe.de) for karlsruhe.de The audience for listening