Ontology-Based Information Extraction: Current Approaches.

Slides:



Advertisements
Similar presentations
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Advertisements

Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
Helping people find content … preparing content to be found Enabling the Semantic Web Joseph Busch.
Information Retrieval in Practice
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
References Kempen, Gerard & Harbusch, Karin (2002). Performance Grammar: A declarative definition. In: Nijholt, Anton, Theune, Mariët & Hondorp, Hendri.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
A New Web Semantic Annotator Enabling A Machine Understandable Web BYU Spring Research Conference 2005 Yihong Ding Sponsored by NSF.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
CONVERSE Intelligent Research Ltd. David Levy, Bobby Batacharia University of Sheffield Yorick Wilks, Roberta Catizone, Alex Krotov.
Retrieving Location-based Data on the Web Andrei Tabarcea,
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Basic HTML The Magic Of Web Pages. Create an HTML folder  Make a folder in your H drive and name it “HTML”. We will save EVERYTHING for this unit here.
Logic Programming for Natural Language Processing Menyoung Lee TJHSST Computer Systems Lab Mentor: Matt Parker Analytic Services, Inc.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Avalanche Internet Data Management System. Presentation plan 1. The problem to be solved 2. Description of the software needed 3. The solution 4. Avalanche.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Aardvark Anatomy of a Large-Scale Social Search Engine.
Survey of Semantic Annotation Platforms
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
Unit Seven Database 1.Passage One. Foundation of Database.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
NLP And The Semantic Web Dainis Kiusals COMS E6125 Spring 2010.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Universit at Dortmund, LS VIII
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Semantic Technologies & GATE NSWI Jan Dědek.
Research Topics CSC Parallel Computing & Compilers CSC 3990.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
You sexy beast. Ok, inappropriate. How about: Web of links to Web of Meaning Hello Semantic Web!
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
OWL Representing Information Using the Web Ontology Language.
Introduction to the Semantic Web and Linked Data
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
ICT-enabled Agricultural Science for Development Scenarios, Opportunities, Issues by ICTs transforming agricultural science, research & technology generation.
WEB PAGE CONTENTS VERIFICATION AGAINST TAGS USING DATA MINING TOOL IKNOW VІI scientific and practical seminar with international participation "Economic.
1 Information Retrieval LECTURE 1 : Introduction.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
Knowledge based Question Answering System Anurag Gautam Harshit Maheshwari.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Steve Cassidy Computing at MacquarieNo 1 Searching The Web Steve Cassidy Centre for Language Technology Department of Computing Macquarie University.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
The Semantic Web Vision. Course Work Dr Yasser Fouad Blogs.alexu.edu.eg 2.
Semantic Wiki: Automating the Read, Write, and Reporting functions Chuck Rehberg, Semantic Insights.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
AQUAINT Mid-Year PI Meeting – June 2002 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Information Retrieval in Practice
Online PD Basic HTML The Magic Of Web Pages
Information Retrieval and Web Search
Presented by: Hassan Sayyadi
Information Retrieval and Web Search
Thanks to Bill Arms, Marti Hearst
CSE 635 Multimedia Information Retrieval
Information Retrieval
Presentation transcript:

Ontology-Based Information Extraction: Current Approaches

Internet today Data Exposion – 45 GB of data per person across the world – 988,000,000,000,000,000,000 available in net in 2010 – 60% yearly grouth – 1,800,000,000,000,000,000,000 (1,800 Exa Bytes) will be available till the end of 2011 (according to IDC statistics)

Internet Today (2) Web 2.0 User Generated content – To the end of 2013, 155 milions of users (only in US) will be using information created by others. – 115 milion users in US will actively create content in the Web – The increase of sharing data is currently 15 times larger than downloading data (data from 2008)

Search "...Search today is still kind of a hunt, where you get all these links, and as we teach software to understand the documents, really read them in the sense a human does, you will get answers more directly..." - Bill Gates.

Google search engine Query: „Which Nobel prize winners were born before Albert Einstein?” Google - 24,600,000 results: - Albert Einstein – Biography Albert Einstein – Biography - Albert Einstein - Wikipedia, the free encyclopediaAlbert Einstein - Wikipedia, the free encyclopedia - Jewish Nobel Prize Winners in PhysicsJewish Nobel Prize Winners in Physics - Nobel Prize Winners Hate School (Learn in Freedom!)Nobel Prize Winners Hate School (Learn in Freedom!) - HHF Factpaper: Jewish Nobel Prize Winners; Part II: PhysicsHHF Factpaper: Jewish Nobel Prize Winners; Part II: Physics Why? Becouse queries in google are key word based and do not distinguish semantic connections between words.

Solution to inprecise data The best solution would be to make everything available online presented in semantic way (idea o web 3.0 – Tim Berners Lee). Produce semantic aware Information extraction systems.

Information Extraction Reduces the information in document transforming it to a machine readable structure Tightly connected with NLP IE system are not trying to understand the input data Analyzes portions of documents containing relevant information

New view on IE More and more people are starting to see it not only as a process of retrieving disconnected text tokens, but more like obtaining meaningful semantic data

How do we get there -OBIE An Ontology-Based Information Extraction System: A system that processes unstructured or semi-structured natural language text through a mechanism guided by ontology to extract certain types of information and presents the output using ontology.

Yago vs. Google Query: „Which Nobel prize winners were born before Albert Einstein?” Yago - 1 result * - Johannes_Stark (15 April 1874 – 21 June 1957) was a German physicist, and Physics Nobel Prize laureate who was closely involved with the Deutsche Physik movement under the Nazi regime.Johannes_Stark * Note that comparing to google yago has very limited knowledge database

Usability of OBIE Natural language automating processing Creating semantic content for web 3.0 Improving the quality of the ontologies

Typical OBIE Architecture

Exa

Preprocessor Preprocessor consists of input specific modules which transform text into form that can be processed by extraction module. Preprocessing consist mainly of striping whitespaces, HTML tags, unreadable characters

Extraction module Extraction module is a place where actual IE takes place, right here the input data is being analyzed, changed into tokens understandable by ontology, and in the end bind with semantic relationship. The data produced by extraction module needs to be transformed into specific descriptive logic language (right now it is usually OWL) in order to be saved in knowledge database.

Rule Learning-Based Extraction Methods (RLBEM) Dictionary Based Method (DBM) – Before the IE begins a dictionary of patterns is created, later on this dictionary is used to extract needed information from the new untagged text. Based Method (RBM) – uses rules instead of dictionaries for IE

DBM example Assuming that we want to find information about terrorist attack, in this case one can use a concept that consists of the triggering word "bombed" together with the linguistic pattern passive-verb. Then when DBM finds sentence like "We are going to bomb NY metro tonight" concept will be activated (the sentence contains word bombed), than the linguistic pattern is matched against the sentence and the subject (in this case it will be NY metro) is extracted as the target of the terrorist attack.

Classification Based Extraction Method (CBEM) The basic idea behind CBEM is to look at the IE problem as it was a classification problem. Currently the most popular approach to classification problem is using Support Vector Machines (SVM) which are classified as unsupervised learning Artificial Neural Network systems.

Classification sample After proper training when given the text: "Professor Marian Makuch will give a speech about dark matter" the SVM CBEM system should point out the "Professor" token as a beginning of speaker label and "Makuch" as an end.

OBIE Ontology