Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.

Slides:



Advertisements
Similar presentations
1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
Advertisements

RCQ-ACS: RDF Chain Query Optimization Using an Ant Colony System WI 2012 Alexander Hogenboom Erasmus University Rotterdam Ewout Niewenhuijse.
Polarity Analysis of Texts using Discourse Structure CIKM 2011 Bas Heerschop Erasmus University Rotterdam Frank Goossen Erasmus.
Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle
1 Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper bypaper Mary Elaine Califf and Raymond.
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
Hermes: News Personalization Using Semantic Web Technologies
Exploiting Discourse Structure for Sentiment Analysis of Text OR 2013 Alexander Hogenboom In collaboration with Flavius Frasincar, Uzay Kaymak, and Franciska.
Connecting Customer Relationship Management Systems to Social Networks 7th International Conference on Knowledge Management, Services, and Cloud Computing.
Determining Negation Scope and Strength in Sentiment Analysis SMC 2011 Paul van Iterson Erasmus School of Economics Erasmus University Rotterdam
Exploiting Emoticons in Sentiment Analysis SAC 2013 Daniella Bal Erasmus University Rotterdam Flavius Frasincar Erasmus University.
Erasmus University Rotterdam Frederik HogenboomEconometric Institute School of Economics Flavius Frasincar.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
RCQ-GA: RDF Chain Query Optimization using Genetic Algorithms BNAIC 2009 Alexander Hogenboom, Viorel Milea, Flavius Frasincar, and Uzay Kaymak Erasmus.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
March 17, 2008SAC WT Hermes: a Semantic Web-Based News Decision Support System* Flavius Frasincar Erasmus University Rotterdam.
Automatically Annotating Web Pages Using Google Rich Snippets 11th Dutch-Belgian Information Retrieval Workshop (DIR 2011) February 4, 2011 Frederik Hogenboom.
Optimizing RDF Chain Queries using Genetic Algorithms DBDBD 2010 Alexander Hogenboom, Viorel Milea, Flavius Frasincar, and Uzay Kaymak Erasmus University.
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.
An Overview of Event Extraction from Text Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11) October 23,
News Personalization using the CF-IDF Semantic Recommender International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011) May 25, 2011.
A Survey of Approaches on Mining the Structure from Unstructured Data Dutch-Belgian Database Day 2009 (DBDBD 2009) 1 Nov. 30, 2009 Frederik Hogenboom
Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Ontology-based Information Extraction for Business Intelligence
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Sentiment Analysis with a Multilingual Pipeline 12th International Conference on Web Information System Engineering (WISE 2011) October 13, 2011 Daniëlla.
Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.
Erasmus University Rotterdam Introduction With the vast amount of information available on the Web, there is an increasing need to structure Web data in.
A News-Based Approach for Computing Historical Value-at-Risk International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) Frederik Hogenboom.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Survey of Semantic Annotation Platforms
Knowledge representation
Chapter 1 Introduction to Data Mining
Ontology Updating Driven by Events Dutch-Belgian Database Day 2012 (DBDBD 2012) November 21, 2012 Frederik Hogenboom Jordy Sangers.
Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors : JEROEN DE KNIJFF, FLAVIUS FRASINCAR, FREDERIK HOGENBOOM DKE Data & Knowledge.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar Erasmus University Rotterdam * Joint work with Kim Schouten,
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
*Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR Rotterdam, the Netherlands † Teezir BV Wilhelminapark 46, NL-3581 NL, Utrecht, the Netherlands.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.
Erasmus University Rotterdam Introduction Content-based news recommendation is traditionally performed using the cosine similarity and TF-IDF weighting.
Facilitating Document Annotation using Content and Querying Value.
Towards Cross-Language Sentiment Analysis through Universal Star Ratings KMO 2012 Malissa Bal Erasmus University Rotterdam Flavius.
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Lexico-semantic Patterns for Information Extraction from Text The International Conference on Operations Research 2013 (OR 2013) Frederik Hogenboom
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Kevin Meijer, Flavius Frasincar, Frederik Hogenboom 2014.DSS. A semantic approach.
Topic Maps introduction Peter-Paul Kruijsen CTO, Morpheus software ISOC seminar, april 5 th 2005.
Information Retrieval
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle
©2012 Paula Matuszek CSC 9010: Information Extraction Overview Dr. Paula Matuszek (610) Spring, 2012.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Erasmus University Rotterdam
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Social Knowledge Mining
Extracting Semantic Concept Relations
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus University Rotterdam PO Box 1738, NL-3000 DR Rotterdam, the Netherlands In collaboration with: Flavius Frasincar and Wouter IJntema

Introduction (1) Increasing amount of (digital) data Problem: utilizing extracted information in decision making processes becomes increasingly urgent and difficult: –Too much data for manual extraction –Yet most data is initially unstructured –Data often contains natural language Solution: automatically process and interpret information, yet automation is a non-trivial task The Dutch-Belgian Database Day 2013 (DBDBD 2013)

Introduction (2) Information Extraction (IE) –Multiple sources: News messages Blogs Papers … –Text Mining (TM): Natural Language Processing (NLP) Statistics … –Specific type of information that can be extracted: events The Dutch-Belgian Database Day 2013 (DBDBD 2013)

Events (1) The Dutch-Belgian Database Day 2013 (DBDBD 2013)

Events (1) The Dutch-Belgian Database Day 2013 (DBDBD 2013)

Events (1) The Dutch-Belgian Database Day 2013 (DBDBD 2013)

Events (2) Event: –Complex combination of relations linked to a set of empirical observations from texts –Can be defined as: e.g., Event extraction could be beneficial to IE systems: –Personalized news –Risk analysis –Monitoring –Decision making support The Dutch-Belgian Database Day 2013 (DBDBD 2013)

Events (3) Common event domains: –Medical –Finance –Politics –Environment The Dutch-Belgian Database Day 2013 (DBDBD 2013)

Event Extraction In analogy with the classic distinction within the field of modeling, we distinguish 3 main approaches: –Data-driven event extraction: Statistics Machine learning Linear algebra … –Expert knowledge-driven event extraction: Representation & exploitation of expert knowledge Patterns –Hybrid event extraction: Combine knowledge and data-driven methods Our focus: expert knowledge-driven event extraction through the usage of pattern languages The Dutch-Belgian Database Day 2013 (DBDBD 2013)

Existing Approaches Various pattern-languages for: –News processing frameworks (e.g., PlanetOnto) –General purpose frameworks (e.g., CAFETIERE, KIM, etc.) Language types: –Lexico-syntactic –Lexico-semantic However: –Limited syntax –Weak semantics –Cumbersome in use –Extract entities, but not events The Dutch-Belgian Database Day 2013 (DBDBD 2013)

Semantics Semantic Web: –Collection of technologies that express content meta-data –Offers means to help machines understand human-created data on the Web Ontologies: –Can be used to store domain-specific knowledge in the form of concepts (classes + instances) –Also contain inter-concept relations The Dutch-Belgian Database Day 2013 (DBDBD 2013)

Pattern Language (1) Basic syntax: –LHS :- RHS –LHS: subject, predicate, object (optional) –RHS: pattern in which subject and object are assigned: Literals (text strings) Lexical categories (nouns, prepositions, verbs, etc.) Orthographic categories (capitalization) Labels (assigning subject and object) Logical operators (and, or, not) Repetition (≥0, ≥1, 0-1, {min,max}) Wildcards (skip ≥0 or exactly 1 word) Ontological concepts The Dutch-Belgian Database Day 2013 (DBDBD 2013)

Pattern Language (2) The Dutch-Belgian Database Day 2013 (DBDBD 2013) Example:

Rule Creation Groups of rules extract specific events Creating such groups is cumbersome, error-prone and time-consuming If the language is implemented using tree structures, a genetic programming approach can be employed for learning rules automatically The Dutch-Belgian Database Day 2013 (DBDBD 2013)

Rule Learning The Dutch-Belgian Database Day 2013 (DBDBD 2013)

Implementation The Hermes News Portal (HNP) is a stand-alone Java-based news personalization tool We have implemented the Hermes Information Extraction Engine (HIEE) within the HNP Pipeline-architecture is based on GATE components The Dutch-Belgian Database Day 2013 (DBDBD 2013)

Evaluation (1) We compare the performance of rule learning versus manually creating rules: –Using a data set on economic events (500 news messages): CEO Profit President Product Loss Revenue Shares Partner Competitor Subsidiary –By allowing for 5 hours of construction time per rule group (including reading, thinking, writing, …) –Based on the Precision, Recall, and F 1 -measure The Dutch-Belgian Database Day 2013 (DBDBD 2013)

Evaluation (2) The Dutch-Belgian Database Day 2013 (DBDBD 2013) Automatic Learning Manual Creation NamePrecisionRecallF1F1 PrecisionRecallF1F1 Δ%Δ% Competitor % Loss % Partner % Subsidiary % CEO % President % Product % Profit % Sales % ShareValue % Total %

Conclusions We presented HIEL, a lexico-semantic rule language for event extraction Rule creation is cumbersome, and hence a genetic programming-based learning approach is proposed Lexico-semantic rule learning performs better than the manual alternative in terms of precision, recall, and F 1 Future work: –Evaluate approach for existing lexico-semantic languages –Evaluate on other domains –Link events to trading algorithms instead of news personalization The Dutch-Belgian Database Day 2013 (DBDBD 2013)

Questions The Dutch-Belgian Database Day 2013 (DBDBD 2013)