Lexico-semantic Patterns for Information Extraction from Text The International Conference on Operations Research 2013 (OR 2013) Frederik Hogenboom

Slides:



Advertisements
Similar presentations
1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
Advertisements

RCQ-ACS: RDF Chain Query Optimization Using an Ant Colony System WI 2012 Alexander Hogenboom Erasmus University Rotterdam Ewout Niewenhuijse.
Polarity Analysis of Texts using Discourse Structure CIKM 2011 Bas Heerschop Erasmus University Rotterdam Frank Goossen Erasmus.
Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.
Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle
1 Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper bypaper Mary Elaine Califf and Raymond.
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
Hermes: News Personalization Using Semantic Web Technologies
Exploiting Discourse Structure for Sentiment Analysis of Text OR 2013 Alexander Hogenboom In collaboration with Flavius Frasincar, Uzay Kaymak, and Franciska.
Connecting Customer Relationship Management Systems to Social Networks 7th International Conference on Knowledge Management, Services, and Cloud Computing.
Determining Negation Scope and Strength in Sentiment Analysis SMC 2011 Paul van Iterson Erasmus School of Economics Erasmus University Rotterdam
Exploiting Emoticons in Sentiment Analysis SAC 2013 Daniella Bal Erasmus University Rotterdam Flavius Frasincar Erasmus University.
Erasmus University Rotterdam Frederik HogenboomEconometric Institute School of Economics Flavius Frasincar.
IR & Metadata. Metadata Didn’t we already talk about this? We discussed what metadata is and its types –Data about data –Descriptive metadata is external.
RCQ-GA: RDF Chain Query Optimization using Genetic Algorithms BNAIC 2009 Alexander Hogenboom, Viorel Milea, Flavius Frasincar, and Uzay Kaymak Erasmus.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
March 17, 2008SAC WT Hermes: a Semantic Web-Based News Decision Support System* Flavius Frasincar Erasmus University Rotterdam.
Automatically Annotating Web Pages Using Google Rich Snippets 11th Dutch-Belgian Information Retrieval Workshop (DIR 2011) February 4, 2011 Frederik Hogenboom.
Using Information Extraction for Question Answering Done by Rani Qumsiyeh.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Optimizing RDF Chain Queries using Genetic Algorithms DBDBD 2010 Alexander Hogenboom, Viorel Milea, Flavius Frasincar, and Uzay Kaymak Erasmus University.
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.
An Overview of Event Extraction from Text Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11) October 23,
News Personalization using the CF-IDF Semantic Recommender International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011) May 25, 2011.
A Survey of Approaches on Mining the Structure from Unstructured Data Dutch-Belgian Database Day 2009 (DBDBD 2009) 1 Nov. 30, 2009 Frederik Hogenboom
Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Ontology-based Information Extraction for Business Intelligence
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Text Analytics And Text Mining Best of Text and Data
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Sentiment Analysis with a Multilingual Pipeline 12th International Conference on Web Information System Engineering (WISE 2011) October 13, 2011 Daniëlla.
Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.
Erasmus University Rotterdam Introduction With the vast amount of information available on the Web, there is an increasing need to structure Web data in.
A News-Based Approach for Computing Historical Value-at-Risk International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) Frederik Hogenboom.
TOWL Time-determined ontology-based information system for real-time stock market analysis Econometric Institute Erasmus School of Economics Erasmus University.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Survey of Semantic Annotation Platforms
Knowledge representation
Chapter 1 Introduction to Data Mining
Ontology Updating Driven by Events Dutch-Belgian Database Day 2012 (DBDBD 2012) November 21, 2012 Frederik Hogenboom Jordy Sangers.
Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors : JEROEN DE KNIJFF, FLAVIUS FRASINCAR, FREDERIK HOGENBOOM DKE Data & Knowledge.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar Erasmus University Rotterdam * Joint work with Kim Schouten,
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.
BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
*Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR Rotterdam, the Netherlands † Teezir BV Wilhelminapark 46, NL-3581 NL, Utrecht, the Netherlands.
Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.
Erasmus University Rotterdam Introduction Content-based news recommendation is traditionally performed using the cosine similarity and TF-IDF weighting.
Understanding User’s Query Intent with Wikipedia G 여 승 후.
Towards Cross-Language Sentiment Analysis through Universal Star Ratings KMO 2012 Malissa Bal Erasmus University Rotterdam Flavius.
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Kevin Meijer, Flavius Frasincar, Frederik Hogenboom 2014.DSS. A semantic approach.
Information Retrieval
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle
©2012 Paula Matuszek CSC 9010: Information Extraction Overview Dr. Paula Matuszek (610) Spring, 2012.
Siemens Enables Digitalization: Data Analytics & Artificial Intelligence Dr. Mike Roshchin, CT RDA BAM.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Erasmus University Rotterdam
Lecture #11: Ontology Engineering Dr. Bhavani Thuraisingham
Extracting Semantic Concept Relations
Ontology-Based Aspect Detection for Sentiment Analysis
Extracting Why Text Segment from Web Based on Grammar-gram
Ontology-Enhanced Aspect-Based Sentiment Analysis
Presentation transcript:

Lexico-semantic Patterns for Information Extraction from Text The International Conference on Operations Research 2013 (OR 2013) Frederik Hogenboom Erasmus University Rotterdam PO Box 1738, NL-3000 DR Rotterdam, the Netherlands In collaboration with: Flavius Frasincar, Uzay Kaymak, and Franciska de Jong

Introduction (1) Increasing amount of (digital) data Problem: utilizing extracted information in decision making processes becomes increasingly urgent and difficult: –Too much data for manual extraction –Yet most data is initially unstructured –Data often contains natural language Solution: automatically process and interpret information, yet automation is a non-trivial task The International Conference on Operations Research 2013 (OR 2013)

Introduction (2) Information Extraction (IE) –Multiple sources: News messages Blogs Papers … –Text Mining (TM): Natural Language Processing (NLP) Statistics … –Specific type of information that can be extracted: events The International Conference on Operations Research 2013 (OR 2013)

Events (1) The International Conference on Operations Research 2013 (OR 2013)

Events (2) Event: –Complex combination of relations linked to a set of empirical observations from texts –Can be defined as: e.g., Event extraction could be beneficial to IE systems: –Personalized news –Risk analysis –Monitoring –Decision making support The International Conference on Operations Research 2013 (OR 2013)

Events (3) Common event domains: –Medical –Finance –Politics –Environment The International Conference on Operations Research 2013 (OR 2013)

Event Extraction In analogy with the classic distinction within the field of modeling, we distinguish 3 main approaches: –Data-driven event extraction: Statistics Machine learning Linear algebra … –Expert knowledge-driven event extraction: Representation & exploitation of expert knowledge Patterns –Hybrid event extraction: Combine knowledge and data-driven methods Our focus: expert knowledge-driven event extraction through the usage of pattern languages The International Conference on Operations Research 2013 (OR 2013)

Existing Approaches Various pattern-languages for: –News processing frameworks (e.g., PlanetOnto) –General purpose frameworks (e.g., CAFETIERE, KIM, etc.) Language types: –Lexico-syntactic –Lexico-semantic However: –Limited syntax –Weak semantics –Cumbersome in use –Extract entities, but not events The International Conference on Operations Research 2013 (OR 2013)

Semantics Semantic Web: –Collection of technologies that express content meta-data –Offers means to help machines understand human-created data on the Web Ontologies: –Can be used to store domain-specific knowledge in the form of concepts (classes + instances) –Also contain inter-concept relations The International Conference on Operations Research 2013 (OR 2013)

Pattern Language (1) Basic syntax: –LHS :- RHS –LHS: subject, predicate, object (optional) –RHS: pattern in which subject and object are assigned: Literals (text strings) Lexical categories (nouns, prepositions, verbs, etc.) Orthographic categories (capitalization) Labels (assigning subject and object) Logical operators (and, or, not) Repetition (≥0, ≥1, 0-1, {min,max}) Wildcards (skip ≥0 or exactly 1 word) Ontological concepts The International Conference on Operations Research 2013 (OR 2013)

Pattern Language (2) Provocation example (lexico-syntactic): The International Conference on Operations Research 2013 (OR 2013) ($sub, kb:provokes, $obj) :- $sub:=(((JJ | NNS | NNP | NNPS | NN) & (upperInitial | allCaps | mixedCaps) )(‘.’ NNP ‘.’?)?)+ (!‘.’ & !‘(’ & !‘)’ & !‘-’){0,3} (‘angers’ | ‘angered’ | ‘accuses’ | ‘accused’ | ‘insult’ | ‘insulted’ | ‘provokes’ | ‘provoked’ | ‘threatens’ | ‘threatened’) (!‘.’ & !‘(’ & !‘)’ & !‘-’){0,3} $obj:=(((JJ | NNS | NNP | NNPS | NN) & (upperInitial | allCaps | mixedCaps) )(‘.’ NNP ‘.’?)?)+

Pattern Language (3) Provocation example (lexico-semantic): The International Conference on Operations Research 2013 (OR 2013) ($sub, kb:provokes, $obj) :- $sub:=([kb:Country] | [kb:Continent] | [kb:Union]) (!‘.’ & !‘(’ & !‘)’ & !‘-’){0,3} (kb:toAnger | kb:toAccuse | kb:toInsult | kb:toProvoke | kb:toThreaten) (!‘.’ & !‘(’ & !‘)’ & !‘-’){0,3} $obj:=([kb:Country] | [kb:Continent] | [kb:Union])

Implementation (1) The Hermes News Portal (HNP) is a stand-alone Java-based news personalization tool We have implemented the Hermes Information Extraction Engine (HIEE) within the HNP Pipeline-architecture is based on GATE components The International Conference on Operations Research 2013 (OR 2013)

Implementation (2) The International Conference on Operations Research 2013 (OR 2013)

Implementation (3) The International Conference on Operations Research 2013 (OR 2013)

Implementation (4) The International Conference on Operations Research 2013 (OR 2013)

Evaluation (1) We compare the performance of lexico-syntactic rules with lexico-semantic rules on two data sets: –Economic events (500 news messages): CEO Profit President Product Loss Revenue Shares Partner Competitor Subsidiary –Political events (100 news messages): Election Resignation Provocation Visit Investment Help Sanction Riots Join Collaboration Performance is evaluated based on rule creation times and precision, recall, and F 1 -measure. The International Conference on Operations Research 2013 (OR 2013)

Evaluation (2) Creation times for F 1 ≥ 0.5: 90% improvement The International Conference on Operations Research 2013 (OR 2013) NameLex-SynLex-Sem NameLex-SynLex-Sem CEO Election Product Visit Shares Sanction Competitor Join Profit Resignation Loss Investment Partner Riots Subsidiary Collaboration President Provocation Revenue Help Overall Overall

Evaluation (3) Scores after maximum time (finance) The International Conference on Operations Research 2013 (OR 2013) Lex-Syn Lex-Sem NamePrecisionRecallF1F1 PrecisionRecallF1F1 CEO Product Shares Competitor Profit Loss Partner Subsidiary President Revenue Overall

Evaluation (4) Scores after maximum time (politics) The International Conference on Operations Research 2013 (OR 2013) Lex-Syn Lex-Sem NamePrecisionRecallF1F1 PrecisionRecallF1F1 Election Visit Sanction Join Resignation Investment Riots Collaboration Provocation Help Overall

Conclusions Compared to lexico-syntactic alternatives, our lexico- semantic patterns perform better in terms of precision, recall, F 1, and creation times Also, our rules are more expressive and easier to use than their lexico-syntactic alternatives Future work: –Evaluate against existing lexico-semantic languages –Evaluate on big data set –Link events to trading algorithms instead of news personalization The International Conference on Operations Research 2013 (OR 2013)

Questions The International Conference on Operations Research 2013 (OR 2013)