Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.

Slides:



Advertisements
Similar presentations
1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
Advertisements

RCQ-ACS: RDF Chain Query Optimization Using an Ant Colony System WI 2012 Alexander Hogenboom Erasmus University Rotterdam Ewout Niewenhuijse.
Polarity Analysis of Texts using Discourse Structure CIKM 2011 Bas Heerschop Erasmus University Rotterdam Frank Goossen Erasmus.
New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.
Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
Hermes: News Personalization Using Semantic Web Technologies
Exploiting Discourse Structure for Sentiment Analysis of Text OR 2013 Alexander Hogenboom In collaboration with Flavius Frasincar, Uzay Kaymak, and Franciska.
Determining Negation Scope and Strength in Sentiment Analysis SMC 2011 Paul van Iterson Erasmus School of Economics Erasmus University Rotterdam
Exploiting Emoticons in Sentiment Analysis SAC 2013 Daniella Bal Erasmus University Rotterdam Flavius Frasincar Erasmus University.
Opinion Mapping Travelblogs Efthymios Drymonas Alexandros Efentakis Dieter Pfoser Research Center Athena Institute for the Management of Information Systems.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Erasmus University Rotterdam Frederik HogenboomEconometric Institute School of Economics Flavius Frasincar.
RCQ-GA: RDF Chain Query Optimization using Genetic Algorithms BNAIC 2009 Alexander Hogenboom, Viorel Milea, Flavius Frasincar, and Uzay Kaymak Erasmus.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
March 17, 2008SAC WT Hermes: a Semantic Web-Based News Decision Support System* Flavius Frasincar Erasmus University Rotterdam.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Automatically Annotating Web Pages Using Google Rich Snippets 11th Dutch-Belgian Information Retrieval Workshop (DIR 2011) February 4, 2011 Frederik Hogenboom.
Optimizing RDF Chain Queries using Genetic Algorithms DBDBD 2010 Alexander Hogenboom, Viorel Milea, Flavius Frasincar, and Uzay Kaymak Erasmus University.
An Overview of Event Extraction from Text Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11) October 23,
News Personalization using the CF-IDF Semantic Recommender International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011) May 25, 2011.
A Survey of Approaches on Mining the Structure from Unstructured Data Dutch-Belgian Database Day 2009 (DBDBD 2009) 1 Nov. 30, 2009 Frederik Hogenboom
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Text mining and the Semantic Web Dr Diana Maynard NLP Group Department of Computer Science University of Sheffield.
Ontology-based Information Extraction for Business Intelligence
Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Sentiment Analysis with a Multilingual Pipeline 12th International Conference on Web Information System Engineering (WISE 2011) October 13, 2011 Daniëlla.
Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.
Erasmus University Rotterdam Introduction With the vast amount of information available on the Web, there is an increasing need to structure Web data in.
A News-Based Approach for Computing Historical Value-at-Risk International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) Frederik Hogenboom.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
TOWL Time-determined ontology-based information system for real-time stock market analysis Econometric Institute Erasmus School of Economics Erasmus University.
Retrieval Effectiveness of an Ontology-based Model for Information Selection Khan, L., McLeod, D. & Hovy, E. Presented by Danielle Lee.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Survey of Semantic Annotation Platforms
Experiences with UIMA from a User’s Perspective Dietmar Rösner, Manuela Kunze, Hany Mahgoub University of Magdeburg C Knowledge Based Systems and Document.
Information Extraction From Medical Records by Alexander Barsky.
Ontology Updating Driven by Events Dutch-Belgian Database Day 2012 (DBDBD 2012) November 21, 2012 Frederik Hogenboom Jordy Sangers.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
1 A Semantic Web-Based Approach for Personalizing News Flavius Frasincar Erasmus University Rotterdam * Joint work with Kim Schouten,
Ngoc Minh Le - ePi Technology Bich Ngoc Do – ePi Technology
Combining terminology resources and statistical methods for entity recognition: an evaluation Angus Roberts, Robert Gaizauskas, Mark Hepple, Yikun Guo.
Extracting Metadata for Spatially- Aware Information Retrieval on the Internet Clough, Paul University of Sheffield, UK Presented By Mayank Singh.
*Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR Rotterdam, the Netherlands † Teezir BV Wilhelminapark 46, NL-3581 NL, Utrecht, the Netherlands.
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.
Erasmus University Rotterdam Introduction Content-based news recommendation is traditionally performed using the cosine similarity and TF-IDF weighting.
Towards Cross-Language Sentiment Analysis through Universal Star Ratings KMO 2012 Malissa Bal Erasmus University Rotterdam Flavius.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Lexico-semantic Patterns for Information Extraction from Text The International Conference on Operations Research 2013 (OR 2013) Frederik Hogenboom
Ontology based Information Extraction
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Kevin Meijer, Flavius Frasincar, Frederik Hogenboom 2014.DSS. A semantic approach.
Personalized Recommendation of Related Content Based on Automatic Metadata Extraction Andreas Nauerz 1, Fedor Bakalov 2, Birgitta.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining knowledge from natural language texts using fuzzy associated concept mapping Presenter : Wu,
Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content Kalina Bontcheva, Diana Maynard, Hamish Cunningham, Horacio.
RECENT TRENDS IN METADATA GENERATION
News Recommendation with CF-IDF+
CS246: Information Retrieval
Presentation transcript:

Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September 2, 2011 Alexander Hogenboom Frederik Hogenboom Flavius Frasincar Uzay Kaymak Otto van der Meer Kim Schouten Erasmus University Rotterdam PO Box 1738, NL-3000 DR Rotterdam, the Netherlands

Introduction (1) News greatly impacts financial markets Some of many recent examples: 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)

Introduction (2) It is important to automatically and accurately identify economic events in news items in a timely manner This involves processing large amounts of heterogeneous sources of unstructured data Domain-specific information captured in domain semantics facilitates detection of relevant concepts 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)

Introduction (3) SPEED: a Semantics-based Pipeline for Economic Event Detection Our approach: –Extracts financial events from emerging news (RSS feeds) –Annotates news messages with meta-data –Aims for fast processing in order to enable real-time use 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)

SPEED: Framework 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)

SPEED: Implementation (1) Java-based pipeline using a general architecture for text engineering (GATE) GATE components used: –English Tokenizer– Part-Of-Speech Tagger –Sentence Splitter– Morphological Analyzer Adaptations and additions required: –Word Sense Disambiguation –Ontology-based components 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)

SPEED: Implementation (2) Ontology Gazetteer: –GATE uses an inefficient list of ontology concepts –We employ a look-up tree based on hash maps Word Group Look-Up: –Tree-based approach using WordNet Word Sense Disambiguator –Adaptation of the Structural Semantic Interconnections (SSI) algorithm Event Phrase Gazetteer: –Matches event concepts 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)

SPEED: Implementation (3) Event Pattern Recognition: –Based on GATE Rule Transducer, utilizing JAPE patterns –Additionally operates on event concepts Ontology Instantiator: –Retrieves event annotations in text –Creates event individuals in ontology –Updates affected concepts 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)

Evaluation (1) Word Sense Disambiguator: –Evaluated on SemCor –Original SSI: precision 53%, recall 31% –Adapted SSI: precision 59%, recall 59% Entire framework: –Evaluated on 200 news messages from Yahoo! Business & Technology feeds, annotated by three domain experts (with IAA 66% or higher) for 10 events regarding: CEOs (60) Partners (23) Revenues (22) Presidents (22) Subsidiaries (46) Profits (33) Products (136) Share values (45) Losses (27) Competitors (50) –Event instances: precision 86%, recall 81% –Fully decorated events: precision 62%, recall 53% 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)

Evaluation (2) Latency: –Total pipeline: 632 milliseconds per document –Linguistic and syntactic analysis: 30% –Word Sense Disambiguation: 60% –Remaining tasks: 10% 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)

Conclusions SPEED framework: –Components are semantically enabled –Pipeline outputs are ontology instances –Adapted SSI algorithm Evaluation underlines fast and accurate performance Future work: –Applications in algorithmic trading –Linking sentiment to discovered events (e.g., trends, moods, opinions, etc.) 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)

Questions 22nd International Conference on Database and Expert Systems Applications (DEXA 2011)