An Overview of Event Extraction from Text Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11) October 23,

Slides:



Advertisements
Similar presentations
Chapter 11 user support. Issues –different types of support at different times –implementation and presentation both important –all need careful design.
Advertisements

Polarity Analysis of Texts using Discourse Structure CIKM 2011 Bas Heerschop Erasmus University Rotterdam Frank Goossen Erasmus.
New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.
Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle
PolyAnalyst Data and Text Mining tool Your Knowledge Partner TM www
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
Exploiting Discourse Structure for Sentiment Analysis of Text OR 2013 Alexander Hogenboom In collaboration with Flavius Frasincar, Uzay Kaymak, and Franciska.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Determining Negation Scope and Strength in Sentiment Analysis SMC 2011 Paul van Iterson Erasmus School of Economics Erasmus University Rotterdam
Exploiting Emoticons in Sentiment Analysis SAC 2013 Daniella Bal Erasmus University Rotterdam Flavius Frasincar Erasmus University.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Erasmus University Rotterdam Frederik HogenboomEconometric Institute School of Economics Flavius Frasincar.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Designing Help… Mark Johnson Providing Support Issues –different types of support at different times –implementation and presentation both important.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Automatically Annotating Web Pages Using Google Rich Snippets 11th Dutch-Belgian Information Retrieval Workshop (DIR 2011) February 4, 2011 Frederik Hogenboom.
Web Mining Research: A Survey
The Use of Corpora for Automatic Evaluation of Grammar Inference Systems Andrew Roberts & Eric Atwell Corpus Linguistics ’03 – 29 th March Computer Vision.
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.
News Personalization using the CF-IDF Semantic Recommender International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011) May 25, 2011.
A Survey of Approaches on Mining the Structure from Unstructured Data Dutch-Belgian Database Day 2009 (DBDBD 2009) 1 Nov. 30, 2009 Frederik Hogenboom
Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Sentiment Analysis with a Multilingual Pipeline 12th International Conference on Web Information System Engineering (WISE 2011) October 13, 2011 Daniëlla.
Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.
Erasmus University Rotterdam Introduction With the vast amount of information available on the Web, there is an increasing need to structure Web data in.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST Kick-off.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
*Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR Rotterdam, the Netherlands † Teezir BV Wilhelminapark 46, NL-3581 NL, Utrecht, the Netherlands.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.
Erasmus University Rotterdam Introduction Content-based news recommendation is traditionally performed using the cosine similarity and TF-IDF weighting.
Towards Cross-Language Sentiment Analysis through Universal Star Ratings KMO 2012 Malissa Bal Erasmus University Rotterdam Flavius.
Uncertainty Management in Rule-based Expert Systems
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Lexico-semantic Patterns for Information Extraction from Text The International Conference on Operations Research 2013 (OR 2013) Frederik Hogenboom
1 Information Retrieval LECTURE 1 : Introduction.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle
Of An Expert System.  Introduction  What is AI?  Intelligent in Human & Machine? What is Expert System? How are Expert System used? Elements of ES.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Rinke Hoekstra Use of OWL in the Legal Domain Statement of Interest OWLED 2008 DC, Gaithersburg.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Sentiment analysis algorithms and applications: A survey
Siemens Enables Digitalization: Data Analytics & Artificial Intelligence Dr. Mike Roshchin, CT RDA BAM.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Extracting Semantic Concept Relations
Ontology-Based Aspect Detection for Sentiment Analysis
Chapter 11 user support.
Ontology-Enhanced Aspect-Based Sentiment Analysis
Presentation transcript:

An Overview of Event Extraction from Text Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11) October 23, 2011 Frederik Hogenboom Flavius Frasincar Uzay Kaymak Franciska de Jong Erasmus University Rotterdam PO Box 1738, NL-3000 DR Rotterdam, the Netherlands ;)

Introduction (1) Increasing amount of (digital) data Utilizing extracted information in decision making processes becomes increasingly urgent and difficult: –Too much data for manual extraction –Yet most data is initially unstructured –Data often contains natural language –Automation is a non-trivial task Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11)

Introduction (2) Information Extraction (IE) –Multiple sources: News messages Blogs Papers … –Text Mining (TM): information learning from pre-processed text: Natural Language Processing (NLP) Statistics … –Specific type of information that can be extracted: events Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11)

Events (1) Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11)

Events (2) Event: –Complex combination of relations linked to a set of empirical observations from texts –Can be defined as: e.g., Event extraction could be beneficial to IE systems: –Personalized news –Risk analysis –Monitoring –Decision making support Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11)

Events (3) Common event domains: –Medical –Finance –Politics –Environment Which Text Mining techniques are appropriate for event extraction? Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11)

Aims Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11) Provide general guidelines on selecting the proper text mining techniques for specific event extraction tasks, taking into account the user and its context Focus: –Event extraction from text –No space/time event dimensions Criteria: –Required amount of data –Required amount of domain knowledge –Required amount of user expertise –Interpretability of results High / medium / low

Event Extraction In analogy with the classic distinction within the field of modeling, we distinguish 3 main approaches: –Data-driven event extraction: Statistics Machine learning Linear algebra … –Expert knowledge-driven event extraction: Representation & exploitation of expert knowledge Patterns –Hybrid event extraction: Combine knowledge and data-driven methods Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11)

Data-Driven Event Extr. (1) Facts: –Commonly used –Rely solely on quantitative methods to discover relations –Require large text corpora for developing models that approximate linguistic phenomena –Methods: Statistical reasoning: –Word frequencies –Ranking (TF-IDF) –N-grams –Clustering Probabilistic modeling Information theory Linear algebra Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11)

Data-Driven Event Extr. (2) Examples: Considerations: –Meaning is not dealt with explicitly –Large amount of data required +No linguistic resources are required +No expert (domain) knowledge is needed Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11) ApproachMethodEventsDataKnow. Exp. Int. Okamoto et al. (2009)Hierarchical clusteringLocalMed Low Liu et al. (2008)Graphs, clusteringNewsHigh Low Tanev et al. (2008)ClusteringViolence & disaster news Med Low Lei et al. (2005)Support Vector MachinesNewsHigh Low

Knowledge-Driven Event Extr. (1) Facts: –Often based on manually created / discovered patterns that express rules representing expert knowledge –Based on linguistic, lexicographic, and human knowledge –Lexico-syntactic (frequent) vs. lexico-semantic patterns (less frequent) Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11)

Knowledge-Driven Event Extr. (2) Examples: Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11) ApproachMethodEventsDataKnow. Exp. Int. Nishihara et al. (2009)Lexico-SyntacticPersonal experiences LowMedHighMed Aone et al. (2000)Lexico-SyntacticGeneralLowHigh Med Yakushiji et al. (2001)Lexico-SyntacticBiomedicalLowMedHighMed Hung et al. (2010)Lexico-SyntacticCommonsense knowledge LowMedHighMed Xu et al. (2006)Lexico-SyntacticPrize awardLowMedHigh Li et al. (2002)Lexico-SemanticFinancialLowHigh Med Cohen et al. (2009)Lexico-SemanticBiomedicalMedHigh Vargas-Vera et al. (2004)Lexico-SemanticKMi newsLowHigh

Knowledge-Driven Event Extr. (3) Considerations: –Lexical knowledge and/or prior domain knowledge required –Definition and maintenance of patterns is more difficult (consistency and costs) +Less training data required than for data-driven approaches +Powerful expressions with lexical, syntactical, and semantic elements make results easily interpretable and traceable +Patterns are useful when one needs to extract very specific information Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11)

Hybrid Event Extr. (1) Facts: –Difficult to stay within boundaries of event extraction approach –Usually, an approach can be considered as mainly data-driven or mainly knowledge-driven –However, an increasing number of researchers equally combine both approaches –Most systems are knowledge-driven, aided by data-driven methods: Solve the lack of expert knowledge Apply bootstrapping Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11)

Hybrid Event Extr. (2) Examples: Considerations: –Large amount of data required –Increased complexity requires expertise +Less domain knowledge needed +Interpretability of results Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11) ApproachMethodEventsDataKnow. Exp. Int. Jungermann et al. (2008)Lexico-Syntactic, graphs German parliament Med HighMed Piskorski et al. (2007)Lexico-Semantic, clustering Violent newsHighMed Chun et al. (2004)Lexico-Syntactic, co-occurences BiomedicalMed Lee et al. (2003)Ontology-based POS tagging Chinese newsN/AMed Low

Discussion Data requirements: –Data-driven: > 10,000 documents –Knowledge-driven: 100 – 1,000 documents –Hybrid methods: < 10,000 documents Interpretability: –Data-driven: low –Knowledge-driven: high (especially lexico-semantic patterns) –Hybrid: medium Domain knowledge & expertise: –Data-driven approaches require less than knowledge-driven and hybrid methods Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11)

Conclusions Knowledge-driven approaches: –For casual users (e.g., students) –Interactive, query-driven approach –Domain knowledge and expertise should be readily available –Patterns close to natural language –Little statistical details & model fine-tuning Data-driven & hybrid approaches: –For advanced users (e.g., researchers) –Less restrictions by, for example, grammars Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11)

Questions Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11)