Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.

Slides:



Advertisements
Similar presentations
A Comparison Study for Novelty Control Mechanisms Applied to Web News Stories 2012 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2012)
Advertisements

RCQ-ACS: RDF Chain Query Optimization Using an Ant Colony System WI 2012 Alexander Hogenboom Erasmus University Rotterdam Ewout Niewenhuijse.
Polarity Analysis of Texts using Discourse Structure CIKM 2011 Bas Heerschop Erasmus University Rotterdam Frank Goossen Erasmus.
Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.
Semantic News Recommendation Using WordNet and Bing Similarities 28th Symposium On Applied Computing 2013 (SAC 2013) March 21, 2013 Michel Capelle
A Linguistic Approach for Semantic Web Service Discovery International Symposium on Management Intelligent Systems 2012 (IS-MiS 2012) July 13, 2012 Jordy.
Hermes: News Personalization Using Semantic Web Technologies
Exploiting Discourse Structure for Sentiment Analysis of Text OR 2013 Alexander Hogenboom In collaboration with Flavius Frasincar, Uzay Kaymak, and Franciska.
Text mining Extract from various presentations: Temis, URI-INIST-CNRS, Aster Data …
Connecting Customer Relationship Management Systems to Social Networks 7th International Conference on Knowledge Management, Services, and Cloud Computing.
Determining Negation Scope and Strength in Sentiment Analysis SMC 2011 Paul van Iterson Erasmus School of Economics Erasmus University Rotterdam
Exploiting Emoticons in Sentiment Analysis SAC 2013 Daniella Bal Erasmus University Rotterdam Flavius Frasincar Erasmus University.
Applications Chapter 9, Cimiano Ontology Learning Textbook Presented by Aaron Stewart.
Erasmus University Rotterdam Frederik HogenboomEconometric Institute School of Economics Flavius Frasincar.
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Unsupervised Information Extraction from Unstructured, Ungrammatical Data Sources on the World Wide Web Mathew Michelson and Craig A. Knoblock.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Automatically Annotating Web Pages Using Google Rich Snippets 11th Dutch-Belgian Information Retrieval Workshop (DIR 2011) February 4, 2011 Frederik Hogenboom.
Detecting Economic Events Using a Semantics-Based Pipeline 22nd International Conference on Database and Expert Systems Applications (DEXA 2011) September.
An Overview of Event Extraction from Text Workhop on Detection, Representation, and Exploitation of Events in the Semantic Web (DeRiVE'11) October 23,
News Personalization using the CF-IDF Semantic Recommender International Conference on Web Intelligence, Mining, and Semantics (WIMS 2011) May 25, 2011.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.
Sentiment Analysis with a Multilingual Pipeline 12th International Conference on Web Information System Engineering (WISE 2011) October 13, 2011 Daniëlla.
Erasmus University Rotterdam Introduction Nowadays, emerging news on economic events such as acquisitions has a substantial impact on the financial markets.
Erasmus University Rotterdam Introduction With the vast amount of information available on the Web, there is an increasing need to structure Web data in.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Exploiting Ontologies for Automatic Image Annotation M. Srikanth, J. Varner, M. Bowden, D. Moldovan Language Computer Corporation
Ontology Updating Driven by Events Dutch-Belgian Database Day 2012 (DBDBD 2012) November 21, 2012 Frederik Hogenboom Jordy Sangers.
Intelligent Database Systems Lab Presenter : YAN-SHOU SIE Authors : JEROEN DE KNIJFF, FLAVIUS FRASINCAR, FREDERIK HOGENBOOM DKE Data & Knowledge.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”
Querying Structured Text in an XML Database By Xuemei Luo.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
10/22/2015ACM WIDM'20051 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis Voutsakis.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
GTRI.ppt-1 NLP Technology Applied to e-discovery Bill Underwood Principal Research Scientist “The Current Status and.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
*Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR Rotterdam, the Netherlands † Teezir BV Wilhelminapark 46, NL-3581 NL, Utrecht, the Netherlands.
1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,
Benchmarking ontology-based annotation tools for the Semantic Web Diana Maynard University of Sheffield, UK.
Semantics-Based News Recommendation with SF-IDF+ International Conference on Web Intelligence, Mining, and Semantics (WIMS 2013) June 13, 2013 Marnix Moerland.
Erasmus University Rotterdam Introduction Content-based news recommendation is traditionally performed using the cosine similarity and TF-IDF weighting.
Towards Cross-Language Sentiment Analysis through Universal Star Ratings KMO 2012 Malissa Bal Erasmus University Rotterdam Flavius.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National.
Lexico-semantic Patterns for Information Extraction from Text The International Conference on Operations Research 2013 (OR 2013) Frederik Hogenboom
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Kevin Meijer, Flavius Frasincar, Frederik Hogenboom 2014.DSS. A semantic approach.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Semantics-Based News Recommendation International Conference on Web Intelligence, Mining, and Semantics (WIMS 2012) June 14, 2012 Michel Capelle
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
SERVICE ANNOTATION WITH LEXICON-BASED ALIGNMENT Service Ontology Construction Ontology of a given web service, service ontology, is constructed from service.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Guangbing Yang Presentation for Xerox Docushare Symposium in 2011
Bing-SF-IDF+: A Hybrid Semantics-Driven News Recommender
Applying Key Phrase Extraction to aid Invalidity Search
CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.
Enriching Taxonomies With Functional Domain Knowledge
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Presentation transcript:

Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering (WISE 2011) October 14, 2011 Jeroen de Knijff Kevin Meijer Flavius Frasincar Frederik Hogenboom Erasmus University Rotterdam PO Box 1738, NL-3000 DR Rotterdam, the Netherlands ;)

Introduction (1) An increasing amount of documents is digitally stored on the Web Documents can be structured through taxonomies Many documents are unstructured, hence driving the need for taxonomy construction 12th International Conference on Web Information System Engineering (WISE 2011)

Introduction (2) Taxonomy construction: –Manually: More accurate Main method –Automatic: Less knowledge needed Less time consuming Taxonomy construction enables inter operability between Web sites, tools, etc. due to the knowledge aggregation into shared taxonomies 12th International Conference on Web Information System Engineering (WISE 2011)

Introduction (3) 12th International Conference on Web Information System Engineering (WISE 2011) W h a t ’ s n e w ?

Introduction (4) Taxonomy construction is a mature and widely researched topic Little literature exists on applying Word Sense Disambiguation (WSD), even though WSD improves results of used techniques like clustering! Hence, we propose the Automatic Taxonomy Construction from Text (ATCT) framework, which implements WSD 12th International Conference on Web Information System Engineering (WISE 2011)

ATCT: Framework (1) 12th International Conference on Web Information System Engineering (WISE 2011)

ATCT: Framework (2) 12th International Conference on Web Information System Engineering (WISE 2011) Term extraction: –Part-of-Speech (POS) tagging –All nouns are extracted Term filtering: –Based on domain pertinence and lexical cohesion –Most relevant terms are subsequently selected through a score, based on domain pertinence, domain consensus and structural relevance Importance of term: term freq. corpus Importance of term: appearance (position) in document Relevance w.r.t. target domain: term freq. domain corpus / term freq. contrastive corpus Relevance w.r.t. target domain: term freq. domain corpus / term freq. contrastive corpus Cohesion among words in compound nouns: (# words × term freq. corpus × log(term freq.)) / word freq. corpus

ATCT: Framework (3) 12th International Conference on Web Information System Engineering (WISE 2011) Word Sense Disambiguation: –Optional step –Synsets are retrieved from a semantic lexicon –Structural Semantic Interconnections (SSI) –Utilizes a similarity measure that is proposed by Jiang and Conrath (1997) –Terms with similar senses are removed –Term counts are aggregated per concept

ATCT: Framework (4) 12th International Conference on Web Information System Engineering (WISE 2011) Concept hierarchy creation: –Based on the subsumption algorithm, which determines potential parents (subsumers) of concepts: x potentially subsumes y, if: 1)x appears in at least the proportion t of all documents in which y appears 2)y appears in less than the proportion t of all documents in which x appears –Additionally takes into account ancestor positions: Weighting scheme based on the number of layers between terms x and y Close parents get assigned more weight

ATCT: Framework (5) 12th International Conference on Web Information System Engineering (WISE 2011) Concept hierarchy creation (cont’d): –Evaluating taxonomy concepts is not trivial: Reference taxonomy: Generated taxonomy:

ATCT: Framework (6) 12th International Conference on Web Information System Engineering (WISE 2011) Concept hierarchy creation (cont’d): –Look at senses through taxonomy concept disambiguation: Similar to term WSD from text, but now surrounding concepts are used instead of surrounding words Terms with single sense for lexicon are disambiguated Other terms are disambiguated using their surrounding terms: –Concept neighborhood of 2 (up/down) –Root node is disregarded Lexicon senses are compared In case no sense is available (e.g., compound nouns): –Lexical matching –Descendant / ancestor comparison Graph distances are calculated

ATCT: Implementation Java-based pipeline Noun parsing with the Stanford parser RDF implementation using Jena Domain taxonomies are expressed in SKOS 12th International Conference on Web Information System Engineering (WISE 2011)

Evaluation (1) Data: –Economics & management: 25,000 abstracts from RePub & RePEc 2,000 distinct concepts Golden taxonomy using STW Thesaurus annotations –Medicine & health: 10,000 abstracts from RePub 1,000 distinct concepts Golden taxonomy using MeSH annotations Measures: –Precision –Recall –F-measure 12th International Conference on Web Information System Engineering (WISE 2011)

Evaluation (2) DomainTaxonomyPrecisionRecallF-Measure E&MWithout WSD With WSD M&HWithout WSD With WSD th International Conference on Web Information System Engineering (WISE 2011)

Conclusions ATCT framework: –Extracts potential taxonomy terms from large corpora –Filters relevant terms –Performs WSD to remove redundant terms –Creates a taxonomy using a subsumption method Evaluation shows performance improvement when using WSD (up to 12.12%) Future work: –Benchmark against other taxonomy creation methods (hierarchical clustering, classification, etc.) –Explore other domains (law, chemistry, physics, history, etc.) 12th International Conference on Web Information System Engineering (WISE 2011)

Questions 12th International Conference on Web Information System Engineering (WISE 2011)