Aishwarya Singh (aish2503@seas.upenn.edu) 03/18/2019 Acquiring Comparative Commonsense Knowledge from the Web Niket Tandon, Gerard de Melo, Gerhard Weikum.

Slides:

Advertisements

Similar presentations

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

Advertisements

Improved TF-IDF Ranker

TEXTRUNNER Turing Center Computer Science and Engineering

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.

Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.

Towards the automatic identification of adjectival scales: clustering adjectives according to meaning Authors: Vasileios Hatzivassiloglou and Kathleen.

Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )

1 Statistical NLP: Lecture 10 Lexical Acquisition.

Higher-Level Cognitive Processes

An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.

Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.

Date : 2014/09/18 Author : Niket Tandon, Gerard de Melo, Fabian Suchanek, Gerhard Weikum Source : WSDM’14 Advisor : Jia-ling Koh Speaker : Shao-Chun Peng.

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.

Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.

WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.

SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.

Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.

Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.

1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.

Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.

FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.

Chapter 6 - Standardized Measurement and Assessment

Knowledge Structure Vijay Meena ( ) Gaurav Meena ( )

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

URBDP 591 A Lecture 16: Research Validity and Replication Objectives Guidelines for Writing Final Paper Statistical Conclusion Validity Montecarlo Simulation/Randomization.

Annotating and measuring Temporal relations in texts Philippe Muller and Xavier Tannier IRIT,Université Paul Sabatier COLING 2004.

Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.

From Frequency to Meaning: Vector Space Models of Semantics

1 Commonsense Reasoning in and over Natural Language Hugo Liu Push Singh Media Laboratory Massachusetts Institute of Technology Cambridge, MA 02139, USA.

Confidence Intervals.

DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014

Exploiting Wikipedia as External Knowledge for Document Clustering

Text Based Information Retrieval

Chapter 5. The Bootstrapping Approach to Developing Reinforcement Learning-based Strategies in Reinforcement Learning for Adaptive Dialogue Systems, V.

Kiril Simov1, Alexander Popov1, Iliana Simova2, Petya Osenova1

Vector-Space (Distributional) Lexical Semantics

Summarizing Entities: A Survey Report

Preparing for the Verbal Reasoning Measure

Applying Key Phrase Extraction to aid Invalidity Search

Category-Based Pseudowords

CSc4730/6730 Scientific Visualization

Weakly Learning to Match Experts in Online Community

Acquiring Comparative Commonsense Knowledge from the Web

Extracting Semantic Concept Relations

CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.

Effective Entity Recognition and Typing by Relation Phrase-Based Clustering

CS246: Information Retrieval

Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou

Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.

Unsupervised Learning of Narrative Schemas and their Participants

How Much is 131 Million Dollars

The Winograd Schema Challenge Hector J. Levesque AAAI, 2011

VERB PHYSICS: Relative Physical Knowledge of Actions and Objects

Statistical NLP: Lecture 10

Presentation transcript:

Aishwarya Singh (aish2503@seas.upenn.edu) 03/18/2019 Acquiring Comparative Commonsense Knowledge from the Web Niket Tandon, Gerard de Melo, Gerhard Weikum 28th AAAI Conference on Artificial Intelligence, 2014 Aishwarya Singh (aish2503@seas.upenn.edu) 03/18/2019

Problem & Motivation Genuine machine intelligence requires drawing commonsense inferences from large amounts of data. This was the first attempt to harvest large amounts of disambiguated comparative knowledge from the Web. Steel is sharper than wood. Done by extracting triplets from the Web corpora of the form (Noun, Adjective, Noun): (steel, sharper, wood). These triplets are sense-disambiguated using WordNet: (steel-noun-2, sharper-adj-2, wood-noun-1) Genuine machine intelligence requires drawing commonsense inferences from large amounts of data. This was the first attempt to harvest large amounts of disambiguated comparative knowledge from the Web. Say we have the example, steel is sharper than wood -- extracting triplets of the form (Noun, Adj, Noun). This isn’t enough as it’s still ambiguous which meaning of sharp we’re referring to. -- disambiguation is done using WordNet senses where each of these numbers is the sense id.

Problem & Motivation Not just restricted to WordNet nouns- Ad-hoc concepts Disambiguated adjective-noun and noun-noun combinations young lions, vegetable dumpling Covers a wider range of commonsense phenomena: “lions can be young”, “dumplings can be made of vegetables” They try to cover a wider range of commonsense phenomena by using ad hoc concepts. These are disambiguated adjective-noun and noun-noun combinations that do not occur in wordnet such as young lions and vegetable dumplings. This tells us that lions can be young and that dumplings can be made of vegetables.

Contents: Previous approaches Contributions of this work Methodology Open Information Extraction Disambiguation Local Model Joint Model Triple Organization Experiments Baselines and Results Conclusions Shortcomings and Future Work I’ll speak about the methodology in the first half and then move on to the evaluation.

Previous approaches (SoTA) Knowledge Harvesting: YAGO (Hoffart et al. 2011), OpenIE (Banko et al. 2007) cover only a small amount of relations Need for more domain specific knowledge: Temporal (Kozareva et al. 2011), Psychology (Rashkin et al. 2018), Spatial (Collell et al. 2018) Comparative Knowledge: Until this paper, large scale data had not been used. Jindal and Liu (2006) identified potential comparative sentences and their elements using sequence mining techniques. Cao et al. (2010) extracted comparative observations using manually defined patterns. We have already seen Quite a few state of the art models in the field of knowledge harvesting such as YAGO And TextRunner. A big shortcoming was that they covered a small amount of relations only which calls for more domain specific knowledge. We’ve seen extraction of psychology and temporal knowledge already and will see a paper on Spatial knowledge next week. In the field of comparative knowledge up until this paper, large scale data had not been used to mine knowledge. Other works used sequence mining techniques and manually defined patterns.

Contributions of this work Constructed a comparative knowledge base by: Using Open Information Extraction from Web corpus to obtain observed textual facts Developed a Joint Optimization Model using ILP to disambiguate and semantically organize this extracted data with respect to a lexical knowledge base: WordNet. The main contribution of the work was the comparative knowledge base they created using Open IE from Web corpus to obtain textual facts They also disambiguated and organized this data with respect to WordNet using a Joint Optimization Model.

METHODOLOGY

Open Information Extraction All triples matching the template: (Noun Phrase) + (Comparative Phrase) + (Noun Phrase) extracted from the corpus Noun Phrases: WordNet nouns (water, dark chocolate), adjective-noun pairs (cold water) and noun-noun pairs (football manager) Comparative Phrases: Adjectives. Also captures negation, modifiers. Based on Hadoop MapReduce (distributed hardware clusters) due to large size of Web corpora. Output: (Noun-1 + Relation + Noun-2 + Frequency + Direction) (car, bigger, dog, 10, >) - In the first phase of OpenIE, all triples matching the template (NP, Comp Phrase, NP) were extracted. - Noun phrases could be word net nouns such as water/dark chocolate or ad-hoc concepts (cold water/football manager) - Comparative phrases were adjectives and their extraction phase also accounted for negation and modifiers. - It was based on Hadoop MapReduce owing to the large size of the web corpora - The output of this phase is given as a quintuple consisting of (Noun-1 + Relation + Noun-2 + Frequency + Direction)

Disambiguation Original extractions have ambiguity. “richer than”: money or calories Goal of this phase: Original Triple (string)  Relevant Grounded Triple (disambiguated word sense IDs) Achieved using: Local Model: assumes independence between triples. Joint Optimization Model: does not assume independence between triples. The second phase is that of disambiguation Original extractions have a certain degree of ambiguity The goal of this phase is to take the original triple and give the relevant grounding for it in the form of disambiguated word sense IDs. This is achieved through 2 methods A local model that assumes that triples are independent of each other A Joint Optimization Model that tries to leverage the fact that triples can be related and depend on each other

Internal Coherence (Taxonomic relatedness) Local Model Most likely disambiguation is chosen on the basis of: Highest internal coherence: picks similar word senses within grounded triple Most frequent senses: WordNet orders senses by frequency 𝑠𝑐𝑜𝑟𝑒 𝑛 1 , 𝑎, 𝑛 2 = 𝜏 𝑁𝑁 𝑛 1 , 𝑛 2 + 𝜏 𝑁𝑁 𝑛 1 , 𝑎 + 𝜏 𝑁𝑁 𝑛 2 , 𝑎 + ∅ 𝑛 1 ∗ , 𝑛 1 + ∅ 𝑛 2 ∗ , 𝑛 2 + ∅ 𝑎 ∗ , 𝑎 𝑠𝑐𝑜𝑟𝑒 steel−2,sharper−2,wood−1 = 𝜏 𝑁𝑁 steel−2,wood−1 + 𝜏 𝑁𝑁 steel−2,sharper−2 + 𝜏 𝑁𝑁 wood−1,sharper−2 + 1 (1+2) + 1 (1+1) + 1 (1+2) Internal Coherence (Taxonomic relatedness) 𝑃𝑟𝑖𝑜𝑟= 1 1+𝑆𝑒𝑛𝑠𝑒 𝑅𝑎𝑛𝑘 The local model chooses the most likely disambiguation on the basis of highest internal coherence which picks senses that have similar context within the triple. - Also wordnet ranks senses by the frequency with which they are used so the model gives priority to popular senses - Internal coherence for two nouns assumes that if words occur in the same context, the appropriate senses will have minimimum distance between them. E.g. the correct disambiguation steel-2 and wood-1 will have shortest path - Internal coherence between nouns and adjectives is given by the overlap between their WordNet glosses E.g. The gloss of key could say “metal device that can be inserted into an appropriate lock” so we know that key, insert, lock can occur in the same context. - The prior to prioritize most frequent sense is computed as 1/1+wordnet sense rank.

Local Model Shortcomings: Solution: Joint Model Two or more groundings could have same score Does not consider dependency between triples (car, faster, bike) highly correlated with disambiguation of related triples (bike, slower, automobile) Solution: Joint Model Similarly grounded triples collectively aid in disambiguation. Encourages high coherence across chosen grounded triples There are shortcomings of the local model where two or more groundings can have the same score without a winner. Does not consider dependency between triples. (car, faster, bike) is closely related to the disambiguation of (bike, slower, automobile) The solution proposed is that of a joint model which encourages not just high internal coherence within the triple but also high coherence across all chosen groundings.

Joint Model Grounding can be accepted or rejected Local model score If 𝑥 𝑖𝑗 =1 accept grounding 𝜏 score between 2 groundings 𝐵 𝑖𝑗,𝑘𝑙 accepts 2 groundings simultaneously Prefer fewer senses of adj/noun across the graph Grounding can be accepted or rejected Accept at the most one grounding (0 or 1) I𝑓 𝑥 𝑖𝑗 = 𝑥 𝑘𝑙 =1 ( i.e, coherence is highest), ensures 𝐵 𝑖𝑗,𝑘𝑙 =1 Ensures at least one word sense per word is chosen If grounding is accepted, mark senses as accepted. Ties semantically equivalent groundings together by accepting them This objective is maximized subject to these constraints and groundings where xij =1 are selected - The first component encodes the local model through the internal coherence score and frequent senses. Xij is the variable that captures where a grounding is accepted or not. The second component allows for dependency between triples by encouraging that the grounding accepted is similar to other groundings that have been accepted. Sim-ij,kl is similar to internal coherence but calculates the coherence between two groundings. Bijkl reflects whether two groundings have been simultaneously accepted. The third and fourth encourage us to pick fewer senses across the graph, thus increasing dependency between groundings. Resultant are disambiguated inherently coherent triples as well as coherent with groundings selected for related observations.

Triple Organization All semantically equivalent grounding grouped together in csynsets (analogous to WordNet synsets) (car, faster, bike), (bicycle, slower, automobile), (car, speedier, cycle) Cars are generally faster than bikes Equivalent triplets are determined using: Synonymy: (car, faster, bike), (car, speedier, cycle) Antonymy: (car, faster, bike), (bicycle, slower, automobile) Negation: treat negated adjectives like antonyms - All semantically equivalent grounding grouped together into csynsets analogous to wordnet synsets - All triples basically encode Cars are generally faster than bikes - These triples are grouped together on the basis of synonymy, antonymy and negation

Experiments The extraction system was run on two Web corpora to compile triples from each: ClueWeb09 (504 million web pages)  488,055 triples ClueWeb12 (773 million web pages)  781,216 triples To evaluate, 100 observations were sampled for each of three sampling sets: WN: both noun phrases of the triple appear in WordNet Ad-hoc: both noun phrases do not occur in WordNet WN/ad-hoc: One noun phrase in WordNet, other isn’t. Manual disambiguation was performed for each triple by selecting the best WordNet sense according to the annotators.

Baselines and Results Baselines considered for evaluation: Most-frequent sense heuristic (MFS): Standard baseline for disambiguation tasks. Local model: Incorporates intra-grounding coherence. Ignores coherence across triples. Accuracy scores (95% Wilson confidence intervals) Local model outperforms the MFS baseline by a small margin Failure case: (tiger, fast, auto)  (tiger-1 (strongperson) , fast-1, auto-1) Joint ILP model is the clear winner!! Takes into account additional information about other related triples. E.g. (pork, more tender, beef) Model can infer that the ambiguous adjective “tender”, with 8 total senses in Wordnet, is not used in its initial senses but in its fifth sense. Approach Accuracy (All sets) MFS 0.43±0.05 Local Model 0.47±0.09 Joint Model 0.83±0.04

Conclusions Utilized open information extraction methods to obtain large amounts of disambiguated comparisons from the Web. The knowledge base provides a million disambiguated comparative assertions. Scalable and does not require any labeled data. Could work with any corpora and lexical resource

Shortcomings and Future Work Performed poor evaluation on the disambiguated triples Used only 100 triples each for 3 sample sets even though they harvested a million triples. Limitation of Web extraction and their model: Most trivial commonsense knowledge is rarely stated explicitly in text on the Web Unlikely to find a sentence that says: “car is bigger than an orange” Their model does not extend to zero-shot settings for never seen before instances. Will not be able to give relation (car, bigger, orange) Future Work: Extend model to work for zero-shot settings. Supplementing with methods that mine comparable commonsense from numerical data Computational Advertisement: Querying against adjective to retrieve noun arguments and create metaphors.