How Much is 131 Million Dollars

Slides:



Advertisements
Similar presentations
Measurement, Evaluation, Assessment and Statistics
Advertisements

Overview of the TAC2013 Knowledge Base Population Evaluation: Temporal Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji,
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Introduction to Machine Learning Approach Lecture 5.
OMAP: An Implemented Framework for Automatically Aligning OWL Ontologies SWAP, December, 2005 Raphaël Troncy, Umberto Straccia ISTI-CNR
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
STAT 211 – 019 Dan Piett West Virginia University Lecture 1.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
NLP And The Semantic Web Dainis Kiusals COMS E6125 Spring 2010.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Bloom’s Critical Thinking Questioning Strategies A Guide to Higher Level Thinking Ruth SundaKyrene de las Brisas.
Bloom’s Critical Thinking Questioning Strategies A Guide to Higher Level Thinking Adapted from Ruth Sunda and Kyrene de las Brisas.
Deep Questions without Deep Understanding
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
Oral Health Training & Calibration Programme
Ensembling Diverse Approaches to Question Answering
Sampling.
CS 388: Natural Language Processing: LSTM Recurrent Neural Networks
How Much is $131 Million? Putting Numbers in Perspective with Compositional Descriptions. Arun Tejasvi Chaganty, Percy Liang. Stanford University. Hello.
Effects of User Similarity in Social Media Ashton Anderson Jure Leskovec Daniel Huttenlocher Jon Kleinberg Stanford University Cornell University Avia.
A Brief Introduction to Distant Supervision
CRF &SVM in Medication Extraction
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Neural Machine Translation by Jointly Learning to Align and Translate
Attention Is All You Need
and Knowledge Graphs for Query Expansion Saeid Balaneshinkordan
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Learning to Rank Shubhra kanti karmaker (Santu)
Fenglong Ma1, Jing Gao1, Qiuling Suo1
Applying Key Phrase Extraction to aid Invalidity Search
MEASURES OF CENTRAL TENDENCY
Using the Slope Formula
iSRD Spam Review Detection with Imbalanced Data Distributions
MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.
Explaining the Methodology : steps to take and content to include
Searching with context
Geometric sequences.
Numerical Descriptive Statistics
Intent-Aware Semantic Query Annotation
Lesson 1.2 Functions Essential Question: What is a function? How do you represent a function? What are the characteristics of a function?
Conjoint analysis.
Leveraging Textual Specifications for Grammar-based Fuzzing of Network Protocols Samuel Jero, Maria Leonor Pacheco, Dan Goldwasser, Cristina Nita-Rotaru.
Enriching Taxonomies With Functional Domain Knowledge
Word embeddings (continued)
Attention for translation
Rachit Saluja 03/20/2019 Relation Extraction with Matrix Factorization and Universal Schemas Sebastian Riedel, Limin Yao, Andrew.
Jointly Generating Captions to Aid Visual Question Answering
VERB PHYSICS: Relative Physical Knowledge of Actions and Objects
Topic: Semantic Text Mining
Modeling IDS using hybrid intelligent systems
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Embedding based entity summarization
Visual Grounding.
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Nivedha Sivakumar (nivedha@seas.upenn.edu) 4/1/19 How Much is 131 Million Dollars? Putting Numbers in Perspective with Compositional Descriptions Arun Tejasvi Chaganty & Percy Liang, 2016 Nivedha Sivakumar (nivedha@seas.upenn.edu) 4/1/19

Cristiano Ronaldo was acquired by Madrid for $131 million Problem & Motivation Cristiano Ronaldo was acquired by Madrid for $131 million How large is $131 million dollars?

Problem & Motivation $131 million Relative comparisons improve comprehension significantly About the cost to employ everyone in Texas over a lunch period Difficult to comprehend scale of large/small absolute numeric mentions $131 million

Problem & Motivation $131 million Numeric mention in a sentence Relative comparisons improve comprehension significantly About the cost to employ everyone in Texas over a lunch period Difficult to comprehend scale of large/small absolute numeric mentions $131 million Perspective generated

Problem & Motivation 601,000 ounces of platinum Relative comparisons improve comprehension significantly About 4 times the weight of an elephant Difficult to comprehend scale of large/small absolute numeric mentions 601,000 ounces of platinum

Problem & Motivation 60 million guns Relative comparisons improve comprehension significantly About twice the gun ownership of the population of Texas Difficult to comprehend scale of large/small absolute numeric mentions 60 million guns

Contents: Previous approaches Contributions of this work Process flow Dataset construction Formula selection Description generation Results & Analysis Conclusions & Shortcomings Future work

Previous approaches Manually generated perspectives Barrio et al., 2016: Improving the comprehension of numbers in the news Manually generated perspectives G. Chiacchieri, 2013: www.dictionaryofnumbers.com Present facts as is from knowledge base Liu et al., 2015: Table-to-text generation by structure-aware Seq2seq learning Hosseini et al., 2015: Learning to solve arithmetic word problems with verb categorization Seq2seq based description generation

Contributions of this work Approach Generate perspectives from numeric mentions in sentences by composing facts from a knowledge base (K) Proposed system - Formula selection: Given numeric mention and K, select formula over K with same value and unit as mention - Description generation: Given formula, generate natural language descriptions Performance analysis - Formula selection:15.2% F1 score improvement - Description generation: 12.5 BLEU point improvement

Contributions of this work Approach Generate perspectives from numeric mentions in sentences by composing facts from a knowledge base (K) Proposed system - Formula selection: Given numeric mention and K, select formula over K with same value and unit as mention - Description generation: Given formula, generate natural language descriptions Performance analysis - Formula selection:15.2% F1 score improvement - Description generation: 12.5 BLEU point improvement Given: Cristiano Ronaldo was acquired by Madrid for $131 million

Contributions of this work Approach Generate perspectives from numeric mentions in sentences by composing facts from a knowledge base (K) Proposed system - Formula selection: Given numeric mention and K, select formula over K with same value and unit as mention - Description generation: Given formula, generate natural language descriptions Performance analysis - Formula selection:15.2% F1 score improvement - Description generation: 12.5 BLEU point improvement Given: Cristiano Ronaldo was acquired by Madrid for $131 million Formula: 131 million ~ 1.0*cost of employee * population of Texas * time taken for lunch

Contributions of this work Approach Generate perspectives from numeric mentions in sentences by composing facts from a knowledge base (K) Proposed system - Formula selection: Given numeric mention and K, select formula over K with same value and unit as mention - Description generation: Given formula, generate natural language descriptions Performance analysis - Formula selection:15.2% F1 score improvement - Description generation: 12.5 BLEU point improvement Given: Cristiano Ronaldo was acquired by Madrid for $131 million Formula: 131 million ~ 1.0*cost of employee * population of Texas * time taken for lunch About the cost to employ everyone in Texas over a lunch period

Contributions of this work Approach Generate perspectives from numeric mentions in sentences by composing facts from a knowledge base (K) Proposed system - Formula selection: Given numeric mention and K, select formula over K with same value and unit as mention - Description generation: Given formula, generate natural language descriptions Performance analysis - Formula selection:15.2% F1 score improvement - Description generation: 12.5 BLEU point improvement

Process Flow Sentence: The billings-based stillwater mining produced 601,000 ounces of platinum Formula: 4 x weight of an elephant That’s about: 4 times the weight of an elephant

Dataset Construction Collect knowledge base Collect numeric mentions 142 tuples with 9 fundamental units where t = (t.value, t.unit, t.description) from US Bureau of Statistics and Wikipedia Collect numeric mentions 2,041 normalized numeric mentions from newswire section of LDC catalog Generate formulae With knowledge base as a graph, traverse all paths that yield desired final unit to generate formulae Collect formulae description Train a language generation system Collect formula preference data Rated through a simple majority S

Dataset Construction Collect knowledge base 142 tuples with 9 fundamental units where t = (t.value, t.unit, t.description) from US Bureau of Statistics and Wikipedia Collect numeric mentions 2,041 normalized numeric mentions from newswire section of LDC catalog Generate formulae With knowledge base as a graph, traverse all paths that yield desired final unit to generate formulae Collect formulae description Train a language generation system Collect formula preference data Rated through a simple majority Input: Knowledge base, final unit - money, time, etc., Formula generation is exhaustive! S

Dataset Construction Collect knowledge base 142 tuples with 9 fundamental units where t = (t.value, t.unit, t.description) from US Bureau of Statistics and Wikipedia Collect numeric mentions 2,041 normalized numeric mentions from newswire section of LDC catalog Generate formulae With knowledge base as a graph, traverse all paths that yield desired final unit to generate formulae Collect formulae description Train a language generation system Collect formula preference data Rated through a simple majority Each vertex is a ‘unit’ and includes all tuples with this unit S

Dataset Construction Collect knowledge base Collect numeric mentions 142 tuples with 9 fundamental units where t = (t.value, t.unit, t.description) from US Bureau of Statistics and Wikipedia Collect numeric mentions 2,041 normalized numeric mentions from newswire section of LDC catalog Generate formulae With knowledge base as a graph, traverse all paths that yield desired final unit to generate formulae Collect formulae description Train a language generation system Collect formula preference data Rated through a simple majority S

Formula Selection & Ranking Criteria Given numeric mention and knowledge base K, select formula over K with the same value and unit as mention Proximity Perspective same order of magnitude as mention Pick formulae in range [1/100, 100] Familiarity Perspective should be composed of concepts familiar to reader Population of Texas VS Angola Compatibility Some tuple combinations more natural than others Median income x a month VS weight of a person x population of Texas Similarity Perspective should be relevant to context NASA’s budget of $17 billion = 0.1% of US budget VS amount of money to feed LA for a year

Perspective Generation Given formula selected & multiplier, generate natural language descriptions with seq2seq RNN Baseline Combine tuples in the formula with neutral prepositions 1/5th of the cost of an employee for the population of Texas for time taken for lunch Seq2seq RNN RNN with an attention-based copying mechanism to generate perspectives Jia and Lang (2016)*

Results and Analysis: Formula Selection Measure of usefulness of generated formulae: Logistic regression classifier using four ranking criteria discussed Key Takeaways: 1. Best: Familiarity + compatibility 2. Similarity does not affect performance relative to proximity baseline; due to many unfamiliar formulae in dataset

Results and Analysis: Formula Selection Training data: 4 ranking features, and perspective ranks as labels Results and Analysis: Formula Selection Measure of usefulness of generated formulae: Logistic regression classifier using four ranking criteria discussed Key Takeaways: 1. Best: Familiarity + compatibility 2. Similarity does not affect performance relative to proximity baseline; due to many unfamiliar formulae in dataset

Results and Analysis: Perspective Generation - Split perspective dataset into non-overlapping sets - Compare BLEU scores of baseline and trained RNN output Key Takeaway: Seq-to-seq RNN gives more ‘natural’ rephrasings 7 x the cost of an employee x a week : 7 times the cost of employing one person for one week 6 x weight of a person x population of California : six times the weight of the people who is worth

Results and Analysis: Perspective Generation Use of BLEU score to determine overlap between input formula and generated perspective - Split perspective dataset into non-overlapping sets - Compare BLEU scores of baseline and trained RNN output Key Takeaway: Seq-to-seq RNN gives more ‘natural’ rephrasings 7 x the cost of an employee x a week : 7 times the cost of employing one person for one week 6 x weight of a person x population of California : six times the weight of the people who is worth

Conclusions & Shortcomings Proposes system to generate natural language descriptions from sentences with numeric mentions Collection of knowledge base, formulae, descriptions and corresponding preferences available for future research Conclusions Small and biased dataset Use of BLEU score as an evaluation measure Facts in knowledge base – course granularity Evaluation conclusions not substantiated Upon re-implementation, many re-phrasings not ‘natural’ as claimed Shortcomings

Future Work Larger knowledge base Freebase (Bollacker et al., 2008) OpenIE (Fader et al., 2011) Semantic compatibility on larger units of text better results than word vectors (Bowman et al, 2015)