HowToKB: Mining HowTo Knowledge from Online Communities

Slides:



Advertisements
Similar presentations
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Advertisements

Knowledge Base Completion via Search-Based Question Answering
A Robust Approach to Aligning Heterogeneous Lexical Resources Mohammad Taher Pilehvar Roberto Navigli MultiJEDI ERC
Learning to Cluster Web Search Results SIGIR 04. ABSTRACT Organizing Web search results into clusters facilitates users quick browsing through search.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Steven Schoonover.  What is VerbNet?  Levin Classification  In-depth look at VerbNet  Evolution of VerbNet  What is FrameNet?  Applications.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Memoplex Browser: Searching and Browsing in Semantic Networks CPSC 533C - Project Update Yoel Lanir.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Thien Anh Dinh1, Tomi Silander1, Bolan Su1, Tianxia Gong
Learning Narrative Schemas Nate Chambers, Dan Jurafsky Stanford University IBM Watson Research Center Visit.
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Date : 2014/09/18 Author : Niket Tandon, Gerard de Melo, Fabian Suchanek, Gerhard Weikum Source : WSDM’14 Advisor : Jia-ling Koh Speaker : Shao-Chun Peng.
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Evgeniy Gabrilovich and Shaul Markovitch
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
University of Nebraska–Lincoln Know how. Know now.
Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web Danushka Bollegala Yutaka Matsuo Mitsuru Ishizuka International.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
2016/3/11 Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge Xia Hu, Nan Sun, Chao Zhang, Tat-Seng Chu.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Question Classification Ling573 NLP Systems and Applications April 25, 2013.
Ensembling Diverse Approaches to Question Answering
Experience Report: System Log Analysis for Anomaly Detection
Recent developments in object detection
Cross-lingual Dataless Classification for Many Languages
A. M. R. R. Bandara & L. Ranathunga
What Convnets Make for Image Captioning?
Concept Grounding to Multiple Knowledge Bases via Indirect Supervision
Exploiting Wikipedia as External Knowledge for Document Clustering
Optimizing Parallel Algorithms for All Pairs Similarity Search
Machine Learning overview Chapter 18, 21
Cross-lingual Dataless Classification for Many Languages
Semantic Parsing for Question Answering
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Information Retrieval and Web Search
Compositional Human Pose Regression
Information Retrieval and Web Search
Machine Learning Ali Ghodsi Department of Statistics
Vector-Space (Distributional) Lexical Semantics
A weight-incorporated similarity-based clustering ensemble method based on swarm intelligence Yue Ming NJIT#:
CMPT 733, SPRING 2016 Jiannan Wang
Commonsense Knowledge Acquisition and Applications
Semantic Network & Knowledge Graph
Enhanced Dependency Jiajie Yu Wentao Ding.
Knowlywood: Mining Activity Knowledge from Hollywood Narratives
Acquiring Comparative Commonsense Knowledge from the Web
Enriching Structured Knowledge with Open Information
Introduction Task: extracting relational facts from text
Word embeddings based mapping
CSE 635 Multimedia Information Retrieval
Word embeddings based mapping
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Enriching Taxonomies With Functional Domain Knowledge
ProBase: common Sense Concept KB and Short Text Understanding
Ying Dai Faculty of software and information science,
Point Set Representation for Object Detection and Beyond
Presentation transcript:

HowToKB: Mining HowTo Knowledge from Online Communities Cuong Chu, Niket Tandon, Gerhard Weikum MPI Saarbruecken Allen Institute for AI MPI Saarbruecken task frame : How to paint a wall

HowToKB: Mining HowTo Knowledge from Online Communities Cuong Chu, Niket Tandon, Gerhard Weikum MPI Saarbruecken Allen Institute for AI MPI Saarbruecken task frame : How to paint a wall Attributes Edges

Related work on HowTo knowledge acquisition Input Representation

Related work on HowTo knowledge acquisition Generic Input - Tasks are not semantic frames Yang et. al SIGIR’15 Representation Syntactic structures OpenIE ConceptNet Reduced expressivity Semantic expressivity PropBank Domain specific

Related work on HowTo knowledge acquisition Generic Input Yang et. al SIGIR’15 Representation Syntactic structures OpenIE ConceptNet Reduced expressivity Semantic expressivity PropBank VerbNet FrameNet Knowlywood Domain specific

Related work on HowTo knowledge acquisition Generic Input Yang et. al SIGIR’15 HowToKB Representation Syntactic structures OpenIE Schank’75 Fillmore’76 ConceptNet Minsky’74 Reduced expressivity Semantic expressivity PropBank VerbNet FrameNet Knowlywood Domain specific

Related work on HowTo knowledge acquisition Generic Input Yang et. al SIGIR’15 HowToKB Representation Syntactic structures OpenIE Schank’75 Fillmore’76 ConceptNet Minsky’74 Reduced expressivity Semantic expressivity PropBank VerbNet FrameNet - No phrases/tasks - manually populated Knowlywood Domain specific Message: HowToKB’s knowledge representation is different.

Related work on HowTo knowledge acquisition Task Model

Related work on HowTo knowledge acquisition Schema based Task Semantic Frame parsing Model Semantic Role Labeling Supervised Unsupervised OpenIE Syntactic structures Schema free extraction

Related work on HowTo knowledge acquisition Schema based Task Semantic Frame parsing HowToKB Model Semantic Role Labeling - mapped to WordNet, closed sense repository Knowlywood Supervised Unsupervised OpenIE Syntactic structures Schema free extraction Message: our task is different.

WikiHow: our input dataset

WikiHow: our input dataset Task Sub task Previous task Sub task Next task … Participating objects

WikiHow: our input dataset Task Sub task Images, videos Previous task Sub task Next task … Participating objects Message: WikiHow data is very rich, and can be exploited.

System overview WikiHow Frame construction Frame organization HowToKB

Novel knowledge representation System overview WikiHow Frame construction Frame organization HowToKB Stage 1a: convert unstructured articles to structured task frame Stage 1b: sequencing task frames Novel knowledge representation

Novel hierarchical organization with distributional senses System overview WikiHow Frame construction Frame organization HowToKB Stage 2: organize the sequenced task frames. Novel hierarchical organization with distributional senses

KB construction: extraction OpenIE naturally suits task frame construction; easy mapping to task attributes attribute OpenIE mapping location time Participating agent subject Participating object subject/object

KB construction: extraction OpenIE naturally suits task frame construction; easy mapping to task attributes Attribute type-checking increases precision from 75% to 97% attribute OpenIE mapping type-checking head ∈ WordNet location WN-noun time WN-time Participating agent subject WN-living Participating object subject/object WN-nonliving Message: Type checking helps to postprocess OpenIE results.

KB construction: extraction 1.2 million task frames OpenIE naturally suits task frame construction; easy mapping to task attributes Attribute type-checking increases precision from 75% to 97% attribute OpenIE mapping type-checking WN: WordNet location Noun phrase time WN-time Participating agent subject WN-living Participating object subject/object WN-nonliving Message: 1.2M task frames are isolated from each other.

Why KB organization? task paint wall participating object brush, paint, .. sub-task clean the surface, dip the roller.. task paint ceiling participating object paint, roller, .. sub-task clean the surface, dip the roller.. Message: KB organization is essential for: a) better redundancy: aggregated frames: paint wall, paint ceiling

Why KB organization? Task use keyboard Category Iphone, Android Mac, Windows Music listening Music appreciation Visuals Message: KB organization is essential for: a) better redundancy: aggregated frames: paint wall, paint ceiling b) disambiguation of tasks: use keyboard– piano? or, computer?

Approach to KB organization use keyboard press keystrokes For the 1.2 million frames, the number of clusters is unknown. Hierarchical clustering is natural, but expensive

Approach to KB organization use keyboard press keystrokes For the 1.2 million frames, the number of clusters is unknown. Hierarchical clustering is natural, but expensive Expected organization use keyboard use keyboard, press keystrokes use keyboard, press keystrokes

Approach to KB organization use keyboard press keystrokes For the 1.2 million frames, the number of clusters is unknown. Hierarchical clustering is natural, but expensive We propose a two stage-clustering, Stage 1: coarse-grained clustering Stage 2: fine-grained clustering use keyboard use keyboard, press keystrokes use keyboard, press keystrokes

Preparing for clustering: Multi-dimensional similarity model Attribute Task frame f1 Task frame f2 Task title paint a wall color the room ceiling Location house, wall, … bedroom, ceiling,… … ... Category home & garden house decoration 𝑓 𝑤2𝑣 𝑡 1 𝑣𝑒𝑟𝑏 , 𝑡 2 𝑣𝑒𝑟𝑏 ∗ 𝑓 𝑤2𝑣 (𝑡 1 𝑛𝑜𝑢𝑛 , 𝑡 2 𝑛𝑜𝑢𝑛 )

Preparing for clustering: Multi-dimensional similarity model Attribute Task frame f1 Task frame f2 Task title paint a wall color the room ceiling Location house, wall, … bedroom, ceiling,… … ... Category home & garden house decoration 𝑓 𝑤2𝑣 𝑡 1 𝑣𝑒𝑟𝑏 , 𝑡 2 𝑣𝑒𝑟𝑏 ∗ 𝑓 𝑤2𝑣 (𝑡 1 𝑛𝑜𝑢𝑛 , 𝑡 2 𝑛𝑜𝑢𝑛 ) Finally, logistic regression over the attributes, 𝑠𝑖𝑚 𝑓 1 , 𝑓 2 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 ( 𝑤 0 + 𝑖=1 𝐴 𝑤 𝑖 𝑓 𝑖 𝑎 1 , 𝑎 2 )

Preparing for clustering: Multi-dimensional similarity model Attribute Task frame f1 Task frame f2 Task title paint a wall color the room ceiling Location house, wall, … bedroom, ceiling,… … ... Category home & garden house decoration 𝑓 𝑤2𝑣 𝑡 1 𝑣𝑒𝑟𝑏 , 𝑡 2 𝑣𝑒𝑟𝑏 ∗ 𝑓 𝑤2𝑣 (𝑡 1 𝑛𝑜𝑢𝑛 , 𝑡 2 𝑛𝑜𝑢𝑛 ) Message: Our task frame pairs are dissimilar with an empirical confidence of 99.9% if a combination of their categorical and lexical similarity is less than a threshold Finally, logistic regression over the attributes, 𝑠𝑖𝑚 𝑓 1 , 𝑓 2 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 ( 𝑤 0 + 𝑖=1 𝐴 𝑤 𝑖 𝑓 𝑖 𝑎 1 , 𝑎 2 )

Coarse-grained clustering 1.2 million task frames use keyboard use mac keyboard press keystrokes Efficient Hash Based grouping use keyboard press keystrokes Lexical grouping 375K groups

Coarse-grained clustering 1.2 million task frames use keyboard use mac keyboard press keystrokes Efficient Hash Based grouping use keyboard press keystrokes Lexical grouping 375K groups Fewer pairs, efficient top-k similarity Distributional grouping 200K groups use keyboard, press keystrokes Message: Pruning helps to efficiently reduce the search space.

Fine-grained clustering 1.2 million task frames use keyboard use mac keyboard press keystrokes … Lexical grouping 375K groups use keyboard press keystrokes use keyboard, press keystrokes Distributional grouping 200K groups Allows fast, parallel hierarchical clustering Final clusters

Recap of system architecture WikiHow Frame construction Frame organization HowToKB

Resulting HowToKB 0.5 million grouped task frames, Avg. per frame: 12 attributes values, 2 images Precision > 85% Wilson confidence intervals

Message: HowToKB maintains high precision at large-scale. Resulting HowToKB 0.5 million grouped task frames, Avg. per frame: 12 attributes values, 2 images Precision > 85% As ground truth, turkers fill “very likely” attribute values for 150 frames Example: In some context such as decorate the house, the most likely location when we paint a wall is ____ Wilson confidence intervals Message: HowToKB maintains high precision at large-scale.

Usecase: finding YouTube videos for a HowTo task Task query YouTube video make caramel corn Gourmet Caramel Popcorn “Thanks Monique”

Usecase: finding YouTube videos for a HowTo task Task query Expansion using frames (attributes, edges) YouTube video make caramel corn brown sugar ... popcorn ... syrup teaspoon.. ... bake soda ... vanilla .. Gourmet Caramel Popcorn “Thanks Monique”

Usecase: finding YouTube videos for a HowTo task Task query Expansion using frames (attributes, edges) YouTube video HowToKB based expansion beats the strong baselines (Word2Vec, WordNet) 50% of HowToKB’s context is unique, going beyond distributional context. For hard ambiguous queries, HowToKB has 10% precision; baselines achieve < 1% make caramel corn brown sugar ... popcorn ... syrup teaspoon.. ... bake soda ... vanilla .. Gourmet Caramel Popcorn “Thanks Monique” Message: HowToKB provides rich context, beyond relatedness.

Conclusion WikiHow provides a very rich starting point for extraction of task frames Knowledge organization is performed by our fast, clustering method Resulting HowToKB is the first KB on HowTo tasks, and is publicly available. http://www.mpi-inf.mpg.de/yago-naga/webchild/HowToKB Task paint a wall Category painting and other finishes, ceilings, interior walls Participating object wall, ceiling, stair, paint, latex paint for base coat, masking tape, cotton rags Parent task touch up paint, paint ceiling Sub task move furniture, choose color, sand the wall, fill any hole Message: HowToKB, with its rich structure, fills an important knowledge gap