Download presentation
Presentation is loading. Please wait.
Published byJanel Gordon Modified over 6 years ago
1
HowToKB: Mining HowTo Knowledge from Online Communities
Cuong Chu, Niket Tandon, Gerhard Weikum MPI Saarbruecken Allen Institute for AI MPI Saarbruecken task frame : How to paint a wall
2
HowToKB: Mining HowTo Knowledge from Online Communities
Cuong Chu, Niket Tandon, Gerhard Weikum MPI Saarbruecken Allen Institute for AI MPI Saarbruecken task frame : How to paint a wall Attributes Edges
3
Related work on HowTo knowledge acquisition
Input Representation
4
Related work on HowTo knowledge acquisition
Generic Input - Tasks are not semantic frames Yang et. al SIGIR’15 Representation Syntactic structures OpenIE ConceptNet Reduced expressivity Semantic expressivity PropBank Domain specific
5
Related work on HowTo knowledge acquisition
Generic Input Yang et. al SIGIR’15 Representation Syntactic structures OpenIE ConceptNet Reduced expressivity Semantic expressivity PropBank VerbNet FrameNet Knowlywood Domain specific
6
Related work on HowTo knowledge acquisition
Generic Input Yang et. al SIGIR’15 HowToKB Representation Syntactic structures OpenIE Schank’75 Fillmore’76 ConceptNet Minsky’74 Reduced expressivity Semantic expressivity PropBank VerbNet FrameNet Knowlywood Domain specific
7
Related work on HowTo knowledge acquisition
Generic Input Yang et. al SIGIR’15 HowToKB Representation Syntactic structures OpenIE Schank’75 Fillmore’76 ConceptNet Minsky’74 Reduced expressivity Semantic expressivity PropBank VerbNet FrameNet - No phrases/tasks - manually populated Knowlywood Domain specific Message: HowToKB’s knowledge representation is different.
8
Related work on HowTo knowledge acquisition
Task Model
9
Related work on HowTo knowledge acquisition
Schema based Task Semantic Frame parsing Model Semantic Role Labeling Supervised Unsupervised OpenIE Syntactic structures Schema free extraction
10
Related work on HowTo knowledge acquisition
Schema based Task Semantic Frame parsing HowToKB Model Semantic Role Labeling - mapped to WordNet, closed sense repository Knowlywood Supervised Unsupervised OpenIE Syntactic structures Schema free extraction Message: our task is different.
11
WikiHow: our input dataset
12
WikiHow: our input dataset
Task Sub task Previous task Sub task Next task … Participating objects
13
WikiHow: our input dataset
Task Sub task Images, videos Previous task Sub task Next task … Participating objects Message: WikiHow data is very rich, and can be exploited.
14
System overview WikiHow Frame construction Frame organization HowToKB
15
Novel knowledge representation
System overview WikiHow Frame construction Frame organization HowToKB Stage 1a: convert unstructured articles to structured task frame Stage 1b: sequencing task frames Novel knowledge representation
16
Novel hierarchical organization with distributional senses
System overview WikiHow Frame construction Frame organization HowToKB Stage 2: organize the sequenced task frames. Novel hierarchical organization with distributional senses
17
KB construction: extraction
OpenIE naturally suits task frame construction; easy mapping to task attributes attribute OpenIE mapping location time Participating agent subject Participating object subject/object
18
KB construction: extraction
OpenIE naturally suits task frame construction; easy mapping to task attributes Attribute type-checking increases precision from 75% to 97% attribute OpenIE mapping type-checking head ∈ WordNet location WN-noun time WN-time Participating agent subject WN-living Participating object subject/object WN-nonliving Message: Type checking helps to postprocess OpenIE results.
19
KB construction: extraction
1.2 million task frames OpenIE naturally suits task frame construction; easy mapping to task attributes Attribute type-checking increases precision from 75% to 97% attribute OpenIE mapping type-checking WN: WordNet location Noun phrase time WN-time Participating agent subject WN-living Participating object subject/object WN-nonliving Message: 1.2M task frames are isolated from each other.
20
Why KB organization? task paint wall participating object brush, paint, .. sub-task clean the surface, dip the roller.. task paint ceiling participating object paint, roller, .. sub-task clean the surface, dip the roller.. Message: KB organization is essential for: a) better redundancy: aggregated frames: paint wall, paint ceiling
21
Why KB organization? Task use keyboard Category Iphone, Android Mac, Windows Music listening Music appreciation Visuals Message: KB organization is essential for: a) better redundancy: aggregated frames: paint wall, paint ceiling b) disambiguation of tasks: use keyboard– piano? or, computer?
22
Approach to KB organization
use keyboard press keystrokes For the 1.2 million frames, the number of clusters is unknown. Hierarchical clustering is natural, but expensive
23
Approach to KB organization
use keyboard press keystrokes For the 1.2 million frames, the number of clusters is unknown. Hierarchical clustering is natural, but expensive Expected organization use keyboard use keyboard, press keystrokes use keyboard, press keystrokes
24
Approach to KB organization
use keyboard press keystrokes For the 1.2 million frames, the number of clusters is unknown. Hierarchical clustering is natural, but expensive We propose a two stage-clustering, Stage 1: coarse-grained clustering Stage 2: fine-grained clustering use keyboard use keyboard, press keystrokes use keyboard, press keystrokes
25
Preparing for clustering: Multi-dimensional similarity model
Attribute Task frame f1 Task frame f2 Task title paint a wall color the room ceiling Location house, wall, … bedroom, ceiling,… … ... Category home & garden house decoration 𝑓 𝑤2𝑣 𝑡 1 𝑣𝑒𝑟𝑏 , 𝑡 2 𝑣𝑒𝑟𝑏 ∗ 𝑓 𝑤2𝑣 (𝑡 1 𝑛𝑜𝑢𝑛 , 𝑡 2 𝑛𝑜𝑢𝑛 )
26
Preparing for clustering: Multi-dimensional similarity model
Attribute Task frame f1 Task frame f2 Task title paint a wall color the room ceiling Location house, wall, … bedroom, ceiling,… … ... Category home & garden house decoration 𝑓 𝑤2𝑣 𝑡 1 𝑣𝑒𝑟𝑏 , 𝑡 2 𝑣𝑒𝑟𝑏 ∗ 𝑓 𝑤2𝑣 (𝑡 1 𝑛𝑜𝑢𝑛 , 𝑡 2 𝑛𝑜𝑢𝑛 ) Finally, logistic regression over the attributes, 𝑠𝑖𝑚 𝑓 1 , 𝑓 2 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 ( 𝑤 0 + 𝑖=1 𝐴 𝑤 𝑖 𝑓 𝑖 𝑎 1 , 𝑎 2 )
27
Preparing for clustering: Multi-dimensional similarity model
Attribute Task frame f1 Task frame f2 Task title paint a wall color the room ceiling Location house, wall, … bedroom, ceiling,… … ... Category home & garden house decoration 𝑓 𝑤2𝑣 𝑡 1 𝑣𝑒𝑟𝑏 , 𝑡 2 𝑣𝑒𝑟𝑏 ∗ 𝑓 𝑤2𝑣 (𝑡 1 𝑛𝑜𝑢𝑛 , 𝑡 2 𝑛𝑜𝑢𝑛 ) Message: Our task frame pairs are dissimilar with an empirical confidence of 99.9% if a combination of their categorical and lexical similarity is less than a threshold Finally, logistic regression over the attributes, 𝑠𝑖𝑚 𝑓 1 , 𝑓 2 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 ( 𝑤 0 + 𝑖=1 𝐴 𝑤 𝑖 𝑓 𝑖 𝑎 1 , 𝑎 2 )
28
Coarse-grained clustering
1.2 million task frames use keyboard use mac keyboard press keystrokes Efficient Hash Based grouping use keyboard press keystrokes Lexical grouping 375K groups
29
Coarse-grained clustering
1.2 million task frames use keyboard use mac keyboard press keystrokes Efficient Hash Based grouping use keyboard press keystrokes Lexical grouping 375K groups Fewer pairs, efficient top-k similarity Distributional grouping 200K groups use keyboard, press keystrokes Message: Pruning helps to efficiently reduce the search space.
30
Fine-grained clustering
1.2 million task frames use keyboard use mac keyboard press keystrokes … Lexical grouping 375K groups use keyboard press keystrokes use keyboard, press keystrokes Distributional grouping 200K groups Allows fast, parallel hierarchical clustering Final clusters
31
Recap of system architecture
WikiHow Frame construction Frame organization HowToKB
32
Resulting HowToKB 0.5 million grouped task frames,
Avg. per frame: 12 attributes values, 2 images Precision > 85% Wilson confidence intervals
33
Message: HowToKB maintains high precision at large-scale.
Resulting HowToKB 0.5 million grouped task frames, Avg. per frame: 12 attributes values, 2 images Precision > 85% As ground truth, turkers fill “very likely” attribute values for 150 frames Example: In some context such as decorate the house, the most likely location when we paint a wall is ____ Wilson confidence intervals Message: HowToKB maintains high precision at large-scale.
34
Usecase: finding YouTube videos for a HowTo task
Task query YouTube video make caramel corn Gourmet Caramel Popcorn “Thanks Monique”
35
Usecase: finding YouTube videos for a HowTo task
Task query Expansion using frames (attributes, edges) YouTube video make caramel corn brown sugar ... popcorn ... syrup teaspoon bake soda ... vanilla .. Gourmet Caramel Popcorn “Thanks Monique”
36
Usecase: finding YouTube videos for a HowTo task
Task query Expansion using frames (attributes, edges) YouTube video HowToKB based expansion beats the strong baselines (Word2Vec, WordNet) 50% of HowToKB’s context is unique, going beyond distributional context. For hard ambiguous queries, HowToKB has 10% precision; baselines achieve < 1% make caramel corn brown sugar ... popcorn ... syrup teaspoon bake soda ... vanilla .. Gourmet Caramel Popcorn “Thanks Monique” Message: HowToKB provides rich context, beyond relatedness.
37
Conclusion WikiHow provides a very rich starting point for extraction of task frames Knowledge organization is performed by our fast, clustering method Resulting HowToKB is the first KB on HowTo tasks, and is publicly available. Task paint a wall Category painting and other finishes, ceilings, interior walls Participating object wall, ceiling, stair, paint, latex paint for base coat, masking tape, cotton rags Parent task touch up paint, paint ceiling Sub task move furniture, choose color, sand the wall, fill any hole Message: HowToKB, with its rich structure, fills an important knowledge gap
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.