Commonsense Knowledge Acquisition and Applications

Commonsense Knowledge Acquisition and Applications
Towards Commonsense Enriched Machines Niket Tandon Ph.D. Supervisor: Gerhard Weikum Max Planck Institute for Informatics

property brown Hard Rock part of Hand, leg Person Climber is a Person
Humans understand commonsense of the environment Climbing a rock scene Adventurous Activity

Humans Machines Human- Machine Knowledge Gap property brown 1 Rock
Hard Rock part of Hand, leg Person 2 Hands Climber is a Person 2 Legs Climbing a rock scene Adventurous Activity 1 Person

objects Humans Machines Human- Machine Knowledge Gap property brown
1 Rock Hard Rock Commonsense of objects part of Hand, leg Person 2 Hands Commonsense of relationships Climber is a Person 2 Legs Climbing a rock scene Adventurous Activity 1 Person Commonsense of interactions

How will the machines be smarter if we fill this knowledge gap
Smarter Robots Get me a coffee (where?) Smarter Vision Better classifiers Monitor or TV? given mouse, keyboard Smarter IR Adventurous activities

Encyclopedic Knowledge
Can we fill the human machine knowledge gap using existing Encyclopedic KBs like FreeBase? Encyclopedic Knowledge Common sense Knowledge Facts about instances/events Facts about Instances: A. Honnold, married, Lisa Honnold Their events: A. Honnold, married on, Facts about classes/activities

Encyclopedic Knowledge Commonsense Knowledge
Facts about instances 1. EKB acquisition Unimodal 2. EKB Curation Textual verification 3. EKB Completion Negative training assumptions hold If (ei, rk, ej) holds, then (ei, rk, ej’ != ej) is -ve A. Honnold, bornIn, US A. Honnold, bornIn, UK Facts about classes 1. CKB acquisition Multimodal 2. CKB Curation Textual + Visual 3. CKB Completion Negative training assumptions fail climber, at location, {mountain, university}

Encyclopedic Knowledge Commonsense Knowledge
Facts about instances 1. EKB acquisition Unimodal 2. EKB Curation Textual verification 3. EKB Completion Negative training assumptions hold If (ei, rk, ej) holds, then (ei, rk, ej’ != ej) is -ve A. Honnold, bornIn, US A. Honnold, bornIn, UK Facts about classes 1. CKB acquisition Multimodal 2. CKB Curation Textual + Visual 3. CKB Completion Negative training assumptions fail EKBs have several functional relations hence the assumption holds. Classes generalize properties of instances

Commonsense knowledge acquisition is different and harder
Humans hardly express the obvious: Scarce & Implicit Spread across multiple modalities: Multimodal Unusual reported more than usual: Reporting Bias Culture specific, Location specific: Contextual

KBs possessing commonsense knowledge
Supervision Pros Cons Cyc manually curated accuracy cost coverage ConceptNet semi-automated coverage less organized Tandon et. al AAAI’11 bootstrapped using ConceptNet noise, Desiderata minimal supervision organized, high accuracy > 80%, high coverage >10M --- Need: automatically constructed, semantically organized Commonsense KB

Need: robust techniques to automatically construct semantically organized Commonsense KB

Three research questions: Investigate robust techniques to acquire:
RQ 1. Commonsense of objects in the environment fine-grained, semantically refined properties.

RQ 2. Commonsense of relationships between objects part whole relation, comparative relation…

RQ 3. Commonsense of interactions between objects. - activities and their semantic attributes.

 RQ.1  RQ.2  RQ.3

Research question 1 RQ 1. Commonsense of objects in the environment fine-grained, semantically refined properties. Previous work: lump together these properties do not distinguish the meanings of the words have low coverage RQ.2 RQ.3

Input :𝐿𝑎𝑟𝑔𝑒 𝑡𝑒𝑥𝑡 𝑐𝑜𝑟𝑝𝑢𝑠
𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑒.𝑔. 𝑠𝑢𝑚𝑚𝑖𝑡 𝑖𝑠 𝑐𝑟𝑖𝑠𝑝 Output 𝑡𝑟𝑖𝑝𝑙𝑒𝑠 : < 𝑤1 𝑛 𝑠 , 𝑟, 𝑤2 𝑎 𝑠 > 𝑠𝑢𝑚𝑚𝑖𝑡 𝑛 2 ℎ𝑎𝑠𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝑐𝑟𝑖𝑠𝑝 𝑎 3

hasAppearance hasSound hasTaste hasTemperature evokesEmotion
Input :𝐿𝑎𝑟𝑔𝑒 𝑡𝑒𝑥𝑡 𝑐𝑜𝑟𝑝𝑢𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑒.𝑔. 𝑠𝑢𝑚𝑚𝑖𝑡 𝑖𝑠 𝑐𝑟𝑖𝑠𝑝 Output 𝑡𝑟𝑖𝑝𝑙𝑒𝑠 : < 𝑤1 𝑛 𝑠 , 𝑟, 𝑤2 𝑎 𝑠 > 𝑠𝑢𝑚𝑚𝑖𝑡 𝑛 2 ℎ𝑎𝑠𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝑐𝑟𝑖𝑠𝑝 𝑎 3 disambiguated n 1.) 2.) 3.) … fine-grained relations: r∈R hasAppearance hasSound hasTaste hasTemperature evokesEmotion … disambiguated a 1.) 2.) 3.) …

Our approach Extract generic hasProperty triples over input
<noun> verb [adv] <adj> <adj> <noun> e.g. 𝑠𝑢𝑚𝑚𝑖𝑡 𝑖𝑠 𝑐𝑟𝑖𝑠𝑝.. 𝒔𝒖𝒎𝒎𝒊𝒕, 𝒄𝒓𝒊𝒔𝒑 𝒎𝒐𝒖𝒏𝒕𝒂𝒊𝒏, 𝒄𝒐𝒍𝒅 𝒄𝒉𝒊𝒍𝒊, 𝒉𝒐𝒕 Disambiguate args and classify triple

Extract generic hasProperty triples over input
Typically requires training data Disambiguate args and classify triple

Extract generic hasProperty triples over input <𝒘𝟏 𝒏 , 𝒘𝟐 𝒂 >
<𝒘𝟏 𝒏 , 𝒘𝟐 𝒂 > Suppose 𝑟=ℎ𝑎𝑠𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 𝑠𝑢𝑚𝑚𝑖𝑡, 𝑐𝑟𝑖𝑠𝑝 𝒓𝒂𝒏𝒈𝒆 𝒓 𝒊𝒏𝒇𝒆𝒓𝒆𝒏𝒄𝒆 <∗,𝒓, 𝒘𝟐 𝒂 𝒔 > 𝒄𝒓𝒊𝒔𝒑 𝒂 𝟑 , 𝒉𝒐𝒕 𝒂 𝟏 , 𝒄𝒐𝒍𝒅 𝒂 𝟏 , 𝒊𝒄𝒚 𝒂 𝟐 … 𝒅𝒐𝒎𝒂𝒊𝒏 𝒓 𝒊𝒏𝒇𝒆𝒓𝒆𝒏𝒄𝒆 < 𝒘𝟏 𝒏 𝒔 , 𝒓,∗> 𝒃𝒆𝒂𝒄𝒉 𝒏 𝟑 , 𝒔𝒖𝒎𝒎𝒊𝒕 𝒏 𝟐 , 𝒎𝒆𝒕𝒂𝒍 𝒏 𝟏 , 𝒎𝒆𝒕𝒂𝒍 𝒏 𝟐 … This is a new Transductive setting because previous transductive settings would only have relationship between triples. By having graphs for parts of triples, we can generalize by first going to abstract level (domain and range) in order to prune the otherwise hopelessly large graph. Disambiguate args and classify triple 𝒂𝒔𝒔𝒆𝒓𝒕𝒊𝒐𝒏 𝒓 𝒊𝒏𝒇𝒆𝒓𝒆𝒏𝒄𝒆 <𝒘𝟏 𝒏 𝒔 ,𝒓, 𝒘𝟐 𝒂 𝒔 > <𝒔𝒖𝒎𝒎𝒊𝒕 𝒏 𝟐 , 𝒄𝒓𝒊𝒔𝒑 𝒂 𝟑 > <𝒃𝒆𝒂𝒄𝒉 𝒏 𝟏 , 𝒉𝒐𝒕 𝒂 𝟏 > …

𝑑𝑜𝑚𝑎𝑖𝑛(𝑟), 𝑟𝑎𝑛𝑔𝑒(𝑟), 𝑎𝑠𝑠𝑒𝑟𝑡𝑖𝑜𝑛(𝑟) 𝑖𝑛𝑓𝑒𝑟𝑒𝑛𝑐𝑒
Noisy, Surface form candidates for 𝒓 Graph construction Graph inference

An instance of the problem: 𝑟𝑎𝑛𝑔𝑒(𝑟)
summit mountain dancer cold 20 50 3 hot 30 40 10 crisp 15 1 Only hirst similarity generalizes to both nouns and adjectives.

𝒄𝒓𝒊𝒔𝒑 𝒂 𝟏 clearly defined 𝒄𝒓𝒊𝒔𝒑 𝒂 𝟑 cold and invigorating temperature 𝒄𝒐𝒍𝒅 𝒂 𝟏 low or inadequate temperature

sense #1 sense #2 sense #3 1/2 1/3 1/4

Label propagation for graph inference, given few seeds
Label propagation for graph inference, given few seeds. - Label per node = in/not in range of hasTemperature 𝒔𝒖𝒎𝒎𝒊𝒕, 𝒄𝒓𝒊𝒔𝒑 𝒎𝒐𝒖𝒏𝒕𝒂𝒊𝒏, 𝒄𝒐𝒍𝒅 s𝒂𝒍𝒔𝒂, 𝒉𝒐𝒕 Similar nodes Similar labels But, limited training data

Label propagation for graph inference, given few seeds
Label propagation for graph inference, given few seeds. - Label per node = in/not in range of hasTemperature Similar nodes Similar labels But, limited training data

Label Propagation: Loss function (Talukdar et. al 2009)
Seed label loss Similar node diff label loss Label prior loss (high degree nodes are noise) U V

Similar node diff label loss
Label propagation for graph inference, given few seeds. - Label per node = in/not in range of hasTemperature Seed label loss Similar node diff label loss Label prior loss

WebChild : Model recap Noisy, surface form candidates for 𝒓
Graph construction Graph inference Clean, disambiguated triples in 𝒓

Resulting KB ... ... ... WebChild: Large (~5Million),
Semantically organized Accurate (0.82 sampled precision) Domain (hasShape) mountain-n1 leaf-n1 ... Range (hasShape) triangular-a1 tapered-a1 ... Assertions (hasSshape) lens-n1, spherical-a2 palace-n2, domed-a1 ...

Summary of property commonsense
WebChild: First commonsense KB with fine-grained relations and disambiguated arguments ; 4.6 million assertions including domain and range for 19 relations. Take away message: Transductive methods help overcome sparsity of commonsense in text. Say it: People usually say commonsense knowledge cannot be found in text. This paper shows that with graph-based methods you can still uncover it and even infer fine-grained disambiguated knowledge. In general, for deeper text understanding and for AI complete tasks. Give sweet house translation example..

RQ 3. Commonsense of interactions between objects.
Research question 3 RQ 3. Commonsense of interactions between objects. - activities and their semantic attributes. Previous work: largely discuss events, but activities only at small-scale do not organize the attributes of the activities do not distinguish the meanings of the attribute values

An Activity frame {Climb up a mountain , Hike up a hill} Participants
climber, boy, rope Location camp, forest, sea shore Time day, holiday Visuals

Semantic organization of Activity frames
Go up an elevation .. Parent activity Previous activity Next activity {Climb up a mountain , Hike up a hill} Participants climber, boy, rope Location camp, forest, sea shore Time day, holiday Visuals Get to village .. Reach at the top ..

Contain events but not activity knowledge
May contain activities but no visuals and varying granularity of scene boundaries, transitions.

Contain events but not activity knowledge
May contain activities but no visuals and varying granularity of scene boundaries, transitions. Hollywood narratives are good

Semantic parsing of scripts Graph construction

Semantic parsing of scripts Graph construction Input: Text in a scene taken from a semi-structured movie script e.g. : He began to shoot a video on the summit Output: Disambiguated, semantic roles e.g. the man : agent began to shoot : action a video : patient summit : location SRL systems are computationally expensive, domain specific

State of the art WSD customized for phrases
the man man.1 man.2 began to shoot shoot.1 shoot.4 a video video.1

Can we use two different information sources to perform SRL
State of the art WSD customized for phrases VerbNet contains curated semantic roles for verbs the man man.1 NP VP NP man.2 agent. animate shoot.vn.1 patient. animate began to shoot shoot.1 shoot.4 agent. animate shoot.vn.3 patient. inanimate a video NP VP NP video.1 Selectional restriction Selectional restriction Can we use two different information sources to perform SRL given no training data?

shoot.4 patient. inanimate video.1 Thing/ inanimate
State of the art WSD customized for phrases Jointly leverage Syntactic and semantic role semantics from VerbNet the man man.1 NP VP NP man.2 agent. animate shoot.vn.1 patient. animate began to shoot shoot.1 shoot.4 agent. animate shoot.vn.3 patient. inanimate a video NP VP NP video.1 WordNet VerbNet linkage WordNet class hierarchy Thing/ inanimate

man.1 shoot.4 patient. inanimate video.1 Thing/ inanimate
State of the art WSD customized for phrases Jointly leverage Syntactic and semantic role semantics from VerbNet the man man.1 NP VP NP man.2 agent. animate shoot.vn.1 patient. animate began to shoot shoot.1 shoot.4 agent. animate shoot.vn.3 patient. inanimate a video NP VP NP video.1 WordNet VerbNet linkage WordNet class hierarchy Thing/ inanimate Binary decision variable

State of the art WSD customized for phrases Jointly leverage Syntactic and semantic role semantics from VerbNet the man man.1 NP VP NP man.2 agent. animate shoot.vn.1 patient. animate began to shoot shoot.1 shoot.4 agent. animate shoot.vn.3 patient. inanimate a video NP VP NP video.1 WordNet VerbNet linkage WordNet class hierarchy Thing/ inanimate WSD prior WN prior

State of the art WSD customized for phrases Jointly leverage Syntactic and semantic role semantics from VerbNet the man man.1 NP VP NP man.2 agent. animate shoot.vn.1 patient. animate began to shoot shoot.1 shoot.4 agent. animate shoot.vn.3 patient. inanimate a video NP VP NP video.1 WordNet class hierarchy WN VN linkage Thing/ inanimate Sense, VN syntactic match score

State of the art WSD customized for phrases Jointly leverage Syntactic and semantic role semantics from VerbNet the man man.1 NP VP NP man.2 agent. animate shoot.vn.1 patient. animate began to shoot shoot.1 shoot.4 agent. animate shoot.vn.3 patient. inanimate a video NP VP NP video.1 WordNet class hierarchy WN VN linkage Thing/ inanimate Sense, VN semantic match score

Joint WSD and SRL WSD prior WN prior Word, VN match score Selectional restriction score xij = binary decision var. for word i, mapped to WN sense j One VN sense per verb WN, VN sense consistency … … Selectional restr. constraints binary decision

Semantic parsing Graph construction of scripts O/P Joint WSD and SRL
Agent: man.1 Action: shoot.4 Patient: video.1 the man man.1 NP VP NP man.2 agent. animate shoot.vn.1 patient. animate began to shoot shoot.1 shoot.4 agent. animate shoot.vn.3 patient. inanimate a video NP VP NP video.1

Semantic parsing Graph construction of scripts Climb up a mountain
Participants climber, rope Location summit, forest Time day

Construct a graph of activity frames with three edge types:
Semantic parsing of scripts Graph construction Go up an elevation .. Climb up a mountain Participants climber, rope Location summit, forest Time day Hike up a hill Participants climber Location sea shore Time holiday Reach top .. Construct a graph of activity frames with three edge types: TypeOf : T(a,b) Similar : S(a,b) Previous: P(a,b)

+ Similarity: S (climb up a mountain, hike up a hill)
Activity Similarity Attribute similarity Climb up a mountain Participants climber, rope Location forest Time day Hike up a Hill Participants climber Location woods Time holiday

+ TypeOf: T (climb up a mountain, go up an elevation)
Activity hypernymy Attribute hypernymy Climb up a mountain Participants climber, rope Location forest Time day Go up an elevation Participants Person Location Exterior Time day

Previous: P (reach the top, climb up a mountain)
… Reach the top … Scene: Carrie and Big start out early to head to the village. They climb up the beautiful mountain which felt as if they were in a different world. After several hours they eventually reach the top. … Allow gaps between activities within one scene. PMI style counting to suppress generic activities.

Semantic parsing Graph construction of scripts parent similar temporal
Go up an elevation .. parent Climb up a mountain Participants climber, rope Location summit, forest Time day Hike up a hill Participants climber Location sea shore Time holiday similar temporal Reach top ..

Semantic parsing of scripts Graph construction

Resulting KB: Knowlywood
Statistics Scenes 1,708,782 Activity synsets 505,788 Accuracy 0.85 ± 0.01 #Images from scenes 30,000

Summary of activity commonsense
Knowlywood: First organized commonsense activity KB with activity attributes and disambiguated values containing nearly million activities with visuals. Take away message: Jointly leveraging different annotated resources helps overcome sparsity of training data.

The overall KB: WebChild KB > 3M concepts, > 18M triples, >1000 relations

Conclusions and take home messages: Knowledge to make machines smarter can be acquired with robust techniques that jointly leverage global information Research Question 1 Properties (WSDM’14) Research Question 2 Comparatives, part-whole (AAAI’14, AAAI’16) Research Question 3 Activities (WWW’15, CIKM’15) WEBCHILD KB Applications (CVPR’15, ACL’15, ISWC’16..)

Conclusions and take home messages: Knowledge to make machines smarter can be acquired with robust techniques that jointly leverage global information Thank you! RQ1 Range, domain, assertions of fine-grained relations Properties (WSDM’14) RQ2 Fine-grained comparative, part-whole relations Comparatives, part-whole (AAAI’14, AAAI’16) RQ3 Activity frames with semantic attributes Activities (WWW’15, CIKM’15) ML + NLP community limited training data can be overcome by jointly leveraging multiple cues Computer Vision community commonsense helps computer vision vision helps commonsense acquisition AI community semantically organized knowledge is a step towards filling human machine gap WEBCHILD KB Applications (CVPR’15, ACL’15, ISWC’16..) Thank you!

Commonsense Knowledge Acquisition and Applications

Similar presentations

Presentation on theme: "Commonsense Knowledge Acquisition and Applications"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Commonsense Knowledge Acquisition and Applications

Similar presentations

Presentation on theme: "Commonsense Knowledge Acquisition and Applications"— Presentation transcript:

Similar presentations

About project

Feedback