Knowlywood: Mining Activity Knowledge from Hollywood Narratives

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

Indoor Segmentation and Support Inference from RGBD Images Nathan Silberman, Derek Hoiem, Pushmeet Kohli, Rob Fergus.
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
A Robust Approach to Aligning Heterogeneous Lexical Resources Mohammad Taher Pilehvar Roberto Navigli MultiJEDI ERC
Finding Structure in Home Videos by Probabilistic Hierarchical Clustering Daniel Gatica-Perez, Alexander Loui, and Ming-Ting Sun.
Patch to the Future: Unsupervised Visual Prediction
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Steven Schoonover.  What is VerbNet?  Levin Classification  In-depth look at VerbNet  Evolution of VerbNet  What is FrameNet?  Applications.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Presented by Zeehasham Rasheed
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Semantic Video Classification Based on Subtitles and Domain Terminologies Polyxeni Katsiouli, Vassileios Tsetsos, Stathes Hadjiefthymiades P ervasive C.
Information Retrieval in Practice
ConceptNet 5 Jason Gaines 1. Overview What Is ConceptNet 5? History Structure Demonstration Questions Further Information 2.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM) Guo-Jun Qi Beckman Institute University of Illinois at Urbana-Champaign.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Human abilities Presented By Mahmoud Awadallah 1.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
Recognizing Activities of Daily Living from Sensor Data Henry Kautz Department of Computer Science University of Rochester.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
March 31, 1998NSF IDM 98, Group F1 Group F Multi-modal Issues, Systems and Applications.
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
Commonsense Reasoning in and over Natural Language Hugo Liu, Push Singh Media Laboratory of MIT The 8 th International Conference on Knowledge- Based Intelligent.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
2/10/2016Semantic Similarity1 Semantic Similarity Methods in WordNet and Their Application to Information Retrieval on the Web Giannis Varelas Epimenidis.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Statistical techniques for video analysis and searching chapter Anton Korotygin.
MULTIMEDIA DATA MODELS AND AUTHORING
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
1 Commonsense Reasoning in and over Natural Language Hugo Liu Push Singh Media Laboratory Massachusetts Institute of Technology Cambridge, MA 02139, USA.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Information Retrieval in Practice
WHAT IS A Process Map? Illustration of the sequence of activities, inputs and outputs to the activities, and relationship across multiple functions, or.
Synchronization for Multi-Perspective Videos in the Wild
Computer vision: models, learning and inference
Search Engine Architecture
Automatic Video Shot Detection from MPEG Bit Stream
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
Reading Report: Open QA Systems
Personalized Social Image Recommendation
Kiril Simov1, Alexander Popov1, Iliana Simova2, Petya Osenova1
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Dynamical Statistical Shape Priors for Level Set Based Tracking
HowToKB: Mining HowTo Knowledge from Online Communities
Social Knowledge Mining
CSC 594 Topics in AI – Applied Natural Language Processing
Information Retrieval
WebChild: Harvesting and Organizing Commonsense Knowledge from Web
Commonsense Knowledge Acquisition and Applications
Semantic Network & Knowledge Graph
Acquiring Comparative Commonsense Knowledge from the Web
Commonsense in Parts: Mining Part-Whole Relations from the Web and Image Tags Niket Tandon1, Charles Hariman1, Jacopo Urbani1,2, Anna.
Automatic Detection of Causal Relations for Question Answering
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Knowledge Representation for Natural Language Understanding
Ying Dai Faculty of software and information science,
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Aishwarya Singh 03/18/2019 Acquiring Comparative Commonsense Knowledge from the Web Niket Tandon, Gerard de Melo, Gerhard Weikum.
Statistical NLP: Lecture 10
Presentation transcript:

Knowlywood: Mining Activity Knowledge from Hollywood Narratives Niket Tandon (MPI Informatics, Saarbruecken) Gerard de Melo (IIIS, Tsinghua Univ) Abir De (IIT Kharagpur) Gerhard Weikum (MPI Informatics, Saarbruecken)

Legs, person, shoe, mountain, rope..

Legs, person, shoe, mountain, rope.. Rock climbing Going up a mountain/ hill Going up an elevation Daytime, outdoor activity What happens next?

Legs, person, shoe, mountain, rope.. Rock climbing Going up a mountain/ hill Going up an elevation Daytime, outdoor activity What happens next? Activity classes Activity groupings Activity hierarchy Additional information Temporal guidance

Go up an elevation .. Parent activity Previous activity Next activity {Climb up a mountain , Hike up a hill} Participants climber, boy, rope Location camp, forest, sea shore Time daylight, holiday Visuals Get to village .. Drink water ..

Activity commonsense: Related work Event mining Encyclopedic KBs: Factual e.g. bornOn Entity oriented e.g. Person Many KBs: e.g. Freebase

Activity commonsense: Related work Event mining Commonsense KB Encyclopedic KBs: Cyc: Factual e.g. bornOn Manual Entity oriented e.g. Person Limited size Many KBs: e.g. Freebase No focus on activities ConceptNet: Crowdsourced Limited size No semantic activity frames WebChild: No focus on activities

Activity commonsense: Related work Event mining Commonsense KB This talk Encyclopedic KBs: Cyc: Semantic Factual e.g. bornOn Manual Activity CSK Entity oriented e.g. Person Limited size KB construction Many KBs: e.g. Freebase No focus on activities ConceptNet: Crowdsourced Limited size No semantic activity frames WebChild: No focus on activities

Activity commonsense is hard: Go up an elevation .. Parent activity Previous activity Next activity {Climb up a mountain , Hike up a hill} Participants climber, boy, rope Location camp, forest, sea shore Time daylight, holiday .. Get to village .. Drink water .. Activity commonsense is hard: People hardly express the obvious : implicit and scarce Spread across multiple modalities : text, image, videos Non-factual : hence noisy

Contain events but not activity knowledge May contain activities but varying granularity and no visuals. No clear scene boundaries. Hollywood narratives are easily available and meet the desiderata align via subtitles with approximate dialogue similarity

TODO: do better on this slide.

shoot.4 patient. inanimate video.1 State of the art WSD customized for phrases Syntactic and semantic role semantics from VerbNet the man man.1 NP VP NP man.2 agent. animate shoot.vn.1 patient. animate began to shoot shoot.1 shoot.4 TODO: do better on this slide. agent. animate shoot.vn.3 patient. inanimate a video NP VP NP video.1

shoot.4 patient. inanimate video.1 State of the art WSD customized for phrases Syntactic and semantic role semantics from VerbNet Output Frame Agent: man.1 Action: shoot.4 Patient: video.1 the man man.1 NP VP NP man.2 agent. animate shoot.vn.1 patient. animate began to shoot shoot.1 shoot.4 TODO: do better on this slide. agent. animate shoot.vn.3 patient. inanimate a video NP VP NP video.1

xij = binary decision var. for word i, mapped to WN sense j IMS prior WN prior Word, VN match score Selectional restriction score xij = binary decision var. for word i, mapped to WN sense j One VN sense per verb WN, VN sense consistency Selectional restr. constraints binary decision

WordNet hypernymy : vi, vj and oi , oj + Attribute hypernymy Go up an elevation .. Climb up a mountain Participants climber, rope Location camp, forest Time daylight Hike up a hill Participants climber Location sea shore Time holiday Drink water .. Similarity: + Attribute overlap TODO: do better on this slide. Hypernymy: WordNet hypernymy : vi, vj and oi , oj + Attribute hypernymy Temporal: Generalized Sequence Pattern mining over statistics with gaps #(asynset1 precedes asynset2 ) / #(asynset1 ) #(asynset2 )

Probabilistic soft logic - refining Typeof (T), Similar (S) and Prev (P) edges

Tie the activity synsets Break cycles Resultant: DAG Go up an elevation .. Climb up a mountain Participating Agent climber, rope Location camp, forest Time daylight Hike up a hill Participating Agent climber Location sea shore Time holiday Drink water .. Tie the activity synsets Break cycles Resultant: DAG TODO: do better on this slide.

Recap Defined a new problem of automatic acquisition of semantically refined frames. Proposed a joint method that needs no labeled data.

Evaluation #Scenes is aggregated counts over Knowlywood Statistics Scenes 1,708,782 Activity synsets 505,788 Accuracy 0.85 ± 0.01 URL bit.ly/knowlywood #Scenes is aggregated counts over Moviescripts, TV serials, Sitcoms, Novels, Kitchen data. Evaluation: Manually sampled accuracy over the activity frames. TODO: describe the baseline

Evaluation: Baselines No direct competitor providing activity frames. KB Baseline: Our semantic frame (rule based) structure over the crowdsourced commonsense KB ConceptNet Methodology Baseline: A rule based frame detector over our data and other data using an open IE system ReVerb

KB Baseline Normalized domain: concept1 ~ verb [article] noun You open your wallet hasNextSubEvent take out money Normalized domain: concept1 ~ verb [article] noun Organize and canonicalized the relations as follows: ConceptNet 5’s relations We map it to IsA, InheritsFrom type Causes, ReceivesAction, RelatedTo, CapableOf, UsedFor agent HasPrerequisite, HasFirst/LastSubevent, HasSubevent, MotivatedByGoal prev/next SimilarTo, Synonym similarTo AtLocation, LocationOfAction, LocatedNear location

Methodology Baseline Reverb, an openIE tool extracts SVO triples from text S and O are only surface forms. V is not categorized into a relation. We use a Bayesian classifier to estimate the label of V The estimates come from MovieClips.com that provides 30K manually tagged popular movie scenes like, action: singing, prop: violin, setting: theater

Methodology Baseline Reverb, an openIE tool extracts SVO triples from text S and O are only surface forms. V is not categorized into a relation. We use a Bayesian classifier to estimate the label of V The estimates come from MovieClips.com that provides 30K manually tagged popular movie scenes like, action: singing, prop: violin, setting: theater

# activities Knowlywood ~1 M High accuracy & high coverage ConceptNet based ~ 5 K High accuracy & low coverage Reverb based ~ 0.3 M Low accuracy & high coverage Reverb clueweb ~ 0.8 M

Visual alignments ~30,000 Images from movies, and additionally, >1 Million images via Flickr tag matching: riding, road, bicycle .. Match verb-noun pairs from Knowlywood as ride bicycle Flickr Activity vector = road DOT Knowlywood = man, road ride a bicycle participant: man, boy location: road

External use case -1 : Semantic indexing Given: participant, location and time Predict: the activity Ground truth: Movieclip’s manually specified activity tag. Atleast one hit in Top 10 predictions Thank you! Browse at bit.ly/webchild

External use case 2: Movie Scene Search Method: A generative model encoding that a query holistically matches a scene if the participants and activity fit well with the query.

Thank you! Browse at bit.ly/webchild Conclusion Remains embedded in information: Implicit Assumed to be known, hence not easy to find: Scarce Spread across multiple modalities Unusual reported more than usual: Reporting Bias Culture specific, Location specific, true/false: Modality Thank you! Browse at bit.ly/webchild