A Compositional and Interpretable Semantic Space Alona Fyshe, Leila Wehbe, Partha Talukdar, Brian Murphy, and Tom Mitchell Carnegie Mellon University

A Compositional and Interpretable Semantic Space Alona Fyshe, Leila Wehbe, Partha Talukdar, Brian Murphy, and Tom Mitchell Carnegie Mellon University amfyshe@gmail.com 1

2 pear lettuce orange apple carrots VSMs and Composition

How to Make a VSM Count Dim. Reduction Corpus Statistics VSM 3 Many cols Few cols

4 pear lettuce orange apple carrots seedless orange VSMs and Composition

f (, ) = adjectivenounestimate observed 5 Stats for seedless Stats for orange Observed stats for “seedless orange”

Previous Work What is “f”? (Mitchell & Lapata, 2010; Baroni and Zamparelli, 2010; Blacoe and Lapata, 2012; Socher et al., 2012; Dinu et al., 2013; Hermann & Blunsom, 2013) Which VSMs are best for composition? (Turney, 2012, 2013; Fyshe et al., 2013; Baroni et al., 2014) 6

Our Contributions Can we learn a VSM that – is aware of composition function? – is interpretable? F F Is edible 7

How to make a VSM Corpus – 16 billion words – 50 million documents Count dependencies arcs in sentences MALT dependency parser Point-wise Positive Mutual Information 8

Matrix Factorization in VSMs X A D ≈ Corpus Stats (c) Words 9 VSM

Interpretability 10 A Latent Dims Words

Interpretability 11 SVD (Fyshe 2013) – well, long, if, year, watch – plan, engine, e, rock, very – get, no, features, music, via Word2vec (pretrained on Google News) – pleasantries, draft_picks, chairman_Harley_Hotchkiss, windstorm, Vermont_Yankee – Programme_Producers_AMPTPP, ###/mt, Al_Mehwar, NCWS, Whereas – Ubiquitous_Sensor_Networks, KTO, discussing, Hibernia_Terra_Nova, NASDAQ_ENWV

Non-Negative Sparse Embeddings 12 X A D ≈ (Murphy 2012)

Interpretability 13 SVD – well, long, if, year, watch – plan, engine, e, rock, very – get, no, features, music, via NNSE – inhibitor, inhibitors, antagonists, receptors, inhibition – bristol, thames, southampton, brighton, poole – delhi, india, bombay, chennai, madras

A Composition-aware VSM 14

Modeling Composition Rows of X are words – Can also be phrases X A Phrases 15 Adjectives Nouns Adjectives Nouns

Modeling Composition Additional constraint for composition A Phrases Adjectives w1w1 w2w2 p p = [w 1 w 2 ] 16 Nouns

Weighted Addition 17

Modeling Composition 18

Modeling Composition Reformulate loss with square matrix B 19 AB αβ adj. col. noun col. phrase col

Modeling Composition 20

Optimization Online Dictionary Learning Algorithm (Mairal 2010) Solve for D with gradient descent Solve for A with ADMM – Alternating Direction Method of Multipliers 21

Testing Composition W. add W. NNSE CNNSE 22 A w1w1 w2w2 p SVD w1w1 w2w2 p A w1w1 w2w2 p

Phrase Estimation Predict phrase vector Sort test phrases by distance to estimate Rank (r/N*100) Reciprocal rank (1/r) Percent Perfect (δ(r==1)) r 23 N

Phrase Estimation Chance 50 ~ 0.05 1% 24

Interpretable Dimensions 25

Interpretability 26

Testing Interpretability SVD NNSE CNNSE 27 A w1w1 w2w2 p SVD w1w1 w2w2 p A w1w1 w2w2 p

Interpretability Select the word that does not belong: crunchy gooey fluffy crispy colt creamy 28

Interpretability 29

Phrase Representations 30 A phrase top scoring words/phrases top scoring dimension

Phrase Representations Choose list of words/phrases most associated with target phrase “digital computers” aesthetic, American music, architectural style cellphones, laptops, monitors both neither 31

Phrase Representation 32

Testing Phrase Similarity 108 adjective-noun phrase pairs Human judgments of similarity [1…7] E.g. Important part : significant role (very similar) Northern region : early age (not similar) 33 (Mitchell & Lapata 2010)

Correlation of Distances 34 Behavioral Data Model A Model B

Testing Phrase Similarity 35

Interpretability 36

Better than Correlation: Interpretability 37 http://www.cs.cmu.edu/~afyshe/thesis/cnnse_mitchell_lapata_all.html (behav sim score 6.33/7)

Better than Correlation: Interpretability 38 http://www.cs.cmu.edu/~afyshe/thesis/cnnse_mitchell_lapata_all.html (behav sim score 5.61/7)

Summary Composition awareness improves VSMs – Closer to behavioral measure of phrase similarity – Better phrase representations Interpretable dimensions – Helps to debug composition failures 39

Thanks! www.cs.cmu.edu/~fmri/papers/naacl2015/ amfyshe@gmail.com 40

A Compositional and Interpretable Semantic Space Alona Fyshe, Leila Wehbe, Partha Talukdar, Brian Murphy, and Tom Mitchell Carnegie Mellon University

Similar presentations

Presentation on theme: "A Compositional and Interpretable Semantic Space Alona Fyshe, Leila Wehbe, Partha Talukdar, Brian Murphy, and Tom Mitchell Carnegie Mellon University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A Compositional and Interpretable Semantic Space Alona Fyshe, Leila Wehbe, Partha Talukdar, Brian Murphy, and Tom Mitchell Carnegie Mellon University

Similar presentations

Presentation on theme: "A Compositional and Interpretable Semantic Space Alona Fyshe, Leila Wehbe, Partha Talukdar, Brian Murphy, and Tom Mitchell Carnegie Mellon University"— Presentation transcript:

Similar presentations

About project

Feedback