Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Compositional and Interpretable Semantic Space Alona Fyshe, Leila Wehbe, Partha Talukdar, Brian Murphy, and Tom Mitchell Carnegie Mellon University

Similar presentations

Presentation on theme: "A Compositional and Interpretable Semantic Space Alona Fyshe, Leila Wehbe, Partha Talukdar, Brian Murphy, and Tom Mitchell Carnegie Mellon University"— Presentation transcript:

1 A Compositional and Interpretable Semantic Space Alona Fyshe, Leila Wehbe, Partha Talukdar, Brian Murphy, and Tom Mitchell Carnegie Mellon University 1

2 2 pear lettuce orange apple carrots VSMs and Composition

3 How to Make a VSM Count Dim. Reduction Corpus Statistics VSM 3 Many cols Few cols

4 4 pear lettuce orange apple carrots seedless orange VSMs and Composition

5 f (, ) = adjectivenounestimate observed 5 Stats for seedless Stats for orange Observed stats for “seedless orange”

6 Previous Work What is “f”? (Mitchell & Lapata, 2010; Baroni and Zamparelli, 2010; Blacoe and Lapata, 2012; Socher et al., 2012; Dinu et al., 2013; Hermann & Blunsom, 2013) Which VSMs are best for composition? (Turney, 2012, 2013; Fyshe et al., 2013; Baroni et al., 2014) 6

7 Our Contributions Can we learn a VSM that – is aware of composition function? – is interpretable? F F Is edible 7

8 How to make a VSM Corpus – 16 billion words – 50 million documents Count dependencies arcs in sentences MALT dependency parser Point-wise Positive Mutual Information 8

9 Matrix Factorization in VSMs X A D ≈ Corpus Stats (c) Words 9 VSM

10 Interpretability 10 A Latent Dims Words

11 Interpretability 11 SVD (Fyshe 2013) – well, long, if, year, watch – plan, engine, e, rock, very – get, no, features, music, via Word2vec (pretrained on Google News) – pleasantries, draft_picks, chairman_Harley_Hotchkiss, windstorm, Vermont_Yankee – Programme_Producers_AMPTPP, ###/mt, Al_Mehwar, NCWS, Whereas – Ubiquitous_Sensor_Networks, KTO, discussing, Hibernia_Terra_Nova, NASDAQ_ENWV

12 Non-Negative Sparse Embeddings 12 X A D ≈ (Murphy 2012)

13 Interpretability 13 SVD – well, long, if, year, watch – plan, engine, e, rock, very – get, no, features, music, via NNSE – inhibitor, inhibitors, antagonists, receptors, inhibition – bristol, thames, southampton, brighton, poole – delhi, india, bombay, chennai, madras

14 A Composition-aware VSM 14

15 Modeling Composition Rows of X are words – Can also be phrases X A Phrases 15 Adjectives Nouns Adjectives Nouns

16 Modeling Composition Additional constraint for composition A Phrases Adjectives w1w1 w2w2 p p = [w 1 w 2 ] 16 Nouns

17 Weighted Addition 17

18 Modeling Composition 18

19 Modeling Composition Reformulate loss with square matrix B 19 AB αβ adj. col. noun col. phrase col

20 Modeling Composition 20

21 Optimization Online Dictionary Learning Algorithm (Mairal 2010) Solve for D with gradient descent Solve for A with ADMM – Alternating Direction Method of Multipliers 21

22 Testing Composition W. add W. NNSE CNNSE 22 A w1w1 w2w2 p SVD w1w1 w2w2 p A w1w1 w2w2 p

23 Phrase Estimation Predict phrase vector Sort test phrases by distance to estimate Rank (r/N*100) Reciprocal rank (1/r) Percent Perfect (δ(r==1)) r 23 N

24 Phrase Estimation Chance 50 ~ 0.05 1% 24

25 Interpretable Dimensions 25

26 Interpretability 26

27 Testing Interpretability SVD NNSE CNNSE 27 A w1w1 w2w2 p SVD w1w1 w2w2 p A w1w1 w2w2 p

28 Interpretability Select the word that does not belong: crunchy gooey fluffy crispy colt creamy 28

29 Interpretability 29

30 Phrase Representations 30 A phrase top scoring words/phrases top scoring dimension

31 Phrase Representations Choose list of words/phrases most associated with target phrase “digital computers” aesthetic, American music, architectural style cellphones, laptops, monitors both neither 31

32 Phrase Representation 32

33 Testing Phrase Similarity 108 adjective-noun phrase pairs Human judgments of similarity [1…7] E.g. Important part : significant role (very similar) Northern region : early age (not similar) 33 (Mitchell & Lapata 2010)

34 Correlation of Distances 34 Behavioral Data Model A Model B

35 Testing Phrase Similarity 35

36 Interpretability 36

37 Better than Correlation: Interpretability 37 (behav sim score 6.33/7)

38 Better than Correlation: Interpretability 38 (behav sim score 5.61/7)

39 Summary Composition awareness improves VSMs – Closer to behavioral measure of phrase similarity – Better phrase representations Interpretable dimensions – Helps to debug composition failures 39

40 Thanks! 40

Download ppt "A Compositional and Interpretable Semantic Space Alona Fyshe, Leila Wehbe, Partha Talukdar, Brian Murphy, and Tom Mitchell Carnegie Mellon University"

Similar presentations

Ads by Google