1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.

1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda

2 Outline Introduction LSM Applications Conclusions

3 Introduction LSA in IR: –Words of queries and documents –Recall and precision Assumption: There is some underlying latent semantic structure in the data –Latent structure is conveyed by correlation patterns –Documents: bag-of-words model LSA improves separability among different topics

4 Introduction

5 Success of LSA: –Word clustering –Document clustering –Language modeling –Automated call routing –Semantic Inference for spoken interface control These solutions all leverage LSA’s ability to expose global relationships in context and meaning

6 Introduction Three unique factors for LSA: –The mapping of discrete entries –The dimensionality reduction –The intrinsically global outlook The change of terminology to latent semantic mapping (LSM) to convey increased reliance on the general properties

7 Latent Semantic Mapping LSA defines a mapping between the discrete sets –M: an inventory of M individual units, such as words –N: an collection of N meaningful compositions of units, such as documents –L: a continuous vector space –r i : unit in M –c j : composition in N

8 Feature Extraction Construction of a matrix W of co-occurrences between units and compositions The cell of W:

9 Feature Extraction The entropy of r i : Value of Entropy Close to 0 means that the unit is present only in a few specific compositions. The global weight is therefore a measure of the indexing power of the unit r i

10 Singular Value Decomposition The MxN unit-composition matrix W defines two vector representations for the units and the compositions r i : a row factor of dimension N c j : a column factor of dimension M Unpractical: –M,N can be extremely large –Vector r i, c j are typically sparse –Two spaces are distinct from each other

11 Singular Value Decomposition Employ SVD: U: MxR left singular matrix with row vectors u i S: RxR diagonal matrix of singular values V: NxR right singular matrix with row vector v j U, V are column-orthonormal – U T U=V T V=I R R<min(M, N)

12 Singular Value Decomposition

13 Singular Value Decomposition captures the major structural associations in and ignores higher order effects The closeness of vector in L: –Unit-unit comparison –Composition-composition comparison –Unit-Composition comparison

14 Closeness Measure WW T : co-occurrences between units W T W: co-occurrences between compositions r i, r j : units which have similar pattern of occurrence across the composition c i, c j : compositions which have similar pattern of occurrence across the unit

15 Closeness Measure Unit-Unit Comparisons: Cosine measure: Distance: [0, π]

16 Unit-Unit Comparisons

17 Closeness Measure Composition-Composition Comparisons: Cosine measure: Distance: [0, π]

18 Closeness Measure Unit-Composition Comparisons: Cosine measure: Distance: [0, π]

19 LSM Framework Extension Observe a new composition, p>N, the tilde symbol reflects the fact that the composition was not part of the original N, a column vector of dimension M, can be thought of as an additional column of the matrix W U, S do not change:

20 LSM Framework Extension : pseudo-composition : pseudo-composition vector If the addition of causes the major structural associations in W to shift in some substantial manner, the singular vectors will become inadequate.

21 LSM Framework Extension It would be necessary to re-compute SVD to find a proper representation for

22 Salient Characteristics of LSM A single vector embedding for both units and compositions in the same continuous vector space L A relatively low dimensionality, which make operations such as clustering meaningful and practical An underlying structure reflecting globally meaningful relationships, with natural similarity metrics to measure the distance between units, between compositions or between units and compositions in L

23 Applications Semantic classification Multi-span language modeling Junk e-mail filtering Pronunciation modeling TTS Unit Selection

24 Semantic Classification Semantic classification refers to determine which one of predefined topic a given document is most closely aligned with The centroid of each clusters can be viewed as the semantic representation of this outcome in LSM space –Semantic anchor A newly observed word sequence measures by computing the distance between the document and semantic anchor, and pick minimum

25 Semantic Classification Domain knowledge is automatically encapsulated in the LSM space in a data-driven fashion For Desktop interface control: –Semantic inference

26 Semantic Inference

27 Multi-Span Language Modeling In a standard n-gram, the history is string In LSM language modeling, the history is the current document up to word Pseudo-document: –Continually updated as q increases

28 Multi-Span Language Modeling An Integrated n-gram + LSM formulation for the overall language model probability: –Different syntactic constructs can be used to carry the same meaning (content words)

29 Multi-Span Language Modeling Assume that the probability of the document History given the current word is not affected by immediate context preceding it

30 Multi-Span Language Modeling

31 Junk E-mail Filtering It can be viewed as a degenerate case of semantic classification (two categories) –Legitimate –Junk M: an inventory of words, symbols N: a binary collection of email messages Two semantic anchors

32 Pronunciation Modeling Also called grapheme-to-phoneme conversion (GPC) Orthographic anchors –(one for each in-vocabulary word) Orthographic neighborhood –In-vocabulary word with High closeness for out- vocabulary word

33 Pronunciation Modeling

34 Conclusions Descriptive Power –Forgoing local constraints is not acceptable in some situations Domain Sensitivity –Depend on the quality of the training data –polysemy Updating the LSM Space –SVD on the fly is not practical Success of LSM for three characteristics

1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.

Similar presentations

Presentation on theme: "1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda.

Similar presentations

Presentation on theme: "1 Latent Semantic Mapping: Dimensionality Reduction via Globally Optimal Continuous Parameter Modeling Jerome R. Bellegarda."— Presentation transcript:

Similar presentations

About project

Feedback