Download presentation
Presentation is loading. Please wait.
Published byAllen Robinson Modified over 9 years ago
1
Calculating cosine for two vectors 1 Given two vectors and : 1 2 x2x2 x1x1 y1y1 y2y2 By using formula [2], we can write: Since and, and using [1]: By using the sine and cosine definitions: Trigonometric formulas: [1] [2] Pythagorean theorem:
2
Cosine Similarity for documents 2 By representing documents with term weights (e.g., tf-idf) as coordinates : We can regard a query q as a document d q and use the same formula: If we represent any two given documents d j and d k, as vectors, then their similarity is:
3
Normalization Cosine similarity is a normalized metric, because its values fall in [0,1]. Normalization avoids score dependency from document/vector length. We can rewrite the similarity formula: 3
4
Calculation Example 4 PaPSaSWH affection1155820 jealous10711 gossip206 Given tf weights for three books: Calculate normalized weights: PaPSaSWH affection0.9960.9930.847 jealous0.0870.1200.466 gossip0.01700.254 Fill in: 0.466 = ___________________ Calculate similarity between documents: sim(PaP, SaS) = 0.996 * 0.993 + 0.087 * 0.120 + 0.017 * 0 = 0.999 Find: Sim(PaP, WH) = _________________ If q = ‘jealous gossip’, which document is the best match?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.