15-826: Multimedia Databases and Data Mining

Slides:



Advertisements
Similar presentations
CMU SCS : Multimedia Databases and Data Mining Lecture #17: Text - part IV (LSI) C. Faloutsos.
Advertisements

CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #14: Text – Part I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #4: Multi-key and Spatial Access Methods - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
A Vector Space Model for Automatic Indexing
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals - case studies - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#2: Primary key indexing – B-trees Christos Faloutsos - CMU
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Temple University – CIS Dept. CIS616– Principles of Data Management V. Megalooikonomou Text Databases (some slides are based on notes by C. Faloutsos)
Lecture 11 Search, Corpora Characteristics, & Lucene Introduction.
15-826: Multimedia Databases and Data Mining
INF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING Crista Lopes.
CMU SCS : Multimedia Databases and Data Mining Lecture #16: Text - part III: Vector space model and clustering C. Faloutsos.
Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI) Jasminka Dobša Faculty of organization and informatics,
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part II (more case studies) C. Faloutsos.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Text Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Data Mining Multimedia Databases Text databases Image and video.
Multimedia and Text Indexing. Multimedia Data Management The need to query and analyze vast amounts of multimedia data (i.e., images, sound tracks, video.
An Introduction to Latent Semantic Analysis
Vector Space Information Retrieval Using Concept Projection Presented by Zhiguo Li
Singular Value Decomposition in Text Mining Ram Akella University of California Berkeley Silicon Valley Center/SC Lecture 4b February 9, 2011.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Multimedia Databases Text II. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video.
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
10-603/15-826A: Multimedia Databases and Data Mining Text - part II C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos.
Chapter 5: Information Retrieval and Web Search
INF 141 COURSE SUMMARY Crista Lopes. Lecture Objective Know what you know.
Document retrieval Similarity –Vector space model –Multi dimension Search –Range query –KNN query Query processing example.
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
June 5, 2006University of Trento1 Latent Semantic Indexing for the Routing Problem Doctorate course “Web Information Retrieval” PhD Student Irina Veredina.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Text Databases (some slides.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Tao Liu, Zheng Chen, Benyu Zhang, Wei-ying Ma, Gongyi Wu 2004.ICDM. Improving Text.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
Best pTree organization? level-1 gives te, tf (term level)
15-826: Multimedia Databases and Data Mining
Data and Applications Security Developments and Directions
15-826: Multimedia Databases and Data Mining
Multimedia and Text Indexing
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Multimedia and Text Indexing
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Multimedia Information Retrieval
CSE 635 Multimedia Information Retrieval
Chapter 5: Information Retrieval and Web Search
Ying Dai Faculty of software and information science,
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Yet another Example T This happens to be a rank-7 matrix
15-826: Multimedia Databases and Data Mining
Latent Semantic Analysis
Presentation transcript:

15-826: Multimedia Databases and Data Mining C. Faloutsos 15-826 15-826: Multimedia Databases and Data Mining Lecture #17: Text - part IV (LSI) C. Faloutsos

Copyright: C. Faloutsos (2016) 15-826 Must-read Material Foltz, P. W. and S. T. Dumais (Dec. 1992). "Personalized Information Delivery: An Analysis of Information Filtering Methods." Comm. of ACM (CACM) 35(12): 51-60. 15-826 Copyright: C. Faloutsos (2016)

Copyright: C. Faloutsos (2016) 15-826 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining 15-826 Copyright: C. Faloutsos (2016)

Indexing - Detailed outline primary key indexing secondary key / multi-key indexing spatial access methods fractals text SVD: a powerful tool multimedia ... 15-826 Copyright: C. Faloutsos (2016)

Text - Detailed outline problem full text scanning inversion signature files clustering information filtering and LSI 15-826 Copyright: C. Faloutsos (2016)

Copyright: C. Faloutsos (2016) LSI - Detailed outline LSI problem definition main idea experiments 15-826 Copyright: C. Faloutsos (2016)

Information Filtering + LSI [Foltz+,’92] Goal: users specify interests (= keywords) system alerts them, on suitable news-documents Major contribution: LSI = Latent Semantic Indexing latent (‘hidden’) concepts 15-826 Copyright: C. Faloutsos (2016)

Information Filtering + LSI Main idea map each document into some ‘concepts’ map each term into some ‘concepts’ ‘Concept’:~ a set of terms, with weights, e.g. “data” (0.8), “system” (0.5), “retrieval” (0.6) -> DBMS_concept 15-826 Copyright: C. Faloutsos (2016)

Information Filtering + LSI Pictorially: term-document matrix (BEFORE) 15-826 Copyright: C. Faloutsos (2016)

Information Filtering + LSI Pictorially: concept-document matrix and... 15-826 Copyright: C. Faloutsos (2016)

Information Filtering + LSI ... and concept-term matrix 15-826 Copyright: C. Faloutsos (2016)

Information Filtering + LSI Q: How to search, eg., for ‘system’? 15-826 Copyright: C. Faloutsos (2016)

Information Filtering + LSI A: find the corresponding concept(s); and the corresponding documents 15-826 Copyright: C. Faloutsos (2016)

Information Filtering + LSI A: find the corresponding concept(s); and the corresponding documents 15-826 Copyright: C. Faloutsos (2016)

Information Filtering + LSI Thus it works like an (automatically constructed) thesaurus: we may retrieve documents that DON’T have the term ‘system’, but they contain almost everything else (‘data’, ‘retrieval’) 15-826 Copyright: C. Faloutsos (2016)

Copyright: C. Faloutsos (2016) LSI - Detailed outline LSI problem definition main idea experiments 15-826 Copyright: C. Faloutsos (2016)

Copyright: C. Faloutsos (2016) LSI - Experiments 150 Tech Memos (TM) / month 34 users submitted ‘profiles’ (6-66 words per profile) 100-300 concepts 15-826 Copyright: C. Faloutsos (2016)

Copyright: C. Faloutsos (2016) LSI - Experiments four methods, cross-product of: vector-space or LSI, for similarity scoring keywords or document-sample, for profile specification measured: precision/recall (‘data’, ‘retrieval’…) (concept1, concept2…) X data mining … 15-826 Copyright: C. Faloutsos (2016)

Copyright: C. Faloutsos (2016) LSI - Experiments Q: Who wins? precision (0.25,0.65) (0.50,0.45) (0.75,0.30) recall 15-826 Copyright: C. Faloutsos (2016)

Copyright: C. Faloutsos (2016) LSI - Experiments LSI, with document-based profiles, were better precision (0.25,0.65) (0.50,0.45) (‘data’, …) (concept1, …) X data mining … (0.75,0.30) recall 15-826 Copyright: C. Faloutsos (2016)

LSI - Discussion - Conclusions Great idea, to derive ‘concepts’ from documents to build a ‘statistical thesaurus’ automatically to reduce dimensionality Often leads to better precision/recall but: Needs ‘training’ set of documents ‘concept’ vectors are not sparse anymore 15-826 Copyright: C. Faloutsos (2016)

LSI - Discussion - Conclusions Observations Bellcore (-> Telcordia) has a patent used for multi-lingual retrieval How exactly SVD works? (Details, next) ?? 15-826 Copyright: C. Faloutsos (2016)