Presentation is loading. Please wait.

Presentation is loading. Please wait.

Vector Space Information Retrieval Using Concept Projection Presented by Zhiguo Li

Similar presentations


Presentation on theme: "Vector Space Information Retrieval Using Concept Projection Presented by Zhiguo Li"— Presentation transcript:

1 Vector Space Information Retrieval Using Concept Projection Presented by Zhiguo Li zhli@paul.rutgers.edu

2 Text Documents 156,000 periodicals in print worldwide (1999), and approximately 12,000 added each year. (Ulrich’s International Periodical Directory) Library of Congress maintains a collection of 17 million books and receive new books at the rate of 7,000 per working day.

3 Why Dimensional Reduction? High dimensional term-by-document sparse matrix –Require large number of computer resource –Difficult to capture underlying concepts.

4 Desired Properties of Dimensional Reduction Preserve distances between vectors (Orthogonal projection matrix) Capture underlying concepts and bring together semantically related documents

5 Dimensional Reduction Latent Semantic Indexing (LSI) use SVD, computationally expensive Random projection is one way of solving the problem of LSI, but cannot capture underlying semantics as LSI How about use both random projection and LSI? It turns out that it does not have the desired property of LSI This paper: random projection of concept vectors, called “concept projection”. Much faster and retrieval efficiency comparable to LSI

6 Concept Projection Concept: spherical K-means Projection: Randomly chosen orthogonal projection matrix, distances between vectors are approximately preserved

7 Spherical K-means algorithm

8

9 Conclusion Combines the random projection with concept vectors to do the dimensional reduction, get faster retrieval and comparable results as LSI

10 Questions In the experiment of this paper, it only uses 1033 documents, too small data set When there are at least 500 clusters, the results will be good. That means every 2 document vectors will form a cluster!


Download ppt "Vector Space Information Retrieval Using Concept Projection Presented by Zhiguo Li"

Similar presentations


Ads by Google