Download presentation
Presentation is loading. Please wait.
Published byBridget Copeland Modified over 9 years ago
1
THE ABSTRACT OBJECT RELATIONSHIP BROWSER (absORB) COS 333 Project Demo Thursday, May 7th, 2009 Laura Bai ’10 Natasha Indik ’10 Ryan Bayer ’09 Tsheko Mutungu ’09
2
We lack user-friendly interfaces for concept-based search: Concept-based Information Retrieval
3
Topic-based information retrieval is attractive: uses term co-occurrence patterns to decompose electronic corpora into sets of topics topics reveal multiple meanings of given terms in some cases, topics can handle synonymity don’t need to know exactly what you’re looking for (easier to browse through a general set of topics than a set of documents) Concept-based Information Retrieval
4
A good interface should a) Capture the multi-dimensionality of the relationships in the data. terms topics documents topics documents documents topics topics b) Convey differences in relevance among the set of topics/documents that match a query c) Be navigable and intuitive for general users Interface Requirements
5
100-topic model of the Journal Science. (Blei & Lafferty) Example Topic Model Interface
6
100-topic model of the Journal Science. (Blei & Lafferty) Example Topic Model Interface
7
This interface is browsable, but it isn’t searchable. And the front page doesn’t directly show us which topics are related to one another. Capturing Relationships
8
With super-category districts of similar topics and topics of similar documents. This gives the user: a) A coherent overview of the topic space. b) Relevance reporting by snapping directly to districts that contain the highest scoring topics and color coding topics accordingly. Our Solution: Represent Topic Space as City
9
Anyone with a university MySQL test database and some flat file of documents for which they want to build a search engine or interface. The first step is to register for an account.register for an account Who can use our solution?
10
Anyone with a university MySQL test database and some flat file of documents for which they want to build a search engine or interface. The first step is to register for an account.register for an account The next step is to upload some data.upload some data Who can use our solution?
11
Anyone with a university MySQL test database and some flat file of documents for which they want to build a search engine or interface. The first step is to register for an account.register for an account The next step is to upload some data.upload some data Who can use our solution?
12
Anyone with a university MySQL test database and some flat file of documents for which they want to build a search engine or interface. The first step is to register for an account.register for an account The next step is to upload some data.upload some data This prompts our system to schedule your data for topic modeling using the LDA algorithm from Blei et al. Who can use our solution?
13
Anyone with a university MySQL test database and some flat file of documents for which they want to build a search engine or interface. The first step is to register for an account.register for an account The next step is to upload some data.upload some data This prompts our system to schedule your data for topic modeling using the LDA algorithm from Blei et al. Who can use our solution?
14
Anyone with a university MySQL test database and some flat file of documents for which they want to build a search engine or interface. The first step is to register for an account.register for an account The next step is to upload some data.upload some data This prompts our system to schedule your data for topic modeling using the LDA algorithm from Blei et al. Who can use our solution?
15
The output from the LDA algorithm is: a lexicon of terms from the document set a file showing term topic relationship scores a file showing document topic relationship scores a file showing document document relationship scores Extracting Super-Category Districts
16
To determine topic super-categories, we wrote a C++ program that: parses the term topic relationship score file to generate topic-specific term vectors uses a centroid clustering heuristic to decompose the associated topic-term matrix into districts Our particular solution clusters uses a cluster centroid similarity scoring function that integrates similarity (significance of correlation) with balance (similarity of size) to determine which pairs of clusters to merge. In practice, our solution works really fast and really well, producing coherent districts that don’t vary so much in size. Extracting Super-Category Districts
17
The districting program outputs: a file containing district topic pairs district term relationship scores a file containing per-district topic interactions; this is an input file for the city map generator The limitation of this solution is that districts are completely disjoint: have no overlap in topic members. The upside is that we can easily relate districts to one another. Extracting Super-Category Districts
18
The district-specific topic interactions file is passed to a Python script that builds a force-directed graph of the district and then imposes a rectangular grid-structure on it to simulate city blocks. Generating district maps
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.