THE ABSTRACT OBJECT RELATIONSHIP BROWSER (absORB) COS 333 Project Demo Thursday, May 7th, 2009 Laura Bai ’10 Natasha Indik ’10 Ryan Bayer ’09 Tsheko Mutungu.

Slides:



Advertisements
Similar presentations
Introduction to Information Retrieval
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Multi-Model Digital Video Library Professor: Michael Lyu Member: Jacky Ma Joan Chung Multi-Model Digital Video Library LYU9904 Multi-Model Digital Video.
To See, or Not to See—Is That the Query? Robert R. Korfhage Dept. of Information Science University of Pittsburgh 1991 Reviewed by Yi-Bu Chen LIS 551 Information.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Information Retrieval in Practice
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Interactive Generation of Integrated Schemas Laura Chiticariu et al. Presented by: Meher Talat Shaikh.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Interfaces for Selecting and Understanding Collections.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Relevance Feedback based on Parameter Estimation of Target Distribution K. C. Sia and Irwin King Department of Computer Science & Engineering The Chinese.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Tuple – InfoVis Publication Browser CS533 Project Presentation by Alex Gukov.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Creating and Visualizing Document Classification J. Gelernter, D. Cao, R. Lu, E. Fink, J. Carbonell.
Content-Based Image Retrieval using the EMD algorithm Igal Ioffe George Leifman Supervisor: Doron Shaked Winter-Spring 2000 Technion - Israel Institute.
Overview of Search Engines
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Xpantrac connection with IDEAL Sloane Neidig, Samantha Johnson, David Cabrera, Erika Hoffman CS /6/2014.
Navigating and Browsing 3D Models in 3DLIB Hesham Anan, Kurt Maly, Mohammad Zubair Computer Science Dept. Old Dominion University, Norfolk, VA, (anan,
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
 Clustering of Web Documents Jinfeng Chen. Zhong Su, Qiang Yang, HongHiang Zhang, Xiaowei Xu and Yuhen Hu, Correlation- based Document Clustering using.
About Dynamic Sites (Front End / Back End Implementations) by Janssen & Associates Affordable Website Solutions for Individuals and Small Businesses.
Automatic Subject Classification and Topic Specific Search Engines -- Research at KnowLib Anders Ardö and Koraljka Golub DELOS Workshop, Lund, 23 June.
CONCLUSION & FUTURE WORK Normally, users perform triage tasks using multiple applications in concert: a search engine interface presents lists of potentially.
AuthorLink: Instant Author Co-Citation Mapping for Online Searching Xia Lin Howard D. White Jan Buzydlowski Drexel University Philadelphia,
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
NCSU Libraries Andrew Pace & Emily Lynema NCSU Libraries May 24, 2006.
Kohonen Mapping and Text Semantics Xia Lin College of Information Science and Technology Drexel University.
22 nd January 2004 UITV 2004 NewsBoy: an interactive news retrieval system Joemon M Jose The Information Retrieval Group Department of Computing Science.
生物資訊程式語言應用 Part 5 Perl and MySQL Applications. Outline  Application one.  How to get related literature from PubMed?  To store search results in database.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
University of Malta CSA3080: Lecture 4 © Chris Staff 1 of 14 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Document Collections cs5984: Information Visualization Chris North.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Ames Community Schools (ACS) has been concerned with the performance of their students’ problem solving abilities on a nationally standardized exam. While.
SINGULAR VALUE DECOMPOSITION (SVD)
Information Retrieval Lecture 6 Introduction to Information Retrieval (Manning et al. 2007) Chapter 16 For the MSc Computer Science Programme Dell Zhang.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Sharad Oberoi Carnegie Mellon University DesignWebs: Learning in Engineering Project Teams.
Building a Topic Map Repository Xia Lin Drexel University Philadelphia, PA Jian Qin Syracuse University Syracuse, NY * Presented at Knowledge Technologies.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Clustering More than Two Million Biomedical Publications Comparing the Accuracies of Nine Text-Based Similarity Approaches Boyack et al. (2011). PLoS ONE.
Measuring How Good Your Search Engine Is. *. Information System Evaluation l Before 1993 evaluations were done using a few small, well-known corpora of.
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
Information Visualization, Human-Computer Interaction, and Cognitive Psychology: Domain Visualizations Kevin W. Boyack Sandia National Laboratories.
Rolando Gaytan Clay Schumacher Josh Weisskopf Cory Simon Aaron Steil (Reiman Gardens) – Client Dr. Tien Nguyen - Advisor.
Concept-based P2P Search How to find more relevant documents Ingmar Weber Max-Planck-Institute for Computer Science Joint work with Holger Bast Torino,
Knowledge based Question Answering System Anurag Gautam Harshit Maheshwari.
The basics of knowing the difference CLIENT VS. SERVER.
ESG-CET Meeting, Boulder, CO, April 2008 Gateway Implementation 4/30/2008.
A code-centric cluster-based approach for searching online support forums for programmers Christopher Scaffidi, Christopher Chambers, Sheela Surisetty.
Query by Image and Video Content: The QBIC System M. Flickner et al. IEEE Computer Special Issue on Content-Based Retrieval Vol. 28, No. 9, September 1995.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Document Clustering for Natural Language Dialogue-based IR (Google for the Blind) Antoine Raux IR Seminar and Lab Fall 2003 Initial Presentation.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Information Retrieval in Practice
Search Engine Architecture
Proposal for Term Project
Discovering User Access Patterns on the World-Wide Web
Personalized Social Image Recommendation
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
Visualizing Document Collections
Multimedia Information Retrieval
Project Team Information
Presentation transcript:

THE ABSTRACT OBJECT RELATIONSHIP BROWSER (absORB) COS 333 Project Demo Thursday, May 7th, 2009 Laura Bai ’10 Natasha Indik ’10 Ryan Bayer ’09 Tsheko Mutungu ’09

We lack user-friendly interfaces for concept-based search: Concept-based Information Retrieval

Topic-based information retrieval is attractive:  uses term co-occurrence patterns to decompose electronic corpora into sets of topics  topics reveal multiple meanings of given terms  in some cases, topics can handle synonymity  don’t need to know exactly what you’re looking for (easier to browse through a general set of topics than a set of documents) Concept-based Information Retrieval

A good interface should a) Capture the multi-dimensionality of the relationships in the data.  terms  topics  documents  topics  documents  documents  topics  topics b) Convey differences in relevance among the set of topics/documents that match a query c) Be navigable and intuitive for general users Interface Requirements

100-topic model of the Journal Science. (Blei & Lafferty) Example Topic Model Interface

100-topic model of the Journal Science. (Blei & Lafferty) Example Topic Model Interface

This interface is browsable, but it isn’t searchable. And the front page doesn’t directly show us which topics are related to one another. Capturing Relationships

With super-category districts of similar topics and topics of similar documents. This gives the user: a) A coherent overview of the topic space. b) Relevance reporting by snapping directly to districts that contain the highest scoring topics and color coding topics accordingly. Our Solution: Represent Topic Space as City

Anyone with a university MySQL test database and some flat file of documents for which they want to build a search engine or interface. The first step is to register for an account.register for an account Who can use our solution?

Anyone with a university MySQL test database and some flat file of documents for which they want to build a search engine or interface. The first step is to register for an account.register for an account The next step is to upload some data.upload some data Who can use our solution?

Anyone with a university MySQL test database and some flat file of documents for which they want to build a search engine or interface. The first step is to register for an account.register for an account The next step is to upload some data.upload some data Who can use our solution?

Anyone with a university MySQL test database and some flat file of documents for which they want to build a search engine or interface. The first step is to register for an account.register for an account The next step is to upload some data.upload some data This prompts our system to schedule your data for topic modeling using the LDA algorithm from Blei et al. Who can use our solution?

Anyone with a university MySQL test database and some flat file of documents for which they want to build a search engine or interface. The first step is to register for an account.register for an account The next step is to upload some data.upload some data This prompts our system to schedule your data for topic modeling using the LDA algorithm from Blei et al. Who can use our solution?

Anyone with a university MySQL test database and some flat file of documents for which they want to build a search engine or interface. The first step is to register for an account.register for an account The next step is to upload some data.upload some data This prompts our system to schedule your data for topic modeling using the LDA algorithm from Blei et al. Who can use our solution?

The output from the LDA algorithm is:  a lexicon of terms from the document set  a file showing term  topic relationship scores  a file showing document  topic relationship scores  a file showing document  document relationship scores Extracting Super-Category Districts

To determine topic super-categories, we wrote a C++ program that:  parses the term  topic relationship score file to generate topic-specific term vectors  uses a centroid clustering heuristic to decompose the associated topic-term matrix into districts  Our particular solution clusters uses a cluster centroid similarity scoring function that integrates similarity (significance of correlation) with balance (similarity of size) to determine which pairs of clusters to merge.  In practice, our solution works really fast and really well, producing coherent districts that don’t vary so much in size. Extracting Super-Category Districts

The districting program outputs:  a file containing district  topic pairs  district  term relationship scores  a file containing per-district topic interactions; this is an input file for the city map generator The limitation of this solution is that districts are completely disjoint: have no overlap in topic members. The upside is that we can easily relate districts to one another. Extracting Super-Category Districts

The district-specific topic interactions file is passed to a Python script that builds a force-directed graph of the district and then imposes a rectangular grid-structure on it to simulate city blocks. Generating district maps