Semantic (web) activity at Elsevier Marc Krellenstein VP, Search and Discovery Elsevier October 27, 2004
Thesaurus use at Elsevier Elsevier traditionally uses proprietary and standard thesauri for: – Indexing (tagging) articles, books and other materials – Browsing thesaurus-indexed content – Expanding searches against specialized content Overall, a net benefit, but not huge – Limiting a search by category – Clustering documents by category Better than limiting search up front…data-driven
Thesaurus use at Elsevier Elsevier does not currently use thesauri for concept searching – Lack of demonstrated superiority to date over current best practice full text search
Thesaurus use at Elsevier New thesaurus requirements and uses: – Integrated search of proprietary, public and/or local user content using multiple thesauri – Integrating chemical structure info with text documents – Integrating databases with diverse schemas – Supporting text mining – Other uses requested by our customers (e.g., extensibility for local content) – Improved thesaurus navigation – Improved search results
Approaches for new thesaurus uses Creating RDF-based intermediary ontology to map diverse thesauri – Support multiple relationships – Extensible by customers – Improved performance, scalability Experimenting with search options – Improving precision as well as recall Experimenting with visualization techniques (e.g., DOPE browser)
Text mining at Elsevier Consider text mining a now capable technology that will be essential for managing information overload and providing new insights Actively investigating uses and developing applications Can provide both substantive and ‘meta- research’ insights – Trends over time, distribution by author or institution, etc. View RDF as the eventual storage medium for extracted facts – Performance, maintainability, inferencing
To organisms?
Author teams In HIV research?
Indirect links from leukemia to Alzheimer’s via enzymes
Red – Product Pink – Reactant Green – Reagent Brown – Solvent …