Personalized Recommendation of Related Content Based on Automatic Metadata Extraction Andreas Nauerz 1, Fedor Bakalov 2, Birgitta König-Ries 2, Martin Welsch 1 1 IBM Deutschland Research & Development GmbH, Germany 2 Friedrich Schiller University of Jena, Germany
Outline Motivation and aims Basic recommender system Architectural extensions Domain model Task model User model Personalization model Service registry Calais service integration Conclusion
Motivation In order to take right decisions, users need access to additional sources of background information and related content. The required additional information might be stored in various places, e.g. wikies, financial databases, company directories, etc. Company Profile Stock Quotes Experts In order to access these different pieces of information, the user has to launch new browser windows and direct them to appropriate resources.
Aims Automatically augment portal documents with recommendations to background information and related content Stock Quotes Company X Company Y Stock Market Technology Company Profile Experts
Basic Recommender System Portal Layer – aggregation of portlets using filter chains
Basic Recommender System Analysis Layer – named entity extraction using UIMA framework
Basic Recommender System Semantic Tagging Layer – wrapping the extracted entities into semantic tags
Basic Recommender System Recommendation Layer – generating a list of references to the similarly annotated information pieces
Basic Recommender System Service Integration Layer – mapping the tagged entities to the corresponding external service
Basic Recommender System Presentation Layer – invocation of the selected external services
Limitations Large number of irrelevant recommendations Hardcoded binding of information types to sources of related content Huge amount of work required to develop analysis engines
Architectural Extensions Generation of user-specific recommendations Mechanism for flexible mapping of information types to information sources Harnessing external unstructured information analysis engines
Extended Architecture
Finance Domain Model Defines general and finance- related concepts Reuses concepts from LSDIS Finance Ontology and XBRL Ontology Grounded on the Proton Upper Level Module Defines fine-grained categorization of industry sectors (partially based on the Yahoo Taxonomy) Represented as an OWL ontology
Task Model Defines information- gathering actions that users might want to take on the portal Two types of actions: generic actions – can be used across different domains, e.g. GetEncyclopediaArticle domain-specific actions – applicable only in a specific domain, e.g. GetStockQuotes Actions are represented as ontological concepts and described by their input and output parameters
User Model Reflects various user features Static part: Date of birth Gender Mother tongue Dynamic part: Interests Expertise Represented as an overlay model Domain Model Overlay User Model
Representation of User Interests and Expertise Numerator – number of occurrences of concept i for user j Denominator – total number of occurrences of all concepts registered for user j
Personalization Model Specifies personalization rules that govern what content is provided to the user Personalization rule is represented in the ECA form: on (event) if (condition) then (actions) Event denotes a situation when the user encounters a certain concept in the text Condition is a combination of user features and context descriptors Actions define the information gathering actions that should be delivered to the user if the event occurs
Multidimensional Representation of the Personalization Model User Interests Document Concepts User Expertise
Intersection of Dimensions GetEncyclopediaArticleGetCompanyWebsiteGetNews Bank Banking: interested Banking: novice Document Concepts User Interests User Expertise
Service Registry Central database for storing information about internal and external services The registry maps each action from the Task Model to the service that “does” the action e.g. getEncyclopdediaArticle -> Wikipedia Services are provided with WSDL description
Calais Service Ingests unstructured text and returns semantically annotated document in RDF format Supports extraction of business entities, events, and facts Entities (total: 38) Currency Industry term Organization Person … Events and facts (total: 38) Acquisition Alliance Bankruptcy Merger …
Conclusion Augmenting portal documents with automatically generated recommendations to background information and related content Extension of our previous recommender system: User-specific recommendations Flexible mapping of information types to services Leveraging external analysis engines for tagging The extensions are currently being incorporated in the existing recommender system prototypically implemented in IBM’s WebSphere Portal
Q uestions & A nswers