Semantic Memory Knowledge Acquisition Through Active Dialogues Włodzisław Duch, Julian Szymański The knowledge representation using relations between concepts and keywords is relatively simple model for modeling language. However it gives the possibilities for implementation quite interesting linguistic competences, not demonstrated by more sophisticated knowledge models, for example frames used in CYC. One of the presented linguistic abilities is a twenty questions game based on semantic memory built on relational model for knowledge representation. The next linguistic competence of the implemented system is to talk about possessed knowledge. The presented interaction with the human user is organized in form of active dialog. It shows how artificial system uses predefined sentence templates for acquiring new knowledge. We present dialog scenarios for mining knowledge and discuss the data acquired into semantic memory structures using them.
Psycholinguistic models of the Semantic Memory Endel Tulving „Episodic and Semantic Memory” Semantic memory refers to the memory of meanings and understandings. It stores concept-based, generic, context-free knowledge. Pernament container for general knowledge (facts, ideas, words etc). Semantic network Collins & Loftus, 1975 Hierarchical Model Collins & Quillian, 1969
Semantic knowledge representation wCRK weight Concept Relation Keyword Cobra is_aanimal is_abeast is_abeing is_abrute is_acreature is_aentity is_afauna is_aobject is_aorganism is_areptile is_aserpent is_asnake is_avertebrate hasbelly hasbody part hascell haschest hascosta … CDV – Concept Description Vector forms Semantic Matrix
Idea for semantic data aquisition Play 20 questions with Avatar! Think about animal – system tries to guess it, asking no more than 20 questions that should be answered only with Yes or No. Given answers narrows the subspace of the most probable objects. System learns from the games – obtains new knowledge from interaction with the human users. Is it vertebrate? Y Is itmammal? Y Is it mammal? Y Does it have hoof? Y Is itequine? N Is it equine? N Is itbovine? N Is it bovine? N Does it have horn? N Does it have long neck? Y I guess it is giraffe.
Algorithm for 20 questions game, where p(keyword=vi) is fraction of concepts for which the keyword has value vi Subspace of candidate concepts O(A) are selected according to: O(A) = {i; d=|CDVi-ANSW| is minimal},where CDVi is a vector for i-concept and ANSW is a partial vector of retrieved answers ● we can deal with user mistakes choosing d > minimal
Automatic data acquisition Basic semantic data obtained from aggregation of machine redable dictionaries: Wordnet ConceptNet Sumo Ontology –Used relations for semantic category: animal –Semantic space truncated using word popularity rank: IC – information content is an amount of appearances of the particular word in WordNet descriptions GR - GoogleRank is an amount of web pages returned by Google search engine for a given word BNC - are the words statistics taken from British National Norpus. ● Initial semantic space reduced to 94 objects and 72 features
Human interaction knowledge aquisition Data obtained from machine readable dictionaries: –Not complete –Not Common Sence –Sometimes specialised concepts –Some errors Knowledge correction in the semantic space:, where: W 0 – initial weight, initial knowledge (from dictionaries) ANS – answer given by user N – amount of answers β - parametr for indicating importance initial knowledge
Active Dialogues Dialogues with the user for obtaining new knowledge: While system fails gues the object: I give up. Tell me what did you think of? The concepts used in the game corrects the semantic space While two concepts has the same CDV Tell me what is characteristic for ? The new keywords for specified concepts are stored in the semantic memory While system needs more knowledge for same concept: I dont have any particular knowledge about. Tell me more about. System obtains new keywords for a given concept.