Concept Description Vectors and the 20 Questions Game Włodzisław Duch Tomasz Sarnatowicz Julian Szymański
Permanent container for general knowledge Semantic Memory Permanent container for general knowledge
Hierarchical Model Collins & Quillian, 1969
Semantic network Collins & Loftus, 1975
Semantic Memory
All the concepts and keywords create a Semantic Space All the concepts and keywords create a semantic matrix
Concept Description Vectors CDV – a vector of properties describing a single concept Most of elements are 0’s – sparse vector
Data Sources I Machine readable dictionaries and ontologies: Wordnet ConceptNet Sumo/Milo ontology
Data Sources II Dictionaries data retrieval On-line sources Approach Merriam Webster Wordnet (gloss) MSN Encarta Approach Word morphing Phrases extraction (with POS tagger) Statistical analysis
Data access Binary dictionary search 220 = 1048576 Binary search – not acceptable in complex semantical applications Narrowing concept space by subsequent queries
20 Questions Game Algorithm p(keyword=vi) is fraction of concepts for which the keyword has value vi Candidate concepts O(A) are selected according to: O(A) = {i; |CDVi-A| is minimal} where CDVi is a vector for concept i and A is a partial vector of retrieved answers
Word puzzles 20Q game reversed Concept – known Keywords – the ones that would lead to the concept