Presentation is loading. Please wait.

Presentation is loading. Please wait.

Types of memory Neurocognitive approach to NLP: at least 4 types of memories. Long term (LTM): recognition, semantic, episodic + working memory. Input.

Similar presentations


Presentation on theme: "Types of memory Neurocognitive approach to NLP: at least 4 types of memories. Long term (LTM): recognition, semantic, episodic + working memory. Input."— Presentation transcript:

1 Types of memory Neurocognitive approach to NLP: at least 4 types of memories. Long term (LTM): recognition, semantic, episodic + working memory. Input (text, speech) pre-processed using recognition memory model to correct spelling errors, expand acronyms etc. For dialogue/text understanding episodic memory models are needed. Working memory: an active subset of semantic/episodic memory. All 3 LTM are coupled mutually providing context for recogniton. Semantic memory is a permanent storage of conceptual data. “Permanent”: data is collected throughout the whole lifetime of the system, old information is overridden/corrected by newer input. “Conceptual”: contains semantic relations between words and uses them to create concept definitions.

2 Semantic Memory Models Endel Tulving „Episodic and Semantic Memory” 1972. Semantic memory refers to the memory of meanings and understandings. It stores concept-based, generic, context-free knowledge. Permanent container for general knowledge (facts, ideas, words etc). Semantic network Collins Loftus, 1975 Hierarchical Model Collins Quillian, 1969

3 Semantic memory Hierarchical model of semantic memory (Collins and Quillian, 1969), followed by most ontologies. Connectionist spreading activation model (Collins and Loftus, 1975), with mostly lateral connections. Our implementation is based on connectionist model, uses relational database and object access layer API. The database stores three types of data: concepts, or objects being described; keywords (features of concepts extracted from data sources); relations between them. IS-A relation us used to build ontology tree, serving for activation spreading, i.e. features inheritance down the ontology tree. Types of relations (like “x IS y”, or “x CAN DO y” etc.) may be defined when input data is read from dictionaries and ontologies.

4 SM & neural distances Activations of groups of neurons presented in activation space define similarity relations in geometrical model (McClleland, McNaughton, O’Reilly, Why there are complementary learning systems, 1994).

5 Similarity between concepts Left: MDS on vectors from neural network. Right: MDS on data from psychological experiments with perceived similarity between animals. Vector and probabilistic models are approximations to this process. S ij ~  (w,Cont)|  (w,Cont) 

6 Creating SM The API serves as a data access layer providing logical operations between raw data and higher application layers. Data stored in the database is mapped into application objects and the API allows for retrieving specific concepts/keywords. Two major types of data sources for semantic memory: 1.machine-readable structured dictionaries directly convertible into semantic memory data structures; 2.blocks of text, definitions of concepts from dictionaries/encyclopedias. 3 machine-readable data sources are used: The Suggested Upper Merged Ontology (SUMO) and the the MId- Level Ontology (MILO), over 20,000 terms and 60,000 axioms. WordNet lexicon, more than 200,000 words-sense pairs. ConceptNet, concise knowledgebase with 200,000 assertions.

7 Creating SM – free text WordNet hypernymic (a kind of … ) IS-A relation + Hyponym and meronym relations between synsets (converted into concept/concept relations), combined with ConceptNet relation such as: CapableOf, PropertyOf, PartOf, MadeOf... Relations added only if in both Wordnet and Conceptnet. Free-text data: Merriam-Webster, WordNet and Tiscali. Whole word definitions are stored in SM linked to concepts. A set of most characteristic words from definitions of a given concept. For each concept definition, one set of words for each source dictionary is used, replaced with synset words, subset common to all 3 mapped back to synsets – these are most likely related to the initial concept. They were stored as a separate relation type. Articles and prepositions: removed using manually created stop-word list. Phrases were extracted using ApplePieParser + concept-phrase relations compared with concept-keyword, only phrases that matched keywords were used.

8 Semantic knowledge representation vwCRK: certainty – truth – Concept Relation Keyword Similar to RDF in semantic web. Cobra is_aanimal is_abeast is_abeing is_abrute is_acreature is_afauna is_aorganism is_areptile is_aserpent is_asnake is_avertebrate hasbelly hasbody part hascell haschest hascosta Simplest rep. for massive evaluation/association: CDV – Concept Description Vectors, forming Semantic Matrix

9 Concept Description Vectors Drastic simplification: for some applications SM is used in a more efficient way using vector-based knowledge representation. Merging all types of relations => the most general one: “x IS RELATED TO y”, defining vector (semantic) space. {Concept, relations} => Concept Description Vector, CDV. Binary vector, shows which properties are related or have sense for a given concept (not the same as context vector). Semantic memory => CDV matrix, very sparse, easy storage of large amounts of semantic data. Search engines: {keywords} => concept descriptions (Web pages). CDV enable efficient implementation of reversed queries: find a unique subsets of properties for a given concept or a class of concepts = concept higher in ontology. What are the unique features of a sparrow? Proteoglycan? Neutrino?

10 Relations IS_A: specific features from more general objects. Inherited features with w from superior relations; v decreased by 10% + corrected during interaction with user. Similar: defines objects which share features with each other; acquire new knowledge from similar objects through swapping of unknown features with given certainty factors. Excludes: exchange some unknown features, but reverse the sign of w weights. Entail: analogical to the logical implication, one feature automatically entails a few more features (connected via the entail relation). Atom of knowledge contains strength and the direction of relations between concepts and keywords coming from 3 components: directly entered into the knowledge base; deduced using predefined relation types from stored information; obtained during system's interaction with the human user.

11 Gry słowne Gry słowne były popularne na długo przed komputerami... Były bardzo przydatne do rozwoju zdolności analitycznego myślenia. Do niedawna słownych gier komputerowych było bardzo mało. Gra w 20 pytań może być kolejnym wielkim wyzwaniem AI, jest bardziej realistyczna niż nieograniczony test Turinga. Szachy są za proste – komputery szybko liczą, więc wygrywają. Maszyna, zgadująca o czym myślę, musi być inteligentna... Znajdywanie dobrych pytań wymaga wiedzy i kreatywności. Pozwala na testowanie modeli pamięci semantycznej i pamięci epizodycznej w realistycznych warunkach. Inne zastosowania: identyfikacja obiektów na podstawie ich opisu, uściślanie zapytań dla wyszukiwarek internetowych itp. Potrzebna jest pamięć semantyczna na dużą skalę, miliony pojęć: ontologie, słowniki (Wordnet), encyklopedie, MindNet (Microsoft), projekty kolaboracyjne, np. Concept Net (MIT) … co się da. Nadal nie wystarczy... przykład gry w 20 pytań.gry w 20 pytań

12 20Q20Q The goal of the 20 question game is to guess a concept that the opponent has in mind by asking appropriate questions. www.20q.netwww.20q.net has a version that is now implemented in some toys! Based on concepts x question table T(C,Q) = usefulness of Q for C. Learns T(C,Q) values, increasing after successful games, decreasing after lost games. Guess: distance-based. SM does not assume fixed questions. Use of CDV admits only simplest form “Is it related to X?”, or “Can it be associated with X?”, where X = concept stored in the SM. Needs only to select a concept, not to build the whole question. Once the keyword has been selected it is possible to use the full power of semantic memory to analyze the type of relations and ask more sophisticated questions. How is the concept selected?

13 20q for semantic data acquisition Play 20 questions with Avatar! http://diodor.eti.pg.gda.pl Think about animal – system tries to guess it, asking no more than 20 questions that should be answered only with Yes or No. Given answers narrows the subspace of the most probable objects. System learns from the games – obtains new knowledge from interaction with the human users. Is it vertebrate? Y Is itmammal? Y Is it mammal? Y Does it have hoof? Y Is itequine? N Is it equine? N Is itbovine? N Is it bovine? N Does it have horn? N Does it have long neck? Y I guess it is giraffe.

14 20 Q web

15 Active Dialogues Dialogues with the user for obtaining new knowledge/features: While system fails guess the object: I give up. Tell me what did you think of? The concepts used in the game corrects the semantic space While two concepts has the same CDV Tell me what is characteristic for ? The new keywords for specified concepts are stored in semantic memory While system needs more knowledge for same concept: I don’t have any particular knowledge about. Tell me more about. System obtains new keywords for a given concept.

16 Experiments in animal domain WordNet, ConceptNet, SumoMilo ontology + MindNet project as knowledge sources; added to SM only if it appears in at least 2 sources. Basic space: 172 objects, 475 features, 5031 relations. # features/concept = CDV density. Initial CDV density = 29, adding IS_A relations =41, adding similar, entails, excludes=46. Quality Q = N S /N = #searches with success/# all searches. Error E = 1-Q = 1-N S /N. For 10 concepts selected with #features close to the average. Q~0.8, after 5 repetition E ~ 18%, so some learning is needed.

17 Learning from games Select O randomly with preference for larger # features, p~exp(-N(O)/N)) N(O) = #features in O, and N = total number of features, Learning procedure: CDV(O) representation of the chosen concept O is inspected, and if necessary corrected. CDV(O) is removed from the memory. Try to learn the concept O by playing the 20 questions game. Average results for 5 test objects as a function of # games shown. NO  = S NO + S GS graph showing the average growth of the number of features as a function of the number of games played. Randomization of questions helps to find different features in each game. Average number of games to learn selected concepts N f =2.7. After the first successful game when a particular concept has been correctly recognized it was always found properly. After 4 games only a few new features are added.

18 Medical applications: goals & questions Can we capture expert’s intuition evaluating document’s similarity, finding its category? How to include a priori knowledge in document categorization – important especially for rare disease. Provide unambiguous annotation of all concepts. Acronyms/abbreviations expansion and disambiguation. How to make inferences from the information in the text, assign values to concepts (true, possible, unlikely, false). How to deal with the negative knowledge (not been found, not consistent with...). Automatic creation of medical billing codes from text. Semantic search support, better specification of queries. Question/answer system. Integration of text analysis with molecular medicine. Provide support for billing, knowledge discovery, dialog systems.

19 Example of clinical summary discharges Jane is a 13yo WF who presented with CF bronchopneumonia. She has noticed increasing cough, greenish sputum production, and fatique since prior to 12/8/03. She had 2 febrile epsiodes, but denied any nausea, vomiting, diarrhea, or change in appetite. Upon admission she had no history of diabetic or liver complications. Her FEV1 was 73% 12/8 and she was treated with 2 z-paks, and on 12/29 FEV1 was 72% at which time she was started on Cipro. She noted no clinical improvement and was admitted for a 2 week IV treatment of Tobramycin and Meropenem.

20 Unified Medical Language System (UMLS) semantic types “Virus” causes “Disease or Syndrome” semantic relation  Other relations: “interacts with”, “contains”, “consists of”, “result of”, “related to”, …  Other types: “Body location or region”, “Injury or Poisoning”, “Diagnostic procedure”, …

21 UMLS – Example (keyword: “virus”)  Metathesaurus: Concept: Virus, CUI: C0042776, Semantic Type: Virus Definition (1 of 3): Group of minute infectious agents characterized by a lack of independent metabolism and by the ability to replicate only within living host cells; have capsid, may have DNA or RNA (not both). (CRISP Thesaurus) Synonyms: Virus, Vira Viridae  Semantic Network: "Virus" causes "Disease or Syndrome"

22 Summary discharge test data Average size [bytes] No. of records 367201282865 Asthma 9906 32416 35348 7958 27024 13430 14282 19418 23583 Reference Data size [bytes] 1375 1420 1597 1790 1816 1587 2849 1598 1451 586 493 177 283 41 298 544 638 609 Clinical Data UTI Gastroenteritis Otitis media Cerebral palsy Cystic fibrosis JRA Anemia Epilepsy Pneumonia Disease name JRA - Juvenile Rheumatoid Arthritis UTI - Urinary tract infection

23 Data processing/preparation Reference Texts MMTx ULMS concepts /feature prototypes/ Filtering - focus on 26 semantic types. Features - UMLS concept IDs Clinical Documents MMTx Filtering using existing space Final data UMLS concepts MMTx – discovers UMLS concepts in text

24 Semantic types used Values indicate the actual numbers of concepts found in: I – clinical texts II – reference texts

25 Data statistics General: 10 classes 4534 vectors 807 features (out of 1097 found in reference texts) Baseline: Majority: 19.1% (asthma class) Content based: 34.6% (frequency of class name in text) Remarks: Very sparse vectors Feature values represent term frequency (tf) i.e. the number of occurrences of a particular concept in text

26 Model of similarity I Try to capture some intuitions combining evidence while scanning the text: 1.Initial distance between document D and the reference vectors R k should be proportional to d 0k = ||D – R k ||  1/p(C k ) – 1 2.If a term i appears in R k with frequency R ik > 0 but does not appear in D the distance d(D,R k ) should increase by  ik = a 1 R ik 3.If a term i does not appear in R k but it has non-zero frequency D i the distance d(D,R k ) should increase by  ik = a 2 D i 4.If a term i appears with frequency R ik > D i > 0 in both vectors the distance d(D,R k ) should decrease by  ik =  a 3 D i 5.If a term i appears with frequency 0 < R ik ≤ D i in both vectors the distance d(D,R k ) should decrease by  ik =  a 4 R ik

27 Model of Similarity II with the constrains: Given the document D, a reference vector R k and probability p(i|C k ) probability that the class of D is C i should be proportional to: where  ik depends on adaptive parameters a 1,…,a 4 which may be specific for each class. Linear programming technique can be used to estimate a i by maximizing similarity between documents and reference vectors: where k indicates the correct class.

28 Results 49.5 51.451.050.248.9kNN 70.1 60.0 (0.01) 71.0 42.3 M5 70.7 59.8 (0.01) 72.3 39.5 M4 71.3 60.5 (0.1) 63.2 39.5 M3 71.4 60.9 (0.1) 60.7 31.0 M2 - 60.4 (0.1) 56.5 40.6 M1 71.6 59.3 (1.0) 66.0 39.5 M0 10 Ref. vectors SSV dec. tree SVM (Optimal C) MLP (300 neur.) 10-fold crossvalidation accuracies in % for different feature weightings. M0: tf frequencies; M1: binary data;

29 Enhancing representations A priori knowledge is form of reference prototypes is not sufficient. Experts reading the text activate their semantic memory and add a lot of knowledge that is not explicitly present in the text. Semantic memory is difficult to create: co-occurrence statistics does not capture structural relations of real objects and features. Better approximation (not as good as SM): use ontologies adding parent concepts to those discovered in the text. Ex: IBD => [C0021390] Inflammatory Bowel Diseases => -> [C0341268] Disorder of small intestine -> [C0012242] Digestive System Disorders -> [C1290888] Inflammatory disorder of digestive tract -> [C1334233] Intestinal Precancerous Condition -> [C0851956] Gastrointestinal inflammatory disorders NEC -> [C1285331] Inflammation of specific body organs -> [C0021831] Intestinal Diseases -> [C0178283] [X]Non-infective enteritis and colitis [C0025677] Methotrexate (Pharmacologic Substance) => -> [C0003191] Antirheumatic Agents -> [C1534649] Analgesic/antipyretic/antirheumatic

30 Enhancing representations Experts reading the text activate their semantic memory and add a lot of knowledge that is not explicitly present in the text. Co-occurrence statistics does not capture structural relations of real objects and features, systematic knowledge is needed. An approximation (not as good as SM): use ontologies adding related concepts (use parent & other relations) to those discovered in the text. Ex: IBD => [C0021390] Inflammatory Bowel Diseases => -> [C0341268] Disorder of small intestine -> [C0012242] Digestive System Disorders -> [C1290888] Inflammatory disorder of digestive tract -> [C1334233] Intestinal Precancerous Condition -> [C0851956] Gastrointestinal inflammatory disorders NEC -> [C1285331] Inflammation of specific body organs -> [C0021831] Intestinal Diseases [C0025677] Methotrexate (Pharmacologic Substance) => -> [C0003191] Antirheumatic Agents -> [C1534649] Analgesic/antipyretic/antirheumatic

31 Enhancing representations MDS for original data MDS on medical discharge summaries after two enhancement steps.. MDS on medical discharge summaries after two enhancement steps

32 Clusterization on enhanced data MDS mapping of 4534 documents divided in 10 classes, using cosine distances. 1.Direct, binarized vectors. 2.Enhanced by all semantic types, one step (parents only). 3.Enhanced by selected semantic types, one step. 4.Enhanced by selected semantic types, two steps.

33 MDS mapping of 4534 documents divided in 10 classes, using cosine distances. 1.Initial representation, 807 features. 2.Enhanced by 26 selected semantic types, two steps, 2237 concepts with CC >0.02 for at least one class. Two steps create feedback loops A  B between concepts. Structure appears... is it interesting to experts? Are these specific subtypes (clinotypes)? Clusterization on enhanced data

34 BODY, PERCEPTION AND AWARENESS Motor and multimodal perspectives September 7-9, Torun 2009, Poland IV INTERNATIONAL INTERDISCIPLINARY CONFERENCE BODY, PERCEPTION AND AWARENESS Motor and multimodal perspectives Conference will be held in Torun, on September 7 - 9. 2009 Details about the conference, its program, key topics, invited experts, accommodation, and other info is at http://kognitywistyka.net/~bpa/


Download ppt "Types of memory Neurocognitive approach to NLP: at least 4 types of memories. Long term (LTM): recognition, semantic, episodic + working memory. Input."

Similar presentations


Ads by Google