Knowledge Representation for Natural Language Understanding Chengqing ZONG Institute of Automation, Chinese Academy of Sciences cqzong@nlpr.ia.ac.cn
Outline CASIA and NLPR Introduction Some Linguistic Knowledge Bases Approaches to NLU Proposal NLPR, CAS-IA 2019/4/15
Institute of Automation (IA), Chinese Academy of Sciences (CAS) CASIA Institute of Automation (IA), Chinese Academy of Sciences (CAS) Founded in 1956 NLPR, CAS-IA 2019/4/15
Personnel Faculty members: 320, including 38 full time professors Post-doc research fellows: 30 Students (Ph.D. and MSc): 600 Visiting researchers: 40+ NLPR, CAS-IA 2019/4/15
NLPR National Laboratory of Pattern Recognition Staff: 29 Ph.D. candidates: 140 MSc: 120 Post-Doc.: 7 NLPR, CAS-IA 2019/4/15
NLPR Directors Academic Committee Management Committee Visual Information Processing Group General Office Biometric Information Processing Group Pattern Recognition and its Cognitive Mechanisms Group Speech and Language Technology Group June 20, 2003 NLPR, CAS-IA 2019/4/15
1. Introduction Natural language understanding is a typical task of knowledge processing Text or speech K.B. Processor Text or speech NLPR, CAS-IA 2019/4/15
1. Introduction For the different tasks or different approaches, the different representations are necessitated. e.g., for document summarization or information extraction, the knowledge for discourse analyzing and topic understanding is necessary. Title Time NLPR, CAS-IA 2019/4/15
1. Introduction For machine translation (MT), the knowledge for sentence analyzing and translating is necessary. e.g., I saw a man with a telescope. NP Det NN NP NP PP NP …… Rule-based MT: I saw [a man with a telescope]. I [saw a man] with a telescope. Statistical MT: 我用望远镜看见一个男孩。 我看见一个带望远镜的男孩。 NLPR, CAS-IA 2019/4/15
1. Introduction Questions: How is about the current linguistic K. B. ? Is an algorithm designed according to the K. B. or the representation designed for an algorithm? NLPR, CAS-IA 2019/4/15
? 2. Some Linguistic K. B. 2.1 WordNet (http://wordnet.princeton.edu ) Three basic Preconditions: Separability hypothesis Patterning hypothesis Comprehensiveness hypothesis Take synset as the building block Relationships: synonymy / antonymy / hypernymy / hyponymy / meronymy / entailment ? NLPR, CAS-IA 2019/4/15
2. Some Linguistic K. B. 2.2 HowNet (http://www.keenage.com ) Knowledge, specifically, the form of knowledge that is computer-operable, is a system encompassing the varied relations amongst concepts as well as those amongst the attributes of concepts. As one acquires more concepts, or rather, captures more relations amongst concepts alongside the links between the attributes attached to the concepts, one simply becomes more knowledgeable; On the creation of a knowledge base, a common-sense knowledge base constituting a knowledge system should first be constructed. This database shall describe general concepts and map out the relations among them. NLPR, CAS-IA 2019/4/15
2. Some Linguistic K. B. Some concepts and relationships are defined. NLPR, CAS-IA 2019/4/15
2. Some Linguistic K. B. 2.3 UPenn TreeBank http://www.cis.upenn.edu/~treebank/home.html 一CD 具体JJ 措施NN 策略NN 要点NN NP 和CC 系列M CLP QP NP-OBJ VP 提出VV 还AD ADVP 他PN NP-SBJ IP 。PU NLPR, CAS-IA 2019/4/15
2. Some Linguistic K. B. 2.4 FrameNet and Others FrameNet (frame semantics) http://framenet.icsi.berkeley.edu PropBank、NomBank http://nlp.cs.nyu.edu/meyers/NomBank.html NLPR, CAS-IA 2019/4/15
2. Some Linguistic K. B. Summary: All the presentations motioned above are human-made and human-defined; The different K. B. is built at different level and based on the different grain, such as at lexical level and tagging lexicons, or at sentence level and annotating the syntactic structure, and so on; NLPR, CAS-IA 2019/4/15
2. Some Linguistic K. B. Generally, the K. B. are developed for all-purposes and single linguistic knowledge is expressed in a specific K. B.; However, are the representations sufficient or even complete for a natural language processing system? NLPR, CAS-IA 2019/4/15
3. Approaches to NLU Three methods: Rationalistic Empirical NLPR, CAS-IA 2019/4/15
3. Approaches to NLU Take MT as an example Inter-lingual SL TL Semantic-Tree Syntactic-Tree Chunk Phrase Word Logical-Form Take MT as an example Word-to-Word Phrase-to-Phrase Chunk-to-Chunk Chunk-to-String Tree-to-Tree (Learned, Syntactic or Semantic) Tree-to-String Logical-Form-to-Logical-Form p(t|s) vs. p(s|t)×p(t) NLPR, CAS-IA 2019/4/15
3. Approaches to NLU Rule base Dictionary + Machine Learning Corpus base More data is better data. Performance Years NLPR, CAS-IA 2019/4/15
3. Approaches to NLU So many hard nuts are still remained to crack: Word sense disambiguation Syntactic disambiguation Semantic analysis and translating Automatic evaluation of translation … … NLPR, CAS-IA 2019/4/15
Increasing Number of Chinese Webpages 3. Approaches to NLU The number of webpages is exponentially increased The highest accuracy of Chinese information retrieval (webpage search) in 2006 was only about 36.7% (from 863 report) Increasing Number of Chinese Webpages The data are from the Information Center of China Internet NLPR, CAS-IA 2019/4/15
3. Approaches to NLU What is the problem? NLPR, CAS-IA 2019/4/15
3. Approaches to NLU “One should build the rocket, instead of climbing the tree, if he wants to reach the moon”, Martin Kay Is it building the rocket or climbing the tree? Does it currently take the right way to build the rocket? NLPR, CAS-IA 2019/4/15
3. Approaches to NLU How does a human brain work when it translates a sentence? Input:Speech Text + Affective Computing Semantic Perception Dynamic Vision K. B. June 20, 2003 Output Static NLPR, CAS-IA 2019/4/15
3. Approaches to NLU _ A man can infer the unknown word sense or sentence structure etc. from his common sense (limited knowledge), but a system can not; _ A man can dynamically and synthetically use multiple knowledge sources (lexical/ syntactic/ semantic/ pragmatic) to process a specific language phenomenon. It is easy to determine what knowledge is necessary and what knowledge is unnecessary, but a system usually can not; NLPR, CAS-IA 2019/4/15
3. Approaches to NLU _ A man can easily get the new knowledge and renew his memory, but a system is usually difficult to do. However, a computer can memorize a number of words and phrases, do the very fast computing, and so on, but a man can not. Currently, the models for NLU mainly use the capability of computing, but rarely or hardly simulate the human’s cognitive process. NLPR, CAS-IA 2019/4/15
4. Proposal For a specific task of NLU, such as word sense disambiguation, syntactic parsing, or translating etc., we need to model the cognitive process of human brain; According to the models, to build the task-oriented knowledge base. NLPR, CAS-IA 2019/4/15
4. Proposal e.g., for the speech-to-speech (S2S) translation in a specific domain, the following aspects are addressed: Investigate the effect of rhythm, tone, and accent; Model translation in combination with language model, speech model, and common sense model etc.; Build the knowledge base describing the language, semantic, speech, emotion, and domain-related common sense as well, which are all oriented to the S2S translation and based on the needs of translation model. NLPR, CAS-IA 2019/4/15
thanks 谢谢 ! NLPR, CAS-IA 2019/4/15