Download presentation
Presentation is loading. Please wait.
1
1 Wen-Hsiang Lu ( 盧文祥 ) Department of Computer Science and Information Engineering National Cheng Kung University 2014/02/17 Multilingual and Crosslingual Information System
2
2 Contact Information Room: 4261, Monday 09:10 - 12:00 AM Instructor: Prof. Wen-Hsiang Lu ( 盧文祥 ) –Office: 4216 –Office hours: Monday 12:10 - 2:10PM –Phone: 62545 –Web page: http://myweb.ncku.edu.tw/~whlu/mis.htmhttp://myweb.ncku.edu.tw/~whlu/mis.htm –Email: whlu@mail.ncku.edu.tw –Teaching assistant: 王廷軒 Email: playif@gmail.complayif@gmail.com
3
3 Course Grading Class participation/presentation: 30% Tests: 25% Project: 25% Homeworks:20%
4
4 Source Textbooks Christopher D. Manning and Hinrich Schutze, Foundations of Statistical Natural Language Processing, The MIT Press, 1999. ( 全華科技圖書 : 02-23717725)Christopher D. ManningHinrich SchutzeFoundations of Statistical Natural Language Processing Daniel Jurafsky and James H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, Prentice Hall, 2000.Daniel JurafskyJames H. MartinSpeech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition James Allen, Natural Language Understanding, Benjamin/Cummings Publishing Co, 1995.James AllenNatural Language Understanding Gregory Grefenstette, Cross-Language Information Retrieval, Kluwer, 1998. Jean Veronis, Parallel Text Processing: Alignment and Use of Translation Corpora, Kluwer, 2000.
5
5 Other Useful Sources (1) Reference Books –Charniak, E. Statistical Language Learning. –Cover, T. M., Thomas, J. A. Elements of Information Theory. –Jelinek, F. Statistical Methods for Speech Recognition. Major Conferences: –ACL (Association of Computational Linguistics) –COLING (International Conference on Computational Linguistics ) –HLT (Human Language Technology Conference) –IJCNLP (International Joint Conference on Natural Language Processing ) Journals –Computational Linguistics –Natural Language Engineering –TALIP (ACM Transactions on Asian Language Information Processing) –TSLP (ACM Transactions on Speech and Language Processing)
6
6 Other Useful Sources (2) Resource URL –http://www.aclclp.org.tw/res_other_c.php ( 中華民國計算語言學學會 )http://www.aclclp.org.tw/res_other_c.php –http://nlp.stanford.edu/software/index.shtml (Stanford NLP Group)http://nlp.stanford.edu/software/index.shtml –http://www.phontron.com/nlptools.php (Graham Neubig)http://www.phontron.com/nlptools.php Tools/Software –Online Dictionary WordNet http://wordnet.princeton.edu/ http://wordnet.princeton.edu/ HowNet http://www.keenage.com/html/c_index.html http://www.keenage.com/html/c_index.html The Academia Sinica Bilingual Ontological Wordnet (BOW) http://bow.sinica.edu.tw/ http://bow.sinica.edu.tw/
7
7 CKIP ( 中研院詞庫小組 ) (Chinese Knowledge and Information Processing) Parser: http://140.109.19.112/main.exe?id=6833http://140.109.19.112/main.exe?id=6833 POS (part of speech) tagger: http://ckipsvr.iis.sinica.edu.tw/http://ckipsvr.iis.sinica.edu.tw/
8
8 Eric Brill's POS Tagger Website: http://cst.dk/online/pos_tagger/uk/http://cst.dk/online/pos_tagger/uk/ This/DT is/VBZ a/DT book/NN./.
9
9 Stanford Parser Website –http://nlp.stanford.edu/software/lex-parser.shtmlhttp://nlp.stanford.edu/software/lex-parser.shtml Tools –Online version Stanford Parser version 1.5.1 English & Chinese http://josie.stanford.edu:8080/parser/
10
10 Stanford Parser
11
11 [Homework 1] Using CKIP POS (part of speech) tagger, Eric Brill’s POS tagger, and Stanford parser to tag and parse at least three sentence.
12
12 Course Topics Probability and Information Theory –basics: definitions, formulas, examples. Language Modeling –n-gram models, parameter estimation –smoothing (EM algorithm) Some Linguistics –phonology, morphology, syntax, semantics, discourse Words and the Lexicon –word classes, mutual information, lexicography.
13
13 Course Topics (cont.) Hidden Markov Models –background, algorithms, parameter estimation Tagging: methods, algorithms, evaluation –tag sets, HMM tagging, transformation-based, feature-based Grammars and Parsing: data, algorithms –statistical parsing: algorithms, parameterization, evaluation
14
14 Course Topics (cont.) Applications –Machine Translation (MT) –Acoustic Speech Recognition (ASR) –Information Retrieval (IR) –Cross-Language Information Retrieval (CLIR) –Question Answering (QA) –Cross-Language Question Answering (CLQA) –Summarization –Information Extraction –…
15
15 Course Introduction Lecture1: Introduction Introduction Lecture2: Mathematical FoundationsMathematical Foundations Lecture3: Linguistics EssentialsLinguistics Essentials Lecture4: Corpus-based WorkCorpus-based Work Lecture5: CollocationsCollocations Lecture6: Statistical Inference: n-gram Models over Sparse DataStatistical Inference: n-gram Models over Sparse Data Lecture7: Word Sense DisambiguationWord Sense Disambiguation Lecture8: Statistical Alignment and Machine TranslationStatistical Alignment and Machine Translation Lecture9: Markov ModelsMarkov Models Lecture10: Term Translation Extraction & Cross-Language Information RetrievalTerm Translation Extraction & Cross-Language Information Retrieval Lecture11 : Statistical/Probabilistic Models for Word Alignment & CLIRStatistical/Probabilistic Models for Word Alignment & CLIR Lecture12: Part-of-Speech TaggingPart-of-Speech Tagging Lecture13: Probabilistic Context Free GrammarsProbabilistic Context Free Grammars Lecture14: Question AnsweringQuestion Answering
16
16 The Ultimate Research Goal in Natural Language Processing (NLP) To develop an automated language understanding system Why is this important? –Easy for everyone to use language –Natural Human interface for a variety of applications (e.g., database access, on-line tutor, robot control, etc.) –Language seems fundamental for developing an intelligent system iPhone Siri IBM's DeepQA project
17
17 Natural Language is VERY Useful
18
18
19
OCR Problems 19
20
20
21
21 Aspects of Computational Linguistics Description of the Language: universals, cross-linguistic research Implementation of Computer Model: algorithms and data structures, formal models to represent knowledge, model of the reasoning process Psycho-Linguistic Aspect: humans are an existence proof of the computability of language comprehension; psychological research can be used to justify a computer model; obtain human processing parameters
22
22 NLP Issues Why is NLP difficult? –Many “words”, many “phenomena”, many “rules” OED (Oxford English Dictionary): 400k words; Finnish lexicon (of forms): ~2 × 10 7 sentences, clauses, phrases, constituents, coordination, negation, imperatives/questions, inflections, parts of speech, pronunciation, topic/focus, and much more! –irregularity (exceptions, exceptions to the exceptions,...) potato potato es (tomato, hero,...); photo photo s, and even: both mango mango s or mango es Adjective / Noun order: new book, electrical engineering, general regulations, flower garden, garden flower
23
23 Difficulties in NLP (cont.) –Ambiguity books: NOUN or VERB? –you need many books vs. she books her flights online Thank you for not smoking, drinking, eating or playing radios without earphones. (MTA bus) –Thank you for not eating without earphones?? –Thank you for drinking?? … Fred’s hat was blown off by the wind. He tried to catch it. –...catch the wind or...catch the hat ?
24
24 Rules or Statistics? Preferences: –context clues: she books books is a verb –rule: if an ambiguous word (verb/nonverb) is preceded by a matching personal pronoun word is a verb –pronoun reference: –she/he/it often refers to the most recent noun or pronoun (but there are certainly exceptions) –selectional restrictions: –catching hat is better than catching wind (but not always) –semantics: –We thank people for doing helpful things or not doing annoying things
25
25 Solutions Don’t guess if you know: morphology (inflections) lexicons (word information) unambiguous names perhaps some (really) fixed phrases syntactic rules? Use statistics (based on real-world data) for preferences (only?) No doubt: but this is an important question!
26
26 Types of Linguistic Knowledge Acoustic/Phonetic Knowledge: How words are related to their sounds. (transliteration) –E ri c sson 易利信 Morphological Knowledge: How words are constructed out of basic meaning units. un + friend + ly unfriendly love + past tense loved object + oriented object-oriented
27
27 More Types of Linguistic Knowledge Lexical Knowledge (or Dictionary): This should include information on parts of speech, features (e.g., number, case), typical usage, and word meaning. Syntactic Knowledge: How words are put together to make legal sentences (or constituents of sentences).
28
28 More Types of Linguistic Knowledge Semantic Knowledge: Word meanings, how words combine into sentence meaning, –e.g., Fred tossed the ball. Semantic roles
29
29 More Types of Linguistic Knowledge Pragmatic Knowledge: How context affects the interpretation of a sentence. Examples: –Louise loves him. [Context 1:] Who loves Fred? [Context 2:] Louise has a cat. –What time is it? [Context 1:] Fred is fidgeting ( 坐立不安 ) and staring at his watch. [Context 2:] Louise has no watch.
30
30 More Types of Linguistic Knowledge World Knowledge: How other people‘s minds work, what a listener knows or believes, the etiquette ( 成規 ) of language. Examples: –Will you pass the salt? –I read an article about the war in the paper. –Fred saw the bird with his binoculars. –Tim was invited to Tom's birthday party. He went to the store to buy him a present.
31
31 Multilingualism Issues in Web Age Language barrier –There are about 6,700 languages listed in the Ethnologue ( http://www.ethnologue.com/ ) http://www.ethnologue.com/ Information overloading –Scaling up of language resources Webpages News Weblogs Microblogs
32
32 Multilingual Understanding??
33
33 Multilingual Understanding??
34
34 Multilingual Understanding??
35
35 Real World Situation Use statistical model based on REAL WORLD DATA and care about the best sentence only Imagine: –Each sentence W = { w 1, w 2,..., w n } gets a probability P(W|X) in a context X –For every possible context X, sort all the imaginable sentences W according to P(W|X): –Ideal situation: best sentence (most probable in context X) P(W) W best W worst
36
36 Real World Situation Unable to specify a set of grammatical sentences using fixed “categorical” rules (disregarding the “grammaticality” issue) best sentence (most probable in context X) P(W) W best W worst
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.