Presentation is loading. Please wait.

Presentation is loading. Please wait.

인공지능 연구실 황명진 2006. 01. 03 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.

Similar presentations


Presentation on theme: "인공지능 연구실 황명진 2006. 01. 03 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand."— Presentation transcript:

1 인공지능 연구실 황명진 2006. 01. 03 FSNLP Introduction

2 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand language –Understanding the relationship between linguistic utterances and the world –Understanding the linguistic structures –Rules which are used to structure linguistic expressions Edward Sapir “All grammars leak” – 정확하고 완전하게 언어의 특성을 제공하는 것은 불 가능하다.

3 3 Rationalist and Empiricist Approaches to Language(1) Rationalist (1960 ~ 1985) – 지식의 중요한 부분은 유전적으로 미리 머리에 저장 되어 있다고 믿었다. –Noam Chomsky – 인간 두뇌가 복제될 수 있다. –Poverty of the stimulus Empiricist (1920 ~ 1960, 1985 ~ 최근 ) – 두뇌에는 몇 개의 인지능력이 있다. 절대적인 것이 아 니라 지식 발달의 한 단계로 본다. – 일반적인 패턴인식, 조합, 일반화 등의 연산은 가지고 태어 난다.

4 4 Rationalist and Empiricist Approaches to Language(2) Corpus –Surrogate for situating language in a real world –Advocacy of Empiricist –Discovering a language’s structure –Finding a good grammatical description Rationalist 와 Empiricist 의 접근 방법의 차이 –Describe the language module of the human mind (the I-language) –Describe the texts(E-language) as it actually occurs

5 5 Scientific Content Questions that linguistics should answer Two basic question –The first covers all aspects of the structure of language –The second deals semantics, pragmatics, discourse Traditional –Competence grammar –Grammatically, categorical binary choice – 무리하게 붙임, Conventionality 문제 발생 –Work in framework –Categorical perception

6 6 Non-categorical phenomena in language Over time, the words and syntax of a language change. Words will change their meaning and their part of speech Blanding of parts of speech : near – 같은 단어가 여러 가지 품사로 쓰인다. Language change kind of and sort of –We are kind of hungry –He sort of understood what was going on

7 7 Language and cognition as probabilistic phenomena We live in a world filled with uncertainty and incomplete information The cognitive processes are best formalized as probabilistic processes A major part of statistical NLP is deriving good probability estimates for unseen events

8 8 The Ambiguity of Language : Why NLP Is Difficult(1) An NLP system needs to determine something of the structure of text-normally Example : 3 syntactic analyses –Our company is training workers.

9 9 The Ambiguity of Language : Why NLP Is Difficult(2) Making disambiguation –decisions of word sense, word category, syntactic structure, and semantic scope Traditional approach –Selectional restrictions, Disallow metaphorical extensions –Disambiguation strategies that rely on manual rule creation and handtuning produce a knowledge acquisition bottleneck Statistical NLP approach –Solve these problems by automatically learning lexical and structural preferences from corpora –We recognize that there is a lot of information in the relationships between words.

10 10 Dirty Hands – Lexical resource Lexical resource – machine-readable text, dictionaries, thesauri, and also tools. Brown corpus, Penn Treebank Parallel texts Balanced corpus WordNet(node=synset)

11 11 Dirty Hands - Word counts(1) Text is being represented as a list of words. Some questions –What are the most common words? –How many words are there in the text? How many word tokens there are? (71,370) How many word types appear in the text? (8,018) –Statistical NLP is difficult : it is hard to predict much about the behavior of words

12 12 Dirty Hands - Word counts(2) Table 1.1 Common words Table1.2 Frequency of frequencies of word type

13 13 Zipf’s laws (1) The Principle of Least effort –People will act so as to minimize their probable average rate of work Zip’s law –Roughly accurate characterization of certain empirical facts –f : frequenc of a word –r : its position in the list, rank

14 14 Zipf’s laws (2) Empirical evaluation of Zipf ’ s law

15 15 Zipf’s laws (3) Mandelbrot –It is very bad in reflecting the details –A straight line with slope –1 –More general relationship between rank and frequency –A straight line descending with slope –B –B=1, equal Zipf’s laws

16 16 Zipf’s laws (4) Zipf ’ law and Mandelbrot

17 17 Other laws The number of meanings of a word is correlated with its frequency A word can measure the number of lines or page between each occurrence of the word in a text –F : frequency –I : different interval sizes –P : between about 1 and 1.3 –Most of the time content words occur near another occurrence of the same word

18 18 Collocation(1) Collocation is any turn of phrase of accepted usage where somehow the whole is perceived to have an existence beyond the sum of the parts Include –Compounds, phrasal verbs, other stock pharsers Any expression that people repeat because they have heard others using it is a candidate for a collocation Example –The first idea : bigrams –Next step : filter Continue it in chapter 5

19 19 Collocation(2) Table 1.4 Commonest bigram collocation Table 1.5 Frequent bigrams after filtering

20 20 Concordances The syntactic frames in which verbs appear Guiding statistical parsers.


Download ppt "인공지능 연구실 황명진 2006. 01. 03 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand."

Similar presentations


Ads by Google