인공지능 연구실 황명진 2006. 01. 03 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.

Slides:



Advertisements
Similar presentations
Uncertainty in Engineering The presence of uncertainty in engineering is unavoidable. Incomplete or insufficient data Design must rely on predictions or.
Advertisements

Statistical NLP: Lecture 3
Statistical Methods and Linguistics - Steven Abney Thur. POSTECH Computer Science NLP Lab Shim Jun-Hyuk.
For Friday No reading Homework –Chapter 23, exercises 1, 13, 14, 19 –Not as bad as it sounds –Do them IN ORDER – do not read ahead here.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Advanced AI - Part II Luc De Raedt University of Freiburg WS 2004/2005 Many slides taken from Helmut Schmid.
Language, Mind, and Brain by Ewa Dabrowska Chapter 10: The cognitive enterprise.
1/13 Parsing III Probabilistic Parsing and Conclusions.
1/17 Probabilistic Parsing … and some other approaches.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Part of speech (POS) tagging
CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.
Statistical Natural Language Processing Advanced AI - Part II Luc De Raedt University of Freiburg WS 2005/2006 Many slides taken from Helmut Schmid.
Lecture 1 Introduction: Linguistic Theory and Theories
1. Introduction Which rules to describe Form and Function Type versus Token 2 Discourse Grammar Appreciation.
Generative Grammar(Part ii)
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
1 Statistical NLP: Lecture 10 Lexical Acquisition.
NLP superficial and lexic level1 Superficial & Lexical level 1 Superficial level What is a word Lexical level Lexicons How to acquire lexical information.
Word senses Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds, Sussex.
Researching language with computers Paul Thompson.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Chapter 5: Elementary Probability Theory
Information Retrieval and Web Search Text properties (Note: some of the slides in this set have been adapted from the course taught by Prof. James Allan.
Lecture 2 What Is Linguistics.
Natural Language Processing (NLP) I. Introduction II. Issues in NLP III. Statistical NLP: Corpus-based Approach.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
1 Ch-1: Introduction (1.3 & 1.4 & 1.5) Prepared by Qaiser Abbas ( )
Natural Language Processing Artificial Intelligence CMSC February 28, 2002.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
1 Statistical NLP: Lecture 7 Collocations. 2 Introduction 4 Collocations are characterized by limited compositionality. 4 Large overlap between the concepts.
PSY270 Michaela Porubanova. Language  a system of communication using sounds or symbols that enables us to express our feelings, thoughts, ideas, and.
Cognitive Processes PSY 334 Chapter 11 – Language Structure June 2, 2003.
Introduction to Linguistics Class # 1. What is Linguistics? Linguistics is NOT: Linguistics is NOT:  learning to speak many languages  evaluating different.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Friday Finish chapter 24 No written homework.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Introduction Chapter 1 Foundations of statistical natural language processing.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Tools of Environmental Scientist Chapter 2.  Scire (latin)  to know What is Science?
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Exploring Text: Zipf’s Law and Heaps’ Law. (a) (b) (a) Distribution of sorted word frequencies (Zipf’s law) (b) Distribution of size of the vocabulary.
Levels of Linguistic Analysis
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Chapter 11 Language. Some Questions to Consider How do we understand individual words, and how are words combined to create sentences? How can we understand.
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
NLP Midterm Solution #1 bilingual corpora –parallel corpus (document-aligned, sentence-aligned, word-aligned) (4) –comparable corpus (4) Source.
Q: What is your definition of “knowing a word”?. Knowing a word means… Knowing how often it occurs, the company it keeps, its appropriateness in different.
An Introduction to Linguistics
Grammar Grammar analysis.
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Statistical NLP: Lecture 3
Searching corpora.
INTRODUCTION TO LINGUISTICS 1
Historical Development of the term (Linguistics)
Statistical NLP: Lecture 9
Levels of Linguistic Analysis
Chapter 2 What speakers know.
Artificial Intelligence 2004 Speech & Natural Language Processing
Information Retrieval
Statistical NLP : Lecture 9 Word Sense Disambiguation
Statistical NLP: Lecture 10
Presentation transcript:

인공지능 연구실 황명진 FSNLP Introduction

2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand language –Understanding the relationship between linguistic utterances and the world –Understanding the linguistic structures –Rules which are used to structure linguistic expressions Edward Sapir “All grammars leak” – 정확하고 완전하게 언어의 특성을 제공하는 것은 불 가능하다.

3 Rationalist and Empiricist Approaches to Language(1) Rationalist (1960 ~ 1985) – 지식의 중요한 부분은 유전적으로 미리 머리에 저장 되어 있다고 믿었다. –Noam Chomsky – 인간 두뇌가 복제될 수 있다. –Poverty of the stimulus Empiricist (1920 ~ 1960, 1985 ~ 최근 ) – 두뇌에는 몇 개의 인지능력이 있다. 절대적인 것이 아 니라 지식 발달의 한 단계로 본다. – 일반적인 패턴인식, 조합, 일반화 등의 연산은 가지고 태어 난다.

4 Rationalist and Empiricist Approaches to Language(2) Corpus –Surrogate for situating language in a real world –Advocacy of Empiricist –Discovering a language’s structure –Finding a good grammatical description Rationalist 와 Empiricist 의 접근 방법의 차이 –Describe the language module of the human mind (the I-language) –Describe the texts(E-language) as it actually occurs

5 Scientific Content Questions that linguistics should answer Two basic question –The first covers all aspects of the structure of language –The second deals semantics, pragmatics, discourse Traditional –Competence grammar –Grammatically, categorical binary choice – 무리하게 붙임, Conventionality 문제 발생 –Work in framework –Categorical perception

6 Non-categorical phenomena in language Over time, the words and syntax of a language change. Words will change their meaning and their part of speech Blanding of parts of speech : near – 같은 단어가 여러 가지 품사로 쓰인다. Language change kind of and sort of –We are kind of hungry –He sort of understood what was going on

7 Language and cognition as probabilistic phenomena We live in a world filled with uncertainty and incomplete information The cognitive processes are best formalized as probabilistic processes A major part of statistical NLP is deriving good probability estimates for unseen events

8 The Ambiguity of Language : Why NLP Is Difficult(1) An NLP system needs to determine something of the structure of text-normally Example : 3 syntactic analyses –Our company is training workers.

9 The Ambiguity of Language : Why NLP Is Difficult(2) Making disambiguation –decisions of word sense, word category, syntactic structure, and semantic scope Traditional approach –Selectional restrictions, Disallow metaphorical extensions –Disambiguation strategies that rely on manual rule creation and handtuning produce a knowledge acquisition bottleneck Statistical NLP approach –Solve these problems by automatically learning lexical and structural preferences from corpora –We recognize that there is a lot of information in the relationships between words.

10 Dirty Hands – Lexical resource Lexical resource – machine-readable text, dictionaries, thesauri, and also tools. Brown corpus, Penn Treebank Parallel texts Balanced corpus WordNet(node=synset)

11 Dirty Hands - Word counts(1) Text is being represented as a list of words. Some questions –What are the most common words? –How many words are there in the text? How many word tokens there are? (71,370) How many word types appear in the text? (8,018) –Statistical NLP is difficult : it is hard to predict much about the behavior of words

12 Dirty Hands - Word counts(2) Table 1.1 Common words Table1.2 Frequency of frequencies of word type

13 Zipf’s laws (1) The Principle of Least effort –People will act so as to minimize their probable average rate of work Zip’s law –Roughly accurate characterization of certain empirical facts –f : frequenc of a word –r : its position in the list, rank

14 Zipf’s laws (2) Empirical evaluation of Zipf ’ s law

15 Zipf’s laws (3) Mandelbrot –It is very bad in reflecting the details –A straight line with slope –1 –More general relationship between rank and frequency –A straight line descending with slope –B –B=1, equal Zipf’s laws

16 Zipf’s laws (4) Zipf ’ law and Mandelbrot

17 Other laws The number of meanings of a word is correlated with its frequency A word can measure the number of lines or page between each occurrence of the word in a text –F : frequency –I : different interval sizes –P : between about 1 and 1.3 –Most of the time content words occur near another occurrence of the same word

18 Collocation(1) Collocation is any turn of phrase of accepted usage where somehow the whole is perceived to have an existence beyond the sum of the parts Include –Compounds, phrasal verbs, other stock pharsers Any expression that people repeat because they have heard others using it is a candidate for a collocation Example –The first idea : bigrams –Next step : filter Continue it in chapter 5

19 Collocation(2) Table 1.4 Commonest bigram collocation Table 1.5 Frequent bigrams after filtering

20 Concordances The syntactic frames in which verbs appear Guiding statistical parsers.