CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Feb MRCSA Lecture I: What Is CL?2 Lecture 1 Course Information What is CL? Linguistics CS Course Contents
Feb MRCSA Lecture I: What Is CL?3 Course Information Web Lecturers Book (nominally) Jurafsky & Martin, Speech and Language Processing, Prentice Hall 2000, ISBN
Feb MRCSA Lecture I: What Is CL?4 CL: Two Main Disciplines COMP SCILINGUISTICS
Feb MRCSA Lecture I: What Is CL?5 Computers and Language Computational Linguistics Emphasis on mechanised linguistic theories. Grew out of early Machine Translation efforts Natural Language Processing Computational models of language analysis, interpretation, and generation. syntax/semantics interface Language Engineering emphasis on large-scale performance example: Google Speech Technology
Feb MRCSA Lecture I: What Is CL?6 Linguistics Phonetics: The study of speech sounds Phonology: The study of sound systems Morphology: The study of word structure Syntax: The study of sentence structure Semantics: The study of meaning Pragmatics: The study of language use
Feb MRCSA Lecture I: What Is CL?7 History of Grammar Until 50 years ago, most linguistic work concerned sound systems (phonology), word structure (morphology), and the historical relationships among languages. Writings on grammar go back at least 3000 years. Until 200 years ago, almost all of it was prescriptive. Scientific study of sentence grammar is comparatively recent. [source: Sag & Wasow]
Feb MRCSA Lecture I: What Is CL?8 Grammar: the rules of a language Prescriptive Grammar Subjective Rules for and against certain uses Proscribed forms that are in current use “don’t end a sentence with a preposition” Descriptive Grammar Objective Rules characterizing what people actually say Goal is to characterize all and only sentences that belong to the language.
Feb MRCSA Lecture I: What Is CL?9 Noam Chomsky Noam Chomsky’s work in the 1950s radically changed linguistics, making syntax central. Chomsky has been the dominant figure in linguistics ever since. Chomsky invented the generative approach to grammar.
Feb MRCSA Lecture I: What Is CL?10 Generative Grammar: What Follows? Grammars should be formulated precisely and explicitly Grammar is a theory of linguistic knowledge. Mathematical definition of a grammar as a generative device. Grammar should generate exactly the strings of the language. [source: Sag & Wasow]
Feb MRCSA Lecture I: What Is CL?11 Generative Power of a Grammar G G GL L L undergeneration only but not all overgeneration all but not only all and only
Feb MRCSA Lecture I: What Is CL?12 Theories of Sentence and Word Structure: Rewrite Rules Rewrite rules can be used to specify the sentences of a language. Rules have the form LHS RHS LHS may be a sequence of symbols RHS may be a sequence of symbols or words. Lexicon specifies words and their categories
Feb MRCSA Lecture I: What Is CL?13 A Simple Grammar/Lexicon grammar: S NP VP NP N VP V NP lexicon: V kicks N John N Bill S NP N Johnkicks NPV VP N Bill
Feb MRCSA Lecture I: What Is CL?14 Grammar + Lexicon Defines language = (possibly infinite) set of sentences. But grammar is finite. Assigns structures that are general "closer" to meaning than sentence itself. Grammar/Lexicon = Linguistic knowledge? Learnability: grammar is concrete entity that can be acquired.
Feb MRCSA Lecture I: What Is CL?15 Formal v. Natural Languages Formal Languages Numbers Logic x man(x) mortal(x) C if (i >10) exit(0); Natural Languages English John saw the dog German Johann hat den hund gesehen Maltese Gianni ra kelb
Feb MRCSA Lecture I: What Is CL?16 Points of Similarity A language is considered to be a (possibly infinite) set of sentences. Sentences are sequences of words. Formation rules determine which sequences are valid sentences. Sentences have a definite structure. Sentence structure related to meaning.
Feb MRCSA Lecture I: What Is CL?17 Points of Difference Formal Languages The grammar defines the language Restricted application Non ambiguous Natural Languages The language defines the grammar Universal application Highly ambiguous
Feb MRCSA Lecture I: What Is CL?18 Ambiguity Lexical Ambiguity the sheep is in the pen Syntactic Ambiguity small animals and children laugh Semantic Ambiguity every girl loves a sailor Pragmatic Ambiguity can you pass the salt? The management of ambiguity is central to the success of CL
Feb MRCSA Lecture I: What Is CL?19 Computer Science The study of basic concepts Algorithm Program Information Data The application of these concepts to practical tasks. Implementation of information processing models from other fields.
Feb MRCSA Lecture I: What Is CL?20 Unimplemented theories can be dangerous Representational details omitted. Computer memory requirements omitted. Nature of individual steps may be unclear. Difficult to test. Potentially unimplementable
Feb MRCSA Lecture I: What Is CL?21 Psychological Memory Model
Feb MRCSA Lecture I: What Is CL?22 Algorithms and Linguistics Does linguistic theory make sense without implementing the concepts? Linguistic theory provides linguistic knowledge in the form of grammar rules theories about grammar rules Putting knowledge to some use involves processing issues: parsing generation
Feb MRCSA Lecture I: What Is CL?23 Computational Linguistics – Issues How are a grammar and a lexicon represented? How is the structure of a given sentence actually discovered? How can we actually generate a sentence to express a particular meaning? How can linguistic theory be made concrete enough to test algorithmically? Can an artificial system learn a language with limited exposure to grammatical sentences?
Feb MRCSA Lecture I: What Is CL?24 Computational Linguistics Twin Goals Scientific Goal: Contribute to Linguistics by adding a computational dimension. Technological Goal: Develop basis for machinery capable of handling human language that can support “language engineering”
Feb MRCSA Lecture I: What Is CL?25 Applications of Computational Linguistics Machine Translation Information Retrieval/Extraction Document Classification Question Answering Style and Spell Checking Integrated Multimodal Tasks
Feb MRCSA Lecture I: What Is CL?26 Course Contents 1 (MR)Overview 2 (RF)Chomsky Hierarchy 3 (MR)Examples 4 (RF)Grammatical Categories 5, 6 (MR)Tagging 7 (RF)Morphology 8, 9, 10 (MR)Comp Morphology 11 (RF)Syntax 12, 13, 14(MR)Grammar Formalism
Feb MRCSA Lecture I: What Is CL?27 Computational Linguistics – Tools & Resources Grammar Formalisms, e.g. Definite Clause Grammars Parsing Algorithms sentence structure Generation Algorithms structure sentence Statistical Methods Linguistic Corpora