Download presentation
Presentation is loading. Please wait.
Published bysertse Abebe Modified over 5 years ago
1
Natural Language Processing (NLP) Chapter One Introduction to Natural Language Processing(NLP)
2
Instructor Name: Sertse Abebe Ayalew Email: sertsea@bdu.edu.et orsertsea@bdu.edu.et sertse26@gmail.com
3
Evaluation Assignment 15% Article Review 15% Exam 70%
4
Course objectives To enable students to understand different issues concerning the creation of computer programs that can interpret, generate, and learn instructions and messages written using natural languages. Among the issues that will be discussed are: morphological processing, syntactic processing, semantic interpretation The primary emphasis of this course is on text-based language processing Speech processing will also be discussed. 2/20/2007Husni Al-Muhtaseb4
5
Tentative Weekly Schedule TopicW# Introduction1 Regular Language Regular Expressions & Automata 2 Morphology and Text processing3 Spelling correction and N-Grams Models4 Parts of Speech Tagging5 Parsing6 Speech Recognition7 Word Vectors and semantics8 5
6
Text Books Speech And Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, By Daniel Jurafsky and James H. Martin, Prentice-Hall, 2017.
7
Introductions What is Natural Language? – language that is used for everyday communication – languages like Amharic, Oromiffa, Tigrigna, English, Hindi or Portuguese. Phonemes & alphabets are the smallest structure A collection of phonemes(alphabet) create words – arbitrary collection of alphabets can form string – all string not form word – Only legal collection of string could form word
8
A collection of Words create a vocabulary – Vocabulary consists of a set of words legal with in a language. – A text(speech) is composed of a sequence of words from a vocabulary – A language is constructed of a set of all possible legal texts (speech)
9
Levels of Language Phonemes/ Morphemes: are meaningful unit of speech or words. Possible studies – Phonological analysis: A language study deals with analysis and synthesis of phonemes is called phonology. – Morphological analysis: A language study deals with analysis and synthesis of Morphemes is called Morphology. – Semantic analysis: What’s the literal meaning of words 9
10
Phrases (sentences): legal units of language formed from possibly more than one words. – Syntactic analysis: What kind of grammatical rule phrases are we dealing with? Which words modify one another? What are the proper ordering of words – Semantics: What’s the literal meaning of phrases (sentences) paragraphs? – Pragmatics: What should you conclude from the fact that I said something? How should you react?
11
Communications Acoustic sounds Alphabets {A-Z, a,z, 0-9, @,#,$,%,^,& ….} Vocabulary A Abide Book Buzz is the Xylophone zero Syntax and grammar Semantics(Mea ning) Synonym Antonym. Social contexts social uses of all components
12
Cont ’ Purpose of natural Language The goal in the production and comprehension of natural language is communication. – Communication for the speaker Intention: decide what and when are languages are used to communicate. May require planning and reasoning about agents’ goals and beliefs. Generation: Translate the information to be communicated ( “language of thought”) into string of words in desired natural language Synthesis: Output the string in desired modality, text or speech.
13
Cont’ – Communication for the Reader or Hearer Perception: Map sensing modality to a string of words (Looking at or hearing) Analysis :Determine the informative content of the string or speech tag. Determine the meaning of the information in its literal or Incorporation: Decide whether or not to believe the content of the string and add it to the permanent knowledge system.
14
Automated Natural Language processes 1.Automatic Natural Language Understanding Taking some spoken/typed sentence and working out what it means 2.Automatic Natural Language Generation Taking some formal representation of what you want to say and working out a way to express it in a natural (human) language (e.g., English) NLP - Prof. Carolina Ruiz
15
Cont’ Automated Natural language Processing usually engaged in accommodation of rules and regulation of a particular language both to understand and generate any language components
16
Morphology and Phonology Study of Words – Their internal structure – How they are formed? Morphology tries to formulate rules washing-ingwash batbats writewriter ratrats browsebrowser
17
Syntax, semantics and Pragmatics Has its own Syntax, Semantic and Pragmatic interpretation – Syntax: concerns the proper ordering of words and its affect on meaning. The dog bit the boy. The boy bit the dog. – Semantics: concerns the (literal) meaning of words phrases, and sentences.
18
Cont’ “plant” as a photosynthetic organism “plant” as a manufacturing facility “plant” as the act of putting “” – Pragmatics: concerns the overall communicative and social context and its effect on interpretation. It needs detail discourse analysis of communication. Usually deal with Social use of language The study of how language is used to accomplish goals, and the influence of context on meaning Understanding the aspects of a language which depends on situation and world knowledge
19
Cont’ – Discourse generally deals with linguistic unit larger than simple statement
20
Disambiguation is a part of NLP Process Natural language is highly ambiguous and must be disambiguated. Ambiguity is Ubiquitous Phonological ambiguity Ambiguity is available in speech recognition. “youth in Asia” vs. “euthanasia” “recognize speech” vs. “wreck a nice beach” – Syntactical ambiguity. I saw a man with a glass – Semantically ambiguity – “”
21
Cont’ Ambiguity is Explosive – Ambiguities compound to generate enormous numbers of possible interpretations. – In English, a sentence ending in n prepositional phrases has over 2n syntactic interpretations (what about Amharic). – I saw the man with the telescope” 2 parses – I saw the man on the hill with the telescope: 5 parses – I saw the man on the hill in Texas with the telescope” 14 parses – I saw the man on the hill in Texas with the telescope at noon.”: 42 parses – I saw the man on the hill in Texas with the telescope at noon on Monday” 132 parses
22
Cont’ Humor and Ambiguity: Many jokes rely on the ambiguity of language: E.g. – Policeman to little boy: “We are looking for a thief with a bicycle.” Little boy: “Wouldn’t you be better using your eyes.” –
23
Natural Language Processing objectives 1.Understand languages Taking some spoken/typed sentence and working out what it means 2.Generate languages Taking some formal representation of what you want to say and working out a way to express it in a natural (human) language (e.g., English) NLP - Prof. Carolina Ruiz
24
‘Cont Natural Language Vs Artificial Language – – Plant as a photosynthetic organism – plant as a manufacturing facility – plant as the act of sowing
25
Cont’ – Ambiguity is the primary difference between natural and computer languages. – Formal programming languages are designed to be unambiguous, i.e. they can be defined by a grammar that produces a unique parse for each sentence in the language.
26
Cont What is NLP? – The term Natural Language Processing(NLP) encompasses a broad set of techniques for automated generation, manipulation and analysis of natural or human languages. – NLP focused on developing systems that allow computers to communicate with people with human using everyday language. – Its evolves in research agenda with a question of how computational methods can aid the understanding of human language.
27
Basic Terminology in NLP Token: linguistic units such as words, punctuation, numbers or alphanumeric are known as tokens. Sentence: An ordered sequence of tokens. Corpus: A body of text, usually containing a large number of sentences. Part-of-speech (POS) Tag: A word lexical categories such as Nouns, Verbs, Adjectives and Articles within certain language structure. A POS tag is a symbol representing such a lexical category - NN(Noun), VB(Verb), JJ(Adjective), AT(Article).
28
cont’ Parse Tree : A tree defined over a given sentence that represents the syntactic structure of the sentence as defined by a formal grammar.
29
Terminology in NLP tasks Tokenization: The process of splitting a sentence into its constituent tokens. – For segmented languages such as English, the existence of whitespace makes tokenization easier. However, for languages such as Chinese and Arabic, the task is difficult since there are no explicit boundaries between units. – Sub processes Section Splitting: Splitting a text into sections Sentence Splitting: Splitting a text into sentences Word splitting: splitting a text in to word
30
Cont’ POS Tagging: Given a sentence and a set of POS tags, a common language processing task is to automatically assign POS tags to each word in the sentences. For example, given the sentence The ball is red, the output of a POS tagger would be The/AT ball/NN is/VB red/JJ.
31
Cont’ Morphological analysis: Morphology is concerned with the discovery and analysis of the internal structure of words known as morphemes (or stems) Stems are the smallest linguistic units possessing meaning.
32
Cont’ Parsing: In the parsing task, a parser constructs the parse tree given a sentence. Parsers techniques may be generating using grammar rules, generating using complex statistical models, through labeling using supervised learning. Parsing: Building the syntactic tree of a sentence
33
cont Named-entity recognition: Identifying pre-defined entity types in a sentence Word sense disambiguation: Figuring out the exact meaning of a word or entity.
34
Cont’ Semantic role labeling: Extracting subject- predicate-object triples from a sentence
35
Possible task involved in NLP
36
Linguistic Knowledge needed for NLP Phonetics and phonology: The study of linguistic sounds and their relations to words. Morphology: The study of internal structures of words and how they can be modified Parsing complex words into their components
37
Cont’ Syntax: The study of the structural relationships between words in a sentence. Semantics: The study of the meaning of words, and how these combine to form the meanings of sentences – – Synonymy: fall & autumn – – Hypernymy & hyponymy (is a): animal & dog – – Meronymy (part of): finger & hand – – Homonymy: fall (verb & season) – – Antonym: big & small
38
Cont’ Pragmatics: Social use of language. The study of how language is used to accomplish goals, and the influence of context on meaning Understanding the aspects of a language which depends on situation and world knowledge. Discourse: The study of linguistic units larger than a single statement
39
Challenges Normalization: Different words/sentences express the same meaning. Preparation of look up tables. Example – Season of the year Fall Autumn – Book delivery time When will my book arrive? When will I receive my book?
40
Coming with better Disambiguate process and techniques – Phonetics and Phonology disambiguate – Syntax disambiguate – Semantics disambiguate – Discourse analysis
41
Application of Natural Language Processing Spell and Grammar Checking and Correction – Checking spelling and grammar – Suggesting alternatives for the errors Word Prediction and suggestion (auto fill) – Predicting the next word that is highly probable to be typed by the user. Information Retrieval Finding relevant information to the user’s query
42
Cont’ Text Categorization: Assigning one (or more) predefined category to a text Text Summarization :Generating a short summary from one or more documents, sometimes based on a given query. Question answering : Answering questions with a short answer Information Extraction : Extracting important concepts from texts and assigning them to slot in a certain template
43
Cont’ Machine Translation: Translating a text from one language to another. Sentiment Analysis : Identifying sentiments and opinions stated in a text. Optical Character Recognition: Recognizing printed or handwritten texts and converting them to computer-readable texts. Speech recognition :Recognizing a spoken language and transforming it into a text
44
Cont’ Speech synthesis : Producing a spoken language from a text. Spoken dialog systems :Running a dialog between the user and the system
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.