Presentation is loading. Please wait.

Presentation is loading. Please wait.

Natural Language Processing. According to research at an Elingsh uinervtisy, it deosn’t mttaer in what oredr the ltteers in a wrod are, the olny iprmoetnt.

Similar presentations


Presentation on theme: "Natural Language Processing. According to research at an Elingsh uinervtisy, it deosn’t mttaer in what oredr the ltteers in a wrod are, the olny iprmoetnt."— Presentation transcript:

1 Natural Language Processing

2 According to research at an Elingsh uinervtisy, it deosn’t mttaer in what oredr the ltteers in a wrod are, the olny iprmoetnt tihng is that the frist and lsat ltteer are in the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by islelf but the wrod as a wlohe. Ceehiro. That’ll sutff the splelchekcer. (Exert from Sideswipe NZ Herald 22/09/03)

3 1970s Example: SHRDLU Person: PICK UP A BIG RED BLOCK. Computer: OK. (does it) Person: GRASP THE PYRAMID. Computer: I DON'T UNDERSTAND WHICH PYRAMID YOU MEAN. Person: FIND A BLOCK WHICH IS TALLER THAN THE ONE YOU ARE HOLDING AND PUT IT INTO THE BOX. Computer: BY "IT", I ASSUME YOU MEAN THE BLOCK WHICH IS TALLER THAN THE ONE I AM HOLDING. Computer: OK. (does it) Person: WHAT DOES THE BOX CONTAIN? Computer: THE BLUE PYRAMID AND THE BLUE BLOCK. Person: WHAT IS THE PYRAMID SUPPORTED BY? Computer: THE BOX. Person: HOW MANY BLOCKS ARE NOT IN THE BOX? Computer: FOUR OF THEM. Person: IS AT LEAST ONE OF THEM NARROWER THAN THE ONE WHICH I TOLD YOU TO PICK UP? Computer: YES, THE RED CUBE. Terry Winograd. 1971. MIT Ph.D. Thesis. Terry Winograd

4 Pomegranade

5 Natural language processing (NLP) Human Language Technology (HLT), Natural Language Engineering (NLE) is considered a sub-field of artificial intelligence and has significant overlap with the field of computational linguistics. It is concerned with the interactions between computers and human (natural) languages. Natural language generation systems convert information from computer databases into readable human language. Natural language understanding systems convert human language into representations that are easier for computer programs to manipulate. The term natural language is used to distinguish human languages (e.g. English, Persian, Swedish) from formal or computer languages (e.g. C++, Prolog). NLP encompasses both text and speech, but work on speech processing has evolved into a separate field.

6 Where does it fit in the CS taxonomy? Computers Artificial Intelligence AlgorithmsDatabasesNetworking Robotics Search Natural Language Processing Information Retrieval Machine Translation Language Analysis SemanticsParsing … …

7 Yahoo, Google, Microsoft Information Retrieval Monster.com, HotJobs.com (Job finders) Information Extraction & Information Retrieval Systran powers Babelfish, Google Machine Translation Ask Jeeves Question Answering Myspace, Facebook, Blogspot Processing of User- Generated Content Tools for “business intelligence” All “Big Companies” have (several) strong NLP research labs: IBM, Microsoft, AT&T, Xerox, Sun, etc. Academia: research in an university environment Applications

8 What is NLP? Combination of computational linguistics, artificial intelligence & cognitive science. Concentrates on interpreting text using a combination of lexical, syntactic, semantic and real world knowledge. Applications include intelligent translators, speech recognition software, information management tools and other types of communication software.

9 Grammar The grammar of a language is a description of the structure of that language. Grammars provide a scheme for specifying the structure of sentences and rules for combining words into correct phrases and clauses.

10 English Grammar English word order follows a Subject- Object-Verb (SVO) linguistic topology. The subject of a verb is the “doer” of the verb, and the object is the “doee”. The catis drinkingthe milk. SubjectVerbObject

11 Syntax Syntax is the study of the rules, or patterns, that govern the way the words in a sentence come together. Syntax deals with how different words which are categorised into “parts of speech” (nouns, adjectives, verbs etc), and how they are combined into clauses, or phrases, which in turn combine into sentences.

12 Syntactic Analysis Syntactic analysis involves isolating phrases and sentences into a hierarchical structure, allowing the study of its constituents. For example the sentence “the big cat is drinking milk” can be broken up into the following constituents:

13 Syntactic Analysis The big cat is drinking milk Noun PhraseVerb Phrase DeterminerAdjective Phrase NounAuxiliaryVerbNoun Phrase Thebigcatisdrinkingmilk

14 A Grammar for a very small fragment of English sentence --> noun_phrase, verb_phrase. noun_phrase --> determiner, noun. noun_phrase --> proper_noun. determiner --> [the]. determiner --> [a]. proper_noun --> [pedro]. noun --> [man]. noun --> [apple]. verb_phrase --> verb, noun_phrase. verb_phrase --> verb. verb --> [eats]. verb --> [sings]. Implementation- Prolog

15 ?- phrase(sentence, [the, man, eats]). yes ?- phrase(sentence, [the, man, eats, the, apple]). yes ?- phrase(sentence, [the, apple, eats, a, man]). yes ?- phrase(sentence, [pedro, sings, the, pedro]). no ?- phrase(sentence,[eats, apple, man]). no ?- phrase(sentence,L).

16 L = [the, man, eats, the, man] ; L = [the, man, eats, the, apple] ; L = [the, man, eats, a, man] ; L = [the, man, eats, a, apple] ; L = [the, man, eats, pedro] ; L = [the, man, sings, the, man] ; L = [the, man, sings, the, apple] ; L = [the, man, sings, a, man] ; L = [the, man, sings, a, apple] ; L = [the, man, sings, pedro] ; L = [the, man, eats] ; L = [the, man, sings] ; L = [the, apple, eats, the, man] ; L = [the, apple, eats, the, apple] ; L = [the, apple, eats, a, man] ; L = [the, apple, eats, a, apple] ; L = [the, apple, eats, pedro] ; L = [the, apple, sings, the, man] ; L = [the, apple, sings, the, apple] ; L = [the, apple, sings, a, man] ;

17 Issues in Syntax “the dog ate my homework” - Who did what? Identify the part of speech (POS) –Dog = noun ; ate = verb ; homework = noun –English POS tagging Identify collocations mother in law, hot dog

18 Chomsky’s Grammars Chomsky introduced transformational grammars (also called transformational generative grammars or generative grammars). He introduced the idea of “deep structures” which provide a syntactic base of language and consist of:

19 Chomsky’s Grammars –a series of phrase-structure (rewrite) rules –a series of (possibly universal) rules that generates the underlying phrase- structure of a sentence –a series of transformations that act upon the phrase-structure, producing more complex sentences –a series of morphophonemic rules controlling pronunciation.

20 Chomsky’s Lexicon The lexicon, which can be thought of as a dictionary of the language in a particular form, lists all of the vocabulary words in the language and associates them with their syntactic, semantic and phonological information. This information is represented in terms of “features”.

21 Chomsky’s Feature Terms For example, the entry for “cat” might have the following syntactic features: Cat: [+ Noun], [+ Count], [+ Common], [+ Animate] These features are used to fill “slots” in a set of phrase markers. For example, a phrase marker requiring an animate noun ([+ Animate]) would find “cat” eligible for lexical subsitiution into that slot, as it fulfils the requirements of being an animate noun.

22 Syntactics vs Semantics One of the most controversial topics in the development of transformational grammar is the reationship between syntax and semantics. There is a considerable degree of interdependence between the two, and the problem is how to formalise this relationship.

23 Phrase Structure Grammars Phrase-structure rules are used to describe a given language's syntax by attempting to break language down into its constituent parts (also known as syntactic categories) namely phrasal categories and lexical categories (parts of speech). There are many kinds of phrase-structure rules, which themselves can be combined to generate additional phrase-structure rules.

24 Phrase Structure Grammars In particlar phrase-structure rules must account for the following characteristics: 1.All languages combine nouns (N) and verbs (V) to express ideas about the universe. 2.All languages have rules determining how these are combined into meaningful units.

25 Phrase Structure Grammars 3.All languages have recursion, i.e. at least one rule that can be repeated ad infinitum: –An example of this is the English use of "and", which can link any series of two or more nouns or two or more verbs: "His and hers and theirs and Mary's and John's... etc. " "He ran and jumped and played and skipped and danced and.. etc. "

26 Phrase Structure Grammar –This would be described in Transfomational Grammar as: A noun phrase (NP) consists of a N or NP, the word ‘and’, and another N or NP. A verb phrase (VP) consists of a V or VP, the word ‘and’, and another V or VP.

27 Phrase Structure Tree Sentence Noun PhraseVerb Phrase DeterminerNounVerbNoun Phrase DeterminerNoun Amonkeyclimbsthetrees

28 Problems with Traditional Grammars They are Grammar based when natural language isn’t strictly ‘Grammar based’. Most don’t take into account language variations and dialects. Humans have a built in natural language processor that can handle things machine natural language processors cannot.

29 Yoda “When 900 years old you reach, look as good you will not.” “With you the force is.” “A brave man your Father was.” Yoda (typically) uses the OSV linguistic topology which is characteristic of some of the Brazilian languages.

30 Inherent Complexity To understand a sentence you must do more than combine the dictionary meanings of it’s constituents. A large amount of human knowledge is assumed and communication takes place between complex agents in complex environments.

31 Statistical approach Statistical Machine Translation


Download ppt "Natural Language Processing. According to research at an Elingsh uinervtisy, it deosn’t mttaer in what oredr the ltteers in a wrod are, the olny iprmoetnt."

Similar presentations


Ads by Google