Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Computational Linguistics

Similar presentations


Presentation on theme: "Introduction to Computational Linguistics"— Presentation transcript:

1 Introduction to Computational Linguistics
Misty Azara

2 Agenda Introduction to Computational Linguistics (CL)
Common CL applications Using CL in theoretical linguistics (computational modeling)

3 What is Computational Linguistics?
CL is interdisciplinary Linguistics Computer Science Mathematics Electrical Engineering Psychology Speech and Hearing Science

4 What is Computational Linguistics?
Computational Linguistics covers many areas Essentially, CL is any task, model, algorithm, etc. that attempts to place any type of language processing (syntax, phonology, morphology, etc.) in a computational setting

5 Core Areas of CL Machine Translation Speech Recognition Text-to-Speech
Natural Language Generation Human-Computer Dialogs Information Retrieval Computational Modeling

6 Machine Translation Using computers to automate some or all of translating from one language to another Rough translation: Useful for the web in translating pages Useful in the first stage of a complete translation process Human post-editor: Speed up translation process Sublanguage: Use a restricted vocabulary, such as weather forcasting such that the raw MT output will be adequate w/o post-editing

7 Three general models or tasks:
Tasks for which a rough translation is adequate Tasks where a human post-editor can be used to improve the output Tasks limited to a small sublanguage rough: e.g. translating web documents [example of task 2?]

8 Machine Translation (cont.)
Linguistic knowledge is extremely useful in this area of CL MT benefits from knowledge of language typology and language-specific linguistic information i.e. word order, syntactic structure, morphology, etc.

9 Taking spoken language as input and outputting the corresponding text
Speech Recognition Taking spoken language as input and outputting the corresponding text

10 Architecture SR takes the source speech and produces “guesses” as to which words could correspond to the source via some type of acoustic model The word with the highest probability is selected as the optimal candidate Linguistics: feature extraction from acoustic signal. Phoneme mapping & selection. Modification for gender, dialectal variation. Why is SR so difficult? Lack of invariance problem.

11 Why use SR? Allow for hands-free human-computer interaction

12 Taking text as input and outputting the corresponding spoken language
Text-to-Speech Taking text as input and outputting the corresponding spoken language

13 Three types of TTS Articulatory- models the physiological characteristics of the vocal tract Concatenative- uses pre-recorded segments to construct the utterance(s) Pavarobotti Speechify (Speechworks), Scansoft Eloquent

14 Three types of TTS (cont.)
Parametric/Formant- models the formant transitions of speech [baj] formant synthesizers typically synthesize lowest 6 formants

15 Why is TTS so difficult? Spelling Homonyms Prosody through, rough
PERmit (n) vs. perMIT (v) Prosody Pitch, duration of segments, phrasing of segments, intonational tune, emotion “I am so angry at you. I have never been more enraged in my life!!”

16 Why use TTS? Allows for text to be read automatically
Extremely useful for the visually impaired

17 Natural Language Generation
Constructing linguistic outputs from non-linguistic inputs

18 Natural Language Generation
Maps meaning to text Nature of the input varies greatly from one application to another (i.e documenting structure of a computer program) The job of the NLG system is to extract the necessary information to drive the generation process

19 NLG systems have to make choices:
Content selection- the system must choose the appropriate content for input, basing its decision on a pre-specified communicative goal Lexical selection- the system must choose the lexical item most appropriate for expressing a concept

20 Sentence Structure Aggregation- the system must apportion the content into phrase, clause, and sentence-sized chunks Referential expression- the system must determine how to refer to the objects under discussion (not a trivial task)

21 Discourse structure- many NLG systems have to deal with multi-sentence discourses, which must have a coherent structure

22 Sample NLG output To save a file 1. Choose save from the file menu
2. Choose the appropriate folder 3. Type the file name 4. Click the save button The system will save the document.

23 Human-Computer Dialogs
Uses a mix of SR, TTS, and pre-recorded prompts to achieve some goal

24 Human-Computer Dialogs
Uses speech recognition, or a combination of SR and touch tone as input to the system The system processes the spoken information and outputs appropriate TTS or pre-recorded prompts

25 Dialog systems have specific tasks, which limit the domain of conversation
This makes the SR problem much easier, as the potential responses become very constrained

26 Sample dialog system for banking
Sys: would you like information for checking or savings? User: Checking, please. Sys: Your current balance is $2, Would you like another transaction? User: Yes, has check #2431 cleared?

27 Linguistic knowledge in dialog systems
Discourse structure- ensuring natural flowing discourse interaction Building appropriate vocabularies/lexicons for the tasks Ensuring prosodic consistencies (i.e. questions sound like questions and spliced prompts sound continuous)

28 Why use human-computer systems?
Automate simple tasks- no need for a teller to be on the other end of the line! Allow access to system information from anywhere, via the telephone

29 Information Retrieval
Storage, analysis, and retrieval of text documents

30 Information Retrieval
Most current IR systems are based on some interpretation of compositional semantics IR is the core of web-based searching, i.e. Google, Altavista, etc.

31 Information Retrieval Architecture
User inputs a word or string of words System processes the words and retrieves documents corresponding to the request

32 “Bag of Words” The dominant approach to IR systems is to ignore syntactic information and process the meaning of individual words only Thus, “I see what I eat” and “I eat what I see” would mean exactly the same thing to the system!

33 Linguistic Knowledge in IR
Semantics Compositional Lexical Syntax (depending on the model used)

34 Computational Modeling
Computational approaches to problem solving, modeling, and development of theories

35 How can we use computational modeling?
Test our theories of language change~ synchronic or diachronic Develop working models of language evolution Model speech perception, production, and processing Almost any theoretical model can have a computational counterpart

36 Why Use Computational Modeling?
Forces explicitness – no black boxes or behind the scenes “magic” Allows for modeling that would otherwise be impossible Allows for modeling that would otherwise be unethical

37 Conclusions CL applications utilize linguistic knowledge from all of the major subfields of theoretical linguistics Computational modeling can aid linguists’ theories of language processing and structure


Download ppt "Introduction to Computational Linguistics"

Similar presentations


Ads by Google