Machine Translation Machine translation is of one of the earliest uses of AI Two approaches: Traditional approach using grammars, rewrite rules, and lexicons May be shallow or deep translation May translate directly between two languages, or from one language into interlingua and then into second language Statistical machine translation
Shallow Translation Transfer model: Keep a database of translation rules or examples. When rule matches, translate directly Could operate on lexical, syntactic, or semantic level Example: technical manuals (Siemens) Doesn't deal with context
Deep Translation One method uses language independent representation of information – interlingua Three problems: Knowledge representation Parsing into that representation Generation from representation Alternate approach is directly from one language to another
Statistical Machine Translation Translation model is learned from a bilingual corpus One system: Break the original sentences in phrases Choose a corresponding phrase in the target language Choose a permutation of the phrases Select the most probable translation
Efficiency Instead of examining all permutations (n!), use the concept of distortion Distortion d i is the number of words that the phrase f i has moved with respect to f i-1, positive if moved to right, negative if moved to the left Find a probability distribution for d Each distortion is independent of the others
Efficiency (cont'd) Still exponential over the number of phrases Use beam search with a heuristic that estimates probability to find nearly-most-probably translation
Speech Recognition Challenges: Little segmentation, unlike written text Coarticulation: sound at end of word runs into sound at beginning of next word Homophones Solutions: Find acoustic model: P(sound i:t | word 1:t ) Find language model: P(word 1:t ) Use HMM and Viterbi algorithm
Acoustical Processing Sample analog signal: sampling rate, quantization factor matter Divide into frames Extract features from each frame: Use Fourier Transform to measure acoustic energy at about a dozen frequencies Computer the mel frequency cepstral coefficient (mfcc) for each frequency Yields thirteen features
Processing (cont'd) Each phone has a onset, middle, and end The phone models are strung together to form a pronunciation model for each word Words can have a coarticulation model: “tomato” vs. “tomahto” Language model can be an n-gram model learned from a corpus of text