Structured prediction A diák alatti jegyzetszöveget írta: Balogh Tamás Péter 13/04/2016
Structured prediction the sample is not IID anymore Supervised learning Instance = structrure Structure can be sequence tree, graph …
Applications Speech and natural language processing Image processing Clinical diagnostics
Sequence labeling Sequence is the simplest structure E.g.: Assign a label to each of the frames about the state of the movement
slide copyright of Nicolas Nicolov
slide copyright of Nicolas Nicolov
slide copyright of Nicolas Nicolov
slide copyright of Nicolas Nicolov
Hidden Markov Models (HMM)
Hidden Markov Models Discrete Markov Process There are N states, the system (nature) is in one of the states in every point in time notes that the state of the system in time point t is Si
Hidden Markov Modells The current state of the system depends exclusively on the previous states First order Markov Model:
Transition probabilities The transition among states is stacionary, i.e. it does not dependent on the time: Sequence initial probs:
Emission probabilities The states qt are not observable (hidden). Let’s assume we have access to observable variables of the system. We can observe a single discrete random variable with M possible values: Emission probabilites:
Hidden Markov Models
HMM example Stock exchange price forecast S = {positive, negative, neutral} mood O = {increasing, decreasing} price
Tasks at HMMs λ are known. What is the likelihood of observing i.e. λ are known. What is the most probable hidden state sequence for an observation sequence (decoder) ? argmax
Evaluation (1.) task Given λ and , =?
Evaluation (1.) task Forward(-backward) algorithm: forward variables: Time complexity: O(NTT) Forward(-backward) algorithm: forward variables: recursive procedure initialisation:
Forward algorithm Time complexity: O(N2T)
Most probable sequance (decoder) Given λ and , argmax P(Q| λ,O) =? Viterbi algorithm Dynamic programming δt(i) notes the sequence 1..t where qt=Si
Viterbi algorithm
Hidden Markov Models
Discriminative sequence labeling
Discriminative sequence labeling P(D|c) P(c|D)
Discriminative sequence labeling arbitrary feature set
Decoder in discriminative sequence labeling
Viterbi for the decoder initalisation:
Maximum Entropy Markov Model MEMM MEMM is a discriminative seq labeler A single (Bayesian) classifier is learnt:
Conditional Random Fields
CRF training gradient descent-based techniques…
Structured perceptron Online learning Decoding with the actual parameters Update if the predicted and expected structures not equal Update by the difference of the two aggregated feature vectors
Structured perceptron Viterbi decoder is the same! Training (parameter update):
Over the sequences...
Tree prediction - PCFG
Tree prediction – CYK algoritmh
Summary Structured prediction tasks Hidden Markov Models pl. sequence labeling Hidden Markov Models Discriminative sequence labelers (MEMM, CRF, structured perceptron)