Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.

Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju

Motivation Activity recognition from video for higher functionality  Who is presenting agenda item  Attendee interest levels

Motivation Want it to be automatic and not involve hand generation of models  Impractical in the case of many activities  Less versatile as you might be constrained to particular aspects of the problem

Problem Definition Video Data Observations are extracted movement deltas via face tracking Hand label training segments Learn underlying models from training segments Carry out activity recognition

Approach - Learning Assume underlying models can be approximated by HMMs Use Baum Welch to learn best model using training segments Need to find observation space and number of states

Approach - Learning HMMs:

Approach - Learning To find observation space:  Run through all training segments and add observations  For new observation when doing recognition, augment learned observation matrices

Approach - Learning To find number of states, Q (for each activity):  Set upper bound as length of longest training segment  Iterate over values and generate most likely model using Baum Welch

Approach - Learning To find number of states, Q (for each activity):  Choose best Q using N-fold cross validation using criterion of discriminative power  With best Q, run Baum Welch using a number of sets of randomly initialized parameters to get λ a

Approach - Recognition Define a window width, w From the beginning, sequentially consider windows of observations (where L is length of entire sequence)

Approach - Recognition Calculate likelihood of each window segment L Rabinier, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings IEEE, 1989

Approach - Recognition Label middle frame in each window with activity with highest likelihood

Evaluation and Results Activities being observed:

Evaluation and Results Observation stream obtained from 87 second long image sequence 1296 individual frames Example frames after face detection:

Evaluation and Results Observation sequence first hand labeled Segments showing same activity extracted 4 training segments used to learn each activity

Evaluation and Results

Once underlying models were learned, calculate likelihood using sliding window Value of 21 was used for the window width, w, as this was the average length of training segments

Evaluation and Results

Carry out recognition using the likelihoods by assigning activities to the frames Compare against hand assigned labels Accuracy approximately 76%

Evaluation and Results Algorithm assigned: Different from hand label Same as hand label

Evaluation and Results Hand assigned: Different from algorithm label Same as algorithm label

Future Work Learn underlying model generating sequence of activities themselves Standardize lengths of training segments using Dynamic Time Warping and use that as the window width

The End Questions

Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.

Similar presentations

Presentation on theme: "Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju.

Similar presentations

Presentation on theme: "Learning and Recognizing Activities in Streams of Video Dinesh Govindaraju."— Presentation transcript:

Similar presentations

About project

Feedback