Variants, improvements etc. of activity recognition with wearable accelerometers Mitja Luštrek Jožef Stefan Institute Department of Intelligent Systems.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

Probabilistic Reasoning over Time
Acoustic design by simulated annealing algorithm
Energy expenditure estimation with wearable accelerometers Mitja Luštrek, Božidara Cvetković and Simon Kozina Jožef Stefan Institute Department of Intelligent.
“Mapping while walking”
Modeling Uncertainty over time Time series of snapshot of the world “state” we are interested represented as a set of random variables (RVs) – Observable.
Russell and Norvig, AIMA : Chapter 15 Part B – 15.3,
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Observers and Kalman Filters
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Profiles for Sequences
Hidden Markov Models Theory By Johan Walters (SR 2003)
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
Presenter: Yufan Liu November 17th,
Department of Electronic Engineering City University of Hong Kong EE3900 Computer Networks Data Transmission Slide 1 Continuous & Discrete Signals.
Ensemble Learning: An Introduction
Speech Recognition in Noise
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Student: Hsu-Yung Cheng Advisor: Jenq-Neng Hwang, Professor
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
Dynamic Time Warping Applications and Derivation
(Some issues in) Text Ranking. Recall General Framework Crawl – Use XML structure – Follow links to get new pages Retrieve relevant documents – Today.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Overview and Mathematics Bjoern Griesbach
Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Activity recognition with wearable accelerometers Mitja Luštrek Jožef Stefan Institute Department of Intelligent Systems Slovenia Tutorial at the University.
Isolated-Word Speech Recognition Using Hidden Markov Models
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Behavior analysis based on coordinates of body tags Mitja Luštrek, Boštjan Kaluža, Erik Dovgan, Bogdan Pogorelc, Matjaž Gams Jožef Stefan Institute, Department.
Mitja Luštrek Jožef Stefan Institute Department of Intelligent Systems.
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
7-Speech Recognition Speech Recognition Concepts
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Karman filter and attitude estimation Lin Zhong ELEC424, Fall 2010.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 19, NO
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Normal Equations The Orthogonality Principle Solution of the Normal Equations.
Week Aug-24 – Aug-29 Introduction to Spatial Computing CSE 5ISC Some slides adapted from the book Computing with Spatial Trajectories, Yu Zheng and Xiaofang.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Other Models for Time Series. The Hidden Markov Model (HMM)
Copyright 2011 controltrix corpwww. controltrix.com Global Positioning System ++ Improved GPS using sensor data fusion
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Kalman Filter and Data Streaming Presented By :- Ankur Jain Department of Computer Science 7/21/03.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
Behavior Recognition Based on Machine Learning Algorithms for a Wireless Canine Machine Interface Students: Avichay Ben Naim Lucie Levy 14 May, 2014 Ort.
Hidden Markov Models BMI/CS 576
ASEN 5070: Statistical Orbit Determination I Fall 2014
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Inertial Measurement Unit (IMU) Basics
Hidden Markov Models Part 2: Algorithms
CONTEXT DEPENDENT CLASSIFICATION
Department of Electrical Engineering
Handwritten Characters Recognition Based on an HMM Model
Kalman Filtering COS 323.
Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Variants, improvements etc. of activity recognition with wearable accelerometers Mitja Luštrek Jožef Stefan Institute Department of Intelligent Systems Slovenia Tutorial at the University of Bremen, November 2012

Outline Different sensor types Dynamic signal segmentation Low- and high-pass filter Feature selection Smoothing with Hidden Markov Models

Outline Different sensor types Dynamic signal segmentation Low- and high-pass filter Feature selection Smoothing with Hidden Markov Models

Magnetometer It measures the strength of (Earth’s) magnetic field MEMS magnetometers are often of the magnetoresistive type: they use strips of alloy (nickel-iron) whose electrical resistance varies with change in magnetic field Tri-axial Can measure azimuth (like compass) and inclination

Gyroscope It measures angular velocity MEMS gyroscopes use vibrating objects, which also tend to preserve the direction of vibration Gyroscope Magnetometer Accelerometer

The whole inertial package Accelerometer: x, y Magnetometer: x, z Orientation accurate long-term, subject to disturbances short-term Gyroscope: orientation accurate short-term, drifts long-term Accelerometer: location by double integration, drifts

Kalman filter Statistically optimally estimates the state of a system Combines the measurement of the current state with the extrapolation from the previous state Combines various sensors / quantities Takes into account noise

Basic Kalman equations x k = A x k–1 + B u k–1 + w k–1 z k = H x k + v k x k, x k–1... current, previous state u k–1... input in the previous state w k–1... process noise (in the previous state ) A... relation previous state – current state B... relation previous input – current state

Basic Kalman equations x k = A x k–1 + B u k–1 + w k–1 z k = H x k + v k z k... current measurement v k... measurement noise H... relation state – measurement

Combining accelerometer and gyroscope x k = A x k–1 + B u k–1 + w k–1 ϕ... orientation (one direction only!) ω drift... gyroscope drift dt... time between states k and k–1 ω... angular velocity

Combining accelerometer and gyroscope z k = H x k + v k ϕ acc... orientation according to accelerometer

Noise Process noise w ~ N (0, Q) Measurement noise v ~ N (0, R)

Kalman computations Predict: x k = A x k–1 + B u k–1 P k = A P k–1 + A T + Q (P... error covariance, initialized to 0) Correct: K k = P k H T (H P k H T + R) –1 (K... Kalman gain) x k = x k + K k (z k – H x k ) P k = (I – K k H) P k

UWB location sensors Tags that transmit ultra-wideband radio pulses Sensors that detect the time and angle of arrival of these pulses Ubisense real-time location system: – Declared accuracy 15 cm – ~ EUR

Applications to activity recogntion etc. All the sensors generate a data stream that can be treated the same way as accelerometer streams Combining multiple sensors can improve orientation sensing Location sensors good for providing context (fall detection in the bed / on the floor)

Outline Different sensor types Dynamic signal segmentation Low- and high-pass filter Feature selection Smoothing with Hidden Markov Models

The problem In activity recognition, the data is usually segmented into intervals and an activity is assigned to each interval Intervals usually have equal lengths One interval can contain multiple activities or different parts of an activity

The solution Align intervals with activities Set interval boundaries when a significant change between data samples occurs Improve classification accuracy

The algorithm Find an interval S of data samples with monotonically decreasing values If |max (S) – min (S)| > threshold then boundary between two intervals (significant change occured) Only decreasing because acceleration is usually followed by deceleration

Threshold Threshold based on previous N samples Dynamically adapts to the data threshold = = (avg max – avg min ) C N = 100 C = 0.4

Experimental results Activity recognition: (Static) activities Transitions between activities Methods Non-overlapping sliding window Overlapping sliding window Dynamic signal segmentation Activities 94.8 %95.3%97.5 % Activities and transitions 89.0 % 89.6 %92.9 %

Outline Different sensor types Dynamic signal segmentation Low- and high-pass filter Feature selection Smoothing with Hidden Markov Models

Accelerometer signal elements Gravity, which signifies orientation (rotation) – Changes with a low frequency when the accelerometer is worn on a person Acceleration due to translation – Changes with a medium frequency Noise – Changes with a high frequency

Features depend on different elements Gravity / low frequency: – The orientation of the accelerometer –... Translation / medium frequency: – The variance of the acceleration – The sum of absolute differences between the consecutive lengths of the acceleration vector –...

Remove unnecessary elements Gravity / low frequency: – Low-pass filter – Attenuates higher-frequency signal elements Translation / medium frequency: – Band-pass filter – Attenuates low- (gravity) and high-frequency (noise) signal elements

Low-pass filter Completely removing higher frequencies impossible without an infinite delay Many practical filters exist: Butterworth, Chebyshev... Simple implementation: x lp (i) = α x (i) + (1 – α) x lp (i – 1) Decreasing α increases the inertia of the filter

Band-pass filter Combines low- and high-pass filter Simple high-pass filter: Subtracts low-pass signal from the raw signal x lp (i) = β x (i) + (1 – β) x lp (i – 1) x hp (i) = x (i) – x lp (i)

Outline Different sensor types Dynamic signal segmentation Low- and high-pass filter Feature selection Smoothing with Hidden Markov Models

Feature selection Usually not difficult to come up with many features, however: – They may slow down learning – Some may be redundant – Some may be noisy – Some machine learning algorithms cope badly with many features (overfitting) So, select just the features that are really useful!

Selection by information gain Information gains assigns a score to each feature Rank features by these scores Select the top n features

Selection by Random Forest Random Forest selects N instances randomly with replacement out of the total N instances Roughly 1/3 of the instances not selected Classify these, compute accuracy ac before Randomly permute each feature f, compute accuracy ac after (f) Assign score ac before – ac after (f) to feature f Rank by these scores, select the top n features

ReliefF N... number of instances M... number of features C... number of classes NN [1... C, 1... K]... nearest K neighbors of an instance for each class dist (i, j; k)... distance between instances i and j according to the feature k Imp [1... M]... importance of features

For i = 1 to M //Features Imp [i] = 0 For i = 1 to N //Instances NN = nearest K neighbors of instance i in each class For j = 1 to M //Features For k = 1 to C //Classes If instance i belongs to class k then For l = 0 to K //Neighbors Imp [j] –= dist (i, NN [k, l]; j) else For l = 0 to K //Neighbors Imp [j] += dist (i, NN [k, l]; j)

For i = 1 to M //Features Imp [i] = 0 For i = 1 to N //Instances NN = nearest K neighbors of instance i in each class For j = 1 to M //Features For k = 1 to C //Classes If instance i belongs to class k then For l = 0 to K //Neighbors Imp [j] –= dist (i, NN [k, l]; j) else For l = 0 to K //Neighbors Imp [j] += dist (i, NN [k, l]; j)

For i = 1 to M //Features Imp [i] = 0 For i = 1 to N //Instances NN = nearest K neighbors of instance i in each class For j = 1 to M //Features For k = 1 to C //Classes If instance i belongs to class k then For l = 0 to K //Neighbors Imp [j] –= dist (i, NN [k, l]; j) else For l = 0 to K //Neighbors Imp [j] += dist (i, NN [k, l]; j)

For i = 1 to M //Features Imp [i] = 0 For i = 1 to N //Instances NN = nearest K neighbors of instance i in each class For j = 1 to M //Features For k = 1 to C //Classes If instance i belongs to class k then For l = 0 to K //Neighbors Imp [j] –= dist (i, NN [k, l]; j) else For l = 0 to K //Neighbors Imp [j] += dist (i, NN [k, l]; j)

What should n be Find out experimentally Use one of the methods to rank features Start removing features from the bottom Perform classification after removing each feature Stop when the classification gets worse

Experimental results The best cutoff

Do not touch test data until testing Golden rule of machine learning: never test on training data Hence cross-validation: – Divide data into n “folds” – Train on (n – 1) folds, test on the last fold – Repeat n times Also: never test on data used for feature selection (it is a bit like training)

Outline Different sensor types Dynamic signal segmentation Low- and high-pass filter Feature selection Smoothing with Hidden Markov Models

State 1 State 2 State 3 Observation 1 Observation 2 Observation 3 a 11 a 33 a 22 a 13 a 31 a 23 a 32 a 12 a 21 b 11 b 12 b 13 b 31 b 33 b 32 b 21 b 22 b 23 States Transitions Emissions/ observations/ outputs Hidden Markov Model

Hidden Markov Model (HMM) States are hidden and are typically the item of interest (= true activities) The next state depends on the previous one (Markov property) Observations are visible (= recognized activities)

Learning the model Baum-Welch algorithm Possible states and observations known Input: sequence of observations Y = {y 1, y 2,..., y T } Output: transition matrix A emission matrix B

Learning the model Initialize A and B to something (e.g., uniform distribution) α j (t)... probability that the model is in state x j at time t, having generated Y until t – α j (0) initialized to something for all j (sometimes starting state is known) – α j (t) =  i α i (t – 1) a ij b jy(t) – Can compute α j (t) recursively for for all j, t

Learning the model Initialize A and B to something (e.g., uniform distribution) β i (t)... probability that the model is in state x j at time t, and will generate Y from t on – β i (T) initialized to something for all j (sometimes end state is known) – β i (t) =  j a ij b jy(t+1) β j (t + 1) – Can compute β i (t) recursively for for all i, t

Learning the model Probability of transition from x i (t – 1) to x j (t), given that the current model generated Y: Iteratively estimate for all i, j, k:

Using the model Viterbi algorithm Complete model known Input: sequence of observations Y = {y 1, y 2,..., y T } Output: sequence of states X = {x 1, x 2,..., x T }

Using the model X = {} For t = 1 to T For all j compute α j (t) //In state y j at time t, having generated Y until t j max = argmax α j (t) Append x j max to X

HMM in activity recognition Take into account probabilities of transitioning from one activity to another (A) Take into account the probabilities of different errors of activity recognition (B) Reduce the number of spurious transitions

Experimental results

Other applications of HMM Recognition of complex activities from simple ones Speech recognition Part-of-speech tagging, named entity recognition Sequence analysis in bioinformatics...