Model-based Classification in Food Authenticity Studies D. Toher 1,2, G. Downey 1 and T.B. Murphy 2 Presented by: Deirdre Toher 1 Ashtown Food Research Centre, Teagasc, (formerly The National Food Centre), Dublin 15 2 Dept of Statistics, School of Computer Science and Statistics, Trinity College Dublin, Dublin 2
Outline Food authenticity Spectroscopic data Current mathematical methods Proposed alternative –Dimension reduction –Model-based clustering –Updating Example near-infrared data with results
Food Authenticity – what and why? Detecting when foods are not what they are claimed to be Tampering/adulteration, mislabelling Economic fraud worth millions of US dollars globally Promote quality products Build consumer trust
Food Authenticity – how? Near infrared spectroscopy –Non-invasive –Relatively inexpensive Multivariate Mathematics –Partial Least Squares Regression –Factorial Discriminant Analysis –Model-based Clustering Other methods available (sp..)
Spectroscopic Data Near infrared transflectance spectroscopy –High dimensional data –Range nm, reading every 2 nm –700 values for each sample
Current Mathematical Methods Discriminant Partial Least Squares Regression Factorial Discriminant Analysis Problem? –Limited to “two-group” classification problems –No quantification of certainty
Proposed Alternative Model-based clustering –Expansion of discriminant analysis –Allows clusters to vary in shape and size –Gives probability of a sample being in each cluster/group –Can classify situations with more than two groupings
Possible Cluster Shapes
The Dimensionality Problem Model-based clustering requires dimension reduction –for efficient computation –to prevent singular covariance matrices Use wavelet analysis with thresholding
EM Algorithm & Updating EM algorithm –expected value of the likelihood function –maximises the expected value –commonly used in statistics for estimating missing values Updating –uses previous estimates of labels as a starting point for iteration
Example: Honey Adulteration Irish honey extended with –fructose:glucose mixtures –fully inverted beet syrup –high fructose corn syrup Total of 478 spectra: –157 pure and 321 adulterated 225 with fructose:glucose mixtures 56 with fully inverted beet syrup 40 with high fructose corn syrup
Classification Achieved Classification rates on test set data achieved with correct proportions of each type of adulterant in the training set for “pure or adulterated” question. Training / TestEMEM & Updating 50% / 50%94.72% (1.12)94.43% (1.10) 25% / 75%93.22% (1.08)93.05% (1.03) 10% / 90%90.82% (1.76)92.22% (1.11)
Classification Achieved Classification rates on test set data achieved with correct proportions of pure / adulterated in the training set for “pure or adulterated” question. Training / TestEMEM & Updating 50% / 50%94.38% (1.16)94.11% (0.89) 25% / 75%93.50% (1.08)93.03% (1.02) 10% / 90%90.54% (1.80)92.05% (1.09)
Classification Achieved Classification rates on test set data achieved using 50% training, 50% test data with correct proportion of pure / adulterated in the training data set for “type of adulteration” question. QuestionEMEM & Updating Pure or adulterated? 91.09% (1.40)90.64% (1.36) Type of adulteration 86.23% (1.20)84.12% (1.67)
Classification Achieved Classification rates on test set data achieved using 50% training, 50% test data with correct proportions of each type of adulterant in the training set for “type of adulteration” question. QuestionEMEM & Updating Pure or adulterated? 89.41% (1.76)88.61% (1.82) Type of adulteration 85.70% (1.96)83.57% (2.23)
Probability v Accurate Classification Probability of group membership - by colour (black being pure, red being adulterated)
Conclusions EM algorithm gives a method of predicting group membership Updating procedures effective with small training sets Quantifying certainty Allows cost of misclassification to be easily incorporated into modelling
Questions? Funded by: Teagasc under the Walsh Fellowship Scheme Irish Department of Agriculture & Food (FIRM programme) Science Foundation of Ireland Basic Research Grant scheme (Grant 04/BR/M0057)