Model-based Classification in Food Authenticity Studies D. Toher 1,2, G. Downey 1 and T.B. Murphy 2 Presented by: Deirdre Toher 1 Ashtown Food Research.

Slides:



Advertisements
Similar presentations
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.
Advertisements

Regression analysis Relating two data matrices/tables to each other Purpose: prediction and interpretation Y-data X-data.
Quantifying soil carbon and nitrogen under different types of vegetation cover using near infrared-spectroscopy: a case study from India J. Dinakaran*and.
Model-based clustering of gene expression data Ka Yee Yeung 1,Chris Fraley 2, Alejandro Murua 3, Adrian E. Raftery 2, and Walter L. Ruzzo 1 1 Department.
DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University.
Computer vision: models, learning and inference
Second order cone programming approaches for handing missing and uncertain data P. K. Shivaswamy, C. Bhattacharyya and A. J. Smola Discussion led by Qi.
Fei Xing1, Ping Guo1,2 and Michael R. Lyu2
A NEW PERSPECTIVE TO VISIBLE NEAR INFRARED REFLECTANCE SPECTROSCOPY: A WAVELET APPROACH Yufeng Ge, Cristine L.S. Morgan, J. Alex Thomasson and Travis Waiser.
São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.
Paper Discussion: “Simultaneous Localization and Environmental Mapping with a Sensor Network”, Marinakis et. al. ICRA 2011.
Discriminant Analysis To describe multiple regression analysis and multiple discriminant analysis. Discriminant Analysis.
Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,
Spatial Semi- supervised Image Classification Stuart Ness G07 - Csci 8701 Final Project 1.
Unsupervised Training and Clustering Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Feature Screening Concept: A greedy feature selection method. Rank features and discard those whose ranking criterions are below the threshold. Problem:
Presented by Zeehasham Rasheed
The Chicken Project Dimension Reduction-Based Penalized logistic Regression for cancer classification Using Microarray Data By L. Shen and E.C. Tan Name.
Optimal Adaptation for Statistical Classifiers Xiao Li.
Jeremy Tantrum, Department of Statistics, University of Washington joint work with Alejandro Murua & Werner Stuetzle Insightful Corporation University.
DEVELOPMENT OF A NOVEL CONTINUOUS STATISTICAL MODELLING TECHNIQUE FOR DETECTING THE ADULTERATION OF EXTRA VIRGIN OLIVE OIL WITH HAZELNUT.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Face Alignment Using Cascaded Boosted Regression Active Shape Models
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
Anomaly detection with Bayesian networks Website: John Sandiford.
Template attacks Suresh Chari, Josyula R. Rao, Pankaj Rohatgi IBM Research.
Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
University of Southern California Department Computer Science Bayesian Logistic Regression Model (Final Report) Graduate Student Teawon Han Professor Schweighofer,
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Digital Image Processing
Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.
Linear Models for Classification
1/18 New Feature Presentation of Transition Probability Matrix for Image Tampering Detection Luyi Chen 1 Shilin Wang 2 Shenghong Li 1 Jianhua Li 1 1 Department.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Multivariate selective editing via mixture models: first applications to Italian structural business surveys Orietta Luzi, Guarnera U., Silvestri F., Buglielli.
Model-based Clustering
Visual Tracking by Cluster Analysis Arthur Pece Department of Computer Science University of Copenhagen
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Rerandomization to Improve Covariate Balance in Randomized Experiments Kari Lock Harvard Statistics Advisor: Don Rubin 4/28/11.
Cell Segmentation in Microscopy Imagery Using a Bag of Local Bayesian Classifiers Zhaozheng Yin RI/CMU, Fall 2009.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Lynette.
Chapter Seventeen Copyright © 2004 John Wiley & Sons, Inc. Multivariate Data Analysis.
Hierarchical Mixture of Experts Presented by Qi An Machine learning reading group Duke University 07/15/2005.
1 Bios 760R, Lecture 1 Overview  Overview of the course  Classification and Clustering  The “curse of dimensionality”  Reminder of some background.
Small area estimation combining information from several sources Jae-Kwang Kim, Iowa State University Seo-Young Kim, Statistical Research Institute July.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Gaussian Mixture Model-based EM Algorithm for Instrument Occlusion in Tool Detection from Imagery of Laparoscopic Robot-Assisted Surgery 1 Interdisciplinary.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
CSE 4705 Artificial Intelligence
Mikhail Belkin Dept. of Computer Science Ohio State University
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
CH 5: Multivariate Methods
Regression.
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Ying shen Sse, tongji university Sep. 2016
GAUSSIAN PROCESS REGRESSION WITHIN AN ACTIVE LEARNING SCHEME
Aline Martin ECE738 Project – Spring 2005
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Presented by Wanxue Dong
EM Algorithm and its Applications
Logistic Regression Geoff Hulten.
Presentation transcript:

Model-based Classification in Food Authenticity Studies D. Toher 1,2, G. Downey 1 and T.B. Murphy 2 Presented by: Deirdre Toher 1 Ashtown Food Research Centre, Teagasc, (formerly The National Food Centre), Dublin 15 2 Dept of Statistics, School of Computer Science and Statistics, Trinity College Dublin, Dublin 2

Outline Food authenticity Spectroscopic data Current mathematical methods Proposed alternative –Dimension reduction –Model-based clustering –Updating Example near-infrared data with results

Food Authenticity – what and why? Detecting when foods are not what they are claimed to be Tampering/adulteration, mislabelling Economic fraud worth millions of US dollars globally Promote quality products Build consumer trust

Food Authenticity – how? Near infrared spectroscopy –Non-invasive –Relatively inexpensive Multivariate Mathematics –Partial Least Squares Regression –Factorial Discriminant Analysis –Model-based Clustering Other methods available (sp..)

Spectroscopic Data Near infrared transflectance spectroscopy –High dimensional data –Range nm, reading every 2 nm –700 values for each sample

Current Mathematical Methods Discriminant Partial Least Squares Regression Factorial Discriminant Analysis Problem? –Limited to “two-group” classification problems –No quantification of certainty

Proposed Alternative Model-based clustering –Expansion of discriminant analysis –Allows clusters to vary in shape and size –Gives probability of a sample being in each cluster/group –Can classify situations with more than two groupings

Possible Cluster Shapes

The Dimensionality Problem Model-based clustering requires dimension reduction –for efficient computation –to prevent singular covariance matrices Use wavelet analysis with thresholding

EM Algorithm & Updating EM algorithm –expected value of the likelihood function –maximises the expected value –commonly used in statistics for estimating missing values Updating –uses previous estimates of labels as a starting point for iteration

Example: Honey Adulteration Irish honey extended with –fructose:glucose mixtures –fully inverted beet syrup –high fructose corn syrup Total of 478 spectra: –157 pure and 321 adulterated 225 with fructose:glucose mixtures 56 with fully inverted beet syrup 40 with high fructose corn syrup

Classification Achieved Classification rates on test set data achieved with correct proportions of each type of adulterant in the training set for “pure or adulterated” question. Training / TestEMEM & Updating 50% / 50%94.72% (1.12)94.43% (1.10) 25% / 75%93.22% (1.08)93.05% (1.03) 10% / 90%90.82% (1.76)92.22% (1.11)

Classification Achieved Classification rates on test set data achieved with correct proportions of pure / adulterated in the training set for “pure or adulterated” question. Training / TestEMEM & Updating 50% / 50%94.38% (1.16)94.11% (0.89) 25% / 75%93.50% (1.08)93.03% (1.02) 10% / 90%90.54% (1.80)92.05% (1.09)

Classification Achieved Classification rates on test set data achieved using 50% training, 50% test data with correct proportion of pure / adulterated in the training data set for “type of adulteration” question. QuestionEMEM & Updating Pure or adulterated? 91.09% (1.40)90.64% (1.36) Type of adulteration 86.23% (1.20)84.12% (1.67)

Classification Achieved Classification rates on test set data achieved using 50% training, 50% test data with correct proportions of each type of adulterant in the training set for “type of adulteration” question. QuestionEMEM & Updating Pure or adulterated? 89.41% (1.76)88.61% (1.82) Type of adulteration 85.70% (1.96)83.57% (2.23)

Probability v Accurate Classification Probability of group membership - by colour (black being pure, red being adulterated)

Conclusions EM algorithm gives a method of predicting group membership Updating procedures effective with small training sets Quantifying certainty Allows cost of misclassification to be easily incorporated into modelling

Questions? Funded by: Teagasc under the Walsh Fellowship Scheme Irish Department of Agriculture & Food (FIRM programme) Science Foundation of Ireland Basic Research Grant scheme (Grant 04/BR/M0057)