A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho 83209 USA Erik.

Slides:



Advertisements
Similar presentations
Evaluating Classifiers
Advertisements

Learning Algorithm Evaluation
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Response Surface Method Principle Component Analysis
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Classification and risk prediction
Model Evaluation Metrics for Performance Evaluation
Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,
Anthropometry Technique of measuring people Measure Index Indicator Reference Information.
The rank of a product of two matrices X and Y is equal to the smallest of the rank of X and Y: Rank (X Y) =min (rank (X), rank (Y)) A = C S.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
4 Th Iranian chemometrics Workshop (ICW) Zanjan-2004.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering On-line Alert Systems for Production Plants A Conflict Based Approach.
CALIBRATION Prof.Dr.Cevdet Demir
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Net Analyte Signal Based Multivariate Calibration Methods By: Bahram Hemmateenejad Medicinal & Natural Products Chemistry Research Center, Shiraz University.
CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.
1 MCR-ALS analysis using initial estimate of concentration profile by EFA.
1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.
Classification of Microarray Data. Sample Preparation Hybridization Array design Probe design Question Experimental Design Buy Chip/Array Statistical.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Optimized Numerical Mapping Scheme for Filter-Based Exon Location in DNA Using a Quasi-Newton Algorithm P. Ramachandran, W.-S. Lu, and A. Antoniou Department.
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Prof.Dr.Cevdet Demir
Comparing Database Search Methods & Improving the Performance of PSI-BLAST Stephen Altschul.
DEVELOPMENT OF A NOVEL CONTINUOUS STATISTICAL MODELLING TECHNIQUE FOR DETECTING THE ADULTERATION OF EXTRA VIRGIN OLIVE OIL WITH HAZELNUT.
1 Chapter 11 Unsaturated Hydrocarbons 11.4 Polymers of Alkenes.
Polymers Materials Polymerisation C H O N Cl F S Degree of polymerisation Sources of monomers Si.
 Compared to metals, plastics have lower density, strength, elastic modulus, and thermal and electrical conductivity, and a higher coefficient of thermal.
Quick guide to pre-processing Use [Alt-Tab] to go to LatentiX (if running) Press [Page Down] or [Enter] to continue Press [ESC] to end the show.
Plastics by the Numbers Xiaofan Li By the end of the class, you’ll be able to… Identify the plastic type of commonly seen/used plastics Recognize.
Evaluating Classifiers
Identifying Computer Graphics Using HSV Model And Statistical Moments Of Characteristic Functions Xiao Cai, Yuewen Wang.
Evaluation – next steps
Utilizing the Intersection Between Simulated and Observed Hyperspectral Solar Reflectance Y. Roberts, P. Pilewskie, B. Kindel Laboratory for Atmospheric.
Performance measurement. Must be careful what performance metric we use For example, say we have a NN classifier with 1 output unit, and we code ‘1 =
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
© Identification Of Different Kinds Of Plastics Using Laser- Induced Breakdown Spectroscopy For Waste Management Gondal, MA; Siddiqu, MN TAYLOR FRANCIS.
Experiments in Machine Learning COMP24111 lecture 5 Accuracy (%) A BC D Learning algorithm.
Surface Adhesion and Water Drops Low Surface Adhesion High Surface Adhesion Water molecules attracted to each other (hydrogen bonding) and not to the surface.
DEFINING MULTIVARITE CALIBRATION MODEL COMPLEXITY FOR MODEL SELECTION AND COMPARISON John Kalivas Department of Chemistry Idaho State University Pocatello,
1 / 9 When things get rough you need new stuff William Lipps Analytical & Measuring Instrument Division July, 2015.
CLASSIFICATION. Periodic Table of Elements 1789 Lavosier 1869 Mendelev.
Evaluating Results of Learning Blaž Zupan
Polymers c) Polymers Aesthetic, functional and mechanical properties, application and advantages/disadvantages of the following thermoplastics in the production.
CHAPTER SEVEN ESTIMATION. 7.1 A Point Estimate: A point estimate of some population parameter is a single value of a statistic (parameter space). For.
PATTERN RECOGNITION : CLUSTERING AND CLASSIFICATION Richard Brereton
Chemistry: An Introduction to General, Organic, and Biological Chemistry, Eleventh Edition Copyright © 2012 by Pearson Education, Inc. Chapter 11 Unsaturated.
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
Preventing Overfitting Problem: We don’t want to these algorithms to fit to ``noise’’ Reduced-error pruning : –breaks the samples into a training set and.
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Richard Brereton
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
1 Robustness of Multiway Methods in Relation to Homoscedastic and Hetroscedastic Noise T. Khayamian Department of Chemistry, Isfahan University of Technology,
PLASTICS. What Do The Numbers Mean? These numbers tell you which plastics are considered safe and which are not safe There are seven numbers you will.
GenChem 2 nd Class Introduction to Measurement Make sure you are sitting with your group members! Today’s Agenda:  Safety Agreement Check  Review of.
Two-Dimensional Infrared Correlation in Time-resolved Spectroscopy Sadeq M. Al - Alawi Department of Chemistry University of Bahrain October 24, 2002.
Self-Modeling Curve Resolution and Constraints Hamid Abdollahi Department of Chemistry, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan,
Classifiers!!! BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin.
Date of download: 7/10/2016 Copyright © 2016 SPIE. All rights reserved. Schematic of the experimental setup. L1: 193-nm excimer laser; L2: 488-nm Ar-Ion.
7. Performance Measurement
Classifiers!!! BCH339N Systems Biology / Bioinformatics – Spring 2016
Lecture 3.
Food adulteration analysis without laboratory prepared or determined reference food adulterant values John H. Kalivasa*, Constantinos A. Georgioub, Marianna.
Chapter 7 – K-Nearest-Neighbor
Data Mining Classification: Alternative Techniques
Abdur Rahman Department of Statistics
Data Driven SIMCA – more than One-Class Classifier
Introduction to Analytical Chemistry
Roc curves By Vittoria Cozza, matr
Fig. 3 FTIR spectra of plastic standards and sediment samples.
Presentation transcript:

A NEW USE OF TARGET FACTOR ANALYSIS (TFA) John H. Kalivas, Kevin Higgins Department of Chemistry Idaho State University Pocatello, Idaho USA Erik Andries Department of Mathematics Central New Mexico Community College Albuquerque, New Mexico, Idaho USA

Classification Situation Numerous classification approaches –KNN, LDA, MD, ANN, SVM, … As the number of classes increases for a problem, the more difficult classification can become Target factor analysis (TFA) and net analyte signal (NAS) –TFA and NAS have concurrent calculations of analogous angles between a test sample vector and respective spaces spanned by library classes –Useful for binary or multiclass situations 2

Requirements X i = m × n library information matrix for the ith class –m = number of samples –n = number of measurements Wavelengths for spectra, other physical or chemical variables –Samples making up a library class must span variances making up the class Instrument profile, temperature effects, measurement process, others y = m × 1 test sample measurement vector 3

Orthogonal Projection Spatial Angle (OPSA) Identical to TFA and NAS –Use same orthogonal projection 4 y

Process No data preprocessing Perform SVD of each library class Retain d eigenvectors (class-wise) where 1 ≤ d ≤ k and k = rank(X) ≤ min(m,n) Compute OPSA, MD, and KNN for the test sample relative to each library class –Use leave one out cross-validation (LOOCV) Library class with smallest angle or MD is the test sample classification KNN classification trends evaluated 5

Assessment Accuracy = (TP + TN)/(TP +TN + FP + FN) –TP = true positives –TN = true negatives –FP = false positives –FN = false negatives Receiver operator characteristic (ROC) –True positive rate = sensitivity = TP/(TP + FN) –False positive rate = 1- specificity = 1 – TN/(TN + FP) 6

Determining Eigenvectors Numerous approaches exist to determine the minimum number of eigenvectors to span X Determination of rank by augmentation (DRAUG) –Malinowski ER. J. Chemom. 2011; 25: Distinguishes primary eigenvectors (chemical, instrumental, etc.) from secondary eigenvectors (experimental error) independent of the experimental uncertainties distribution 7

Plastic Data Six classes (six of seven commercial plastic types 1-6) –Allen V, Kalivas JH, Rodriguez RG. Applied Spec. 1999; 53: Raman spectroscopy (850 – 1800 cm -1, 1093 wavenumbers per spectrum) –Type 1 = polyethylene terephthalate (PET); 30 samples –Type 2 = high-density polyethylene (HDPE); 29 samples –Type 3 = polyvinyl chloride (PVC); 13 samples –Type 4 = low-density polyethylene (LDPE); 22 samples –Type 5 = polypropylene (PP); 23 samples –Type 6 = polystyrene (PS); 29 samples 8

Plastic Score and Scree Plots 9 Type 1 Type 2 Type 3 Type 4 Type 5 Type 6 Unique clusters are not formed Most of the spectral variance is captured with the first eigenvector Score Plot Scree Plot

Plastic Classification Results 10 a Parenthesis values are DRAUG eigenvector number rounded to nearest whole number Numbers indicate number of eigenvectors Total Accuracy Across All Classes OPSA MD ROC Plot KNN Specificity Sensitivity Accuracy

Archeological Data Four classes (four archeological sources of obsidian) –Kowalski BR, Schatzki TF, Stross FH. Anal. Chem. 1972; 44: trace metal concentrations from X-ray fluorescence spectroscopy (Fe, Ti, Ba, Ca, K, Mn, Rb, Sr, Y, and Zr) –Source 1 = 10 samples –Source 2 = 9 samples –Source 3 = 23 samples –Source 4 = 21 samples 11

Archeological Classification Results 12 OPSA MD Source 1 Source 2 Source 3 Source 4 Score PlotScree Plot Total Accuracy Across All Classes a Parenthesis values are DRAUG eigenvector number rounded to nearest whole number KNN Specificity Sensitivity Accuracy

Gasoil Data Three classes (three commercial sources of gasoil) –Wentzell P, Andrews D, Walsh J, Cooley J, Spencer P. Can. J. Chem. 1999; 77: Ultraviolet spectroscopy (200 – 400 nm, 572 wavelengths per spectrum) –Source 1 = 59 samples –Source 2 = 25 samples –Source 3 = 30 samples 13

Gasoil Classification Results 14 OPSA MD Source 1 Source 2 Source 3 Score Plot Scree Plot Total Accuracy Across All Classes a Parenthesis values are DRAUG eigenvector number rounded to nearest whole number KNN Specificity Sensitivity Accuracy

Extra Virgin Olive Oil (EVOO) Data Six classes (six adulterant oils) –Poulli KI, Mousdis GA, Georgiou CA. Food Chem. 2007; 105: Synchronous fluorescence spectroscopy (250 – 400 nm at Δ20nm,151 wavelengths per spectrum) –Adulterant 1 = corn –Adulterant 2 = olive-pomace –Adulterant 3 = soybean –Adulterant 4 = sunflower –Adulterant 5 = rapeseed –Adulterant 6 = walnut 31 samples each at 0.5 to 95 % adulterant 15

EVOO Classification Results OPSA MD Corn, Olive-pomace, Rapeseed, Soybean, Sunflower, Walnut Score Plot Scree Plot Total Accuracy Across All Classes Specificity Sensitivity Accuracy KNN

EVOO Concentrations 17 Corn, Olive-pomace, Rapeseed, Soybean, Sunflower, Walnut Concentration Coded Score Plot Score Plot % Sunflower a Parenthesis values are DRAUG eigenvector number rounded to nearest whole number

Summary TFA or NAS angular measure OPSA out-performs MD and KNN over a variety of data sets –If normalize y to unit length, same results if use (TFA) Score plots need not be obvious Need to determine number of eigenvectors (basis vectors) to characterize each library class Samples making up a library class need to span variances making up that library class –Instrument profile –Temperature effects –Others 18