Md Firoz Khan, Mohd Talib Latif, Norhaniza Amil Source Apportionment of particulate Matter Using Principal Component Analysis and Positive Matrix Factorisation Md Firoz Khan, Mohd Talib Latif, Norhaniza Amil School of Environmental and Natural Resource Sciences, Universiti Kebangsaan Malaysia, Bangi, Malaysia Research Centre for Tropical Climate Change System, Institute of Climate Change, Universiti Kebangsaan Malaysia, Bangi, Malaysia Institute for Environment and Development (Lestari), Universiti Kebangsaan Malaysia, Bangi, Malaysia Atmospheric Chemistry and Air Pollution Research Group
Determination of Air Pollution Sources Schematic representations of the different methods for source identification (EU 2014) Atmospheric Chemistry and Air Pollution Research Group
Receptor Model Atmospheric Chemistry and Air Pollution Research Group
Principal Component Analysis It is a way of identifying patterns in data, and expressing the data in such a way as to highlight their similarities and differences. Principal component analysis (PCA) is also a technique used to emphasize variation and bring out strong patterns in a dataset. It's often used to make data easy to explore and visualize. Atmospheric Chemistry and Air Pollution Research Group
Toy Example Pretend we are studying the motion of the physicist’s ideal spring. This system consists of a ball of mass m attached to a massless, frictionless spring. In other words, the underlying dynamics can be expressed as a function of a single variable x. Atmospheric Chemistry and Air Pollution Research Group
Source-compositions Receptor models (e.g., PMF, CMB, PCA) Monitor
Source Apportionment Models Widely Used Models Other Available Models PCA/ APCS - simplified model Weighted APCS - deals “zero score” but lack of non-negativity requirement PCA/Absolute principal component score(APCS) EPA‘S Chemical Mass Balance (CMB) PMF is complicated and robust model PMF - lower uncertainty and stop producing zero factor score, requires component loadings and scores to be non-negative Capable of identifying sources without any prior knowledge of sources Unmix Positive Matrix Factorization (PMF) Artificial Neural Networks-Source receptor modelling
PCA/APCS/PMF/MLRA address with the following formula Measurement error Normalized data Source contribution Source profile Atmospheric Chemistry and Air Pollution Research Group
Data matrix Data matrix Source contribution Profiles Atmospheric Chemistry and Air Pollution Research Group
Preparation of Database Common problem Systematic bias-analysis by different labs or different methods Presence of data below MDL Presence of coelution (non-target analytes that elute at the same time as a target analyte) Data entry, identify outliers Noisy data Missing data Exclude variables if missing >50% Atmospheric Chemistry and Air Pollution Research Group
Preparation of Database Continue.. Replace data below MDL with MDL/2 Replace missing data with average value of nearby data, or simply the average of the variable concentration Data normalization or conversion of the data into unit less or zero/centered mean Adequate number of data point and variables Atmospheric Chemistry and Air Pollution Research Group
Independent (y) and Dependent (x) Missing value Step 1: Get Data Suitable data (N) Independent (y) and Dependent (x) Missing value Atmospheric Chemistry and Air Pollution Research Group
Adequate number of data set No of data point must be more than no of variables No of data point should be 5 times of variables N > or = 100 samples (PK Hopke) N>(30+p+3)/2 (Henry et al 1984) N=50 (source unknown) N=30 (magic number!) Suitability test (KMO and Bartlett’s test): Our suggestions!! Atmospheric Chemistry and Air Pollution Research Group
Optimization Factor >1 Eigen value Variance (%) ~ 10 or >10 Interpretable factor profiles At least one variables should response significantly Exclude variable if doesn’t response to any factor either! Atmospheric Chemistry and Air Pollution Research Group
Step 1: Get Data Suitable data (N) Missing value Atmospheric Chemistry and Air Pollution Research Group
Step 2: Normalize the Data in Excel X – Average ----------------- Stdev Use “$” for Average and Standard Deviation Paste formula e.g. =SUM(H3-H$632)/H$633 Atmospheric Chemistry and Air Pollution Research Group
Upload data into SPSS Atmospheric Chemistry and Air Pollution Research Group
Step 3: Suitability of the Data KMO and Bartlett’s test Atmospheric Chemistry and Air Pollution Research Group
KMO and Bartlett’s test Atmospheric Chemistry and Air Pollution Research Group
Step 4: Run PCA Atmospheric Chemistry and Air Pollution Research Group
Run PCA Atmospheric Chemistry and Air Pollution Research Group
PCA Results Atmospheric Chemistry and Air Pollution Research Group
PCA Results Atmospheric Chemistry and Air Pollution Research Group
Step 5: Copy and paste the Factor Scores in a Excel Sheet from Step 4 Atmospheric Chemistry and Air Pollution Research Group
Step 6: Prepare a New Raw Data Set Adding a Zero Sample at the End of the Row Atmospheric Chemistry and Air Pollution Research Group
Step 7: Run PCA for the Second Time Atmospheric Chemistry and Air Pollution Research Group
Step 8: Copy and paste the Factor Scores in a Excel Sheet from Step 7 Atmospheric Chemistry and Air Pollution Research Group
The revised factor scores are recognized here APCS Step 9: Subtract the Factor score for Zero Sample from the Each Sample in Step-8 The revised factor scores are recognized here APCS Atmospheric Chemistry and Air Pollution Research Group
Step 10: Run MLR using PM2.5 mass as Dependent Variables and Each of the APCS is Independent Variable. Atmospheric Chemistry and Air Pollution Research Group
Step 10: Convert the APCS into Factor Mass by Multiplying the Respective Regression Coefficient Atmospheric Chemistry and Air Pollution Research Group
Atmospheric Chemistry and Air Pollution Research Group
Demonstration of US EPA PMF 5.0
Upload input files
Execution of the PMF model
Responses of PMF5.0 {Mg, Zn, Cu, Ni, Ca2+} {Pb, NH4+, K+, As, Cd, Zn, Ni, V } {NO3-} Slope = 0.91, R2 = 0.88 P < 0.01 {As, Ba, Sr, Se} PMF ….Fit line {Na+, Cl-, SO42-} HVS
Acknowledgement School of Environmental and Natural Resource Sciences, Universiti Kebangsaan Malaysia, Bangi, Malaysia Research Centre for Tropical Climate Change System, Institute of Climate Change, Universiti Kebangsaan Malaysia, Bangi, Malaysia Atmospheric Chemistry and Air Pollution Research Group Atmospheric Chemistry and Air Pollution Research Group