Md Firoz Khan, Mohd Talib Latif, Norhaniza Amil

Slides:



Advertisements
Similar presentations
Exercise 7.5 (p. 343) Consider the hotel occupancy data in Table 6.4 of Chapter 6 (p. 297)
Advertisements

Critical Review and Meta-analysis of ambient particulate matter source apportionment using receptor models in Europe C.A. Belis, F. Karagulian, B.R. Larsen,
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
PPA 501 – Analytical Methods in Administration Lecture 8 – Linear Regression and Correlation.
The robustness of the source receptor relationships used in GAINS Hilde Fagerli, EMEP/MSC-W EMEP/MSC-W.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
1 Satellite Remote Sensing of Particulate Matter Air Quality ARSET Applied Remote Sensing Education and Training A project of NASA Applied Sciences Pawan.
A Brief Introduction to Statistical Forecasting Kevin Werner.
Building a Conceptual Model for PM over Hong Kong: A Weight-of-Evidence Approach to Evaluating Source Apportionment Results Jay Turner, Varun Yadav Washington.
1 JOINT RESEARCH CENTRE EUROPEAN COMMISSION IAEAPNRI Particle air pollution monitoring with nuclear analytical techniques: challenges and opportunities.
Angeliki Karanasiou Source apportionment of particulate matter in urban aerosol Institute of Nuclear Technology and Radiation Protection, Environmental.
Species in natural freshwater Central equilibriums in natural water samples KJM MEF 4010 Module 19.
” Particulates „ Characterisation of Exhaust Particulate Emissions from Road Vehicles Key Action KA2:Sustainable Mobility and Intermodality Task 2.2:Infrastructures.
Chapter 18 Four Multivariate Techniques Angela Gillis & Winston Jackson Nursing Research: Methods & Interpretation.
Robust GW summary statistics & robust GW regression are used to investigate spatial variation and relationships in a freshwater acidification critical.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
COHA Update Jin Xu. Update 2003 and 2004 back-trajectories – done PMF modeling by groups using 2000 to 2004 IMPROVE data – done Analysis of PMF results.
The Use of Source Apportionment for Air Quality Management and Health Assessments Philip K. Hopke Clarkson University Center for Air Resources Engineering.
Statistical analysis of corrosion attack and environmental data, dose-response functions.
Causes of Haze Assessment (COHA) Update. Current and near-future Major Tasks Visibility trends analysis Assess meteorological representativeness of 2002.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
Tom.h.wilson Department of Geology and Geography West Virginia University Morgantown, WV.
Variations of Elemental Concentration in PM 10 and PM June 2007,Colombo. M.C. Shirani Seneviratne Head, Nulear Analytical Services Sec. Atomic.
1 4. Model constraints Quimiometria Teórica e Aplicada Instituto de Química - UNICAMP.
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
L Berkley Davis Copyright 2009 MER301: Engineering Reliability Lecture 12 1 MER301: Engineering Reliability LECTURE 12: Chapter 6: Linear Regression Analysis.
Proposal for estimation of surface water bodies background levels for selected metals Slovak Republic.
NPS Source Attribution Modeling Deterministic Models Dispersion or deterministic models Receptor Models Analysis of Spatial & Temporal Patterns Back Trajectory.
Towards a Robust and Model- Independent GNSS RO Climate Data Record Chi O. Ao and Anthony J. Mannucci 12/2/15CLARREO SDT Meeting, Hampton, VA1 © 2015 California.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Source apportionment of submicron organic aerosols at an urban site by linear unmixing of aerosol mass spectra V. A. Lanz 1, M. R. Alfarra 2, U. Baltensperger.
Effects of origin, genotype, harvest year and their interactions on stable isotope, multi-element and near-infrared fingerprints in wheat Boli Guo, Yimin.
Daiwen Kang 1, Rohit Mathur 2, S. Trivikrama Rao 2 1 Science and Technology Corporation 2 Atmospheric Sciences Modeling Division ARL/NOAA NERL/U.S. EPA.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Source Contribution to PM 2.5 and Visibility Impairment in Two Class I Areas Using Positive Matrix Factorization Keith Rose EPA, Region 10 June 22, 2005.
Statistical Forecasting
Inference about the slope parameter and correlation
Statistics for Political Science Levin and Fox Chapter 11:
Periodic Table Physical Properties
Statistical analysis.
Simulation of PM2.5 Trace Elements in Detroit using CMAQ
Multiple Regression Prof. Andy Field.
Analysis of Survey Results
Source Apportionment of Water Soluble Elements, EC/OC, and BrC by PMF
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
More on Specification and Data Issues
Multiple Regression.
More on Specification and Data Issues
Advanced Quantitative Analysis
EASIUR: A Reduced-Complexity Model Derived from CAMx
Quality Control at a Local Brewery
Statistical Methods For Engineers
Data Analysis Learning from Data
Principal Component Analysis (PCA)
Econ 3790: Business and Economics Statistics
Source Apportionment of PM2.5 With CMB8
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Ionic Bonding.
Multivariate Analysis of a Carbonate Chemistry Time-Series Study
PCA of Waimea Wave Climate
Linear Regression and Correlation
Chapter_19 Factor Analysis
Source identification of aerosols in Mexico City
Linear Regression and Correlation
More on Specification and Data Issues
Regression and Correlation of Data
Residuals (resids).
PM10 trends in Switzerland using random forest models
Presentation transcript:

Md Firoz Khan, Mohd Talib Latif, Norhaniza Amil Source Apportionment of particulate Matter Using Principal Component Analysis and Positive Matrix Factorisation Md Firoz Khan, Mohd Talib Latif, Norhaniza Amil School of Environmental and Natural Resource Sciences, Universiti Kebangsaan Malaysia, Bangi, Malaysia Research Centre for Tropical Climate Change System, Institute of Climate Change, Universiti Kebangsaan Malaysia, Bangi, Malaysia Institute for Environment and Development (Lestari), Universiti Kebangsaan Malaysia, Bangi, Malaysia Atmospheric Chemistry and Air Pollution Research Group

Determination of Air Pollution Sources Schematic representations of the different methods for source identification (EU 2014) Atmospheric Chemistry and Air Pollution Research Group

Receptor Model Atmospheric Chemistry and Air Pollution Research Group

Principal Component Analysis It is a way of identifying patterns in data, and expressing the data in such a way as to highlight their similarities and differences. Principal component analysis (PCA) is also a technique used to emphasize variation and bring out strong patterns in a dataset. It's often used to make data easy to explore and visualize. Atmospheric Chemistry and Air Pollution Research Group

Toy Example Pretend we are studying the motion of the physicist’s ideal spring. This system consists of a ball of mass m attached to a massless, frictionless spring. In other words, the underlying dynamics can be expressed as a function of a single variable x. Atmospheric Chemistry and Air Pollution Research Group

Source-compositions Receptor models (e.g., PMF, CMB, PCA) Monitor

Source Apportionment Models Widely Used Models Other Available Models PCA/ APCS - simplified model Weighted APCS - deals “zero score” but lack of non-negativity requirement PCA/Absolute principal component score(APCS) EPA‘S Chemical Mass Balance (CMB) PMF is complicated and robust model PMF - lower uncertainty and stop producing zero factor score, requires component loadings and scores to be non-negative Capable of identifying sources without any prior knowledge of sources Unmix Positive Matrix Factorization (PMF) Artificial Neural Networks-Source receptor modelling

PCA/APCS/PMF/MLRA address with the following formula Measurement error Normalized data Source contribution Source profile Atmospheric Chemistry and Air Pollution Research Group

Data matrix Data matrix Source contribution Profiles Atmospheric Chemistry and Air Pollution Research Group

Preparation of Database Common problem Systematic bias-analysis by different labs or different methods Presence of data below MDL Presence of coelution (non-target analytes that elute at the same time as a target analyte) Data entry, identify outliers Noisy data Missing data Exclude variables if missing >50% Atmospheric Chemistry and Air Pollution Research Group

Preparation of Database Continue.. Replace data below MDL with MDL/2 Replace missing data with average value of nearby data, or simply the average of the variable concentration Data normalization or conversion of the data into unit less or zero/centered mean Adequate number of data point and variables Atmospheric Chemistry and Air Pollution Research Group

Independent (y) and Dependent (x) Missing value Step 1: Get Data Suitable data (N) Independent (y) and Dependent (x) Missing value Atmospheric Chemistry and Air Pollution Research Group

Adequate number of data set No of data point must be more than no of variables No of data point should be 5 times of variables N > or = 100 samples (PK Hopke) N>(30+p+3)/2 (Henry et al 1984) N=50 (source unknown) N=30 (magic number!) Suitability test (KMO and Bartlett’s test): Our suggestions!! Atmospheric Chemistry and Air Pollution Research Group

Optimization Factor >1 Eigen value Variance (%) ~ 10 or >10 Interpretable factor profiles At least one variables should response significantly Exclude variable if doesn’t response to any factor either! Atmospheric Chemistry and Air Pollution Research Group

Step 1: Get Data Suitable data (N) Missing value Atmospheric Chemistry and Air Pollution Research Group

Step 2: Normalize the Data in Excel X – Average ----------------- Stdev Use “$” for Average and Standard Deviation Paste formula e.g. =SUM(H3-H$632)/H$633 Atmospheric Chemistry and Air Pollution Research Group

Upload data into SPSS Atmospheric Chemistry and Air Pollution Research Group

Step 3: Suitability of the Data KMO and Bartlett’s test Atmospheric Chemistry and Air Pollution Research Group

KMO and Bartlett’s test Atmospheric Chemistry and Air Pollution Research Group

Step 4: Run PCA Atmospheric Chemistry and Air Pollution Research Group

Run PCA Atmospheric Chemistry and Air Pollution Research Group

PCA Results Atmospheric Chemistry and Air Pollution Research Group

PCA Results Atmospheric Chemistry and Air Pollution Research Group

Step 5: Copy and paste the Factor Scores in a Excel Sheet from Step 4 Atmospheric Chemistry and Air Pollution Research Group

Step 6: Prepare a New Raw Data Set Adding a Zero Sample at the End of the Row Atmospheric Chemistry and Air Pollution Research Group

Step 7: Run PCA for the Second Time Atmospheric Chemistry and Air Pollution Research Group

Step 8: Copy and paste the Factor Scores in a Excel Sheet from Step 7 Atmospheric Chemistry and Air Pollution Research Group

The revised factor scores are recognized here APCS Step 9: Subtract the Factor score for Zero Sample from the Each Sample in Step-8 The revised factor scores are recognized here APCS Atmospheric Chemistry and Air Pollution Research Group

Step 10: Run MLR using PM2.5 mass as Dependent Variables and Each of the APCS is Independent Variable. Atmospheric Chemistry and Air Pollution Research Group

Step 10: Convert the APCS into Factor Mass by Multiplying the Respective Regression Coefficient Atmospheric Chemistry and Air Pollution Research Group

Atmospheric Chemistry and Air Pollution Research Group

Demonstration of US EPA PMF 5.0

Upload input files

Execution of the PMF model

Responses of PMF5.0 {Mg, Zn, Cu, Ni, Ca2+} {Pb, NH4+, K+, As, Cd, Zn, Ni, V } {NO3-} Slope = 0.91, R2 = 0.88 P < 0.01 {As, Ba, Sr, Se} PMF ….Fit line {Na+, Cl-, SO42-} HVS

Acknowledgement School of Environmental and Natural Resource Sciences, Universiti Kebangsaan Malaysia, Bangi, Malaysia Research Centre for Tropical Climate Change System, Institute of Climate Change, Universiti Kebangsaan Malaysia, Bangi, Malaysia Atmospheric Chemistry and Air Pollution Research Group Atmospheric Chemistry and Air Pollution Research Group