Air Quality Data Analysis Using Open Source Tools

Slides:



Advertisements
Similar presentations
© 2013 IBM Corporation Reducing Cost with R in IBM Storage Products Manufacturing Elaine Jones Integrated Supply Chain Engineering.
Advertisements

Dream Report: Advanced Manual Data Entry
Dream Report: The Technical Basics Renee Sikes Applications Engineer Dream Report Brand Manager.
This presentation is intended as a detailed WebEx, to bring potential customers to an understanding of Dream Report capabilities. This presentation focuses.
Solving Automation Reporting Problems with Dream Report Renee Sikes Applications Engineer Dream Report Brand Manager.
A.Q.M.E.I.S.: Air Quality Meteorological and Environmental Information System in Western Macedonia, Hellas International Conference on Buildings Energy.
Page 1 More information at; gaddsoftware.comgaddsoftware.com.
Principles of Internet Marketing Chapter 10 Programs and Languages.
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
GIS Topics and Applications
Electrical and Computer Engineering PeopleFinder Vitaly Gordievsky Alex Trefonas Scott Richard Matt Beckford Midway Design Review.
LCT2506 Internet 2 Data-driven web sites Week 5. LCT2506 Internet 2 Current Practice  Combining web pages and data stored in a relational database is.
PHP: HYPERTEXT PRE PROCESSOR BY: KAILA ULINE, HILARY PETROKUBI, HAIDAN HU, EMILY MARTIN.
Supplement 02CASE Tools1 Supplement 02 - Case Tools And Franchise Colleges By MANSHA NAWAZ.
Computer Skills Preparatory Year Presented by: L.Obead Alhadreti.
Web-Enabling the Warehouse Chapter 16. Benefits of Web-Enabling a Data Warehouse Better-informed decision making Lower costs of deployment and management.
RESEARCH HUB AT THE UNIVERSITY LIBRARIES PENN STATE UNIVERSITY TOUR OF STATISTICAL PACKAGES.
NODEM reporter Flexible Information Generating Tool for Asset Managers.
Systems Software Operating Systems.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Introducing Dream Report Win Worrall Applications and Development Engineer.
CRYSTAL REPORTS Jacob Grogan. CRYSTAL REPORTS AND WHY IT’S USEFUL? “ Crystal Reports is a popular Windows-based report generation program that allows.
Chapter 5 Application Software.
The PHP Story. PHP Story PHP is a programming language. Incorporate(join) sophisticated business logic. Widely used general purpose scripting language.
Cool:gen CIS 764, Fall 2007 Presentation By Mandar Haridas.
Office of Research and Development National Exposure Research Laboratory, Atmospheric Modeling Division, Applied Modeling Research Branch October 8, 2008.
Industrial Project (234313) Final Presentation “App Analyzer” Deliver the right apps users want! (VMware) Students: Edward Khachatryan & Elina Zharikov.
Samuvel Johnson nd MCA B. Contents  Introduction to Real-time systems  Two main types of system  Testing real-time software  Difficulties.
UML Tools ● UML is a language, not a tool ● UML tools make use of UML possible ● Choice of tools, for individual or group use, has a large affect on acceptance.
Manage Engine: Q Engine. What is it?  Tool developed by Manage Engine that allows one to test web applications using a variety of different tests to.
OPC Database.NET. OPC Systems.NET What is OPC Systems.NET? OPC Systems.NET is a suite of.NET and HTML5 products for SCADA, HMI, Data Historian, and live.
Basics of Web Databases With the advent of Web database technology, Web pages are no longer static, but dynamic with connection to a back-end database.
About Dynamic Sites (Front End / Back End Implementations) by Janssen & Associates Affordable Website Solutions for Individuals and Small Businesses.
SednaSpace A software development platform for all delivers SOA and BPM.
Introduction to ColdFusion Penn State Web 2001 Conference Brian Panulla Elmwood Media Group, LLC.
Quality Attributes of Web Software Applications – Jeff Offutt By Julia Erdman SE 510 October 8, 2003.
COLD FUSION Deepak Sethi. What is it…. Cold fusion is a complete web application server mainly used for developing e-business applications. It allows.
Page 1 GADD Software & GADD Analytics 1.5 Public version, January 2015, gaddsoftware.com GADD Analytics.
October 28,2010 Carmen Maso’ – Region 5 National GIS Workgroup Chair.
Fundamentals of Database Chapter 7 Database Technologies.
M1G Introduction to Database Development 6. Building Applications.
Part 1. Persistent Data Web applications remember your setting by means of a database linked to the site.
Introduction to Database Management. 1-2 Outline  Database characteristics  DBMS features  Architectures  Organizational roles.
Chapter 1 Introduction to SAS ® Enterprise Guide ®
MET280: Computing for Bioinformatics Introduction to databases What is a database? Not a spreadsheet. Data types and uses DBMS (DataBase Management System)
Yokogawa Electric Corporation Copyright © Yokogawa Electric Corporation Release 2.10 Functionality Overview September 2004.
Python technology Python is a programming language that lets you work more quickly and integrate your systems more effectively. You can learn to use Python.
Increased Lower Cost. IT Services Customized Programming Data Processing Reports Mechanized Data Entry Server Support / PC Support Systems.
Putting it all together Dynamic Data Base Access Norman White Stern School of Business.
Contents 1.Introduction, architecture 2.Live demonstration 3.Extensibility.
Systems Software Operating Systems. What is software? Software is the term that we use for all the programs and data that we use with a computer system.
PHP Features. Features Clean syntax. Object-oriented fundamentals. An extensible architecture that encourages innovation. Support for both current and.
DATA, SITE AND RESOURCE MANAGEMENT SOFTWARE. A Windows application software designed for use with Stylitis data loggers. EMMETRON consolidates resources,
1Computer Sciences Department Princess Nourah bint Abdulrahman University.
What is Sure Stats? Sure Stats is an add-on for SAP that provides Organizations with detailed Statistical Information about how their SAP system is being.
© Paradigm Publishing Inc. 5-1 Chapter 5 Application Software.
Automatic Report Generation for WLCG/EGEE D. D. Sonvane (Gridview Team) B.A.R.C.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 14 Database Connectivity and Web Technologies.
Yokogawa Global MES Solution Centre (GMSC)
An Investigation into using a Document Management System Presented by: Bijal RanaSupervisor: John Ebden.
Performance Testing Test Complete. Performance testing and its sub categories Performance testing is performed, to determine how fast some aspect of a.
Basics Components of Web Design & Development Basics, Components, Design and Development.
PHP is a server scripting language, and a powerful tool for making dynamic and interactive Web pages. PHP is a widely-used, free, and efficient alternative.
Architectural Description The Wind application is based on the JDDAC platform. The system is comprised of a network of weather stations responsible.
Software Testing Training Online. Software testing is ruling the software business in current scenario. It provides an objective, independent view of.
PHP Basics and Syntax Lesson 3 ITBS2203 E-Commerce for IT.
Intermountain West Data Warehouse - Western Air Quality Study
PHP / MySQL Introduction
Web Application Development Using PHP
Presentation transcript:

Air Quality Data Analysis Using Open Source Tools Daniel Garver and Ryan Brown US EPA Region 4

The Problem Do you ever want to do more than is possible through a standard AQS report? And have it automated? Do you wish you had additional software but don’t have the resources? Software like SAS, SigmaPlot, Matlab, etc. is useful for air quality data analysis. For many agencies, however, these tools are cost prohibitive. Do you have routine reports or analyses that you currently generate manually and are resource/time intensive? Do you ever want to do more than is possible through a standard AQS report? And have it automated? Do you wish you had additional software but don’t have the resources? There’s great commercial software out there like SAS, SigmaPlot, Matlab,… And these tool could provide immense value for air quality data analysis, but they’re expensive. Do you have routine reports or analyses that you currently generate manually and are resource/time intensive? Python tools

Outline About Python SQL databases and Python - SQLAlchemy Region 4 Examples: Ozone site correlations Air monitoring geodatabase Automated wind roses and pollution roses Other potential applications

What is Python?

“Python is a programming language that lets you work more quickly and integrate your systems more effectively. You can learn to use Python and see almost immediate gains in productivity and lower maintenance costs.” - Python.org Extremely portable and compatible – the same code runs on Windows, Linux/Unix, Mac OS X and works with existing code in C, C++, Fortran, Java, R, and HTML Open Source License – free to use (unlimited number of users and licenses) and immediately available for many diverse applications Strong user base that continually develops and supports python – free add-ons available for database integration, graphing, math/statistics, web development (22,846 add-ons currently available and growing) Support system is also community-based No traditional tech support Rather, a large online support community

Same functionality as widely available, costly, commercial software Free, open source add-ons similar to: Matlab SAS PDF writing software Graphing software HTML Web Interface software Database management/migration software

Using Python With Air Quality Data Air quality data is almost always stored in an SQL relational database Examples: AQS, AQS Datamart (Oracle) AIRNow (MySQL) Proprietary air monitoring data management systems such as E-DAS or AirVision (Usually MS SQL Server or Oracle) State or Local Agency Databases (various platforms) The first step is to access the data using Python… SQLAlchemy: a Python add-on

What is SQLAlchemy?

Python and SQLAlchemy SQLAlchemy is a Python add-on that interacts with relational databases The same Python code can communicate with any type of SQL database (e.g. Oracle, MySQL, SQL Server, Access) Python code with SQLAlchemy is easier to read and write than raw SQL Your data can then be used for other purposes in Python SQLAlchemy is used by organizations such as: Yelp! reddit Mozilla (firefox) Urban Airship Survey Monkey Lolapps SQLAlchemy can automatically map your existing database! – with the SQL Soup extension

SQLAlchemy – An Example This Python code… query = session.query(dm.dim_monitor).join(dm.dim_facility) Generates this SQL to the AQS Data Mart: SELECT aqsmart.dim_monitor.dim_monitor_key AS aqsmart_dim_monitor_dim__1, aqsmart.dim_monitor.dim_facility_key AS aqsmart_dim_monitor_dim__2, aqsmart.dim_monitor.dim_substance_key AS aqsmart_dim_monitor_dim__3, aqsmart.dim_monitor.mo_id AS aqsmart_dim_monitor_mo_id, aqsmart.dim_monitor.poc AS aqsmart_dim_monitor_poc, aqsmart.dim_monitor.measurement_scale AS aqsmart_dim_monitor_meas_4, aqsmart.dim_monitor.measurement_scale_definition AS aqsmart_dim_monitor_meas_5, aqsmart.dim_monitor.project_type_code AS aqsmart_dim_monitor_proj_6, aqsmart.dim_monitor.project_type AS aqsmart_dim_monitor_proj_7, aqsmart.dim_monitor.dominant_source AS aqsmart_dim_monitor_domi_8, aqsmart.dim_monitor.probe_location AS aqsmart_dim_monitor_prob_9, aqsmart.dim_monitor.probe_height AS aqsmart_dim_monitor_prob_a, aqsmart.dim_monitor.probe_horiz_distance AS aqsmart_dim_monitor_prob_b, aqsmart.dim_monitor.probe_vert_distance AS aqsmart_dim_monitor_prob_c, aqsmart.dim_monitor.sample_residence_time AS aqsmart_dim_monitor_samp_d, aqsmart.dim_monitor.unrestr_air_flow_ind AS aqsmart_dim_monitor_unre_e, aqsmart.dim_monitor.surrogate_ind AS aqsmart_dim_monitor_surr_f, aqsmart.dim_monitor.collaborating_programs AS aqsmart_dim_monitor_coll_10, aqsmart.dim_monitor.last_sampling_date AS aqsmart_dim_monitor_last_11, aqsmart.dim_monitor.etl_last_load_process AS aqsmart_dim_monitor_etl__12, aqsmart.dim_monitor.etl_last_load_date AS aqsmart_dim_monitor_etl__13 FROM aqsmart.dim_monitor JOIN aqsmart.dim_facility ON aqsmart.dim_facility.dim_facility_key = aqsmart.dim_monitor.dim_facility_key

Once you have access to your air quality data in Python, you can: Perform statistical analysis Create graphs Export data for other applications (e.g. Excel) Display your data in a web interface Create PDF reports Load data into another database (e.g. for GIS) Perform geoprocessing Perform automated analysis for data validation or QA/QC

An AQS Example: Ozone Site Comparison Regressions (with Graphs!) Can be run in minutes! Continually useful! Graphs!

Ozone Site Comparison Regressions Compares one site with all other sites in the CBSA based on daily maximum 8-hr average Generates nonparametric Theil-Sen Regression plots of each site pair Calculates correlation statistics Spearman’s r, Pearson’s r, etc. Potential uses: Evaluating network design (identifying redundant sites) Modeling concentrations in the event of missing data Outputs results in a pdf and a text file table

Ozone Site Comparison Matrix Site A Site B n Spearman r Spearman p-val Pearson r Pearson p-val Theil slope Theil int 47-093-1020-44201-1 47-093-0021-44201-1 1444 0.951914952 0.943372755 0.94118 0.00185 47-063-0003-44201-1 292 0.941248585 9.46E-139 0.933572848 2.90E-131 0.91892 7.00E-05 47-105-0108-44201-1 93 0.912599594 4.10E-37 0.900784971 1.00E-34 0.75 0.01025 47-089-0002-44201-1 1417 0.877047948 0.733866562 6.64E-240 1 -0.001 47-105-0109-44201-1 1327 0.832461933 0.704831416 7.99E-200 0.88889 0.00411 47-001-0101-44201-1 1420 0.825591772 0.693255698 5.94E-204 0.92453 0.00421 47-009-0101-44201-1 1482 0.747143127 7.83E-265 0.641101197 2.45E-172 0.91667 -0.00271 47-155-0101-44201-1 1484 0.726548204 8.57E-244 0.619745174 3.62E-158 1.03571 -0.00966 47-155-0102-44201-1 1102 0.655271896 3.59E-136 0.474699528 5.18E-63 -0.00524 47-009-0102-44201-1 1425 0.637935642 1.17E-163 0.535401114 1.65E-106 0.85714 0.00771

Regression Plots: Selected Southeast Sites

Example: An Air Monitoring Geodatabase Custom AQS Data Mart Queries New database design is created in geodatabase Data loaded into a geodatabase Python scripting in GIS can automatically create custom layers and feature classes

Example: Air Pollution Wind Roses Customized AQS Data Mart query pulls: Concentration data over a certain value (e.g. SO2 > 75 ppb) Corresponding wind speed and direction for those hours with high concentration Another Python add-on is used to generate a wind rose or pollution rose of the hours with high concentration

Other possibilities for air quality data Object wrappers/framework for retrieving data from specific databases (e.g. AQS Datamart) Could design complex queries with less coding Speed up development time for specific analysis, graphing, web development functionality Python powered web interface Data retrieval, analysis/graphics, and web connectivity can be coded all together

Summary Agencies rely on air quality data analysis for planning and decision making State, Local, and Federal Budgets are being cut Data analysis automation can provide real value Time savings Better, more reliable analysis Python is a free tool that can help achieve this goal: Easy to learn and read Ideal for scientists and engineers (part time programmers)

Additional Resources Daniel Garver Ryan Brown Python: www.python.org garver.daniel@epa.gov 404-562-9839 Ryan Brown brown.ryan@epa.gov 404-562-9147 Python: www.python.org SQLAlchemy: www.sqlalchemy.org Codecademy Python course (free): www.codecademy.com/tracks/python We are happy to share our code/knowledge and collaborate with anyone else interested.

Questions?