2018 CMAS Users Forum: CMAS Data Warehouse and Modeling Platform

Slides:



Advertisements
Similar presentations
COMPREHENSIVE APPROACH TO INFORMATION SECURITY IN ADVANCED COMPANIES.
Advertisements

A Web-based Community Approach to Model Evaluation using AMET Saravanan Arunachalam 1, Nathan Rice 2 and Pradeepa Vennam 1 1 Institute for the Environment.
Which server is right for you? Get in Contact with us
Cyberinfrastructure Breakout Session Looking 5 years ahead... Broad themes that emerged: A) Calibration tools and skill assessment (MInt, as) B) More examples.
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
An Introduction to DuraCloud Carissa Smith, Partner Specialist Michele Kimpton, Project Director Bill Branan, Lead Software Developer Andrew Woods, Lead.
Cloud computing Tahani aljehani.
Document Processing Ways to centralize and streamline your Endangered Species Act document processing procedures.
For more notes and topics visit:
Three State Data Warehouse Tom Moore Shawn McClure, Rodger Ames, Dustin Schmidt
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Cloud Computing Characteristics A service provided by large internet-based specialised data centres that offers storage, processing and computer resources.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
1 MOBILE6 -Input and Modeling Guidance -SIP and Conformity Policy North American Vehicle Emission Control Conference Atlanta, April 4, 2001 Gary Dolce.
Community Multiscale Air Quality Modeling System CMAQ Air Quality Data Summit February 2008.
Cloud Computing Project By:Jessica, Fadiah, and Bill.
Carolina Environmental Program 1 UNC Chapel Hill A New Control Strategy Tool within the Emissions Modeling Framework Alison M. Eyth Carolina Environmental.
Virtual Classes Provides an Innovative App for Education that Stimulates Engagement and Sharing Content and Experiences in Office 365 MICROSOFT OFFICE.
1 TCS Confidential. 2 Objective : In this session we will be able to learn:  What is Cloud Computing?  Characteristics  Cloud Flavors  Cloud Deployment.
Advances in Support of the CMAQ Bidirectional Science Option for the Estimation of Ammonia Flux from Agricultural cropland Ellen Cooter U.S. EPA, National.
Websms Offers Professional Messaging Solutions via Web, , Gateway or Directly Out of Excel (Online) on the Microsoft Office 365 Platform OFFICE 365.
Prof. Jong-Moon Chung’s Lecture Notes at Yonsei University
Hosting for E-Commerce
INTRODUCTION TO WEB HOSTING
KasPer Pro HRMS with Self Service Brings a Fully Featured Human Resources Management Solution to the Office 365/SharePoint Online Environment OFFICE 365.
Course: Cluster, grid and cloud computing systems Course author: Prof
Accessing the VI-SEEM infrastructure
Amazon Web Services (aws)
WorkDiff Mobile, Scenario-Based Collaboration Solution WorkDiff Allows Users to Work Differently While Using Familiar Functions of Microsoft Office 365.
ESign365 Add-In Gives Enterprises and Their Users the Power to Seamlessly Edit and Send Documents for e-Signature Within Office 365 OFFICE 365 APP BUILDER.
Office 365 is cloud-based productivity, hosted by Microsoft.
GScan Online from GRADIENT ECM is a Native SharePoint App that Fully Enables Document Capture Within SharePoint Online and Office 365 OFFICE 365 APP BUILDER.
Utilize Internal Data via Mobile Business Apps
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
The Use of AMET and Automated Scripts for Model Evaluation
Amazon Storage- S3 and Glacier
What is Cloud Computing - How cloud computing help your Business?
SmartHOTEL Planner Add-In for Outlook: Office 365 Integration Enhances Room Planning, Booking, and Guest Management for Small Hotels and B&Bs OFFICE 365.
Boomerang Adds Smart Calendar Assistant and Reminders to Office 365 That Increase Productivity and Simplify Meeting Scheduling OFFICE 365 APP BUILDER.
Uniting Office 365 and PRINCE2, UPrince and Project Online Make Managing Structured Projects More Efficient Without Increased Overhead Costs OFFICE 365.
Webparts360: A Low-Code App Development Tool That Enables Non-Programmers to Build Business Solutions for Microsoft Office 365 Quickly, Easily OFFICE 365.
An Introduction to Cloud Computing
Letsignit, an Automated Signature Solution for Microsoft Office 365 and Microsoft Exchange, Provides Efficiency in Branding and Customization OFFICE.
Soft1 Open Enterprise Edition Allows Customers to Easily Synchronize Files Using Microsoft Office 365 and Seamlessly Store Any Information in SharePoint.
Booklet365 Office 365 Outlook Add-In Makes Easy Work of Managing Schedules for Fitness Gyms, Sports Associations, Trainers, and Their Customers Partner.
Tools and Services Workshop
Make Your Management and Board Meetings More Effective and Paperless with Microsoft Office 365, SharePoint, and the Pervasent Board Papers App Partner.
VI-SEEM Data Repository
Overview of the CMAS Center at UNC
A Modern Intranet Integration that Extends the Value of Your Microsoft Office 365 Deployment, Boosts Productivity, and Enhances Collaboration OFFICE 365.
Smart Org Charts in Microsoft Office 365: Securely Create, Collaborate, Edit, and Share Org Charts in PowerPoint and Online with OrgWeaver Software OFFICE.
Data uploading and sharing with CyVerse
AWS COURSE DEMO BY PROFESSIONAL-GURU. Amazon History Ladder & Offering.
AWS. Introduction AWS launched in 2006 from the internal infrastructure that Amazon.com built to handle its online retail operations. AWS was one of the.
SocialBoards Self-Service, Multichannel Support Ticket Notifications in Microsoft Office 365 Groups Help Customer Care Teams to Provide Better Care OFFICE.
MetaShare, Powered by Azure, Gives SharePoint a User-Friendly, Intuitive User Interface and Added App Features with No Added Administrative Tasks OFFICE.
It’s About Time – ScheduleMe Outlook Add-In for Office 365 Enables Users to Schedule Meetings Easily with People Outside of Your Organization Partner Logo.
+Vonus: An Intuitive, Cloud-Based Point-of-Sale Solution That’s Powered by Microsoft Office 365 with Tools to Increase Sales Using Social Media OFFICE.
COMPREHENSIVE APPROACH TO INFORMATION SECURITY IN ADVANCED COMPANIES
Dev Test on Windows Azure Solution in a Box
File Manager for Microsoft Office 365, SharePoint, and OneDrive: Extensible Via Custom Connectors in Enterprise Deployments, Ideal for End Users OFFICE.
Get Enterprise-Grade Call Handling and Control for Microsoft Office 365 and Skype for Business with the Bridge Boss-Admin Executive Console OFFICE 365.
Built on the Powerful Microsoft Office 365 Platform, My Intranet Boosts Efficiency with Support of Daily Tasks, Internal Communications and Collaboration.
Geospatial Data Use and sharing Concepts
EPA Office of Air Quality Planning and Standards
AWS Cloud Computing Masaki.
Letsignit, an Automated Signature Solution for Microsoft Office 365 and Microsoft Exchange, Provides Efficiency in Branding and Customization OFFICE.
Yooba File Sync: A Microsoft Office 365 Add-In That Syncs Sales Content in SharePoint Online to Yooba’s Sales Performance Management Solution OFFICE 365.
Reportin Integrates with Microsoft Office 365 to Provide an End-to-End Platform for Financial Teams That Simplifies Report Creation and Management OFFICE.
Dataverse for citing and sharing research data
Presentation transcript:

2018 CMAS Users Forum: CMAS Data Warehouse and Modeling Platform Facilitator : B.H. Baek Panelists : Alison Eyth (U.S. EPA) Barron Henderson (U.S. EPA) Weining Zhao (TCEQ) Talat Odman (Georgia Tech University) Mathew Alvarado (AER) Good Morning Everyone. Thanks for joining our 2018 CMAS Users Forum discussion. From time to time, we have been having this users forum with the CMAS participants to discuss various topics that matters to the air quality communities. Few years back, we had another user’s forum to discuss about the the concept of data, model and other tools sharing within the air quality community. Since then we have made some progress on that subject and we would like to share some of work we have started with US EPA and open this floor to you guys to hear what you want. To start the conversations, we have invited five panelists today across various fields. First, we have an emissions inventory modeling expert, Ms. Alison Eyth from Inventory and Analysis Group in U.S. EPA, Modeling expert across various field from air quality to climate, Dr. Barron Henderson from the air quality modeling group in US EPA, Also, we have Weining Zhao from Texas Commission on Environmental Quality who have been working on developing an air quality modeling web tool, Professor Talat Odman from Georgia Tech who represets the researchers in academia, and the last is Dr. Alvarado who recently worked on air quality modeling on cloud computing resources.

Motivations The m3users community would benefit from access to new, open datasets Modeling Inputs and outputs Share their own models, datasets and tools Better/Easier Discoverability With more download options Citations for models, datasets, and tools How can we make modeling setup and runs easy, affordable, and portable? Easier Installation and setup of modeling systems Low cost option for air quality modeling system Global Support 17th Annual CMAS Conference October 24, 2018

Panelist Presentations Barron Henderson: AQ Modeling Group, OAQPS, U.S. EPA Alison Eyth: Inventory and Analysis Group, OAPQS, U.S. EPA Weining Zhao: Texas Commission on Environmental Quality (TCEQ) Matthew Alvarado: Atmospheric and Environment Research (AER) Talat Odman: Professor, Georgia Tech University 17th Annual CMAS Conference October 24, 2018

CMAS Cloud Data Warehouse Product Storage (GB) CMAQ 876 12US1 (2002-2014) 68 per year SMOKE 2014fd_2015fd_2016fd 300 NEIshare 506 Onroad (MOVES) 83 Surrogates 25 MCIP soas_2013_cb6 875 WRFv3.8_12US_2015 4,354 WRFv3.8_4NE2_2015 565 IC/BC 2014_12US1 0.660 cb6r3_ae6_aq 359 Total 8,588 17th Annual CMAS Conference October 24, 2018

CMAS Data Warehouse:Dataverse 17th Annual CMAS Conference October 24, 2018

Dataverse Features: Metadata Support Versioning Capability Discoverability and Searchability Published Data and Models Citation DOI : Digital Object Identifier Data File types Data Provenance Any types of datasets: NCF, JPG, Rdata, Various Tabular formats, Geospatial, Compressed Files,,, Cloud Storage + Computing 17th Annual CMAS Conference October 24, 2018

CMAS Cloud Modeling Platform Cloud Computing Platforms: Google Cloud Platform and Amazon Web Services Cloud Ecosystem: Direct Access to the Cloud Data Storage Cloud-based Modeling System Instances Individual or multiple modeling system Instances CMAQ, SMOKE, WRF, EMF, AMET, Spatial Allocator, Speciation Tool Instances including input datasets, and run scripts 17th Annual CMAS Conference October 24, 2018

Benefits of CMAS Cloud Modeling Platform Easy to Scale Up (More Computing Power and Storage) Facilitate Collaborations (Sharing identical Instances) No Hardware and System Maintenance and Updates No full-time IT Support Staff (No Overhead) On-Demand Usages: (Low Modeling Costs) Data Safety and Automatic Backups World-Wide Zone Support for the fastest connectivity to the Cloud Platforms 17th Annual CMAS Conference October 24, 2018

Use Case: MARAMA EMF-SMOKE-MOVES in AWS Amazon Elastic Block Store (EBS), LTS (Long-Term Stoarge), NFS (Network File System) 17th Annual CMAS Conference October 24, 2018

Use Case: Cost Estimates using AWS On-Demand instance Shut down during the night and the weekend Cost Estimates: [Total : Average $300/month] EMF on AWS Instance m3.xlarge ($0.266/hr) : 226 hours ≃ $71/month 2 TB shared EBS HDDs: 0.10 per GB-Month ≃ $200 per month SMOKE Modeling Runs on m3.xlarge instance ($0.266/hr) One month of regional SMOKE run: 22 hours run times = $6 /month MOVES modeling runs on m4.xlarge instance ($0.252 per hour) One reference county MOVES 2014a : 80 hours ≃ Approx. $20 per county 304 reference counties * $20 ≃ over $6,000 for CONUS modeling domain EBS (Elastic Block Store) 17th Annual CMAS Conference October 24, 2018

PollEv.com/cmas What is your organization sector? What is the Biggest Challenges in your current workflow? What Models/Tools do you use in your current workflow? What pre-built images on Cloud Computing would you want to use? 17th Annual CMAS Conference October 24, 2018

Data Sharing from a Few Perspectives Barron H. Henderson Private consultant (2002-2006, and here and there) Academic at the UNC and UF (2006-2016) Government Scientist at EPA (2016-present)

Competing Generator and User interests Generating data takes your time and is risky If the process is simple and quick, will the research community accept the methods? Will the regulatory community accept the results? Data Users benefit by collecting data Storing data is cheap (for many organizations) Time lost hurts: profit, publications, other? So the tendency is to request data from providers/generators… even if you might not need it Zac Adelman told me he called folks that do this “data collectors” Perhaps there is a role for “data collectors”…

Ideal world Data generators want to share Data collectors act as intermediate repositories Data users only get what they need from data collectors This is really the Amazon Web Services model (and others) Data generators may generate on AWS and store data Agreements can be reached to host data “long-term”, where AWS acts as the data collector Data users can independently connect to the data store.

My AWS Experience CPU time on AWS is reasonably cheap Data storage on AWS is reasonably cheap Data download is expensive Incentive: generate where ever, store in cloud, download only what is necessary and/or use in the cloud Dr. Alvarado has good news on these fronts Reality check Most users do not need to look at every 4-D DENS variable in MCIP Most users do not need to look at every 4-D XO2 variable in CONC Most don’t want WRF… etc

Cloud Computing Limitations By itself, cloud computing does not solve our organization problems. Cataloguing and discovery is a separate equally important issue. Can we partner with library sciences? Geographic Information Systems already has made a lot of progress.

Appendix My initial thoughts on questions we will be asked.

1. What kind of data can be shared? Anything can be shared on cloud computing, because whole disks can be almost instantly shared

2. What are the most common constraints with sharing data? Data ownership and credits Proprietary value Fear of unconstructive criticism

3. How the data can be shared 3. How the data can be shared? Data discoverability and Data citation (DOI) Not sure I get this one availability of documentation that describes the data, how the data were generated, and any limitations/caveats about the data; ability of data users to interact with the data generators to ask questions about the data; and responsibility, if any, of the data generator to notify the data users when errors are discovered in the data and/or when updates are made.

4. How can we leverage cloud computing to facilitate Emissions, Air Quality Modeling and Data Analysis? Great question! Analysis systems are notoriously hard to install. Clonable nodes with analysis systems seems like a great place to start. Clonable data stores means easy application of analysis systems to other data stores. Instead of getting data providers to put data in special formats and upload to special systems, analysis systems need to be more nimble Meta-data conventions are important Analysis system conventions is key too.

5. What is your experience with the cost (financial and personnel) of using cloud computing versus purchasing in-house hardware? Benefits of in-house is of budget applications Buy once Use til it dies Increasingly against the “rules” Benefits of cloud computing Management costs included in price Package management mostly included in price Backup availability included in price Amazon likely has lower overhead per node than smaller providers

6. What benefits or advances do you foresee from data sharing and cloud computing? Connecting systems to data and data to systems (see question 4) Are you willing to throw more resources at a problem? Now you could.

Data Sharing and Cloud Computing for Emissions and Air Quality Modeling Alison Eyth U.S. EPA Office of Air Quality Planning & Standards Emission Inventory and Analysis Group October 24, 2018

Data Sharing Requests EPA receives dozens of requests for emissions and air quality model input and output data each year Requestors are from state/local governments, academia, consultants, non-profits, regional organizations, etc. Requests are for SMOKE, WRF, CMAQ, and CAMx input and output files for regulatory and non-regulatory cases Most SMOKE input files are small enough to make available on EPA’s web/FTP site, but reassembly for use can be tricky https://www.epa.gov/air-emissions-modeling Many other files are too large for the FTP area and are transferred via physical hard drives For popular data, sometimes we transfer drives to regional organizations who send the drive around to the others Servicing of data requests requires significant staff time Q: who has asked for data from EPA?

Future Directions in Data Sharing Today’s network speeds make it easier to transfer larger data sets Would like to get away from shipping drives around Ideal solutions allow us to post data once, where everyone who needs the data can retrieve them OAQPS is working with UNC to post full distributions of emissions modeling inputs and selected outputs https://drive.google.com/drive/folders/1caRJVHx_SzY0sSD6DL- TE7rgAoSDqrt- Helps reduce issues with reassembly compared to the split up versions on our FTP site Also posting some sector-specific SMOKE outputs More work on this is needed (2016 platform beta and v1.0)

Use of Cloud Computing Cloud computing is useful for easily decomposed problems Use MOVES to compute winter and summer onroad emission factors (EFs) for each of 300 representative counties Use independent nodes / instances to run each of the 600 county-months and download EFs after runs complete Cloud computing could make emissions and AQ modeling accessible to more organizations Provide working compiled versions of models (yay!) Make available key input data sets with working scripts Streamline applications to new grids / scenarios Reduces need for hardware and system administration

EPA’s 2019 International Emission Inventory Conference Biennial conference that connects offices across EPA, regional/state/local/tribal staff, researchers, consultants, and students who work on various aspects of emissions inventory development 2019 Theme: “Collaborative Partnerships to Advance Science and Policy” July 29-August 2, 2019 in Dallas, Texas Training on Monday, July 29 Tuesday-Friday: plenary session, technical sessions, lightning talks More info coming soon to: https://www.epa.gov/ air-emissions-inventories/international-emission- inventory-conference

Air Quality Modeling Data on TCEQ Website https://www.tceq.texas.gov/airquality/airmod/data Weining Zhao, Texas Commission on Environmental Quality, October 24, 2018

Texas Ozone SIP Modeling Data (2012 Episodes) Weining Zhao, Texas Commission on Environmental Quality, October 24, 2018

TCEQ Modeling Data FTP Site 2.5 TB raw EI data CAMx input and output data no WRF data Weining Zhao, Texas Commission on Environmental Quality, October 24, 2018

Interactive Time-Series Evaluation Tool Weining Zhao, Texas Commission on Environmental Quality, October 24, 2018

Ozone Design Value Visualization Tool Ozone Transport SIP Modeling Weining Zhao, Texas Commission on Environmental Quality, October 24, 2018

Cloud Computing: An Academic Perspective Talat Odman and Kevin Kelly 17th Annual CMAS Conference Chapel Hill, North Carolina October 24, 2018

Challenge - Moving the Data Too much data! Copying TBs over the Internet Variable speeds Sometimes lacks reliability Wait hours to find out results

Cloud HPC On-Demand Capacity Public Cloud HPC Computation Environment Data Store End Users

Cloud Computing vs On-Premise Optimize costs for spiky compute patterns On-demand for unexpected compute needs Increase capacity without commitments Control user access and billing Provide users a familiar HPC platform Manage hybrid compute by splitting workloads between on-premise and cloud

AQcast System for CMAQ on Amazon Cloud (Also have CAMx and WRF-Chem variants) Features Automatically performs all pre-processing steps, including weather modeling Flexible and simple interface Benefits High number of simultaneous runs (ensembles, sensitivity studies) Reduces labor cost of modeling studies

NASA Applied Science Project: Use AQcast to improve monthly-mean NH3 emissions

“Let me explain. No, there is too much. Let me sum up.” Inigo Montoya, The Princess Bride Sharing data is a great idea! Cuts down on unnecessary duplication Facilitates review, reproducibility, and sensitivity studies A CMAS Data Warehouse is a great idea! Providing access to complete input/output data from past projects would be a great benefit to the community Releasing CMAQ and other models as virtual machines or containers is a great idea! All libraries already included No more compiler/OS errors wasting user’s time

“Let me explain. No, there is too much. Let me sum up.” Inigo Montoya, The Princess Bride Cloud computing has effectively no limits, so you need to set your own. Controlling costs requires planning ahead. Sharing data is great, but moving data is expensive Data and computation should stay in the same cloud Need methods to review runs without downloading data CMAS Data Warehouse should include complete projects Multiple project types with links to ALL of the input data, model containers/VMs, run scripts, and output Different run types (e.g, HDDM, Adjoint) in separate containers? Need to consider Warehouse cost and permission issues Is everyone going to have read/write/execute permission? Who pays for storage? Downloads? Negotiating discounts will likely require picking a cloud provider

Questions? Email Matt Alvarado (malvarad@aer.com)