Uniting Free Government Data using PowerPivot Greg Beaumont, PM / BI Architect GNet Group
Welcome Greg R. Beaumont, MBA, PMP GNet Group University of Minnesota / Carlson School of Management – MBA St. Mary’s University of Minnesota – Pre-Med / Biology GNet Group www.gnetgroup.com
This is my first SQL Saturday presentation… Thank you for attending! Welcome This is my first SQL Saturday presentation… Thank you for attending! Follow Me: Twitter: @GRBeaumont GNet Group Blog: www.gnetgroup.wordpress.com
“Information is a source of learning “Information is a source of learning. But unless it is organized, processed, and available to the right people in a format for decision making, it is a burden, not a benefit.” William Pollard (1911-1989)
Why Government Data and PowerPivot? During projects and when looking for good demo data over the last several years, I have found interesting and useful data Data ranges in scope from weather, to economic, to geospatial, to healthcare, and much more Data is found in flat files, via FTP, via GUI interfaces, via Excel files, etc. PowerPivot is a quick and effective tool to pull in data from different sources and tie it all together It seemed like a fun topic!
Government Data State Agency Databases: Federal Data – Data.gov http://wikis.ala.org/godort/index.php/State_Agency_Databases Federal Data – Data.gov Now has an open API for developers http://explore.data.gov/ - United States International Data http://Data.gov.uk/ - United Kingdom http://Data.gov.au/ - Australia http://Data.gov.sg/ - Singapore
Other Agencies and Organizations http://wonder.cdc.gov/ http://deli.dnr.state.mn.us/ http://www.ncdc.noaa.gov/oa/ncdc.html http://www.fws.gov/gis/data/national/index.html http://www.itl.nist.gov/fipspubs/ http://www.nysl.nysed.gov/ils/topics/databases.htm http://data.worldbank.org/country/saudi-arabia http://seer.cancer.gov/ http://www.kazusa.or.jp/codon/codon.html TOO MANY TO LIST!!!!!
Google Public Data Explorer http://www.google.com/publicdata/directory
Microsoft – Open Government Data Initiative http://www.microsoft.com/industry/government/opengovdata/default.aspx
Bing Health Maps
Google Flu Trends
Common Keys Date State Zip Code Geospatial FIPS ICD9 / ICD10
Notes About the Demo The intent of this demo is to demonstrate that disparate sources of free data can be integrated in fun and interesting ways My demos today are not intended to establish causation, correlation, or to draw any conclusions When comparing disparate data sources, there is much more information needed to establish statistical conclusions
Example #1: Corn Commodity Data and the Weather Data Source #1: Historical Weather Data from the US Department of Energy Data Source #2: Feed Grains Database from the USDA Data Source #3: Date table generated with a custom SQL script
Example #2: CDC Mortality Healthcare and Economic Data Data Source #1: CDC Wonder Database Data Source #2: Bureau of Labor Statistics Data Source #3: Data.gov.sg – Singapore Labor Statistics Data Source #4: Date table generated with a custom SQL script
Demo Enjoy the Demo!
Problem with Free Data Today Bad – Multiple file and storage formats Worse – Lack of standardized, conformed dimensions and defined, expected levels of granularity Because of this there is demand for our work in BI! Worst – Introduction of GUIs that limit access to data
Final Thoughts The vast data that is currently out there is just the very beginning of what is to come Terabytes, Exabytes, Zettabytes Hive and Hadoop Tabular Model
Final Thoughts I envision a day, in our time, where global big data will seamlessly integrate to allow for the future descendants of BI tools to: Integrate Statistical tools to red-flag anomalies and changes for integrated multi-dimensional big data Predict droughts, food shortages, political upheaval Find new causes and cures for disease Detect health and safety hazards Decode the building blocks of life (many-to-many) Understand the Universe
Follow-Up I will post an entry on www.gnetgroup.wordpress.com for follow-up Post your own suggestions for sources of great data Have a follow-up conversation @GRBeaumont Greg.Beaumont@gnetgroup.com