Greg R Beaumont October 25, 2014 A New Frontier of Government & Open Data: What it Means to You & Microsoft BI Tools Greg R Beaumont October 25, 2014
About Me I have worked on Microsoft BI stack projects since 2007 Using Excel since 1998 (Undergraduate Thesis) Consulting for GNet Group since 2009: http://www.gnetgroup.com Education: St. Mary’s University of Minnesota – B.A. in Biology/Pre-Med Carlson School of Management (University of Minnesota) – MBA Self-Directed Learning – Business Intelligence @GRBeaumont linkedin.com/in/gregbeaumont
Agenda Open Data history, status, and examples Microsoft BI tools and Open Data Demos of Open Data with MSBI tools OpenFDA API (Excel, Power Query) Seattle Real Time Fire 911 Calls API (Excel, Power Query, PowerPivot, Power View, Power Map) Chicago Crime & Weather mashup (Excel, Power Pivot, Power View, Azure Machine Learning) The Future of Open Data Recommended Links Q & A
What is Open Data? “Open Data and content can be freely used, modified, and shared by anyone for any purpose” Opendefinition.org
The History & Evolution of Open Data Integrated Open Data Stored Data Open Data Scientific Data Government Data Organizational Data Governments Organizations Corporations BI Tools Linked Open Data Future Opportunities?
Open Data Goes Global
Open Data Site Examples United States Government – Data.gov UK Government – Data.gov.uk Australian Government – Data.gov.au City of Chicago – Data.cityofchicago.org Other Cities on Socrata platform – https://opendata.socrata.com/dataset/Socrata-Customer-Spotlights- Cities-Map/ivbs-benj World Bank Data - http://data.worldbank.org/ United Nations Data - http://data.un.org/ cBioPortal for Cancer Genomics (Memorial Sloan Kettering Cancer Center) - http://www.cbioportal.org/public-portal/ Many, many more…
Minneapolis Open Data The City of Minneapolis passed an Open Data policy in July of 2014: http://www.ci.minneapolis.mn.us/www/groups/public/@clerk/docum ents/webcontent/wcms1p-128978.pdf Open Twin Cities is a group of citizens involved with Open Data in Minnesota, and you can learn more about their group and meetings via Meetup.com: http://opentwincities.org/
Open Data & MSBI Tools Chicago Crime Open Data Chicago Crime Open Data Report NOAA Weather Open Data NOAA Weather Open Data Report
Open Data & MSBI Tools Chicago Crime Open Data Chicago Crime Open Data Report Chicago Crime & NOAA Weather Open Data Mashup Report NOAA Weather Open Data NOAA Weather Open Data Report
Level of complexity, effort, usage difficulty, know-how required MSBI Tool Spectrum Power View Reporting Services Excel & Excel Services PerformancePoint PowerPivot Least Business User Level of complexity, effort, usage difficulty, know-how required BI Dev Most
Team BI Personal BI Organizational BI PowerPivot for SharePoint PowerPivot for Excel Organizational BI Analysis Services (OLAP) + SharePoint
Excel Family of Tools Excel Power Query Power Map Power Pivot Power View
Team BI Personal BI Organizational BI PowerPivot for SharePoint PowerPivot for Excel Organizational BI Analysis Services (OLAP) + SharePoint
Power Query ETL for Personal BI Connect to Numerous File Types and Connections CSV, Text, XLS, XLSX, etc. SQL, Azure, Access, Oracle, IBM, MySQL, etc. OData APIs HDFS, Azure HDInsight (Big Data) Transform, Filter, Unpivot, Concatenate Data Data Imported into an Excel PivotTable
Power Query Demo for an API Data in the demo is from an OpenFDA Adverse Drug Event API: https://open.fda.gov/ The url used for the API in the demo is: https://api.fda.gov/drug/event.json?search=receivedate:[20040101+TO+201 60101]&count=receivedate The url specifies a count of events, by Receive Date, between Jan 1, 2004 and Jan 1, 2016 Other parameters for the API can be explored at: https://open.fda.gov/drug/event/
Power Query Demo for an API Go To Demo (Excel, Power Query)
Demo for a Real Time API Data in the demo is from the Seattle Real Time Fire 911 Calls API (Socrata SODA API): https://data.seattle.gov/Public-Safety/Seattle-Real-Time-Fire-911-Calls/kzjm- xkqj The url used for the API in the demo is: http://data.seattle.gov/resource/kzjm- xkqj.csv?$order=datetime%20DESC&$limit=300&$where=datetime%20%3E% 20'2014-07-01‘ The url specifies the most recent 300 events on rows with Address, Latitude, Longitude, Type, Date, Time, and Incident Number Other parameters for the API can be explored at: http://dev.socrata.com/
Go To Demo (Excel, Power Query, Power View, Power Map) Demo for a Real Time API Go To Demo (Excel, Power Query, Power View, Power Map)
Thoughts on APIs APIs are fantastic for organizations providing the data Limit the amount of data per query Efficient Track API usage APIs are challenging for BI professionals and Data Scientists “Give me all of the data!” Multiple queries required to get all of the data for cubes, data mining, etc.
Demo for a Data Mashup Open Data Set #1: City of Chicago Crimes 2001 – Present: https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2 Open Data Set #2: Weather Data for Chicago From NOAA: http://www.ncdc.noaa.gov/cdo-web/ OR From CDC: http://wonder.cdc.gov/nasa-nldas.html Open Data Set #3: Census Data - Selected socioeconomic indicators in Chicago, 2008 – 2012 https://data.cityofchicago.org/Health-Human-Services/Census-Data-Selected- socioeconomic-indicators-in-C/kn9c-c2s2 Dimension Data Sets: Date Dimension – Homemade in Excel Time Dimension – Homemade in Excel
A Few Idiosyncrasies in the Data I only used Weather data for Chicago Midway Airport. A formal solution should have all of the weather stations mapped to community areas. Community Area Socioeconomic Indicator Census Data is for 2008- 2012. The Crime Data is for 2001-2014. Population data for Community Areas is two columns for 2000 & 2010 All attempts were made in the reports to account for the slightly offset years of some of the data For Q & A…How do we determine whether data with idiosyncrasies should be compared?
Chicago Crime & Weather Mashup Find, define, and analyze the data Pull Data into Excel using Power Pivot and/or Power Query Architect the conformed model in Power Pivot Explore the data in Excel Create dashboards and reports using Excel, Power View, Power Map Look for trends and relationships in the data by slicing and dicing BI reports and dashboards Evaluate the data for additional analysis using distribution charts, control charts, and pareto charts in Excel Create a Machine Learning Model in Azure ML. Train and test the Model. Browse the results of the Azure ML model in Excel and review the accuracy for predicting future Chicago Crime trends based upon a weather forecast using an Azure Machine Learning (ML) predictive analytics model
Before We Start the Demo… Please note that while this demo has some entertaining implications, it is only a demo and conclusions should not be drawn from it: I do not know the accuracy of the Open Data sets for Chicago Crime or Weather I created this demo in a few days, and there has been no peer review Correlation Does Not Equal Causation If weather and crime rates correlate, it does not mean that one causes the other Weather might not cause crime rates to change, just as crime rates may not cause the weather to change. However, it is possible that they correlate and move together. Example: If Beer Sales are up at Minneapolis restaurants whenever Bratwurst Sales are up at Target Field in October, there is a correlation. The cause might be something totally separate, such as the Twins making the playoffs.
Demo for Chicago Crime & Weather Mashup Go To Demo (Excel, Power Query, Power View, Azure Machine Learning)
Chicago Crime and Weather Mashup Architecture (Quick Mashup for Demo) Community Areas (with Socioeconomic Indicator Data) Weather Chicago Crime Time Date
Community Areas (with Socioeconomic Indicator Data) Chicago Crime and Weather Mashup Architecture (for a Production Architecture) Community Areas (with Socioeconomic Indicator Data) Weather Date Time Chicago Crime
Video: Police Predict Crimes as in 'Minority Report' http://youtu.be/BVqqCckjH84
A Proposed Future Architecture for Open Data TODAY THE FUTURE? Data.gov Operational Data Store Conformed Data Warehouse or Architecture Data.gov.uk Scientific Data Minneapolis Open Data APIs Flat files OData Feeds Etc. Store History Data in One Place Physical Database Nonprofit Opportunity? Open Data Wiki? Conformed Keys Conformed Dimensions Standard Access Methods Logic Only or Physical? Direct Query Technology? Integrated Reports Self Service Reporting Reports by Non-IT folk Usable Data for the Masses
Where to start? Open Data Wiki-style approach ODS (Operational Data Store), standardized keys and agreed upon, standardized common dimensions (Date, Time, Zip Codes, Country-Region-City) Not every data set will conform, and many will have architectural differences to resolve (granularity, snowflake vs star schema, many-to-many, etc) Integration into an Organization’s Data Warehouse Org Dim Date Table Org Dim Table Open Data Dim Table Open Data Fact Table #1 Org Fact Table #1 Org Fact Table #2 Org Dim Zip Code 5 Table Org Dim Table
Recommended Data Sets for Organizational Use US Bureau of Labor Statistics Open Data (Inflation, CPI, Unemployment, etc.): http://www.bls.gov/data/ Weather Data: From NOAA: http://www.ncdc.noaa.gov/cdo-web/ OR From CDC: http://wonder.cdc.gov/nasa-nldas.html USDA Food Price Outlook: https://catalog.data.gov/dataset/food-price-outlook USDA Feed and Grain Data: http://www.ers.usda.gov/data-products/feed-grains-database/feed-grains- custom-query.aspx Data.Medicare.Gov https://data.medicare.gov/ Gas Price Data https://catalog.data.gov/dataset/retail-gasoline-prices-all-grades-areas-and- formulations-4338e I could go on for dozens of slides!
Some Use Case Examples Weather – Almost every type of organization can be impacted Economic Data – Unemployment by geography, CPI, GDP, etc are relevant to most for-profit companies Crime Data – Could be used for Government, Retail, Security, Fraud Detection Commodities Data – Food and Feed manufacturing, consumer spending habit projections, agricultural services Gas Prices – What organization doesn’t rely on logistics? Medicare Data – Claims and reimbursement by category, hospital, provider
Q & A @GRBeaumont linkedin.com/in/gregbeaumont