Data.gov Review of New and Existing Applications Brand K. Niemann, Rich W. LaValley, Dr. W. Chris Hardy Presentation to the Data Architecture Subcommittee (DAS) September 10, 2009 Advanced Concepts and Integrated Systems (ACIS) SAIC
© 2008 Science Applications International Corporation. All rights reserved. SAIC and the SAIC logo are registered trademarks of Science Applications International Corporation in the U.S. and/or other countries. 2 Overview Review Data Sets Review and Demonstrate New and Existing Applications Feedback and Comments Summary ToolsData COTS & GOTS, Desktop & Web Review variety of applications from a variety of sources
Review Datasets and Application Sources Data Sources (June 2009) Application Sources By the Numbers Data Sources Data-gov.tw.rpi.edu Applications: 11 Converted to RDF: 16 Apps for America Gov2.0 Expo lic/schedule/presentations 35 (5 Categories) Other: Palantir Government
See also Data.Gov dashboard The Giant Warehouse of Data
File Type Contributed
Influence the kind of applications that are developed.
Review the Challenge (Open Government) Gives us the tools and we will can do it ourselves. Lend your hand and your coding skills (Tim O’Reilly) 1. Be an Organizer 2. Volunteer skills, developers – parse a state – 50 states 3. Provide Specific Results, Work together 4. Visualize Data (Clay Johnson, Sunlight Labs) 5. Visually explore and interact with data to facilitate sense making (DAS, 9/10/2009)
Age of Visualization and Analysis Emerging Trends in Data Visualization, July 30,2009 DM Radio Heat Maps, Tag Clouds, Concepts Layers Widgets, Dashboards, Sliders, Filters
View of data over time is a story Heat Maps, Tag Clouds, Concepts Layers Widgets, Dashboards, Sliders, Filters
View of data over time is a story The Yield Curve
View of data over time is a story (temporal and geospatial characteristics) using-palantir
Efficient Access of Data Sources Data Imaging Direct, ad hoc extraction of selected data elements from a native file Representation of the content of the data extracted as an integer matrix Dates become integer in YYYYMMDD format Time becomes number of seconds after midnight Character names/descriptions assigned index values in table Numerical values expressed as integer with understood base Benefits Minimal overhead in configuration for data handling Significant compression of working files without loss of content Substantial acceleration of data retrieval and analysis capabilities achieved by: Reduction of tests to integer (=1 word) compares Exploiting matrix-based processing efficiencies
Efficient Access of Data Sources Date/Time of X-mission Router Serial Number Fault Code Example 479,921 records/69,588,548 bytes 1/7/2007 7:00: /7/2007 6:29: S26 1/1/2007 4:00: /1/2007 3:36: Z55 1/2/ :01: /2/ :01: J89 1/7/2007 0:01: /7/2007 0:01: Q66 1/7/ :00: /7/ :01: X44 1/5/2007 8:01: /5/2007 8:01: G49 1/7/ :00: /7/ :31: Z55 1/7/ :00: /7/ :36: Z55 1/5/ :01: /5/ :01: D39 1/5/2007 6:01: /5/2007 6:01: G29... Data Elements of Interest : 21,596,488 bytes Image Generation: 44.5 secs Image Size: - 7,678,736 bytes plus 12,592 bytes in conversion tables - 9:1 compression over total data set - 2.8:1 compression of data sought Query for Error Counts by Router: - Direct: more than 1 minute - Matrix-Based: 9.7 secs - Image-Based: 1.2 secs
-- Neighborhood to Live NameSource Crime in the US FBI Tableau 5 Application Data.gov New Application State --Related Are you Safe? Existing Application City --Related Every Block Existing Application City -- Related Density of firearms/ Death Rate Existing Application State
-- Purchasing a Car, Planning a Vacation NameSource Fuel Efficient Cars Heat Map Explorer (COTS) New Application Federal Hurricane data (1990 – 2006) -- Related Tableau 5 (COTS) New Application Federal See other examples examples
Discussion and Feedback
-- Other Backup NameSource World Copper Smelters er-fLD.kml Data.gov Existing Application World Copper Smelters.bmp USGS Oil and Gas Assessment Database Data.gov Existing Application World Petroleum Assessment.bmp
-- Emerging Technologies Backup