The BOP (Billion Object Platform) and WorldMap / Dataverse Integration Harvard Center for Geographic Analysis Tuesday, July 12, 2016 Ben Lewis, Mercè Crosas, Raman Prasad
Billion Object Platform - funded by Sloan General purpose, open source, streaming, big spatio-temporal data exploration and extraction Performs basic sentiment analysis Runs on commodity hardware and software Built on Spatial Lucene and Solr. Exposes all functions through an API
Other geospatial visualization work (funded by the Boston Area Research Initiative) 1.Spatial stamping in Billion Object Platform 2.Table visualization –Tables with well defined area columns (Census codes) –Tables with lat/longs 3.Geospatial data visualization –Shapefiles
The “Billion Streaming Geo-tweets” dataset A new dataset type in Dataverse which supports real-time streaming and visual, interactive exploration The content is geo-tweets (tweets containing GPS coordinate from originating device). Currently 1-2% of tweets are geo-tweets, about 8 million per day. The CGA has been harvesting geo-tweets since Main components: –1) Geo-tweet harvesting and archiving system –2) software and hardware platform to support interactive exploration of a billion spatio-temporal objects. –3) API to provide query access to the archive from Dataverse. –4) client-side tools for querying/visualizing the contents of the archive, extracting subsets, pushing them to Dataverse.
The “Billion Streaming Geo-tweets” dataset What does a landing page look like when… –Data source is external to Dataverse –The data source is continuously being updated –The data does not consist of “files” in the traditional Dataverse sense
The BOP: streaming big data… A closer look at the Billion Streaming Geotweets
API to streaming geo-tweets Built on Solr
A dataset landing page which enables data exploration and extraction A client which enables interactive exploration in multiple dimensions
Demo of Big Data exploration using predecessors of BOP : Japan Data Archive and HHypermap Japan Data Archive rt=relevant& rt=relevant& HHypermap Distributed Archive
2) Table Geocoding Work funded by NSF. Goal is to enable Dataverse tables with well-known geographic encodings to be easily visualized as maps
Pick the “Geospatial Data Type”
Choose (a) WorldMap “Join Layer” & (b) File column to join
Table visualized
Apply cartographic classification
Map symbolized
Map saved back to Dataverse
Thank You Ben Lewis
Phase II? Use Polygons to Symbolize Big Data Perform big data query. Find 10 million tweets mentioning Brexit. 18
( Geographic region and sentiment stamping ) Geographic stamping: As tweets stream in they will be stamped with census block, census tract, and Admin 2 codes. –To support aggregations by census or admin as well as by heatmap grid. Sentiment stamping: As tweets stream in a basic attempt will be made to determine sentiment. –To support heatmaps representing average sentiment values as well as count values.
Geo-tweet Dataverse