Download presentation
Presentation is loading. Please wait.
Published byClaire French Modified over 9 years ago
1
Streaming GIS using PostGIS & SQLstream Julian Hyde - Chief Architect Sunil Mujumdar – Founding Engineer
2
The Data Crunch » Data volumes rising fast » Human-originated data (e.g. e-commerce purchases) rising fast » Machine-generated data (e.g. e-commerce events and network packets) rising faster » Sensor data (e.g. GIS-enabled mobile phone, road sensors) faster still » Every business needs answers with lower latency » Every significant problem is massively parallel & distributed: » Geographically distributed organizations » Multiple boxes for scale » Exploit multiple cores
3
The world is no longer flat In data warehouse, all records are equally important In many real-world applications, recent & close events are much more important Time Space Now Here
4
Case study: Mozilla
5
Data management is hard » If you make a mistake, the system won’t be fast enough » Can’t afford to lose data » New technologies are very difficult to use » MapReduce » NoSQL » Multi-threaded programming in Java, C++, Erlang, Scala, … » Collaborate, interoperate, evolve
6
SQL – life in the old dinosaur yet » Widely spoken » Rich » Orthogonal » Declarative » Tune your system without changing your logical schema » Apps don’t interfere with each other » Adaptive » Route around failure » Exploit available resources » Make tradeoffs to meet QoS goals
7
Streaming SQL: example #1 Tweets about this conference: » SELECT STREAM ROWTIME, author, text FROM Tweets WHERE text LIKE ‘%#PGWest%'
8
Streaming SQL basics » Streams: » CREATE STREAM Tweets ( author VARCHAR(20), text VARCHAR(140)); » Relational operators have streaming counterparts: » Project (SELECT) » Filter (WHERE) » Union » Join » Aggregation (GROUP BY) » Windowed aggregation (e.g. SUM(x) OVER window) » Sort (ORDER BY)
9
Streaming SQL: example #2 » Each minute, return the number of clicks on each web page: » SELECT STREAM ROWTIME, uri, COUNT(*) FROM PageRequests GROUP BY FLOOR(ROWTIME TO MINUTE), uri
10
Streaming SQL: Time » ROWTIME pseudo-column » Provided by source application or generated by system » WINDOW » Present in regular SQL (e.g. SQL:2003) but more important in streaming SQL » Defines a ‘working set’ for streaming JOIN, GROUP BY, windowed aggregation » Monotonicity (“sortedness”) » Prerequisite for certain streaming operations
11
Streaming SQL: example #3 Find all orders from New York that shipped within an hour: » CREATE VIEW compliant_orders AS SELECT STREAM * FROM orders OVER sla JOIN shipments ON orders.id = shipments.orderid WHERE city = 'New York' WINDOW sla AS (RANGE INTERVAL '1' HOUR PRECEDING)
12
Streaming SQL: more » Usual advanced SQL stuff: » Schemas, views, tables » Ability to nest queries » User-defined functions and transforms » Interoperate with 3 rd party systems » Adapters make external systems look like read/write streams » Push/pull » Active/passive » Interact with databases: » As source (change-data capture) » Lookup (e.g. GIS lookup; normalizing current data using historic norms) » As sink (populating the data warehouse)
13
Real-time road traffic monitoring 1.Map vehicle positions to road segments 2.Compute average speed of each road segment 3.Detect traffic incidents Line segments representing sections of freeway Vehicle position » Vehicle id, latitude, longitude, speed, timestamp » 15,000 vehicles with sensors » Each vehicle transmits each min » Road network through New South Wales, Australia
14
»
15
Copyright © 2010 SQLstream, Inc. Google earth Road traffic analytics architecture Position Log Stream POSDATA_nnn.txt POSDATA_n.txt Parse RoadInfo Lookup PostGIS SQLstream Traffic Analytics Dashboard
16
Gathering input data -- Define the Foreign Stream for reading log data CREATE OR REPLACE FOREIGN STREAM "PositionLogStream" ( MESSAGE VARCHAR(132)) SERVER "PositionLogReader" OPTIONS (file_pattern 'POSDATA.*\.txt') DESCRIPTION 'Raw Vehicle Position Log Stream';
17
Problem 1: Map vehicle positions to road segments SELECT STREAM segmentId, roadElementId, vehiclePositionX, vehiclePositionY, velocityX, velocityY FROM (TABLE RoadInfoLookup( CURSOR (SELECT STREAM * FROM VehiclePositions), ' postgis_source.properties ', -- data source properties ' road_segment ', -- table name ' v_latitude ', -- latitude column name ' v_longitude ' )) -- longitude column name
18
SQLstream user-defined transform (UDX) » public class RoadInfoLookupUdx { public static void RoadInfoLookup( ResultSet trafficInfoIn, PreparedStatement roadSegmentInfoOut) { while (trafficInfoIn.next()) { double latitude = trafficInfoIn.getDouble(1); double longitutde = trafficInfoIn.getDouble(2); int roadElementId = getInfo(latitude, longitude); roadSegmentInfoOut.setDouble(1, latitude); roadSegmentInfoOut.setDouble(2, longitude); roadSegmentInfoOut.setDouble(3, roadElementId); // etc. roadSegmentInfoOut.executeUpdate(); } }
19
Helper method to access PostGIS » private int getInfo( double latitude, double longitude) throws SQLException { // First time through, prepare query. if (pstmt == null) { pstmt = connection.prepareStatement( “select … from road_segments where ST_Distance(uts_geom, ST_GeomFromText(?, srid)) < width”); } pstmt.setDouble(); ResultSet rset = pstmt.executeQuery(); rset.next(); return rset.getInt(1); }
20
Problem 2: Compute average speed Streaming query computes average over 15 minute sliding window Results are written to Google Earth file (and elsewhere) -- Average road element Speeds CREATE OR REPLACE VIEW "EstimatedReSpeeds" DESCRIPTION 'Estimated RE Speeds' AS SELECT STREAM "roadElementID", AVG("vSpeed") OVER "last15" AS "reSpeed", "reSpeedLimit" FROM "Stage3" WINDOW "last15" AS ( PARTITION BY "roadElementID" RANGE INTERVAL '15' MINUTE PRECEDING);
21
Problem 3: Incident detection » Use Bollinger bands to detect outliers (3 standard deviations = 99.7%) CREATE OR REPLACE VIEW "Incidents" DESCRIPTION 'Detect incidents' AS SELECT STREAM... FROM ( SELECT STREAM "roadElementID", AVG("vSpeed") OVER "lastMinute" AS "avgSpeedLastMinute", AVG("vSpeed") OVER "last15" AS "avgSpeedLast15", STDDEV("vSpeed") OVER "last15" AS "stddevSpeedLast15", "reSpeedLimit",... FROM "Stage3" WINDOW "last15" AS (PARTITION BY "roadElementID" RANGE INTERVAL '15' MINUTE PRECEDING) WINDOW “lastMinute” AS (PARTITION BY "roadElementID" RANGE INTERVAL '1' MINUTE PRECEDING) ) WHERE "avgSpeedLastMinute" < "avgSpeedLast15" – 3 * "stddevSpeedLast15";
22
Summary Emergence of data problems that are: – Real-time – Geospatial – High throughput In particular, Intelligent Transport Systems (ITS) analytics Need to combine streaming, GIS and relational (SQL) Technology synergy: – PostGIS is a mature GIS implementation, integrates SQL with GIS – SQLstream integrates SQL with streaming
23
Any questions?
24
Thank you for attending! Further reading: » “Data in Flight” by Julian Hyde (Communications of the ACM, Vol. 53 No. 1, Pages 48-52)Data in Flight Blogs: »http://www.sqlstream.com/bloghttp://www.sqlstream.com/blog »http://julianhyde.blogspot.comhttp://julianhyde.blogspot.com Twitter: »@julianhyde »@sunil_mujumdar
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.