Telegraph Status Joe Hellerstein
Overview Telegraph Design Goals, Current Status First Application: FFF (Deep Web) Budding Application: Traffic Sensor Data Moving Forward
Telegraph: Adaptive Dataflow Dataflow –Siphon data from the “deep web” –Harness data streaming from sensors/traces –Flow through code –The API and Architecture for ubiquitous computing Why adaptive? –Sensor nets & wide area internet: volatile! –Like Telegraph Avenue, need to roll w/the changes –Adaptive techniques for routing data to machines & code
Demos Delivered! The big push: FFF Election 2000 demo 10/2000 –Got Telegraph off the ground and live –Shows power of analysis & integration on web It’s not just search any more! –Served thousands of live, long-running queries Initial Sensor Demo –UCB Institute for Transportation Studies data –Various web cams –Project for SIMS InfoVis class A harness for more sensor-oriented work in Telegraph
Telegraph v1 (alpha) infrastructure Single-site (multi-source) dataflow engine –All Java: some lessons here (paper in preparation) Numerous dataflow operators built –TeSS (Telegraph Screen Scraper) –File reader –Relational ops (filters, joins, grouping, aggregation) –Some simple sequence analysis ops –Eddy: adaptive flow ordering operator Key architectural theme: gain adaptivity via new operators Not changes to dataflow infrastructure! This is our upgrade strategy to parallelism/distribution SQL-to-Dataflow parser –SQL is a fine dataflow language for many tasks
Upcoming Telegraph Operators Goal: Further adaptivity through competition –Multiple mirrored sources Handle rate changes, failures, parallelism –Multiple alternate operators –STeM operator manages tradeoffs STate Module, unifies caches, rendezvous buffers, join state Competitive sources/operators share building/using STeMs Vijayshankar Raman static dataflow eddy + stem
Telegraph Nuts and Bolts 2 Parallelism & Fault Tolerance –Continuous/long-running flows need fault-tolerance –Big flows need parallelism Adaptive Load-Balancing req’d –FLUX operator: Exchange plus… Adaptive flow partitioning –River Mobile operator state for full Load Balancing Replicated flows & redundant state (RAID for operators) Load rebalancing vs. vulnerability Mehul Shah & Sirish Chandrasekaran
Further Directions & Goals Deep Web Trawling & Privacy Issues –We’re about to crawl web DBs (What? How much?) –Can do some fascinating/creepy things –Consider privacy & accuracy: countermeasures, incentives, etc Mehul Shah (W/Varian, Papadimitriou, L. Hellerstein & T. Suel) Data Dissemination & Continuous Queries –Franklin’s XFILTER: XML pub/sub –New automata-based techniques from CS262 –Extend/integrate for pub/sub on general Telegraph flows Yanlei Diao/Asha Tarachandani Sensor/Trace Data Apps –Bay Area traffic. Would like to do TinyOS (nobody on it yet) –Software traces? OceanStore? Sam Madden