Telegraph Status Joe Hellerstein. Overview Telegraph Design Goals, Current Status First Application: FFF (Deep Web) Budding Application: Traffic Sensor.

Slides:



Advertisements
Similar presentations
Tableau Software Australia
Advertisements

Distributed Processing, Client/Server and Clusters
Adaptive Dataflow: A Database/Networking Cosmic Convergence Joe Hellerstein UC Berkeley.
Telegraph Endeavour Retreat 2000 Joe Hellerstein.
Technical Architectures
Information Capture and Re-Use Joe Hellerstein. Scenario Ubiquitous computing is more than clients! –sensors and their data feeds are key –smart dust.
Fjording the Stream: An Architecture for Queries over Streaming Sensor Data Samuel Madden, Michael J. Franklin University of California, Berkeley Proceedings.
Federated Facts and Figures Joseph M. Hellerstein UC Berkeley.
Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.
Telegraph: An Adaptive Global- Scale Query Engine Joe Hellerstein.
Adaptive Dataflow Joe Hellerstein UC Berkeley. Overview Trends Driving Adaptive Dataflow Lessons –networking flow control, event programming, app-level.
Sensor Networks: Implications for Database Systems and Vice-Versa Michael Franklin January UCB Sensor Day.
Telegraph Status Joe Hellerstein. Overview Telegraph Design Goals, Current Status First Application: FFF (Deep Web) Budding Application: Traffic Sensor.
Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley.
Streaming Data, Continuous Queries, and Adaptive Dataflow Michael Franklin UC Berkeley NRC June 2002.
Telegraph: A Universal System for Information. Telegraph History & Plans Initial Vision –Carey, Hellerstein, Stonebraker –“Regres”, “B-1” Sweat, ideas.
Data-Intensive Systems Michael Franklin UC Berkeley
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
GridGain – Java Grid Computing Made Simple Dmitriy Setrakyan
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Redundant Array of Independent Disks
施賀傑 何承恩 TelegraphCQ. Outline Introduction Data Movement Implies Adaptivity Telegraph - an Ancestor of TelegraphCQ Adaptive Building.
CIS 375—Web App Dev II Microsoft’s.NET. 2 Introduction to.NET Steve Ballmer (January 2000): Steve Ballmer "Delivering an Internet-based platform of Next.
Telegraph Continuously Adaptive Dataflow Joe Hellerstein.
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
MSR Sense The Microsoft Research Networked Embedded Sensing Toolkit Stewart Tansley, PhD Adapted from: Feng Zhao.
Managing Service Metadata as Context The 2005 Istanbul International Computational Science & Engineering Conference (ICCSE2005) Mehmet S. Aktas
PIER & PHI Overview of Challenges & Opportunities Ryan Huebsch † Joe Hellerstein † °, Boon Thau Loo †, Sam Mardanbeigi †, Scott Shenker †‡, Ion Stoica.
Architecture Planning and designing a successful system Use tried and tested techniques Easy to maintain Robust and long lasting.
© 2010 IBM Corporation IBM InfoSphere Streams Enabling a smarter planet Roger Rea InfoSphere Streams Product Manager Sept 15, 2010.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
SharePoint 2010 Search Architecture The Connector Framework Enhancing the Search User Interface Creating Custom Ranking Models.
Distributed Software Engineering Lecture 1 Introduction Sam Malek SWE 622, Fall 2012 George Mason University.
1 Fjording The Stream An Architecture for Queries over Streaming Sensor Data Samuel Madden, Michael Franklin UC Berkeley.
We will cover in this lecture A first look at issues related to Security Maintenance Scalability Simple Three Tier Architecture Module Road Map Assignment.
Database Systems Carlos Ordonez. What is “Database systems” research? Input? large data sets, large files, relational tables How? Fast external algorithms;
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
Highly available database clusters with JDBC
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
2/14/01RightOrder : Telegraph & Java1 Telegraph Java Experiences Sam Madden UC Berkeley
Information-Centric Networks10b-1 Week 10 / Paper 2 Hermes: a distributed event-based middleware architecture –P.R. Pietzuch, J.M. Bacon –ICDCS 2002 Workshops.
Societal-Scale Computing: The eXtremes Scalable, Available Internet Services Information Appliances Client Server Clusters Massive Cluster Gigabit Ethernet.
Stuff to memorise… "A method tells an object to perform an action. A property allows us to read or change the settings of the object."
Group Communication A group is a collection of users sharing some common interest.Group-based activities are steadily increasing. There are many types.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
Stuff to memorise… "A method tells an object to perform an action. A property allows us to read or change the settings of the object."
AMSA TO 4 Advanced Technology for Sensor Clouds 09 May 2012 Anabas Inc. Indiana University.
Presented by: Aaron Stanley King.  Benefits of SQL Azure  Features of SQL Azure  Demos, Demos, Demos!  How to query in SQL Azure  More Demos!  Recent.
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Chapter 1 Characterization of Distributed Systems
Intra-Farm Shared Services
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Advancements in Availability and Reliability computation Introduction and current status of the Comp Reports mini project C. Kanellopoulos GRNET.
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
Pervasive Data Access (PDA) Research Group
Running on the Powerful Microsoft Azure Platform,
Introduction to Spark.
Telegraph: An Adaptive Global-Scale Query Engine
Distributing Queries Over Low Power Sensor Networks
Outline Virtualization Cloud Computing Microsoft Azure Platform
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Architecture.
Architecture.
B. Stegmaier und R. Kuntschke TU München – Fakultät für Informatik
A. Kemper, R. Kuntschke, and B. Stegmaier
Agenda Need of Cloud Computing What is Cloud Computing
TelegraphCQ: Continuous Dataflow Processing for an Uncertain World
Information Capture and Re-Use
Presentation transcript:

Telegraph Status Joe Hellerstein

Overview Telegraph Design Goals, Current Status First Application: FFF (Deep Web) Budding Application: Traffic Sensor Data Moving Forward

Telegraph: Adaptive Dataflow Dataflow –Siphon data from the “deep web” –Harness data streaming from sensors/traces –Flow through code –The API and Architecture for ubiquitous computing Why adaptive? –Sensor nets & wide area internet: volatile! –Like Telegraph Avenue, need to roll w/the changes –Adaptive techniques for routing data to machines & code

Demos Delivered! The big push: FFF Election 2000 demo 10/2000 –Got Telegraph off the ground and live –Shows power of analysis & integration on web It’s not just search any more! –Served thousands of live, long-running queries Initial Sensor Demo –UCB Institute for Transportation Studies data –Various web cams –Project for SIMS InfoVis class A harness for more sensor-oriented work in Telegraph

Telegraph v1 (alpha) infrastructure Single-site (multi-source) dataflow engine –All Java: some lessons here (paper in preparation) Numerous dataflow operators built –TeSS (Telegraph Screen Scraper) –File reader –Relational ops (filters, joins, grouping, aggregation) –Some simple sequence analysis ops –Eddy: adaptive flow ordering operator Key architectural theme: gain adaptivity via new operators Not changes to dataflow infrastructure! This is our upgrade strategy to parallelism/distribution SQL-to-Dataflow parser –SQL is a fine dataflow language for many tasks

Upcoming Telegraph Operators Goal: Further adaptivity through competition –Multiple mirrored sources Handle rate changes, failures, parallelism –Multiple alternate operators –STeM operator manages tradeoffs STate Module, unifies caches, rendezvous buffers, join state Competitive sources/operators share building/using STeMs Vijayshankar Raman static dataflow eddy + stem

Telegraph Nuts and Bolts 2 Parallelism & Fault Tolerance –Continuous/long-running flows need fault-tolerance –Big flows need parallelism Adaptive Load-Balancing req’d –FLUX operator: Exchange plus… Adaptive flow partitioning –River Mobile operator state for full Load Balancing Replicated flows & redundant state (RAID for operators) Load rebalancing vs. vulnerability Mehul Shah & Sirish Chandrasekaran

Further Directions & Goals Deep Web Trawling & Privacy Issues –We’re about to crawl web DBs (What? How much?) –Can do some fascinating/creepy things –Consider privacy & accuracy: countermeasures, incentives, etc Mehul Shah (W/Varian, Papadimitriou, L. Hellerstein & T. Suel) Data Dissemination & Continuous Queries –Franklin’s XFILTER: XML pub/sub –New automata-based techniques from CS262 –Extend/integrate for pub/sub on general Telegraph flows Yanlei Diao/Asha Tarachandani Sensor/Trace Data Apps –Bay Area traffic. Would like to do TinyOS (nobody on it yet) –Software traces? OceanStore? Sam Madden