Telegraph: An Adaptive Global- Scale Query Engine Joe Hellerstein.

Slides:



Advertisements
Similar presentations
anywhere and everywhere. omnipresent A sensor network is an infrastructure comprised of sensing (measuring), computing, and communication elements.
Advertisements

MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO
The Top 10 Reasons Why Federated Can’t Succeed And Why it Will Anyway.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
1 G2 and ActiveSheets Paul Roe QUT Yes Australia!
Telegraph Endeavour Retreat 2000 Joe Hellerstein.
IBM TJ Watson Research Center © 2010 IBM Corporation – All Rights Reserved AFRL 2010 Anand Ranganathan Role of Stream Processing in Ad-Hoc Networks Where.
Observation Pattern Theory Hypothesis What will happen? How can we make it happen? Predictive Analytics Prescriptive Analytics What happened? Why.
Notes to the presenter. I would like to thank Jim Waldo, Jon Bostrom, and Dennis Govoni. They helped me put this presentation together for the field.
Information Capture and Re-Use Joe Hellerstein. Scenario Ubiquitous computing is more than clients! –sensors and their data feeds are key –smart dust.
Eddies: Continuously Adaptive Query Processing Ron Avnur Joseph M. Hellerstein UC Berkeley.
CS538: Advanced Topics in Information Systems. 2 Secure Location transparency Consistent Real-Time Available Black Box: Distributed Storage [GMM] ? Data.
Big Infrastructure, Small Clients Prof. Eric A. Brewer
Connecting the Invisible Extremes of Computing David Culler U.C. Berkeley Summer Inst. on Invisible Computing July,
Telegraph Status Joe Hellerstein. Overview Telegraph Design Goals, Current Status First Application: FFF (Deep Web) Budding Application: Traffic Sensor.
Clusters Massive Cluster Gigabit Ethernet System Design for Vastly Diverse Devices David Culler U.C. Berkeley HP Visit 3/9/2000.
Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley.
Streaming Data, Continuous Queries, and Adaptive Dataflow Michael Franklin UC Berkeley NRC June 2002.
Slide 1 Ubiquitous Storage Breakout Group Endeavour mini-retreat January, 2000.
Telegraph: A Universal System for Information. Telegraph History & Plans Initial Vision –Carey, Hellerstein, Stonebraker –“Regres”, “B-1” Sweat, ideas.
Packing for the Expedition David Culler. 5/25/992 Ongoing Endeavors Millennium: building a large distributed experimental testbed –Berkeley Cluster Software.
Data-Intensive Systems Michael Franklin UC Berkeley
Knowledge Portals and Knowledge Management Tools
Communication Part IV Multicast Communication* *Referred to slides by Manhyung Han at Kyung Hee University and Hitesh Ballani at Cornell University.
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Edge Based Cloud Computing as a Feasible Network Paradigm(1/27) Edge-Based Cloud Computing as a Feasible Network Paradigm Joe Elizondo and Sam Palmer.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
Training Workshop Windows Azure Platform. Presentation Outline (hidden slide): Technical Level: 200 Intended Audience: Developers Objectives (what do.
Configuration Management and Server Administration Mohan Bang Endeca Server.
Telegraph Continuously Adaptive Dataflow Joe Hellerstein.
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Application Provider Visualization Access Analytics Curation Collection.
NSF Critical Infrastructures Workshop Nov , 2006 Kannan Ramchandran University of California at Berkeley Current research interests related to workshop.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Introduction to Hadoop and HDFS
HTML ~ Web Design.
Maximize Return on Engagement via Scalable Omni-Channel Online Services in the Cloud COMPANY PROFILE: XOMNI, INC. Founded in 2011 and headquartered in.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
© Copyright IBM Corporation 2013 June 2013 IBM Integrated System Test Page 1 IBM Integrated Solutions Test Enterprise Test Series: Ideal Stack Testing.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Transformation Provider Visualization Access Analytics Curation Collection.
ProActive Infrastructure Eric Brewer, David Culler, Anthony Joseph, Randy Katz Computer Science Division U.C. Berkeley ninja.cs.berkeley.edu Active Networks.
A Fully Automated Fault- tolerant System for Distributed Video Processing and Off­site Replication George Kola, Tevfik Kosar and Miron Livny University.
CROSS PLATFORM MOBILITY
Telegraph Status Joe Hellerstein. Overview Telegraph Design Goals, Current Status First Application: FFF (Deep Web) Budding Application: Traffic Sensor.
Comprehensive Flexible Global Storage and Search Responsive Available Secure Manageable Federation Coordination Consolidation Transformation Synchronization.
Societal-Scale Computing: The eXtremes Scalable, Available Internet Services Information Appliances Client Server Clusters Massive Cluster Gigabit Ethernet.
K E Y : DATA SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Hardware (Storage, Networking, etc.) Big Data Framework Scalable.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
Internet of Things. Creating Our Future Together.
Data Mining - Introduction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
DATA Storage and analytics with AZURE DATA LAKE
DISA Cyclops Program.
Connected Infrastructure
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
5/13/2018 1:53 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Berkeley Cluster Projects
The Next Generation - UNIFIED
Open Source distributed document DB for an enterprise
Modern Data Management
Connected Infrastructure
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
Rajesh Ponnurangam Infosol
The Top 10 Reasons Why Federated Can’t Succeed
Telegraph: An Adaptive Global-Scale Query Engine
Ch 4. The Evolution of Analytic Scalability
Information Capture and Re-Use
Presentation transcript:

Telegraph: An Adaptive Global- Scale Query Engine Joe Hellerstein

Scenarios Ubiquitous computing: more than clients! –sensors and their data feeds are key smart dust, biomedical (MEMS sensors) each consumer good records (mis)use –disposable computing video from surveillance cameras, broadcasts, etc. Global Data Federation –all the data is online – what are we waiting for? –The plumbing is coming XML/HTTP, etc. give LCD communication but how do you query robustly over many sites in the wide area?

There’s a Data Flood Coming

What does it look like? –Never ends: interactivity required –Big: data reduction/aggregation is key –Unpredictable: this scale of devices and nets will not behave nicely

The Telegraph Query Engine Key technologies –Interactive Control interactivity with early answers online aggregation for data reduction –Continuously adaptive flow optimization massively parallel, adaptive dataflow via Rivers and Eddies

CONTROL Continuous Output, Navigation & Transformation with Refinement On Line Data-intensive jobs are long-running. How to give early answers and interactivity? –online interactivity over feeds: data “juggle” –online query processing algs: ripple joins –statistical estimators, and their performance implications Appreciate interplay of massive data processing, stats, and UIs

CONTROL Continuous Output and Navigation Technology with Refinement On Line

River We built the world’s fastest sorting machine –On the “NOW”: 100 Sun workstations + SAN –But it only beat the record under ideal conditions! River: performance adaptivity for data flows on clusters –simplifies management and programming –perfect for sensor-based streams

Eddy How to order and reorder operators over time – based on performance, economic/admin feedback Vs.River: –River optimizes each operator “horizontally” –Eddies optimize a pipeline “vertically” Eddy

Telegraph: Putting it Together Scalable, adaptive dataflow infrastructure. Apps include… –sensor nets –massively parallel and wide-area query engines –net appliances: chaining xform8n/aggreg8n/etc. proxies –any unpredictable dataflow scenario Technology: a marriage of… –CONTROL, River & Eddy Many research questions here E.g. how to combine River and Eddy adaptivity E.g. how to tune Eddies for statistical performance goals –Combinations of browse/query/mine at UI –Storage management to handle new hardware realities

Integration with Endeavour Give –Be data-intensive backbone to diverse clients –Be replication dataflow engine for OceanStore –Telegraph Storage Manager provides storage (xactional/otherwise) for OceanStore –Provide platform for data-intensive “tacit info mining” Take –Leverage OceanStore to manager distributed metadata, security –Leverage protocols out of TinyOS for sensors

Additional Slides For use in questions, etc.

Connectivity & Heterogeneity Lots of folks working on data format translation, parsing –we will borrow, not build –currently using JDBC & Cohera Net Query commercial tool, donated by Cohera Corp. gateways XML/HTML (via http) to ODBC/JDBC –we may write “Teletalk” gateways from sensors Heterogeneity –never a simple problem –Control project developed interactive, online data transformation tool: Potter’s Wheel

Potter’s Wheel Anomaly Detection