8/5/2015 1 Monitoring Big, Distributed, Streaming Data Daniel Keren, Haifa U Tsachi Sharfman, Technion Assaf Schuster, Technion.

Slides:



Advertisements
Similar presentations
Surrey Libraries Computer Learning Centres January 2012 Internet Searching Teaching Script Totally New to Computers Internet Searching.
Advertisements

Ali Ghodsi UC Berkeley & KTH & SICS
June 2010 At A Glance The Room Alert Adapter software in conjunction with AVTECH Room Alert™ devices assists in monitoring computer room environments as.
© 2013 IBM Corporation October 4, 2013 IT Analytics and Big Data IBM Solutions Paul Smith (Smitty) Service Management Architect.
MessageOps Monitor. Communication apps are mission critical But how do you ensure high service levels when they run in the cloud?
Running Hadoop-as-a-Service in the Cloud
Modern Application Lifecycle Pla n Develop + Test Monitor + Learn Release.
SEBD Tutorial, June Monitoring Distributed Streams Joint works with Tsachi Scharfman, Daniel Keren.
Massive Data Analysis Lab (MassDAL) S. Muthukrishnan CS Dept.
Sheldon Brown, UCSD, Site Director Milton Halem, UMBC Director Yelena Yesha, UMBC Site Director Tom Conte, Georgia Tech Site Director Fundamental Research.
A. Frank 1 Internet Resources Discovery (IRD) Peer-to-Peer (P2P) Technology (1) Thanks to Carmit Valit and Olga Gamayunov.
Cumulative Violation For any window size  t  Communication-Efficient Tracking for Distributed Cumulative Triggers Ling Huang* Minos Garofalakis.
Computing ESSENTIALS     Copyright 2003 The McGraw-Hill Companies, Inc CHAPTER Information Technology, the Internet, and You computing ESSENTIALS.
Inbound Statistics Slides Attract. 1 Blogging There are 31% more bloggers today than there were three years ago 46% of people read blogs more than once.
What is adaptive web technology?  There is an increasingly large demand for software systems which are able to operate effectively in dynamic environments.
COMPUTER TERMS PART 1. COOKIE A cookie is a small amount of data generated by a website and saved by your web browser. Its purpose is to remember information.
Navigation and Menus Hillary Funk. Agenda  Overview of Navigation and Menus  Types of Navigation  What good navigation includes  Navigation Stress.
Internet Basics مهندس / محمد العنزي
Event Viewer Was of getting to event viewer Go to –Start –Control Panel, –Administrative Tools –Event Viewer Go to –Start.
Virtual Memory Tuning   You can improve a server’s performance by optimizing the way the paging file is used   You may want to size the paging file.
Introduction. Readings r Van Steen and Tanenbaum: 5.1 r Coulouris: 10.3.
For more notes and topics visit:
PERSONALLY CUSTOMIZABLE GROUP NAVIGATION SYSTEM USING CELLULAR PHONES AND WIRELESS AD-HOC COMMUNICATION Yoshitaka Nakamura, Guiquan Ren, Masatoshi Nakamura,
The Internet, World Wide Web, and Computer Communication.
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 9.1 Chapter 9 : Social Networks What is a social.
Networks. What is a Network? Two or more computers linked together so they can send and receive data. We use them for sending s, downloading files,
Chapter 16 The World Wide Web Chapter Goals ( ) Compare and contrast the Internet and the World Wide Web Describe general Web processing.
Ao-Jan Su, David R. Choffnes, Fabián E. Bustamante and Aleksandar Kuzmanovic Department of EECS Northwestern University Relative Network Positioning via.
Lecturer: Ghadah Aldehim
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
Web Search Created by Ejaj Ahamed. What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain.
Copyright © Allyn & Bacon 2008 POWER PRACTICE Chapter 7 The Internet and the World Wide Web START This multimedia product and its contents are protected.
Making Every Bit Count in Wide Area Analytics Ariel Rabkin Joint work with: Matvey Arye, Siddhartha Sen, Michael J. Freedman, and Vivek Pai 1.
WEB TERMINOLOGIES. Page or web page: a file that can be read over the world wide web Pages or web pages: the global collection of documents associated.
Chapter 16 Designing Effective Output. E – 2 Before H000 Produce Hardware Investment Report HI000 Produce Hardware Investment Lines H100 Read Hardware.
Cloud Computing.
Ideas to Improve SharePoint Usage 4. What are these 4 Ideas? 1. 7 Steps to check SharePoint Health 2. Avoid common Deployment Mistakes 3. Analyze SharePoint.
Alert Logic Security and Compliance Solutions for vCloud Air High-level Overview.
Master Thesis Defense Jan Fiedler 04/17/98
1. The Basic and New Features Of MSU Centralized Adobe Connect Pro MSU IT Conference Breakout Session 3 Presented by Catherine Zhang 2.
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
Maintaining and Updating Windows Server Monitoring Windows Server It is important to monitor your Server system to make sure it is running smoothly.
Search Engine Marketing SEM = Search Engine Marketing SEO = Search Engine Optimization optimizing (altering/changing) your page in order to get a higher.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
1 UNIT 13 The World Wide Web Lecturer: Kholood Baselm.
Mr C Johnston ICT Teacher G042 – Lecture 02 Using Logical Operators To Aid Searching.
Website Design, Development and Maintenance ONLY TAKE DOWN NOTES ON INDICATED SLIDES.
Copyright © 2002 Pearson Education, Inc. Slide 3-1 Internet II A consortium of more than 180 universities, government agencies, and private businesses.
+ Logentries Is a Real-Time Log Analytics Service for Aggregating, Analyzing, and Alerting on Log Data from Microsoft Azure Apps and Systems MICROSOFT.
The Internet is a Big Collection of Computers and Cables. -"interconnection of computer networks". Millions of personal, business, and governmental.
COPYRIGHT © 2012 ALCATEL-LUCENT. ALL RIGHTS RESERVED. lightRadio TM Network Demonstration October 22, 2013 The LTE End User Experience.
Wireless Network Management SANDEEP. Network Management Network management is a service that employs a variety of tools, applications, and devices to.
Setting up a search engine KS 2 Search: appreciate how results are selected.
1 UNIT 13 The World Wide Web. Introduction 2 Agenda The World Wide Web Search Engines Video Streaming 3.
1 UNIT 13 The World Wide Web. Introduction 2 The World Wide Web: ▫ Commonly referred to as WWW or the Web. ▫ Is a service on the Internet. It consists.
Web Design Vocabulary #3. HTML Hypertext Markup Language - The coding scheme used to format text for use on the World Wide Web.
Maintaining and Updating Windows Server 2008 Lesson 8.
Medical Education Center
Smart Cities & DigiGov - on the Road to Reality
Wadley Medical Education Center
Health Professional Education Building
An Efficient Bit Vector Approach to Semantics-based
Medical Education Center
Multi-Farm, Cross-Continent SharePoint Architecture
Who We Are BlueI provides a holistic management system for water quality By providing a total water analytics solutions along side with state of the art.
湖南大学-信息科学与工程学院-计算机与科学系
The Internet An Overview.
Understanding the Features of a Web Site
REMOTE POWER MONITORING OF MARINE SITES
Chapter 16 The World Wide Web.
Presentation transcript:

8/5/ Monitoring Big, Distributed, Streaming Data Daniel Keren, Haifa U Tsachi Sharfman, Technion Assaf Schuster, Technion

SRDC Large scale and widespread networked systems Large scale and widespread networked systems Continuous production of data Continuous production of data High volume High volume Dynamic nature Dynamic nature Required to detect a global property Required to detect a global property Often in (near) real time Often in (near) real time Distributed Stream Networks

8/5/ Web Page Frequency Counts Mirrored web site Mirrored web site Mirrors record the frequency of requests for pages Mirrors record the frequency of requests for pages Detect when the global frequency of requests for a page exceeds a predetermined threshold Detect when the global frequency of requests for a page exceeds a predetermined threshold Req #1 Req #2 Req #3

SRDC /5/ Air Quality Monitoring Sensors monitoring the concentration of air pollutants. Sensors monitoring the concentration of air pollutants. Each sensor holds a data vector comprising measured concentration of various pollutants (CO 2, SO 2, O 3, etc.). Each sensor holds a data vector comprising measured concentration of various pollutants (CO 2, SO 2, O 3, etc.). A function on the average readings determines the Air Quality Index (AQI) A function on the average readings determines the Air Quality Index (AQI) Issue an alert in case the AQI exceeds a given threshold. Issue an alert in case the AQI exceeds a given threshold.

8/5/ Sensor Networks Sensors monitoring the temperature in a server room (machine room, conference room, etc.) Sensors monitoring the temperature in a server room (machine room, conference room, etc.) Ensure uniform temp.: monitor variance of readings Ensure uniform temp.: monitor variance of readings Alert in case variance exceeds a threshold Alert in case variance exceeds a threshold Temperature readings by n sensors x 1, …, x n Temperature readings by n sensors x 1, …, x n Each sensor holds a data vector v i = (x i 2, x i ) T Each sensor holds a data vector v i = (x i 2, x i ) T The average data vector is v = The average data vector is v = Var(all sensors) = Var(all sensors) =

SRDC /5/ Search Engine Distributed datacenter/warehouse Distributed datacenter/warehouse 10Ks horizontal partitions 10Ks horizontal partitions “ Our logs are larger than any other data by orders of magnitude. They are our source of truth. ” Sridhar Ramaswamy. SIGMOD’08 keynote on “Extreme Data Mining” “ Our logs are larger than any other data by orders of magnitude. They are our source of truth. ” Sridhar Ramaswamy. SIGMOD’08 keynote on “Extreme Data Mining” Mining the logs: Compute pairs of keywords for which the correlation index is high Mining the logs: Compute pairs of keywords for which the correlation index is high Thousands simultaneous tasks Thousands simultaneous tasks “ Network bandwidth is a relatively scarce resource in our computing environment ”. Dean and Ghemawat. MapReduce paper, OSDI ’ 04 “ Network bandwidth is a relatively scarce resource in our computing environment ”. Dean and Ghemawat. MapReduce paper, OSDI ’ 04

SRDC 2013 Cloud Health Monitoring 8/5/ Amazon Web ServicesAmazon Web Services » Service Health DashboardService Health Dashboard Amazon S3 Availability Event: July 20, 2008 “At 8:40am PDT, error rates in all Amazon S3 datacenters began to quickly climb and our alarms went off. By 8:50am PDT, error rates were significantly elevated and very few requests were completing successfully. By 8:55am PDT, we had multiple engineers engaged and investigating the issue. Our alarms pointed at problems processing customer requests in multiple places within the system and across multiple data centers. While we began investigating several possible causes, we tried to restore system health... At 9:41am PDT, we determined that servers within Amazon S3 were having problems… By 11:05am PDT, all server-to-server communication was stopped, request processing components shut down, and the system's state cleared…. “

SRDC 2013 Ad-Hoc Mobile P2P Networks 8/5/ Peer-to-peer network invites drivers to get connected CarTorrent could smarten up our daily commute, reducing accidents and bringing multimedia journey data to our fingertips Laura Parker The Guardian,The Guardian Thursday January “The name BitTorrent has become part of most people's day-to-day vernacular, synonymous with downloading every kind of content via the internet's peer-to-peer networks. But if a team of US researchers have their way, we may all be talking about CarTorrent in the not too distant future….. Researchers from the University of California Los Angeles are working on a wireless communication network that will allow cars to talk to each other, simultaneously downloading information in the shape of road safety warnings, entertainment content and navigational tools….”

SRDC /5/2015 9

SRDC 2013 Distributed Monitoring – State of the Art Periodically send all data to a central location Periodically send all data to a central location High communication High communication High latency High latency A tradeoff A tradeoff Expensive central resources Expensive central resources Power inefficient Power inefficient Can we do better? Can we do better? Linear systems Linear systems Non-linear systems  Non-linear systems  8/5/

Threshold 8/5/ Monitoring Distributed Non-Linear Functions

Given a 2X2 table, the mutual information is defined as The mutual information of the global table is much larger than the local values. As in the parabola case, there’s no way to infer about the global MI given the local ones. 8/5/ Mutual Information

8/5/ Non-Linear Functions “… The link function is, of course, nonlinear. So we agonize over trading off optimization performance with ability to use the massive infrastructure. …” Sridhar Ramaswamy. SIGMOD’08 Keynote talk on “Extreme Data Mining” Slide title: “10 top reasons why googlers do not sleep at night” (Coffee is reason #5)

SRDC 2013 Geometric Method – Idea The behavior of a general function over distributed data may be hard to see The behavior of a general function over distributed data may be hard to see Local indications may be misleading Local indications may be misleading Non-linear Non-linear Looking at the *domain* of the function may be easier Looking at the *domain* of the function may be easier For long periods, the local inputs are stationary, or do not change much For long periods, the local inputs are stationary, or do not change much 8/5/