Why Cloudberry? “Eat your own dog food”

Slides:



Advertisements
Similar presentations
Thomas Ball Microsoft Research. C# 3.0C# 3.0 Visual Basic 9.0Visual Basic 9.0 OthersOthers.NET Language Integrated Query LINQ to Objects LINQ to DataSets.
Advertisements

TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Handle] [Person Handle 1] [Person Handle 2] [Person Handle 3] [###] Handle] [Description.
2008/12/161 Computer Applications in Eng. Mathematics Reporter : Jian-Feng Wu Advisor : Jeng-Tzong Chen Date : December 16, 2008 Coulomb friction.
New Geometric Methods of Mixture Models for Interactive Visualization PIs: Jia Li, Xiaolong (Luke) Zhang, Bruce Lindsay Department of Statistics College.
Tuen Mun Government Secondary School Computer Literacy Project-based Learning Group members : * Mo Yan Ki (32) Chen Ying Ying (5) Lee Wing Ki (27) Yeung.
Training Workshop Windows Azure Platform. Presentation Outline (hidden slide): Technical Level: 200 Intended Audience: Developers & Technical Decision.
StoryFlow: Tracking the Evolution of Stories IEEE INFOVIS 2013 Shixia Liu, Senior Member, IEEE, Microsoft Research Asia Yingcai Wu, Member, IEEE, Microsoft.
Service-oriented Resource Broker for QoS-Guaranteed in Grid Computing System Yichao Yang, Jin Wu, Lei Lang, Yanbo Zhou and Zhili Sun Centre for communication.
Trust Me, I’m Partially Right: Incremental Visualization Lets Analysts Explore Large Datasets Faster Shengliang Dai.
DEV14 – Building Business Dashboards: Excel Services, KPIs and Report Centers Darwin Schweitzer Enterprise Technology Strategist
INTEGRATING DEBATE Purpose: Identify Web 2.0 tools and low tech ways to provide guidance and support to students during the debate.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Lab 6 Problem 1: DNA. DNA Given a string with length N, determine the number of occurrences of some given substrings (with length K) in that string. For.
Process of Creating a Website By: Ryan Millevoi and Lauren Gallo.
Cloud data. Tap the buttons to count your vote! Demo: VOTING APP.
The BOP (Billion Object Platform) and WorldMap / Dataverse Integration Harvard Center for Geographic Analysis Tuesday, July 12, 2016 Ben Lewis, Mercè Crosas,
COURSE DETAILS SPARK ONLINE TRAINING COURSE CONTENT
Process Automation The Technology
ArcGIS Data Reviewer: Assessing Positional Accuracy
5/7/ :44 AM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Process Automation The Technology
Big Data Panel Discussion
Add More Zing to your Dashboards – Creating Zing Plot Gadgets
Market Intelligence Analysis
Real Time Data with Azure and Power BI
CS122A: Introduction to Data Management Lecture #16: AsterixDB
Computing and Data Analysis
Using Python to Interact with the EPA WATERS Web Services
Queries Over Graph Data: Presidential Election
2010 Microsoft BI Conference
Cloud data.
Teaching a Workshop Kent Schroeder SIL AFA.
Julie Strauss Senior Program Manager Microsoft
Manage you application stress free
Drum: A Rhythmic Approach to Interactive Analytics on Large Data
Tech·Ed North America /14/2018 7:13 PM
Data Warehouse.
Marian Luparu Program Manager Microsoft Session Code: DEV308
September 11, Ian R Brooks Ph.D.
Real-Time streaming in Power BI
Google Analytics.
Introduction to Data Programming
End-to-End Machine Learning with Apache AsterixDB
Community-based User Recommendation in Uni-Directional Social Networks
"Oslo”: Customizing and Extending the Visual Design Experience
RI.2.1 Ask and answer questions as who, what, where, when, why and how to demonstrate understanding of key details in a text. RI.2.9 Compare and contrast.
Show suggestions and borderlines Hierarchical Clustering
NVIDIA AI CITY CHALLENGE
Interdisciplinary collaboration between social and computer science
Adam Lech Joseph Pontani Matthew Bollinger
1.5 Linear Inequalities.
Peter Provost Sr. Program Manager Microsoft Session Code: DEV312
Building a Threat-Analytics Multi-Region Data Lake on AWS
Instructor: Chen Li Irvine Fall 2017
Instructor: Chen Li Irvine Fall 2017
Bell Work 5/20/16 How do you think Astronomy will improve/change/be different in the coming decades? Why? I’m going to grade the next 10 days of bell.
By the end of this lesson, you will know how to: Evaluate a function
Entry Guideline Template
Jonathan Griffin, Managing Director, IFIS Publishing &
Distributed Edge Computing
Chapter 26 Estimation for Software Projects.
Cisco Meraki Digital Solutions for K-12 Education
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #14 Open Topics and Wrap up Instructor: Chen Li.
What Are Performance Counters?
Microsoft Azure Data Catalog
UCLA Health Data Analytics Strategy
Interactive Powerpoint
Implementing ETL solution for Incremental Data Load in Microsoft SQL Server Ganesh Lohani SR. Data Analyst Lockheed Martin
Presentation transcript:

Cloudberry: Interactive Analytics and Visualization on Large-Scale Data The Cloudberry Team

Why Cloudberry? “Eat your own dog food” A general-purpose middleware system Support analytics and visualization Interactive: sub-second response time

Presidential election 2012

First attempt: “Cherry demo”

Our TweetMap in 2017

Cloudberry: architecture

A use case: Zika analysis Use of Twitter Data to Track the 2016 Zika Virus Epidemic in the United States Shahir Masri1, Jianfeng Jia2, Chen Li2, Guofa Zhou1, Ming-Chieh Lee1, Guiyun Yan1, Jun Wu1*

Many front-end tools can be used

API Example: # of “Zika Virus” tweets per state

View caching and incremental computation

View caching and incremental computation

What if no views? Query Slicing Response time saving come from The view is small The base dataset time predicate is small

Query slicing

Open challenges Other frontend solutions Advanced techniques for answering queries using views Middleware caching Visualizing large number of records on the frontend Other data domains

Open source

Dynamically updating slicing interval value Dataset: Twitter(id: int, day: date, text: string) Query: count number of tweets talking about “zika” from last week Deadline: 2s Slicing on “day”