MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Grand Challenge: The BlueBay Soccer Monitoring Engine Hans-Arno Jacobsen Kianoosh Mokhtarian Tilmann Rabl Mohammad.

Slides:

Advertisements

Similar presentations

Solving Manufacturing Equipment Monitoring Through Efficient Complex Event Processing Tilmann Rabl, Kaiwen Zhang, Mohammad Sadoghi, Navneet Kumar Pandey,

Advertisements

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.

Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.

Tracking a Soccer Game with Big Data Srinath Perera Director of Research, WSO2 Member, Apache Software

OLAP Query Processing in Grids

To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki,

Multi-dimensional Packet Classification on FPGA: 100Gbps and Beyond

LIBRA: Lightweight Data Skew Mitigation in MapReduce

© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert

Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.

FlumeJava Easy, Efficient Data-Parallel Pipelines Mosharaf Chowdhury.

SILT: A Memory-Efficient, High-Performance Key-Value Store

Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.

Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.

A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.

Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.

Cache-Conscious Runtime Optimization for Ranking Ensembles Xun Tang, Xin Jin, Tao Yang Department of Computer Science University of California at Santa.

SEC(R) 2008 Intel® Concurrent Collections for C++ - a model for parallel programming Nikolay Kurtov Software and Services.

Berlin SPARQL Benchmark (BSBM) Presented by: Nikhil Rajguru Christian Bizer and Andreas Schultz.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG MADES - A Multi-Layered, Adaptive, Distributed Event Store Tilmann Rabl Mohammad Sadoghi Kaiwen Zhang Hans-Arno.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.

Task Scheduling for Highly Concurrent Analytical and Transactional Main-Memory Workloads Iraklis Psaroudakis (EPFL), Tobias Scheuer (SAP AG), Norman May.

Pattern Matching in DAME using AURA technology Jim Austin, Robert Davis, Bojian Liang, Andy Pasley University of York.

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

Y. Kotani · F. Ino · K. Hagihara Springer Science + Business Media B.V Reporter: 李長霖.

ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.

ACMSE’04, ALDepartment of Electrical and Computer Engineering - UAH Execution Characteristics of SPEC CPU2000 Benchmarks: Intel C++ vs. Microsoft VC++

StreamX10: A Stream Programming Framework on X10 Haitao Wei School of Computer Science at Huazhong University of Sci&Tech.

Performance of mathematical software Agner Fog Technical University of Denmark

MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.

Distributed Automatic Service Composition in Large-Scale Systems Songlin Hu*, Vinod Muthusamy +, Guoli Li +, Hans-Arno Jacobsen + * Chinese Academy of.

INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.

Parallel Event Processing for Content-Based Publish/Subscribe Systems Amer Farroukh Department of Electrical and Computer Engineering University of Toronto.

Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.

Optimizing MapReduce for GPUs with Effective Shared Memory Usage Department of Computer Science and Engineering The Ohio State University Linchuan Chen.

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.

CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.

1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),

Streaming Big Data with Self-Adjusting Computation Umut A. Acar, Yan Chen DDFP January 2014 SNU IDB Lab. Namyoon Kim.

Page 1 A Platform for Scalable One-pass Analytics using MapReduce Boduo Li, E. Mazur, Y. Diao, A. McGregor, P. Shenoy SIGMOD 2011 IDS Fall Seminar 2011.

University of Michigan Electrical Engineering and Computer Science Adaptive Input-aware Compilation for Graphics Engines Mehrzad Samadi 1, Amir Hormati.

Flexible Filters for High Performance Embedded Computing Rebecca Collins and Luca Carloni Department of Computer Science Columbia University.

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Distributed Ranked Data Dissemination in Social Networks Joint work with: Mo Sadoghi Vinod Muthusamy Hans-Arno.

A parallel High Level Trigger benchmark (using multithreading and/or SSE)‏ Håvard Bjerke.

Sunpyo Hong, Hyesoon Kim

SketchVisor: Robust Network Measurement for Software Packet Processing

Chapter Overview General Concepts IA-32 Processor Architecture

NFV Compute Acceleration APIs and Evaluation

CSCI206 - Computer Organization & Programming

Gwangsun Kim, Jiyun Jeong, John Kim

Efficient Evaluation of XQuery over Streaming Data

Seth Pugsley, Jeffrey Jestes,

A Dynamic Scheduling Framework for Emerging Heterogeneous Systems

Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.

Distributed Network Traffic Feature Extraction for a Real-time IDS

High Performance Computing on an IBM Cell Processor --- Bioinformatics

Solving DEBS Grand Challenge with WSO2 CEP

Sub-millisecond Stateful Stream Querying over

A Lock-Free Algorithm for Concurrent Bags

Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

CSCI206 - Computer Organization & Programming

High Performance Stream Processing for Mobile Sensing Applications

Optimizing MapReduce for GPUs with Effective Shared Memory Usage

Slides prepared by Samkit

Hybrid Programming with OpenMP and MPI

Department of Computer Science University of California, Santa Barbara

CSC Multiprocessor Programming, Spring, 2011

Nikola Grcevski Testarossa JIT Compiler IBM Toronto Lab

Presentation transcript:

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG Grand Challenge: The BlueBay Soccer Monitoring Engine Hans-Arno Jacobsen Kianoosh Mokhtarian Tilmann Rabl Mohammad Sadoghi Reza Sherafat Kazemzadeh Young Yoon Kaiwen Zhang DEBS '13, July Middleware Systems Research Group University of Toronto

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 2 GUI Client

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 3 Agenda Multi-stage monitoring pipeline World Cup: survey of existing solutions – Esper – Storm BlueBay: custom monitoring engine Evaluation

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 4 Monitoring architecture Event processing stage Data collection and dispatching Distribution stage

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 5 Baseline solutions for comparison – Our own custom solution: BlueBay Representative range of solutions – CEP Engine+Language: Esper – MapReduce-like: Storm – CEP Language+Compiler: StreamIT – DSMS: Stream World cup of event processing Library support for common languages such as Java

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 6 Esper Query #2 insert into b_possession_percent select *, sum(b_ts - prev(b_ts, 1)) as time_total, sum((b_ts - prev(b_ts, 1)) * msrg.GameSetting.equalStr(owner,prev(owner,1),'teamA')) as time_teamA, sum((b_ts - prev(b_ts, 1)) * msrg.GameSetting.equalStr(owner,prev(owner,1),'teamB')) as time_teamB from b_possession.win:time(10 seconds) Use of Esper stream primitives To process the stream Use of user-defined helper functions Use of Esper window semantics To extract stream at correct granularity

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 7 Storm Query #4 Project right attributes Split the stream (Map) User-defined functions Player Heatmap Player Position Sensor Events Player Position Output Player Position Player Position Player ID Merge output (Reduce)

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 8 BlueBay Origins

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 9 Written in C++ Modular design – Reusable and shared components Other possible queries: – Passing success (already implemented) – Offside detection – Man-marking success Implementation details

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 10 BlueBay Architecture Stream components per object w/ noise reduction Circular list of timestamp range buckets O(1) time and memory Trajectory Estimator Custom-built

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 11 Efficient implementation: – ID-to-index array-based maps – Avoid floating points – … Parallelization: – Inter-query & Intra-query – Q3 heatmaps is the most time consuming » Parallelize per player and per grid resolution Optimizations

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 12 Setup – Intel Xeon 3.20 GHz 4-core, 6 GB RAM Single-threaded benchmarking: Evaluation EngineQ1Q2Q3Q4 BlueBay141x165x30x187x Esper7.5x2.4x6.3x2.3x Storm9.7x8.6x9.8x8.6x Average speedup 30x

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 13 Impact of multi-threading Queue size to synchronize the queries Throughput bounded by slowest query Highest sustained average 790k e/s Missing ball data in workload BlueBay: 60x speedup!

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 14 Throughput-latency tradeoff Time required to fully process one input event for all queries Less throughput means less queue size Less queue size blocks faster threads Give full CPU usage to slower threads, clearing older events Combined throughput of output events for all queries Large queue size allows fast threads to process and emit events quickly and ahead of slower threads

MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG 15 Bluebay engine – Modular design to support other queries – Optimized for the soccer use case World cup of event processing – BlueBay: 790k e/s and 60x speedup – Esper and Storm finalists – Ability to define custom functions essential for rapid implementation Conclusion