Ricardo Jimenez-Peris Universidad Politecnica de Madrid

Slides:



Advertisements
Similar presentations
Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:
Advertisements

MS I Scalable Multimedia Servers Walid G. Aref Research Scientist Panasonic Information and Networking Technologies Laboratory (PINTL) Princeton, New Jersey.
Advanced Data Mining and Integration Research for Europe ADMIRE – Framework 7 ICT ADMIRE Overview European Commission 7 th.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Word Wide Cache Distributed Caching for the Distributed Enterprise.
What is (Application) Clustering and Why do you Want to Use it? February 2005 Eero Teerikorpi CEO.
1 Advanced Storage Technologies for High Performance Computing Sorin, Faibish EMC NAS Senior Technologist IDC HPC User Forum, April 14-16, Norfolk, VA.
7th Workshop on Fusion Data Processing Validation and Analysis Integration of GPU Technologies in EPICs for Real Time Data Preprocessing Applications J.
N. GSU Slide 1 Chapter 02 Cloud Computing Systems N. Xiong Georgia State University.
Data Warehousing at Acxiom Paul Montrose Data Warehousing at Acxiom Paul Montrose.
IMDGs An essential part of your architecture. About me
Technology for Tomorrow’s High Performance Exchanges Paul Michaud Global Executive IT Architect for Financial Markets November 2009 © 2009 IBM Corporation.
Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al.
Emergency Services Workshop, 21th-24 th of October, Vienna, Austria Page 1 IP-Based Emergency Applications and Services for Next Generation Networks PEACE.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Cache Fusion Making Shared Storage Perform for Vanilla Systems RAC Architecture.
FP CumuloNimbo: A Highly Scalable Transactional Cloud PaaS Bettina Kemme, McGill University CumuloNimbo Canada EU Future Internet Workshop, 2011.
WLCG critical services update Andrea Sciabà WLCG operations coordination meeting December 18, 2014.
Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.
Shared Nothing Architecture Allen Archer. What is Shared Nothing architecture? It is a distributed architecture in which each node is independent and.
Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Issues in Cloud Computing. Agenda Issues in Inter-cloud, environments  QoS, Monitoirng Load balancing  Dynamic configuration  Resource optimization.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
Private Cloud Solution for the FSI Success Stories
Connected Infrastructure
Business System Development
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Scaling Network Load Balancing Clusters
Reducing Risk with Cloud Storage
How Alluxio (formerly Tachyon) brings a 300x performance improvement to Qunar’s streaming processing Xueyan Li (Qunar) & Chunming Li (Garena)
WP18, High-speed data recording Krzysztof Wrona, European XFEL
5/13/2018 1:53 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Introduction to Distributed Platforms
Berkeley Cluster Projects
op5 Monitor - Scalable Monitoring
Database Services at CERN Status Update
2016 Citrix presentation.
IoT at the Edge Technical guidance deck.
PROTEAN: A Scalable Architecture for Active Networks
Operational & Analytical Database
GlassFish in the Real World
Connected Infrastructure
Distributed Multimedia Systems
In-Memory Performance
Introduction to Wireless Sensor Networks
GRID COMPUTING PRESENTED BY : Richa Chaudhary.
Central Florida Business Intelligence User Group
Cloud Computing.
GGF15 – Grids and Network Virtualization
IoT at the Edge Technical guidance deck.
Capitalize on modern technology
CS294-1 Reading Aug 28, 2003 Jaein Jeong
Specialized Cloud Mechanisms
Overview of big data tools
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
Cloud Computing Architecture
GATES: A Grid-Based Middleware for Processing Distributed Data Streams
with Raul Castro Fernandez* Matteo Migliavacca+ and Peter Pietzuch*
Deterministic and Semantically Organized Network Topology
Database System Architectures
Enabling the business-based Internet of Things and Services
Microsoft Virtual Academy
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Scalable Autonomic Streaming Middleware for Real-Time Processing of Massive Data Flows Ricardo Jimenez-Peris Universidad Politecnica de Madrid Project Coordinator

Project Data Start: February 2008. Duration: 3 years. Partners: UPM – Spain (coord.). FORTH - Greece. TU Dresden - Germany. Telefonica - Spain. Exodus - Greece. Epsilon - Italy.

Background Data streaming is a new paradigm developed in the database community to process large data flows in memory in an online fashion. It allows to perform continuous queries over flowing data. Most existing platforms are centralized, and a few distributed, and perform 1-2 orders of magnitude better than relational DBs.

Background: Data Streaming Operators

Background: Data Streaming Query

Scope Many potential applications in Internet today require to process huge amounts of information in an online fashion: Mitigation of DDoS attacks. Spam filtering. Processing the output of sensor networks. Detecting fraud in cellular telephony. Financial applications. QoS monitoring for enforcing SLAs. Real time data mining. Etc.

Objectives Stream aims at developing a highly scalable middleware infrastructure to process massive data flows in real time. The innovation lies in the sheer scale targeted by the project 1-2 orders of magnitude higher than current technology.

Innovation Parallelizing data streaming operators: Currently a query operator can be deployed on a single site and it has to process the full data flow thus becoming the bottleneck. Stream is developing distributed versions of query operators that enable to run individual query operators in a cluster of sites.

Innovation: Parallel Data Streaming Op1 upstream downstream O p2 p3

Innovation Exploiting leading edge high performance networks and IO systems: Reaching 40 gbs for both networking and IO. This results in high throughput communication among sites and very low latency. Low cost storage system: 1 PC controlling 40 disks.

Architecture Data Mining Layer Autonomic Controller Layer Parallel Data Streaming Layer Data Streaming Layer High Performance IO & Storage Layer

Innovation Self-healing: Self-configuring: Self-provisioning: Able to tolerate failures  Novel approach. Able to online recover new nodes. Self-configuring: Dynamic load balancing. Self-provisioning: Nodes are added and removed as needed depending on the load.

Expected Outcome Highly scalable and autonomic infrastructure to process massive data flows. 2 orders of magnitude more scalable than current distributed data streaming platforms. Application to 3 different markets: Telco: Fighting fraud in cellular telephony. Services: Real-time checking of SLAs fulfillment. Financial/banking: Detection of laundry financial operations/Fraud detection in credit card payments/Real time data warehousing.

Current Status Month 8 of the project. Prototypes of all layers (except automic controller foreseen for the 2nd year). Cluster with 50 nodes interconnected with Myrinet10G setup. First tests of parallel data streaming exhibiting high scalability. Prototypes of IO and storage tiers in advanced state.

Questions?