Business Discovery, Monitoring & Reporting Data Flow iCLM UI Operator Systems OCS IN CDR PCC CRM Marketing Operations CSR Monitoring Marketing Integration.

Slides:



Advertisements
Similar presentations
Syncsort Data Integration Update Summary Helping Data Intensive Organizations Across the Big Data Continuum Hadoop – The Operating System.
Advertisements

Real-Time Big Data Use Cases John Leach CTO, Splice Machine.
Transform + analyze Visualize + decide Capture + manage Dat a.
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
Hadoop Ecosystem Overview
Confidential ODBC May 7, Features What is ODBC? Why Create an ODBC Driver for Rochade? How do we Expose Rochade as Relational Transformation.
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
SPONSORS. Microsoft PowerPivot for SQL Server, Excel 2010, and SharePoint 2010 Michael Herman Syntergy, Inc.
Best Practices for Data Warehousing. 2 Agenda – Best Practices for DW-BI Best Practices in Data Modeling Best Practices in ETL Best Practices in Reporting.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
Introduction to Hadoop and HDFS
Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.
An Introduction to HDInsight June 27 th,
Data and SQL on Hadoop. Cloudera Image for hands-on Installation instruction – 2.
Nov 2006 Google released the paper on BigTable.
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
Technology Drill Down: Windows Azure Platform Eric Nelson | ISV Application Architect | Microsoft UK |
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
Harnessing Big Data with Hadoop Dipti Sangani; Madhu Reddy DBI210.
Apache Hadoop on Windows Azure Avkash Chauhan
Data Analytics and Hadoop Service in IT-DB Visit of Cloudera - April 19 th, 2016 Luca Canali (CERN) for IT-DB.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Our experience with NoSQL and MapReduce technologies Fabio Souto.
Microsoft Ignite /28/2017 6:07 PM
Data Analytics Challenges Some faults cannot be avoided Decrease the availability for running physics Preventive maintenance is not enough Does not take.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
From RDBMS to Hadoop A case study Mihaly Berekmeri School of Computer Science University of Manchester Data Science Club, 14th July 2016 Hayden Clark,
Big Data & Test Automation
Integration of Oracle and Hadoop: hybrid databases affordable at scale
OMOP CDM on Hadoop Reference Architecture
Connected Infrastructure
Big Data Enterprise Patterns
Integration of Oracle and Hadoop: hybrid databases affordable at scale
BigData - NoSQL Hadoop - Couchbase
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
Hadoop and Analytics at CERN IT
From DBA to DPA – Becoming a Data Platform Administrator
Software Systems Development
Fast Data Made Easy Ted Malaska Cloudera With Kafka and Kudu
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Chapter 14 Big Data Analytics and NoSQL
Sqoop Mr. Sriram
CLOUDERA TRAINING For Apache HBase
Hadoopla: Microsoft and the Hadoop Ecosystem
Operational & Analytical Database
Hadoop.
Delivering Business Insight with SQL Server 2005
Connected Infrastructure
Enabling Scalable and HA Ingestion and Real-Time Big Data Insights for the Enterprise OCJUG, 2014.
SQOOP.
Powering real-time analytics on Xfinity using Kudu
Ministry of Higher Education
Big Data - in Performance Engineering
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Introduction to Apache
Overview of big data tools
DAT381 Team Development with SQL Server 2005
Charles Tappert Seidenberg School of CSIS, Pace University
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
Data Warehousing Concepts
Big DATA.
IBM C IBM Big Data Engineer. You want to train yourself to do better in exam or you want to test your preparation in either situation Dumpspedia’s.
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
SQL Server 2019 Bringing Apache Spark to SQL Server
Pig Hive HBase Zookeeper
Presentation transcript:

Business Discovery, Monitoring & Reporting Data Flow iCLM UI Operator Systems OCS IN CDR PCC CRM Marketing Operations CSR Monitoring Marketing Integration Layer RT Complex Event Processing Decisioning Engine Decisioning Engine Business Discovery, Monitoring & Reporting Visual Rules Subscriber Profile Channels Subscriber Data Store HBase Big Data Analytics Hive DWH

Legacy Architecture

Legacy System analysis In 2011 we reached the system’s glass-ceiling: 10 M subscribers. 120 M events per day. We analyzed the architecture bottle-necks and identified the following issues: Real-Time sub-system: Queue management in Oracle – incoming/outgoing data streams, application logs. Subscriber state BLOB update in Oracle (random access). Analytical sub-system: Large joins between dozens of facts/dimension/entity tables. ETL from OLTP DB.

Architecture blueprint To overcome the problems raised in the analysis phase we have made several architectural decisions. Queue’s management – in a distributed file-system supporting 10s of millions of small files (<10s of MB) Subscriber OLTP state management – by using NoSQL, Key-Value store. Analytical workflows – should work over files holding Subscriber aggregates in BLOB and thus avoiding large joins. We examined several solution technologies and concluded that Big-Data will provide the best TCO, but is lacking in enterprise readiness We identified extra functional requirements to support the system quality attributes and conducted an RFP to select the Big-Data vendor

We conducted an RFP for selecting the most Telco-Grade platform. The RFP focused on non-functional capabilities such as sustainable performance, high-availability and manageability.

The approach Each step should increase scalability and reduce TCO. Runtime (OLTP) processing: We replace the underline plumbing's-minimal changes to business logic. All changes can be turned on/off by GUI configurations: Modular hybrid architecture. Ability to work in dual mode - Good for QA…But also for production (legacy)… Upgrade path from legacy is kept in all phases Analytics processing: Calculate the Profile in M/R (Java). Scalable. We have the best Java developers. Wrap it with a DSL (Domain-Specific-Languages) That’s how we work for years – (ModelTalk paper) Non-Java-programmers can do the Job.

Phase 1 Phase # Customers # Events Legacy 10M 120M Phase 1 200M

Phase 1 – File queues in NFS Resulting context Pure plumbing change – no changes to business logic code. Offloading oracle: *2 Performance boost. No BigData technology. Windows NFS client performance is a bottleneck. Phase # Customers # Events Legacy 10M 120M Phase 1 200M

Reverse engineering of the SQL code

Phase 2 Phase # Customers # Events Legacy 10M 120M Phase 1 200M unlimited

Phase 2 – Introducing MapR Hadoop Cluster Resulting Context MapR FS + NFS : Horizontally scalable Cheap compared to high-end NFS solutions. Fast and High-Available (using VIPs) Avoiding another hop to HDFS (Flume, Kafka). Many small files are stored in HDFS (100s of millions) – no need to merge files Phase # Customers # Events Legacy 10M 120M Phase 1 200M Phase 2 unlimited

Phase 2 – Introducing MapR Hadoop Cluster Resulting Context Avro files: Complex Object Graph Troubleshooting with PIG Out-of-the-box upgrade (e.g. adding field) Map/Reduce is incremental – Avro record capture the subscriber state Map/Reduce efficiency - avoiding huge joins Subscriber Profile calculation: Performance : 2-3 hours. Linear scalability: No limitation on number of subscribers/raw data (buy more nodes) Fast run over history data allows for early launch Sqoop - very fast insertions to MS-SQL (10s of millions of records in minutes). Data-Analysts started working over Hive environment. No HA for OOZIE yet… Hue is premature MS-SQL and ODBC over Hive is slow and limited

Phase 3 Phase # Customers # Events Legacy 10M 120M Phase 1 200M unlimited Phase 3 300M

Phase 3 –Introducing MapR M7 Table Extensive YCSB load tests to find best table structure and read/update granularity. Main conclusions: M7 knows how to handle very big heap – 90GB. Update granularity : small updates (using columns) = fast reads (*)While in other KV store need to update the entire BLOB CSR tables migrated from Oracle to M7 Table: 10s of billions of records Need sub-second random access per subscriber 99.9% Writes – by Runtime machines (almost each event processing operation produces update) 0.1% Reads – by Customer’s CSR representative. Rows – per subscriber key, 10’s of millions 2 CFs – TTL 365 days. 1 version. Qualifier: key:[date_class_event_id], value: record Up to thousands per Row

Phase 3 –Introducing MapR M7 Table Resulting Context Choosing the right features – no too demanding performance wise. Easy to create and manage tables– still there’s some tweaking. No cross-table ACID - need to develop a solution for keeping consistency across M7 Table/Oracle/Files-system. Hard for QA - compared to RDBMS. No easy way to query. Need to develop tools. Phase # Customers # Events Legacy 10M 120M Phase 1 200M Phase 2 unlimited Phase 3 300M

Phase 4 Phase # Customers # Events Legacy 10M 120M Phase 1 200M unlimited Phase 3 300M Phase 4

Phase 4 – Migrating OLTP features to M7 tables Subscriber State table migrated from Oracle to M7 Table: 25% Writes– by Runtime machines updating the state 100% Reads – by Runtime. Rows – per subscriber key, 10’s of millions 1 CFs – TTL -1. 1 version. YCSB to validate the solution  Sizing model Qualifier: key:state_name, value: state value. Dozens per Row. But….Only 10% are being updated per event Subscriber Profile Table migrated from MS-SQL to M7 Table. Bulk insert once a day Outbound Queue Table migrated from MS-SQL to M7 Table.

Phase 4 – Migrating OLTP features to M7 tables Resulting Context No longer dependent on Oracle for OLTP. Real-time processing can handle billions of events per day. Sizing is linear and easy to calculate: Number of subscribers * state size * 80% should reside in cache. HW spec: 128GB RAM, 12 SAS drives. Consistency management is very complicated. Phase # Customers # Events Legacy 10M 120M Phase 1 200M Phase 2 unlimited Phase 3 300M Phase 4

Phase 5

Summary We migrated to Big-Data over 5 product versions spanning 1.5 years The Software Architects were dominant in defining the product roadmap The Software Architect has a paramount role in Big-Data architecture Having a well defined architecture allows for controlled, well planned architecture changes with minimal to no rework

Atzmon Hen-Tov Lior Schachter