Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica GAMIFIED REWARDS

Slides:



Advertisements
Similar presentations
From Startup to Enterprise A Story of MySQL Evolution Vidur Apparao, CTO Stephen OSullivan, Manager of Data and Grid Technologies April 2009.
Advertisements

Oracle to MySQL Database Migration SQLWays - Migration Software Presentation Copyright (c) Ispirer Systems Ltd. All Rights Reserved.
Creating HIPAA-Compliant Medical Data Applications with Amazon Web Services Presented by, Tulika Srivastava Purdue University.
Jose Chinchilla, MCTS, MCITP. Nuevo Ambiente de Desarrollo SQL Server 2012 Habilidades T-SQL a Super Poderes SSIS Demo BIDS Fuentes de Datos (Data Sources)
BY LECTURER/ AISHA DAWOOD DW Lab # 3 Overview of Extraction, Transformation, and Loading.
Module 8 Importing and Exporting Data. Module Overview Transferring Data To/From SQL Server Importing & Exporting Table Data Inserting Data in Bulk.
Mecanismos de alta disponibilidad con Microsoft SQL Server 2008 Por: ISC Lenin López Fernández de Lara.
Moving Data Lesson 23. Skills Matrix Moving Data When populating tables by inserting data, you will discover that data can come from various sources.
SSIS Field Notes Darren Green Konesans Ltd. SSIS Field Notes After years of careful observation and recording of the Species SSIS, Genus ETL, in both.
CX Analytics: Best Practices in Measuring For Success
Cacti Workshop Tony Roman Agenda What is Cacti? The Origins of Cacti Large Installation Considerations Automation The Current.
Toolbox Mirror -Overview Effective Distributed Learning.
Introduction to MySQL Administration.  Server startup and shutdown ◦ How to manually start and stop it from the command line ◦ How to arrange an automated.
Technical Architectures
IS4401 Project Technology Issues. Introduction This seminar covers Databases When to use a Database What Database to use Development Tools Visual Studio.
Passage Three Introduction to Microsoft SQL Server 2000.
Copying, Managing, and Transforming Data With DTS.
ETL Design and Development Michael A. Fudge, Jr.
MySQL Data Warehousing Survival Guide Marius Moscovici Steffan Mejia
Replication with MySQL 5.1 Ligaya Turmelle Senior Technical Support Engineer - MySQL
Copyright © 2013 NetEase 马进 app DDB introduce.
SQL Server to MySQL Database Migration SQLWays - Migration Software Presentation March 2009 Copyright (c) Ispirer Systems Ltd.
Sage SalesLogix What’s New in Sawgrass. Data Safeguarding.
Open Source Backup 1 Cloud Backup of Distributed MySQL Applications Taking snapshot of a thousand dancing dolphins Chander Kant Paddy.
Design and Implementation of a Module to Synchronize Databases Amit Hingher Reviewers: Prof. Dr. rer. nat. habil. Andreas Heuer Prof. Dr.-Ing. Hartmut.
© Continuent 9/19/2015 PostgreSQL Lightning Talk Availability, Scaling, and more with Tungsten Stephane Giron and Gilles Rayrat PG Euro Prato Italy.
Converting COBOL Data to SQL Data: GDT-ETL Part 1.
Data: Migrating, Distributing and Audit Tracking Michelle Ayers, Advisory Solution Consultant
© Continuent 2010 Liberating Your Data From MySQL: Cross-Database Replication to the Rescue! Robert Hodges and Linas Virbalas Continuent, Inc.
OpenACS: Porting Oracle Applications to PostgreSQL Ben Adida
1 The Fast(est) Path to Building a Private/Hybrid Cloud October 25th, 2011 Paul Mourani RightScale.
IT 456 Seminar 5 Dr Jeffrey A Robinson. Overview of Course Week 1 – Introduction Week 2 – Installation of SQL and management Tools Week 3 - Creating and.
Carey Probst Technical Director Technology Business Unit - OLAP Oracle Corporation.
A Brief Documentation.  Provides basic information about connection, server, and client.
Integration Services in SQL Server 2008 Allan Mitchell SQL Server MVP.
Copyright © 2004 Insight Technology, Inc. All Rights Reserved. 1 Performance Insight for Oracle 5.2 V.S Oracle Enterprise Management
02/09/2010 Industrial Project Course (234313) Virtualization-aware database engine Final Presentation Industrial Project Course (234313) Virtualization-aware.
SQL Server 2005 Implementation and Maintenance Chapter 12: Achieving High Availability Through Replication.
Ing. Erick López Ch. M.R.I. Replicación Oracle. What is Replication  Replication is the process of copying and maintaining schema objects in multiple.
Powered by Microsoft Azure, PointMatter Is a Flexible Solution to Move and Share Data between Business Groups and IT MICROSOFT AZURE ISV PROFILE: LOGICMATTER.
02 | Data Flow – Extract Data Richard Currey | Senior Technical Trainer–New Horizons United George Squillace | Senior Technical Trainer–New Horizons Great.
Linux Operations and Administration
Migrating Data to SQL Azure Arunraj Chandrasekaran Twitter June 21, 2011.
Cloud Cellar Offers Users a Cost-Effective, Turnkey Backup and Restore Solution for Their Applications and Data Hosted in the Microsoft Azure Cloud MICROSOFT.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
Strategies for Working with Texas-sized Databases Robert L Davis Database Engineer
Ignite in Sberbank: In-Memory Data Fabric for Financial Services
In this session, you will learn to: Manage databases Manage tables Objectives.
Sergi Rubio Manrique “Archiving System at ALBA”. Tango Meeting. ALBA. October 16 th, MMVIII 1 Archiving ALBA Sergi Rubio Manrique.
Plan for Populating a DW
Fundamental of Databases
Course: Cluster, grid and cloud computing systems Course author: Prof
What’s new in SQL Server 2017 for BI?
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
MyRocks at Facebook and Roadmaps
Magento Enterprise cloud Edition had launched this year, and is a platform as a service environment designed for amazon web service and for magento 2.0.
AWS DevOps Engineer - Professional dumps.html Exam Code Exam Name.
Where can I download Aws Devops Engineer Professional Exam Study Material - Get Updated Aws Devops Engineer Professional Braindumps Dumps4downlaod.us
Get Amazon AWS-DevOps-Engineer-Professional Exam Real Questions - Amazon AWS-DevOps-Engineer-Professional Dumps Realexamdumps.com
Buy September 2018 Valid Amazon AWS-SysOps Dumps Questions - Amazon AWS-SysOps Braindumps Realexamdumps.com
Near Real Time ETLs with Azure Serverless Architecture
Cloud Data Replication with SQL Data Sync
Agile testing for web API with Postman
ITAS Risk Reporting Integration to an ERP
NoSQL databases An introduction and comparison between Mongodb and Mysql document store.
An Overview of GoldenGate Replication
SSDT, Docker, and (Azure) DevOps
QlikView for use with SAP Netweaver Version 5.8 New Features
Presentation transcript:

Building a High-Volume Reporting System on Amazon AWS with MySQL, Tungsten, and Vertica GAMIFIED REWARDS

1.Custom MySQL ETL via shell scripts, visualizations in Tableau 2.ETL via a custom Tungsten applier into Vertica 3.New Tungsten Vertica applier, built by Continuent 4.Sharded transactional system, multiple Tungsten Vertica appliers What I’ll cover: Our reporting/analytics growth stages, their pitfalls and what we’ve learned:

Stage 1 : Custom MySQL ETL via shell scripts, visualizations in Tableau 1.On slave, dump an hour’s worth of new rows via SELECT INTO OUTFILE 2.Ship data file to aggregations host, dump old hourly snapshot, load new 3.Perform aggregation queries against temporary snapshot and FEDERATED tables 4.Tableau refreshes its extracts after aggregated rows are inserted.

Detour : RAID for the Win Big drop in API endpoint latency (writes)

Stage 2 : ETL via a custom Tungsten applier into Vertica

4/11/12 Stage 2 : Customized Tungsten Replication Setup Vertica Extract From Master to Log Extract from Log Filter Custom Vertica JDBC Applier Filter DDL & unwanted tables Slave Replicator Extract binlog to Tungsten Log MySQL Master Replicator

Stage 2 : Issues with the Custom Tungsten Filter 1.OLTP transactions on Vertica are very slow! (10 transactions per second vs. around 1000 per second for a MySQL slave). Slave applier could not keep up with MySQL master. 2.Person who created the applier was no longer in the company. 3.Tungsten setup including custom applier was difficult to maintain and hard to move to other hosts.

Detour : flexible APIs and baseball schedules

Stage 3 : New Tungsten Vertica Applier

4/11/12 Stage 3: A Template-Driven Batch Apply Process Tungsten Replicator Pipeline CSV Files CSV Files DELETE, then INSERT (Template) Extract- Filter- Apply MySQL Extract- Apply Extract-Filter-Apply Staging Table 233, d, 64, …, 1 233, i, 64, …, 2 239, I, 76, …, 3 Staging Table 233, d, 64, …, 1 233, i, 64, …, 2 239, I, 76, …, 3 COPY (Template) Base Tables 63, ‘bob’, 23, … 64, ‘sue’, 76, … 67, ‘jim’, 1, … 76, ‘dan’, 25, … 98, ‘joe’, 66, … Base Tables 63, ‘bob’, 23, … 64, ‘sue’, 76, … 67, ‘jim’, 1, … 76, ‘dan’, 25, … 98, ‘joe’, 66, …

4/11/12 Stage 3 : Batch Applier Replication Setup Vertica Extract From Master to Log Extract from Log Slave Replicator Extract binlog to Tungsten Log MySQL Master Replicator Batch applier using SQL template commands Filter Use built-in Filters; DDL ignored CSV Write date to disk files COPY / INSERT

Stage 3 : Solving Problems to Get the New Applier to Work 1.Testing – Developed a lightweight testing mechanism for heterogeneous replication 2.Batch applier implementation – Two tries to get it right including SQL templates and full datatype support 3.Character sets – Ensuring consistent UTF-8 handling throughout the replication change, including CSV files 4.Time zones – Ensuring Java VM handled time values correctly 5.Performance – Tweak SQL templates to get 50x boost over old applier

Detour : Sharding or Learning How To Sleep In Any Position

Stage 4 : Sharded transactional system, multiple Tungsten Vertica appliers

Solving Problems to Scale Up The Replication Configuration 1.Implement remote batch apply so Tungsten can run off-board from Vertica 2.Convert replication to a direct pipeline with a single service between MySQL and Vertica 3.Create a script to deploy replicator in a single command 4.Create staging tables on Vertica server

Remaining Challenges to Complete Replication Setup 1.Configure replication for global and local DBShards data 2.Ensure performance is up to snuff-currently at transactions per second 3.Introduce intermediate staging servers to reduce number of replication streams into Vertica

Thank You! In summary: 1.Tungsten is a great tool when it comes to MySQL ETL automation, so check it out as an alternative to custom in-house scripts or other options. 2.Vertica is a high-performance, scaleable BI platform that now pairs well with Tungsten. Full360 offers a cloud-based solution. 3.If you’re just getting started on the BI front, hire a BI developer to focus on this stuff, if you can. 4.I see no reason why this framework couldn’t scale to easily handle whatever our business needs in the future.