Alibaba and PostgreSQL Guangzhou Zhang Practices on providing PG as a cloud service in Alibaba.

Slides:



Advertisements
Similar presentations
Refeng Wu CQ5 WCM System Administrator
Advertisements

The Architecture of Oracle
Acknowledgments Byron Bush, Scott S. Hilpert and Lee, JeongKyu
Overview of Database Administrator (DBA) Tools
Oracle9i Database Administrator: Implementation and Administration 1 Chapter 2 Overview of Database Administrator (DBA) Tools.
Page Footer Keed Education Oracle Database Administration Basic Copyright 2009 Keed Education BV Version Concept.
The Zebra Striped Network Filesystem. Approach Increase throughput, reliability by striping file data across multiple servers Data from each client is.
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
Toolbox Mirror -Overview Effective Distributed Learning.
ManageEngine TM Applications Manager 8 Monitoring Custom Applications.
Chapter 9 Auditing Database Activities
1 - Oracle Server Architecture Overview
Backup and Recovery Part 1.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Oracle Architecture. Database instance When a database is started the current state of the database is given by the data files, a set of background (BG)
NovaBACKUP 10 xSP Technical Training By: Nathan Fouarge
Backup & Recovery Concepts for Oracle Database
1 Copyright © 2005, Oracle. All rights reserved. Introduction.
© 2011 IBM Corporation 11 April 2011 IDS Architecture.
Distributed Deadlocks and Transaction Recovery.
SANPoint Foundation Suite HA Robert Soderbery Sr. Director, Product Management VERITAS Software Corporation.
Copyright © 2013 NetEase 马进 app DDB introduce.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
2 Copyright © 2006, Oracle. All rights reserved. Performance Tuning: Overview.
Chapter Oracle Server An Oracle Server consists of an Oracle database (stored data, control and log files.) The Server will support SQL to define.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
Basic Oracle Architecture
Sofia, Bulgaria | 9-10 October SQL Server 2005 High Availability for developers Vladimir Tchalkov Crossroad Ltd. Vladimir Tchalkov Crossroad Ltd.
CSE 781 – DATABASE MANAGEMENT SYSTEMS Introduction To Oracle 10g Rajika Tandon.
Ideas to Improve SharePoint Usage 4. What are these 4 Ideas? 1. 7 Steps to check SharePoint Health 2. Avoid common Deployment Mistakes 3. Analyze SharePoint.
7202ICT – Database Administration
Oracle Tuning Considerations. Agenda Why Tune ? Why Tune ? Ways to Improve Performance Ways to Improve Performance Hardware Hardware Software Software.
1 Oracle Architectural Components. 1-2 Objectives Listing the structures involved in connecting a user to an Oracle server Listing the stages in processing.
Copyright © Oracle Corporation, All rights reserved. 1 Oracle Architectural Components.
An Oracle server:  Is a database management system that provides an open, comprehensive, integrated approach to information management.  Consists.
UNIX File and Directory Caching How UNIX Optimizes File System Performance and Presents Data to User Processes Using a Virtual File System.
Microsoft ® Business Solutions–Navision ® 4.0 Development II - C/SIDE Solution Development Day 2.
A Guide to Oracle9i1 Database Instance startup and shutdown.
ArcGIS Server for Administrators
1 Chapter Overview Preparing to Upgrade Performing a Version Upgrade from Microsoft SQL Server 7.0 Performing an Online Database Upgrade from SQL Server.
Buffers Let’s go for a swim. Buffers A buffer is simply a collection of bytes A buffer is simply a collection of bytes – a char[] if you will. Any information.
1 Chapter Overview Performing Configuration Tasks Setting Up Additional Features Performing Maintenance Tasks.
1 MONGODB: CH ADMIN CSSE 533 Week 4, Spring, 2015.
Database Security and Auditing: Protecting Data Integrity and Accessibility Chapter 9 Auditing Database Activities.
IT Database Administration Section 09. Backup and Recovery Backup: The available options Full Consistent (cold) Backup Database shutdown, all files.
D Copyright © Oracle Corporation, All rights reserved. Loading Data into a Database.
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
GLOBAL EDGE SOFTWERE LTD1 R EMOTE F ILE S HARING - Ardhanareesh Aradhyamath.
Connect with life Vinod Kumar Technology Evangelist - Microsoft
Alwayson Availability Groups
4P13 Week 9 Talking Points
3 Copyright © 2007, Oracle. All rights reserved. Using the RMAN Recovery Catalog.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
Oracle Database Architectural Components
MySQL HA An overview Kris Buytaert. ● Senior Linux and Open Source ● „Infrastructure Architect“ ● I don't remember when I started.
 Database Administration Oracle Database Instance Management Starting Up and Shutting Down أ. ندى الغامدي, أ. ندى الطوالة.
Oracle Database High Availability
Jean-Philippe Baud, IT-GD, CERN November 2007
Physics validation database
Data Bridge Solving diverse data access in scientific applications
Oracle 11g Real Application Clusters Advanced Administration
Maximum Availability Architecture Enterprise Technology Centre.
A Technical Overview of Microsoft® SQL Server™ 2005 High Availability Beta 2 Matthew Stephen IT Pro Evangelist (SQL Server)
Farida Fassi, Damien Mercie
Oracle Database High Availability
Test Upgrade Name Title Company 9/18/2018 Microsoft SharePoint
Upgrading Your Private Cloud with Windows Server 2012 R2
Presentation transcript:

Alibaba and PostgreSQL Guangzhou Zhang Practices on providing PG as a cloud service in Alibaba

About this talk Firstly some background about PG in Alibaba Then issues and our solutions with providing PostgreSQL as a cloud service Finally future directions of our database cloud service Focuses on technical details and NOT from business-level or application development perspectives

PostgreSQL and Alibaba Alibaba Founded in 1999 , now one of the world largest internet companies Its cloud services (alicloud.com) cover both private and public cloud, which have gained the largest market share in China and being growing fast (revenue up 126% in 2015 Q4) Alibaba loves PostgreSQL Uses PG in its own internet businesses Government and state-owned companies are moving out of Oracle solutions. PG is the most perfect replacement for Oracle Counts on PG to attract Oracle users to its cloud services

PostgreSQL and Alibaba Alibaba uses PG for its own internet businesses Map service Online to Offline services CRM

PostgreSQL and Alibaba Provided PG as a cloud service at alicloud.com AliCloudDB database service for PG went online at alicloud.com in Cooperated with EDB to provide EDB’s Postgres Advanced Server as a cloud service

Architecture of cloud service for PG Master Slave HA Proxy layer LVS (Linux Virtual Server) PG Instances User requests thru VIP

Issues on enabling PG as a cloud service Transparent switch-over with Proxy Transparent connection pool with Proxy Handling OOM Handling IO hang Privilege Management

Needs for transparent switch-over There are quite a few cases requiring instance restart. And we implement a restart with switch-over Upgrade or degrade an instance’s class (e.g. from 2G mem to 4G) Move an instance from one machine to another PG kernel version upgrades Transparent switch-over is the process that does a switch-over without breaking existing user connections. This is key to SLA and customer satisfactory

A normal switch-over Master Slave HA Proxy layer PG Instances

A transparent switch-over Master Slave HA Proxy layer PG Instances

Transparent switch-over with Proxy Proxy does the connection re-establishing out of transaction boundaries Only for connections that have not done anything not re-creatable Temporary table usage Statement preparation Change the kernel to support a command like “set connection_user to xxx”for proxy to re-establish connection without an explicit authentication process

A transparent switch-over Master Slave HA Proxy layer PG Instances Change user to app user Application user

Needs for transparent connection pool Connection establishment is costly for PG Performance is bad for short-connection applications Backend process Proxy PG Instance

Transparent connection pool with Proxy Backend process Proxy PG Instance Conn pool

Transparent connection pool with Proxy Proxy needs to restore all the resources held by this connection before putting it into the connection pool DISCARD ALL is issued for this purpose Authentication needs to be performed against the incoming user when connection is reused Change the kernel to support “SET AUTH_REQ = ‘user/database/md5password/salt’” for authentication

Handling OOM OOM (out of memory) cases were frequently observed Cloud users tend to buy small class instance firstly, then regularly upgrade to higher class PostGIS large objects or big json objects are widely used in some instances Concurrent connections are not suitably configured by application When an instance goes OOM, one of its processes is picked up and killed by the Linux kernel. Postmaster process then detects the kill and restart the whole instance. All connections are then lost. How can we minimize OOM impact for users?

Handling OOM Linux Box CGroup backend CGroup backend

Handling OOM Linux Box CGroup backend CGroup backend CGroup Public CGroup Kill –USR2

Handling IO Hang Symbols of IO hangs (ext4 filesystem with mount –data=order on Linux) Message in PG error log:“WARNING: using stale statistics instead of current ones because stats collector is not responding” Long checkpoint fsync time (observed from error log when log_checkpoints set to on) Nearly all operations including setup of a new connection hang for > 10s. All instances on the same machine are affected. But IO usage is low (< 10%) for the problematic disk.

Handling IO Hang We observed such hangs are quite frequent. We found it gets worse when: Many instances shares one same disk and ext4 file system in our environment “create database template ”which will cause large file copying and fsyncing if template db is large

Why IO Hang? fsync can be slow Linux Box CGroup backend checkpointer fsync Ext4 journaling buffer for metadata Dirty data in IO queue File 2 metadata File 1 metadata

Why IO Hang? write() can be blocked by fsync() Linux Box CGroup backend checkpointer fsync Ext4 journaling buffer for metadata Dirty data in IO queue File 2 metadata File 1 metadata write

Handling IO Hang Mount ext4 with option data=writeback Do this only when your PG full_page_image has been set to on Change the kernel to call sync_file_range() before calling fsync in checkpointer process

Privilege Management Superuser privileges are kept from the cloud service users for security reason Users keep on asking for more “superuser”privileges for managing roles and data of the whole instance We have to loose privilege check for some cases to make users happy

Our solution for Privilege Management Creating a new role – rds_superuser Allow this role to manage all other non-superuser’s data Allow it to set specific configuration parameters In order not to break the catalog compatibility, we manage to reuse a flag in pg_authid to identify rds_superuser roles

Our solution for Privilege Management Table "pg_catalog.pg_authid" Column | Type | Modifiers rolname | name | not null rolsuper | boolean | not null rolinherit | boolean | not null rolcreaterole | boolean | not null rolcreatedb | boolean | not null rolcatupdate | boolean | not null rolcanlogin | boolean | not null

Future Directions Support data sharding across machines with proxy Master Slave Proxy layer LVS (Linux Virtual Server) User requests thru VIP Master Slave Master Slave Master Slave

Future Directions Support shared-data architecture MASTER SLAVE WRITE READ WRITE READ

Future Directions Providing Greenplum as a cloud service Master Segment Primary Proxy layer LVS (Linux Virtual Server) Master node Primary nodes Segment Mirror Mirror nodes

THANKS! Q&A