MySQL Web Reference Architectures Best Practices for Innovating on the Web
Safe Harbour Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Program Agenda Requirements for Web Innovation Reference Architectures Sizing & Topologies Enabling Technology Best Practices Hardware Sizing Operational Best Practices Resources Program Agenda
$1.7TR GLOBAL CONTRIBUTION 70+ NEW DOMAINS EVERY 60 SECONDS 2.4BN USERS $1.7TR GLOBAL CONTRIBUTION 70+ NEW DOMAINS EVERY 60 SECONDS 1BN+ USERS 20M APPS PER DAY 2x DATA GROWTH EVERY 14 MONTHS 72 HOURS UPLOADED EVERY MINUTE 350m TWEETS PER DAY 875k TPM DURING US PRESIDENTIAL ELECTION $1TR BY 2014 $700BN IN 2011 Growth in internet pentration rates, mobile – bandwidth and devices enables users consume web and mobile services at faster rate than ever before - communicate, socialize and engage, create and share information, entertained, or shop for products and services. In turn, driving huge growth in data volumes – double every months 6.7BN MOBILE SUBS IN 2012 1.2 BILLION iOS & ANDROID APPS DOWNLOADED PER WEEK 85% HANDSETS SHIPPED WITH A BROWSER
Infrastructure Requirements Agile, open & adaptable Simple & repeatable Low latency Scalable and Available Rock-solid security Low TCO To support this growth – core requirements in underlying infrastructure for web and mobile services Fast response times and continuous availability for exceptional user-experience Simplicity in design to achieve fast time to market Seamless scalability to meet rapidly growing demands Highly adaptable and agile to support rapidly evolving service requirements Cloud and DevOps friendly to automate deployment and management Open and extensible to support integration with big data and analytical systems Rock-solid security to protect customer data Low total cost of ownership for fast ROI
MySQL: Powering the Web & the Cloud MySQL is at the heart of the largest and most successful web, mobile and cloud services 9 of the top 10 most traficked properties Most deployed in the cloud Gives MySQL unique insight into how to build and operate databases behind these services: developers, consultants and support Packaged that insight into a series of Web Ref Archs
MySQL Web Reference Architectures Best practices for innovating in web & mobile services Fast Time to Market Open, Agile and Highly Adaptable Reduced Cost, Risk & Complexity On-Premise or in the Cloud What are they – best practices for building HA, highly scalable inf: Fast TTM Open, agile and highly adaptable, ie scale, changing user requirement Reduce cost risk and complexity Whether deployed on-prem or in cloud
Reference Architectures: Design Patterns On-Line Retail Small, Medium & Large: Database load & size User Authentication & Session Management Content Management eCommerce Analytics & Big Data Integration Social Networking Extra Large Recommended hardware Operational Best Practices Set of design patterns – on-line retail and social networking. Decomposed into core services underlying those apps: user authentication, etc
Reference Architecture Sizing Social Network Small Medium Large Extra Large Queries/Second <500 <5,000 10,000+ 25,000+ Transactions/Second <100 <1,000 Concurrent Read Users Write Users <10 1,000+ 2,500+ Database Size Sessions <2 GB <10 GB 20+ GB 40+ GB eCommerce <50 GB 50+ GB 200+ GB Analytics (Multi-Structured Data) <1TB 10+ TB 100+ TB Content Management (Meta-Data) <500 GB 1+ TB 2+ TB Ref Archs are defined by sizing of the workload MySQL needs to support – this chart shows actual sizing used: Size in 2 –dimensions - by the load on the db (number of queries, number of tx, number of users 2nd dimension is db size, split this out by the workload
Reference Architectures
Small: Web Reference Architecture Social Network Small Medium Large Extra Large Queries/Second <500 <5,000 10,000+ 25,000+ Transactions/Second <100 <1,000 Concurrent Read Users Write Users <10 1,000+ 2,500+ Database Size Sessions <2 GB <10 GB 20+ GB 40+ GB eCommerce <50 GB 50+ GB 200+ GB Analytics (Multi-Structured Data) <1TB 10+ TB 100+ TB Content Management (Meta-Data) <500 GB 1+ TB 2+ TB Read thru metrics
Small: Web Reference Architecture Single server supporting: Users & sessions eCommerce Content management MySQL replication Analytics and Backups If traffic volumes increase, scale session management first on a separate server Diagramatic representation of the arch Single MySQL server supporting all workloads Users and sessions Content Management Search Data replicated to slaves for back-up & analysis. Replication is a standard part of the MySQL database, so a replicated set of MySQL Master / Slave servers can be deployed immediately at installation. If service grows, it is recommended that the Session Management application be managed by MySQL and migrated to its own MySQL server. Complex to tune multiple applications on shared hardware running a single storage engine – config params for one workload (ie cont mgmt) often different from ideal parameters for another (ie ecomm) if you expect your service to scale, then start with Medium Ref Arch Limited Scalability Recommended to start with the Medium Architecture
MySQL Default Storage Engine InnoDB Data integrity: Fully ACID-compliant, crash-safe, Foreign Keys High concurrency: Row level locking, MVCC Scale up: 48+ threads, Scale-Out: MySQL replication Flexible: On-Line DDL, Full-text search, SQL & NoSQL APIs, Backup and restore buffer pool Instrumented & Monitored: Performance_Schema, MySQL Enterprise Monitor MySQL Server powered by InnoDB as default storage engine – gives users highly flex, general purpose database engine – list properties
HIPAA, Sarbanes-Oxley, PCI, etc. New! MySQL Enterprise Audit Policy-based Auditing for MySQL Applications MySQL Audit API– enables users to add auditing MySQL Enterprise Audit provides ready-made policy auditing Out-of-the-box logging of connections, logins, query activity across all or specific MySQL servers User defined policies, filtering and log rotation Dynamically enabled, disabled: no server restart XML-based audit stream per Oracle audit specification Data security on the web is critical and increasingly mandated – no matter the size of user-base for your application. PCI compliance guidelines ensure credit card data is secure within e-commerce apps. From a corporate standpoint, Sarbanes-Oxley, HIPAA, etc. guard medical, financial, public sector and other personal data with required logging, archiving and "upon request" access to audit trails that reveal the eyes and hands that have viewed and acted upon the most sensitive of data. In all use cases, requirements for capturing application level user activity are most commonly implemented on the back-end database. To meet these requirements, MySQL provides an open pluggable audit interface that enables all MySQL users to write their own auditing solutions based on application-specific requirements. To help users quickly and seamlessly add auditing compliance to new and existing services MySQL Enterprise Edition includes MySQL Enterprise Audit an easy to use policy-based auditing solution http://www.mysql.com/products/enterprise/audit.html Adds regulatory compliance to MySQL applications. HIPAA, Sarbanes-Oxley, PCI, etc.
MySQL Enterprise Audit Flow 3. Joe’s connection, query logged Server1 1. DBA Enables on Server1 MySQL Enterprise Audit Set Up and Use Case DBA enables auditing on MySQL Server When a user connects, all of their activities logged – as shown in section 3 2. User Joe Connects and Queries Server1
MySQL Enterprise Security External Authentication MySQL Authentication API – enables addition of user authentication MySQL Enterprise Security provides ready-made authentication modules PAM (Pluggable Authentication Modules) Access external authentication methods Standard interface (Unix, LDAP, Kerberos, others) proxied and non-proxied users Windows Access native Windows services Authenticate users already logged into Windows (Windows Active Directory) In addition to auditing, MySQL also supports an open, pluggable authentication interface that enables users to develop plug-ins to authenticate MySQL client connections against external resource such as LDAP, Linux PAM, Windows Active Directory, etc. This enables MySQL to easily integrate with existing security standards and infrastructure that have been established to protect data from your web and mobile applications. As with auditing, MySQL Enterprise Edition provides ready to use external authentication modules for applications that authenticate users via Pluggable Authentication Modules (“PAM”) or native Windows OS services. The modules are developed and fully supported by Oracle. Security and auditing important in all of the ref archs Integrates MySQL with existing security infrastructures and SOPs.
MySQL Enterprise Backup Online, high performance backup for InnoDB (scriptable interface) Full, Incremental, Partial Backups (with compression) Point in Time, Full, Partial Recovery options Enterprise Advisor Monitoring and Alerts on Backup Operations Metadata on status, progress, history Unlimited Database Size Cross-Platform Windows, Linux, Unix mysqlbackup MySQL Database Files MEB Backup Files Implementing proper database backup and disaster recovery plans to protect against accidental loss of data, database corruption, hardware/operating system crashes or any natural disasters is one of the most important responsibilities of the Database Administrator (DBAs). MySQL is distributed with the mysqldump client backup tool. Alternatively MySQL Enterprise Backup provides DBAs with a higher performance, online “hot” backup solution with data compression technology. MySQL Enterprise Backup performs online "Hot", non-blocking backups of MySQL databases. Full backups can be performed on all InnoDB data while MySQL is online, without interrupting queries or updates. In addition, incremental backups are supported so that only data that has changed from a previous backup are backed up. Also partial backups are supported when only certain tables or tablespaces need to be backed up. MySQL Enterprise Backup restores data from a full backup with full backward compatibility. Consistent Point-in-Time Recovery (PITR) enables restoration to a specific point in time. Using MySQL backups and binlog, you can also perform fine-grained roll forward recovery to a specific transaction. A partial restore allows recovery of targeted tables or tablespaces. In addition, you can restore backups to a separate location, or create clones for fast replication setup or administration. MySQL Enterprise Backup supports creating compressed backup files, typically reducing backup size from 70% to over 90% when compared to the size of actual database files, reducing storage and other costs. MySQL Enterprise Backup is provided as part of MySQL Enterprise Edition. Ensures quick, online backup and recovery of your on premise and Cloud based MySQL applications. 17
Medium: Web Reference Architecture Social Network Small Medium Large Extra Large Queries/Second <500 <5,000 10,000+ 25,000+ Transactions/Second <100 <1,000 Concurrent Read Users Write Users <10 1,000+ 2,500+ Database Size Sessions <2 GB <10 GB 20+ GB 40+ GB eCommerce <50 GB 50+ GB 200+ GB Analytics (Multi-Structured Data) <1TB 10+ TB 100+ TB Content Management (Meta-Data) <500 GB 1+ TB 2+ TB Setting design pattern for all additional architectures
Medium: Web Reference Architecture Each workload is deployed onto dedicated server & storage infrastructure, makes it easier to scale, also choose HA tech right for the workload This approach provides much simpler evolution – the tech are architecture are basis for the large and social n/working apps, so much easier to scale without rearchitecting Intro new components – see Caching tier deployed in session & content management components For content mgmt, we using MySQL Repl for scale and HA Session mgmt, we are using MySQL replication with GTID and failover utilities eComme, can configure semi-sync repl or use DRBD for sync repl Look at some of the design choices Green lines represent data flow to analytics system
Best Practices - Overview Medium Web Reference Architecture Server ratio: 10 application servers to each MySQL Server More for PHP applications, fewer for Java Add more slaves as the application tier scales Caching layer deployed in session & content management components Memcached or Redis are common Reads fulfilled from cache, relieving load on the source database servers NoSQL Memcached APIs can enable compression of caching and data layers To look at best practices for medium To size the application server to MySQL server ratio, a good rule-of-thumb is that each MySQL server will support 10 application servers, with more slaves added as application servers scale out. For PHP applications, more application servers are generally required, and for Java, it is less. Caching layer deployed in session and content mgmt workloads – Memcached been widely used with MySQL in some of the largest web properties, Redis also growing in popularity. these allow offload read queries from the source db, fulfilled from cache – enables higher scalability and faster responsiveness
Best Practices – Content Management Medium Web Reference Architecture Each slave can handle around 5,000 concurrent users Each master handles 20 slaves More slaves are often used in larger environments MySQL Replication for high availability & scale-out Meta data of content assets managed by MySQL Distributed File System / CDN / Cloud (i.e. S3) for physical storage of assets on High quality SAN (redundancy for HA) Across commodity nodes XOR Best practices for content mgmt This is a core part of the web service, so scalability is critical here. Using MySQL replication, built in to MySQL, for read scale – typically find 1 master driving 20-30 slaves, each slave supporting around 5k concurrent users These are guides only, and are highly dependent on a number of factors, including: Read and Write volumes Distribution and load balancing of web traffic Caching mechanisms used Distributed file systems are often used to store the physical assets of the content management system, with the metadata for each asset stored within InnoDB tables managed by MySQL. The content assets themselves – images, videos, documents, etc. – are not stored in the database, but rather within the file system.
Best Practices – Sessions & eCommerce Medium Web Reference Architecture Session Management & eCommerce Session data maintained for up to 1 hour in a dedicated partition, rolling partitions used to delete aged data NoSQL Memcached API for InnoDB can be used to scale session management eCommerce HA Semi-Synchronous replication or clustering (DRBD, shared storage, etc.) If web traffic grows, move Session Management to MySQL Cluster Persist session data for real-time personalization of user experience 99.999% availability and in-memory data management can reduce need for DRBD & caching Data is captured in Analytics Database, increasingly replicated to Big Data system for analysis Best practices for session mgmt In typical web properties, session state is typically maintained for 45 minutes to 1 hour in a dedicated database partition, use rolling partitions to quickly delete aged session data by dropping the relevant table. For eComm HA use shared storage or sync repl to protect data integrity – talk more later For data mining and business intelligence applications, session and ecommerce data is captured in an analytics database for off-line report generation – may also be replicated to Big Data system, ie Hadoop. If web traffic grows, recommendation is to scale by moving Session Management to MySQL Cluster Highly write-scalable, 99.999% availability and in-memory data management can reduce need memcached
MySQL HA Solutions In terms of HA – range of certified and supported solutions users can select – depending on app requirements, operational preferences From Replication – native in database Through to solutions bassed around DRBD and Oracle VM in clustering world, through to MySQL Cluster which is a shared nothing distributed implementation of MySQL Talk more about these now
MySQL Replication Duplicates database from a “master” to a “slave” Redundant copies of the data provide foundation for High Availability Scale out by distributing queries across the replication cluster Web / App Servers Writes & Reads Reads Master Core features of MySQL, native to the database, so no 3rd party components or adds to get the benefit MySQL Replication enables users to cost-effectively deliver application scalability and high availability. Many of the world's most trafficked web properties rely on MySQL Replication to rapidly scale beyond the capacity constraints of a single system, enabling them to serve hundreds of millions of users and handle exponential growth. MySQL Replication is easy to provision, allowing for simple Master/Slave topologies through to complex chained clusters achieving massive scalability. Slaves
High Availability & Scalability MySQL Replication Scale-out within and across data centers Self-healing and crash-safe Multiple topologies supported Master/Slave, Cascading, Circular Asynchronous as default with Semi- Synchronous as an option Replication utilities for fast provisioning Monitoring & best practices with MySQL Enterprise Monitor MySQL Replication is implemented by configuring one instance as a master, with one or more additional instances configured as slaves. The master will log the changes to the database, which are then sent and applied to the slave(s). Gives scalability and foundation for HA. Replication is self-healing, with introduction of GTIDs, very easy to track progress of repl through a cluster, and failover to most up todate slave By default, MySQL replication is asynchronous. The master does not wait for the slave to receive the update, and so is able to continue processing further write operations without being blocked as it waits for acknowledgement from the slave. This gives high performance, but if the master crashes before the event is replicated, that data will be lost. Alternatively, MySQL replication can be configured to be semi-synchronous. In this mode, a commit is returned to the master (and then the client) only when a slave has received the update. As a result, the risk of data loss from failure of the master failure is reduced. Using MySQL Replication, organizations are able to scale out their applications on commodity hardware by distributing their data across MySQL server farms, within and across data centers and regions. This gives users the flexibility to elastically add or remove instances to dynamically adjust capacity based on demand. In addition, long running jobs such as analytics workloads can be offloaded to slaves, allowing masters to be dedicated to serving real-time transactional workloads. Monitor and manage with GUI based MEM Relay Log
MySQL 5.6: Evolving Replication Huge advances made in 5.6 – focused on perf, mt slaves, apply updates across schemas HA with GTIDs and crash-safeness Data integrity with checksums TDR and utils to enhance agility Collectively, allow MySQL to scale out and ensure service continuity with much more perforance and less admin overhead
Shared-Disk Clustering for HA Clients VIP Stricter data durability, integrity constraints Shared storage persists commits across instances Clustering software manages data access Auto-failover of applications and database Deploy with MySQL replication for slave scale-out MySQL Certified & Supported Solutions Oracle VM Template Windows Failover Clustering Oracle Solaris Cluster Looking beyond repl for HA When handling use-cases such as eCommerce, some users prefer the stricter HA and data durability constraints afforded by Operating System (OS)-level clustering, which deploy systems in more tightly coupled failover clusters using heartbeating mechanisms and cluster resource managers. Multiple OS-Clustering solutions are certified and supported for MySQL, comprising both shared (i.e. SAN-based) and distributed (i.e. local storage). When using shared storage, writing data to storage which is independent of the MySQL instance, therefore if that instance fails, don’t lose commited data – so stricter gurantees that repl We can use shared disk clustering to protect masters, and repl to scale out slaves – so not trading away scalability Number of certified solutions: Oracle VM Template for MySQL Enterprise Edition: Packaged as a single downloadable image, the Oracle VM Template provides a pre-installed and pre-configured virtualized MySQL software image running on Oracle Linux and Oracle VM. Users can rapidly provision multiple, virtualized MySQL instances to an Oracle VM Server pool, which integrates HA capabilities to provide resource monitoring, failover, recovery and load balancing of MySQL. Oracle Solaris Cluster and Windows Failover Clustering with agents enabling failover and recovery of MySQL.
Shared Nothing Clustering for HA DRBD + Clustering Based on distributed storage, not a SAN Synchronous replication eliminates risk of data loss Open source, mature & proven Certified and fully supported by Oracle DRBD integrated into Oracle Linux Unbreakable Enterprise Kernel R2 Pacemaker and Corosync for clustering / failover Updates to stack via ULN channel Shared storage ensures the consistency of data across different MySQL instances. DRBD (Distributed Replicated Block Device) provides an alternative approach using storage which is local to each instance. Using synchronous replication, all updates are propagated to a standby node before commits are returned to the application. Like shared storage, this enables “lossless failover”, ensuring no committed transactions are lost in the event of an outage to the master. DRBD is complemented by Corosync and Pacemaker to provide clustering and resource management. All components are open source, enabling users to build high availability clusters from low cost, commodity infrastructure. In addition, the entire stack is certified and supported by Oracle.
MySQL Enterprise Scalability MySQL Thread Pool MySQL default thread-handling – excellent performance, can limit scalability as user connections grow Connections assigned to 1 thread for the life of the connection, same thread used for all statements Thread Pool API, enables users to build their own thread pool MySQL Thread Pool improves sustained performance/scale as user connections grow Thread Pool contains configurable number of thread groups, each manages up to 4096 re-usable threads Threads are prioritized, statements queued to limit concurrent executions, load on server, improve scalability as connections grow User connections are mapped to execution threads on a one-to-one basis with each connection/thread assignment remaining intact until the client terminates the connection. While this model serves and scales most web deployment use cases very well it does have the potential to limit scalability as connection and query loads rapidly increase. For the most highly trafficked applications when concurrent connections grow from hundreds to thousands and associated query executions grow proportionally In these use cases the MySQL Thread Pool addresses the limitations to scalability by: Managing/controlling query execution until the MySQL server has the resources to execute it. Splitting threads into managed Thread Groups. Inbound connections are assigned to a group via a round-robin algorithm and the number of concurrent connections/threads per group is limited based on queue prioritization and nature of queries awaiting execution. Transactional queries are given a higher priority in queue than non-transactional, but queue prioritization can be overridden at the user level as needed.
MySQL Enterprise Edition MySQL Community Server With Thread Pool Enabled MySQL Enterprise Edition With Thread Pool MySQL Community Server Without Thread Pool The result is sustained performance and scalability as concurrent user connections and workloads grow as shown here: MySQL 5.6.11 Oracle Linux 6.3, Unbreakable Kernel 2.6.32 4 sockets, 24 cores, 48 Threads Intel(R) Xeon® E7540 2GHz CPUs 512GB DDR RAM 60x Better Scalability with Thread Pool
Large: Web Reference Architecture Social Network Small Medium Large Extra Large Queries/Second <500 <5,000 10,000+ 25,000+ Transactions/Second <100 <1,000 Concurrent Read Users Write Users <10 1,000+ 2,500+ Database Size Sessions <2 GB <10 GB 20+ GB 40+ GB eCommerce <50 GB 50+ GB 200+ GB Analytics (Multi-Structured Data) <1TB 10+ TB 100+ TB Content Management (Meta-Data) <500 GB 1+ TB 2+ TB Move onto large web ref arch
Large: Web Reference Architecture The Large Reference Architecture builds upon the best practices defined for the Medium Reference Architecture. Master / Slave replication is used to scale-out the Content Management and Analytics components of the web property. The content management architecture would be able to scale to 100k+ concurrent users with around 20 slaves – again dependent on traffic patterns of the workload. Changes here are that we are using MySQL Cluster for the Session Mgmt and we’ve integrated Hadoop for processing of both structured data from MySQL and unstructured data from other sources to give more insight into our users, with results of analysis replicated back to MySQL to deliver targeted offers back at them. Also use MySQL replication to mirror this infastructure to another region or DC, for DR and data locality
MySQL Cluster: Overview Auto-Sharding, Multi-Master ACID Compliant, OLTP + Real-Time Analytics HIGH SCALE, READS + WRITES Shared nothing, no Single Point of Failure Self Healing + On-Line Operations 99.999% AVAILABILITY In-Memory Optimization + Disk-Data Predictable Low-Latency, Bounded Access Time REAL-TIME Key/Value + Complex, Relational Queries SQL + Memcached + JavaScript + Java + JPA + HTTP/REST & C++ SQL + NoSQL In terms of ref arch To handle the higher performance and availability demands of the “Large” web architecture, the session management and ecommerce databases are deployed on MySQL Cluster. With 2 x data nodes, it is possible to support 6,000 sessions (page hits) a second, with each page hit generating 8 – 12 database operations. By using the scale-out capabilities of MySQL Cluster, it is possible to combine the session management and ecommerce databases onto one larger cluster. Scale Writes as well as reads – does this via auto-sharding with multi-master arch. Can shard other databases, needs to be done at app layer – complex and intrusive. Cluster shards (spreads the dbase out across nodes) automatically at databae layer, so devs don’t have to worry about it 99.999 Avail – failures + online operations – scale online, modify schema online. Low latency for Real time responsiveness Flex of working with more than just SQL – makes it simple for developers to manage the db in their preferred lang Low TCO – commodity h/w with affordable subs and licensing Open Source + Commercial Editions Commodity hardware + Management, Monitoring Tools LOW TCO
MySQL Cluster Architecture Clients Application Layer Arch behind cluster. Apears as 1 logical database to app and developers, connect to any node, query is routed Under covers, 3 types of nodes – intelligence in data nodes. Tables auto-sharded across commodity h/w. Each data node manages a shard or fragment of the db. Creates a replica – all updates sync replicated between replicas. If failure, failover less than 1 sec. Data nodes handle all cluster membership, failover and recovery Connect to data nodes via Application layer, via MySQL Server, Could use one of NoSQL APIs, so these are cluster libraries embedded into your app All multi master, so update from any node in app layer instantly available to any other node Management Data Layer MySQL Cluster Data Nodes
MySQL & Hadoop Use-Case in Web Architecture Users Browsing Profile, Purchase History Recommendations Recommendations Web Logs: Pages Viewed Comments Posted Social media updates Preferences Brands “Liked” Look at why so many online properties integrate MySQL with Hadoop Users browsing online retail property – MySQL storing profile and purcahse history. This data loaded into Hadoop, where combined with web logs + social media updates, CDR – processed to find product preferences, interests – loaded back to MySQL, so when the user visits again, they get a whole series of recommendations based on their preferences Look at some of the technologies that support this integration Telephony Stream
MySQL in the Big Data Lifecycle ACQUIRE DECIDE BI Solutions Applier ANALYZE Data source for many Hadoop deployments Use tools such as Apache Sqoop to batch replicate data from MySQL into Hadoop. Also have Hadoop Applier for real time streaming of new data from MySQL to Hadoop Oracle BDA, includes range of tools such as Hadoop + R Use Sqoop to load data back into MySQL ORGANIZE
New: MySQL Applier for Hadoop Real-time streaming of events from MySQL to Hadoop Supports move towards “Speed of Thought” analytics Connects to the binary log, writes events to HDFS via libhdfs library Each database table mapped to a Hive data warehouse directory Enables eco-system of Hadoop tools to integrate with MySQL data See dev.mysql.com for articles Available for download now labs.mysql.com The MySQL Hadoop Applier is designed to enable real-time replication of events between MySQL and Hadoop. Replication via the Hadoop Applier is implemented by connecting to the MySQL master reading events from the binary log events as soon as they are commited on the MySQL master, and writing them into a file in HDFS. “Events” describe database changes such as table creation operations or changes to table data. The Hadoop Applier uses an API provided by libhdfs, a C library to manipulate files in HDFS. The library comes precompiled with Hadoop distributions. It connects to the MySQL master to read the binary log and then: Fetches the row insert events occurring on the master Decodes these events, extracts data inserted into each field of the row, and uses content handlers to get it in the format required Appends it to a text file in HDFS. http://dev.mysql.com/doc/refman/5.6/en/binary-log.html
NoSQL Interfaces for MySQL 10x faster read/write access to InnoDB or MySQL Cluster Bypasses SQL parsing Fast look-ups and data ingestion Maintain ACID guarantees Maintain SQL for complex queries Combine with On-Line DDL for schema evolution 38
Extra Large: Social Network Reference Architecture Small Medium Large Extra Large Queries/Second <500 <5,000 10,000+ 25,000+ Transactions/Second <100 <1,000 Concurrent Read Users Write Users <10 1,000+ 2,500+ Database Size Sessions <2 GB <10 GB 20+ GB 40+ GB eCommerce <50 GB 50+ GB 200+ GB Analytics (Multi-Structured Data) <1TB 10+ TB 100+ TB Content Management (Meta-Data) <500 GB 1+ TB 2+ TB
Extra Large: Social Network The Social Networking reference architecture re-uses many concepts defined in the “Medium” and “Large” reference architectures, including dedicated infrastructure for each of the workloads that make up the service. MySQL Cluster is used for authentication of users and to provide the shard catalog - used by applications to direct queries. User data is stored in database shards with high availability for each shard delivered by MySQL replication. Unlike the Web reference architectures, the Social Networking architecture manages session state on the shards themselves.
Best Practices Social Networking Reference Architecture Introduces Sharding Implemented at the application layer for scaling very high volume of writes Data divided into smaller sets, distributed across low-cost hardware Shards based on Hash of a single column – ie. User ID “Functional” sharding also an option Sharding only needed in a small percentage of workloads MySQL scale-up, coupled with replication will get users a long way! Most Web and mobile workloads are still read-intensive, ie record is read before updates applied MySQL Cluster supports Auto-Sharding + Cross-Shard JOINs Sharding is commonly used by large web properties to increase the scalability of writes across their database. Sharding – also called application partitioning - involves the application dividing the database into smaller data sets and distributing them across multiple servers. This approach enables very cost effective scaling as low-cost commodity servers can be easily deployed when capacity requirements grow. In addition, query processing affects only a smaller sub-set of the data, thereby boosting performance. It must be remembered that reads will still predominate in these environments as typically each record will be read before an update is applied. It is also important to emphasize that millions of users can still be serviced without requiring any database sharding at all, especially with the latest MySQL developments in scaling on multi-core commodity hardware and replication. Hashing against a single column (key) often proves the most effective means to shard data. In the Social Networking Reference Architecture we have sharded the user database by User ID, so Shard One handles the first third of users, the second shard handles the second third of users, and the remainder are stored in the third shard. An alternative approach used in some social networks is "functional sharding". In this scheme, each shard manages a different service or data type, with each shard storing the entire user base. The approaches described above initially provide the most logical sharding mechanism and make it simple to scale-out as the service grows. However, it is likely that some shards will contain more active users than other shards; meaning re-balancing of shards may be required in the future. Both the application and the database need to be sufficiently flexible to accommodate change in the architecture. Implementing a sharding catalogue is a common way to achieve this. The central catalogue keeps track of data distribution and the application bases its sharding on this catalogue. This way it is easy to rebalance even individual data objects.
Sharding Implementation Master Clients Slave Reads Writes Partitioning Logic 1 2 3 4 5 Shards Sharding is complex Recommend the Architecture and Design Consulting Engagement When using sharding, applications need to become “shard-aware” so that writes can be directed to the correct shard, and JOIN operations are minimized. If best practices are applied, then sharding becomes an effective way to scale relational databases supporting web-based write-intensive workloads. Slaves
Recommended Hardware – lots of innovation especially with multi-core, multi-thread CPU designs - Lots of question about the best h/w to run MySQL on
The Perfect MySQL Server Up to 64 x86-64 bit CPU threads (MySQL 5.6 and above). Recommended RAM at least equal to or larger than “hottest” (most regularly accessed) data set. Linux, Oracle Solaris or Microsoft Windows operating systems. Minimum of 4 x SSDs or HDDs. 8 – 16 drives will increase performance for I/O intensive applications. Hardware RAID with battery-backed cache. RAID 10 recommended. RAID 5 is suitable if the workload is read- intensive. 2 x Network Interface Cards and 2 x Power Supply Units for redundancy. With latest 5.5 release, big enhancements in both MySQL Server and InnoDB to scale on latest multi-core h/w. Seen linear scalability on up to 30 cores. On 5.1, up to 8 cores Recommend 64 bit x86 h/w, as address more memory Memory sizing 3-10x more memory that data Linux or Solaris best, Windows and Unix also fine. RAID 10 for most, RAID 5 OK if very read intensive Hardware RAID battery backed up cache critical! More disks are always better! 4+ recommended, 8-16 can increase IO performance if needed At least 2 x NICs for redundancy Slaves should be as powerful as the Master Oracle Sun X4170 for example
MySQL Cluster Hardware Selection - SQL Layer 4 - 24 x86-64 bit CPU threads Minimum 4GB of RAM Memory not as critical at this layer Requirements influenced by connections and buffers. 2 x Network Interface Cards & 2 x PSUs Linux, Solaris or Windows operating systems. Due to the distributed nature of MySQL Cluster, the recommended server configurations are slightly different than a MySQL server running the MyISAM or InnoDB storage engines. 45
MySQL Cluster Hardware Selection - Data Layer Up to 64 x86-64 bit CPU threads Use high frequency: enables faster processing of messages Calculating RAM per Data Node (in-memory database) Database Size * # Replicas * 1.25 / # data node 50GB database * 2 replicas * 1.25 / 2 data nodes = 64GB of RAM Non-indexed columns can be stored on disk, reduces RAM requirements 2 x NICs and 2 x PSUs Deep-dive into network best-practices in Ref Archs Guide 46
MySQL Cluster Hardware Selection: Disk Subsystem for Checkpointing Entry-Level Mid-Range High-End 1 x SATA 7200 RPM For read-mostly No redundancy (but other data node is the mirror) 1 x SAS 10K RPM or SSD Heavy duty (many MB/s) No redundancy (but other data node is the mirror) 4 x SAS 10-15K RPM or SSDs Disk redundancy (RAID1+0), hot swap REDO, LCP, BACKUP – written sequentually in small chunks (256KB) If possible, use Odirect = 1 LCP REDOLOG LCP / REDOLOG 47
MySQL Cluster Hardware Selection: Disk Data Storage Recommended Minimum High-End Recommendation 2 x SAS 10K RPM or 2 x SSD Use High-End Recommendation for heavy read / write workloads (1000's of 10KB records per sec) of data (i.e. Content Delivery platforms) Having TABLESPACE on separate disk is good for read performance Enable WRITE_CACHE on devices TABLESPACE LCP REDOLOG UNDOLOG UNDOLOG LCP (REDO LOG / UNDO LOG) TABLESPACE 1 TABLESPACE 2 4 x SAS 10-15K RPM or SSD (REDO LOG) 48
Operational Best Practices These aren’t mandatory in any way, but can make the environment more efficient, more scalable and increase HA
MySQL Enterprise Edition Highest Levels of Security, Performance and Availability Oracle Premier Lifetime Support MySQL Enterprise Security Oracle Product Certifications/Integrations MySQL Enterprise Audit MySQL Enterprise Monitor/Query Analyzer MySQL Enterprise Scalability MySQL Enterprise Backup For business critical applications that are part of web and mobile services, the support, advanced features and tooling that are included in MySQL Enterprise Edition and MySQL Cluster CGE will help you achieve the highest levels of performance, scalability and data security. MySQL Enterprise High Availability MySQL Workbench
MySQL Enterprise Monitor Web-based, global view of MySQL/Cluster applications (on- premise and Cloud deployments) Automated, rules-based monitoring and alerts (SMTP, SNMP enabled) Query capture, monitoring, analysis and tuning, correlated with Monitor graphs Real-time Replication Monitor with auto-discovery of master-slave topologies Integrated with Oracle Support MySQL Enterprise Monitor, included with MySQL Enterprise Edition, is a distributed web application that continually monitors your MySQL servers and proactively sends SNMP/SMTP alerts on potential problems and tuning opportunities, before they become costly outages Enterprise Dashboard for Monitoring all MySQL Servers Using the Enterprise Dashboard, you can monitor MySQL and OS specific metrics for single or groups of servers, and can stay on top of all their replication topologies. The Enterprise Dashboard is designed so you can easily understand the complete security, availability, and performance landscape of all MySQL servers in one place, all from a thin, browser-based console. Availability and Performance Diagnosis The Enterprise Dashboard includes a color-coded Heat Chart that provides an at-a-glance view into the availability and performance of all of the MySQL servers across the enterprise. A Virtual MySQL DBA Assistant!
Automated Advisors and Alerts Administration Monitors and Advises on Optimal Start up and Run time Configuration MySQL Cluster Monitors and Advises on status/ performance of MySQL Cluster Data Nodes Performance Monitors and Advises on Optimal Performance Variable Settings Built by DBA to Enforce Organization specific best practices Custom Replication Monitors and Advises on Master/Slave Latency Security Monitors and Advises on Unplanned Security Changes/Loopholes Upgrade Monitors and Advises on Bugs/Upgrades that affect current installation Schema Monitors and Advises on Unplanned Schema Change Memory Usage Monitors and advises on optimal memory/cache settings Backup Monitors and Advises on Backup/Recovery processes The MySQL Enterprise Monitor differs from traditional third-party database monitors in that it supplies a complete set of MySQL Advisors that are designed to automatically examine a MySQL server’s configuration, security, and performance levels; identify problems and tuning opportunities; and provide the you with specific corrective actions. 10 Advisors, 160+ Rules, 60+ MySQL, OS specific Graphs Saves time writing, deploying, versioning, maintaining custom scripts.
MySQL Query Analyzer Centralized monitoring of queries without Slow Query Log, SHOW PROCESSLIST; Aggregated view of query execution counts, time, and rows Visual “grab and go” correlation with Monitor graphs Enabled via Connectors (PHP, JDBC, .Net) or MySQL Proxy MySQL Query Analyzer: Included in the MySQL Enterprise Monitor, the Query Analyzer helps developers and DBAs improve complex query performance by accurately pinpointing SQL code that can be optimized. Queries are presented in an aggregated view across all MySQL servers so developers can filter for specific query problems and identify the code that consumes the most resources. Queries can be analyzed during active development and continuously monitored and tuned when in production. Saves you time parsing atomic executions from logs. Finds problems you cannot find yourself.
MySQL Cluster Manager Enhancing DevOps Agility, Reducing Downtime Automated Management Self-Healing HA Operations Start / Stop node or whole cluster On-Line Scaling On-Line Reconfiguration On-Line Upgrades On-Line Backup & Restore Node monitoring Auto-recovery extended to SQL + mgmt nodes Cluster-wide configuration consistency Persistent configurations HA Agents To further simplify administration, MySQL Cluster Manager 1.2 automates the integrated on-line back-up and restore functionality of MySQL Cluster. DBAs no longer need to log-in to each individual data node to view and / or restore a backup. Using MySQL Cluster Manager 1.2, restore operations are reduced to a single, simple command.
Reference Architecture - Summary Designed as a springboard to innovating on the web Based on insight from most successful web properties Best practices & repeatable technologies for Scale & HA To support this growth – core requirements in underlying infrastructure for web and mobile services Fast response times and continuous availability for exceptional user-experience Simplicity in design to achieve fast time to market Seamless scalability to meet rapidly growing demands Highly adaptable and agile to support rapidly evolving service requirements Cloud and DevOps friendly to automate deployment and management Open and extensible to support integration with big data and analytical systems Rock-solid security to protect customer data Low total cost of ownership for fast ROI
Next Steps MySQL Web Reference Architectures Whitepaper http://www.mysql.com/why-mysql/white-papers/mysql_wp_high-availability_webrefarchs.php MySQL 5.6: Developer and DBA Guide http://www.mysql.com/why-mysql/white-papers/whats-new-mysql-5-6/ Engage MySQL Consulting
MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure 57