Mark Holliman Wide Field Astronomy Unit Institute for Astronomy University of Edinburgh.

Slides:



Advertisements
Similar presentations
Tuning the Dennis Shasha and Philippe Bonnet, 2013.
Advertisements

What's new?. ETS4 for Experts - New ETS4 Functions - improved Workflows - improvements in relation to ETS3.
CS224 Spring 2011 Computer Organization CS224 Chapter 6A: Disk Systems With thanks to M.J. Irwin, D. Patterson, and J. Hennessy for some lecture slide.
Copyright © 2006 Quest Software SQL 2005 Disk I/O Performance By Bryan Oliver SQL Server Domain Expert.
RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
Storage.
Skyward Server Design Mike Bianco.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Database Performance Tuning and Query Optimization
ITT NETWORK SOLUTIONS. Quick Network Facts Constant 100 Mbps operation for users Infrastructure ready for 1000 Mbps operation to the user Cisco routing.
Chapter 10: Designing Databases
ArcGIS Server Architecture at the DNR GIS/LIS Conference, October 2013.
Big Data Working with Terabytes in SQL Server Andrew Novick
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
Tables Lesson 6. Skills Matrix Tables Tables store data. Tables are relational –They store data organized as row and columns. –Data can be retrieved.
Shimin Chen Big Data Reading Group Presented and modified by Randall Parabicoli.
Shimin Chen Big Data Reading Group.  Energy efficiency of: ◦ Single-machine instance of DBMS ◦ Standard server-grade hardware components ◦ A wide spectrum.
Selecting New Hardware What’s Most Important. Quality, Quality, Quality! Assume you are purchasing for 3 to 6 years Assume you are purchasing for 3 to.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Fast Track, Microsoft SQL Server 2008 Parallel Data Warehouse and Traditional Data Warehouse Design BI Best Practices and Tuning for Scaling SQL Server.
THE CPU Cpu brands AMD cpu Intel cpu By Nathan Ferguson.
Russ Houberg Senior Technical Architect, MCM KnowledgeLake, Inc.
SQL Server 2008 & Solid State Drives Jon Reade SQL Server Consultant SQL Server 2008 MCITP, MCTS Co-founder SQLServerClub.com, SSC
MySQL Data Warehousing Survival Guide Marius Moscovici Steffan Mejia
Buying a Laptop. 3 Main Components The 3 main components to consider when buying a laptop or computer are Processor – The Bigger the Ghz the faster the.
© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.
1 Storage Refinement. Outline Disk failures To attack Intermittent failures To attack Media Decay and Write failure –Checksum To attack Disk crash –RAID.
Local Area Networks: Software © Abdou Illia, Spring 2007 School of Business Eastern Illinois University (Week 8, Thursday 3/1/2007)
Secondary Storage Chapter 7.
How to buy a PC Brad Leach David Howarth James Sawruk Andrew U.
Introduction. Outline What is database tuning What is changing The trends that impact database systems and their applications What is NOT changing The.
Planning and Designing Server Virtualisation.
Copyright © 2010, Scryer Analytics, LLC. All rights reserved. Optimizing SAS System Performance − A Platform Perspective Patrick McDonald Scryer Analytics,
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Moohanad Hassan Maedeh Pishvaei. Introduction Open Source Apache foundation project Relational DB: SQL Server CouchDB : JSON document-oriented DB (NoSQL)
1 Selecting LAN server (Week 3, Monday 9/8/2003) © Abdou Illia, Fall 2003.
Block1 Wrapping Your Nugget Around Distributed Processing.
(C) 2008 Clusterpoint(C) 2008 ClusterPoint Ltd. Empowering You to Manage and Drive Down Database Costs April 17, 2009 Gints Ernestsons, CEO © 2009 Clusterpoint.
DBMS Implementation Chapter 6.4 V3.0 Napier University Dr Gordon Russell.
Hardware. Make sure you have paper and pen to hand as you will need to take notes and write down answers and thoughts that you can refer to later on.
+ CS 325: CS Hardware and Software Organization and Architecture Memory Organization.
Price Performance Metrics CS3353. CPU Price Performance Ratio Given – Average of 6 clock cycles per instruction – Clock rating for the cpu – Number of.
11.1Database System Concepts. 11.2Database System Concepts Now Something Different 1st part of the course: Application Oriented 2nd part of the course:
Astronomy, Petabytes, and MySQL MySQL Conference Santa Clara, CA April 16, 2008 Kian-Tat Lim Stanford Linear Accelerator Center.
Solution to help customers and partners accelerate their data.
Hard Drives aka Hard Disk Drives Internal, External, and New Solid State Drives.
11 Intel Modular Server Understanding the Storage MFSYS25 MFSYS35.
CIS 250 Advanced Computer Applications Database Management Systems.
ClinicalSoftwareSolutions Patient focused.Business minded. Slide 1 Opus Server Architecture Fritz Feltner Sept 7, 2007 Director, IT and Systems Integration.
Lec 5 part2 Disk Storage, Basic File Structures, and Hashing.
LSST Cluster Chris Cribbs (NCSA). LSST Cluster Power edge 1855 / 1955 Power Edge 1855 (*LSST1 – LSST 4) –Duel Core Xeon 3.6GHz (*LSST1 2XDuel Core Xeon)
Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.
SQL Basics Review Reviewing what we’ve learned so far…….
1 Components of the Virtual Memory System  Arrows indicate what happens on a lw virtual address data physical address TLB page table memory cache disk.
RAID Technology By: Adarsha A,S 1BY08A03. Overview What is RAID Technology? What is RAID Technology? History of RAID History of RAID Techniques/Methods.
Storage HDD, SSD and RAID.
New ways in Big Data Management for NWP
New ways in Big Data Management for NWP
Flash Storage 101 Revolutionizing Databases
CS 554: Advanced Database System Notes 02: Hardware
Hardware September 19, 2017.
Oracle SQL*Loader
File Processing : Storage Media
Oracle Storage Performance Studies
File Processing : Storage Media
In Memory OLTP Not Just for OLTP.
IBM Power Systems.
Efficient Migration of Large-memory VMs Using Private Virtual Memory
Presentation transcript:

Mark Holliman Wide Field Astronomy Unit Institute for Astronomy University of Edinburgh

Summary Some Astronomical survey databases are becoming so large that some curation tasks are approaching unreasonable timeframes. This is evident in such surveys as the Galactic Plane Survey (GPS) for UKIDSS, and the VISTA Variables in the Via Lactea (VVV). The VVV alone contains >10 of TB of data and includes a detection table with >10 10 rows. While RDBMSs are capable of handling this much data, the execution times for curation activities can stretch into weeks at a time. To address this issue, there are two main approaches: 1. Switch from a RDBMS to a Key-Pair based model or Column oriented DB (i.e. Hadoop, MonetDB, etc) 2. Throw Hardware at it (i.e. SSDs, SAN, LUSTRE, GPFS, etc) As you can probably guess, this talk is about #2

The Big Database Problem In an RDBMS the main bottleneck on almost all operations is disk I/O. While parallel processing and increased RAM can address some performance issues, ultimate performance figures are dictated by how fast data can be moved to/from the storage medium. For large databases (>1TB) the most cost efficient storage medium are RAID5/6/10 arrays of spinning disks (HDD). These disks range in size from 1TB to 4TB at present. These arrays provide redundancy in case of disk failure, while at the same time speed up I/O by spreading disk operations across multiple devices simultaneously The rotational speed on the hard drives is the main factor in determining an HDDs performance. The faster the disk spins, the faster read/write operations can occur. Most enterprise disks run at 7200RPM, though 10000RPM and 15000RPM disks are available (at a serious jump in price) SSDs are just beginning to approach the size/price ratio necessary for large DBs.

Test Server Details Intel Xeon 2.8GHz, 24 cores 16GB RAM Disk Subsystem (6Gbps SAS RAID) HDD1,2: 7 x 1TB HDD RAID5 arrays SSD1: 1 x 512GB Crucial SSD SSD2: 6 x 512GB Crucial SSD RAID5 Array OS: Windows Server 2008 R2 DBMS: SQL Server 2008 R2

Full DB Tests The first tests involved placing an entire database on each particular disk subsystem and running a set of queries to measure performance. The queries were constructed to represent 3 specific use cases in order to identify exactly where the SSDs provide performance gains. Database: 2MASS Query 1, All Indexed Columns: Select * From twomass_psc Where ((ra> 240) AND (ra 120) AND (ra -47) AND (dec 120) AND (dec<123)) Query 2, All Nonindexed Columns: Select * From twomass_psc Where (j_m>16) AND (h_m>15.6 AND h_m 15.1 AND k_m<15.3) Query 3, Mixed Index Columns: Select * From twomass_psc Where (j_m>16) AND (h_m>15.6 AND h_m 15.1 AND k_m 230) AND (ra 110) AND (ra -47) AND (dec 120) AND (dec 1) OR (k_psfchi<0.4)) AND (dist_opt<.3) AND (k_msig_stdap<1)

Full DB Test Results

SSD Cost Issue While putting an entire Database on SSDs is certainly preferable, it is unfortunately cost prohibitive for very large archive databases (where 10s of TB are required). At recent pricing: 1TB on SSD = ~£500 vs 1TB on HDD = ~£87 So with this in mind we ran a second battery of tests to assess whether sufficient performance gains can be achieved with a hybrid system, whereby some database files are placed on SSDs while the rest remain on standard HDD arrays

SSD File Location Tests Database: UKIDSS DR5 4 different representative queries were run ( me if you want to see the actual SQL) #1: produce light curves #2: produce variable light curves #3: JOIN Source and Detection Tables (single index) #4: JOIN Source and Detection Tables (multi indices) 4 different file/disk configurations All files on HDD Indices on SSD (RAID) Source Tables on SSD (RAID) Detection Tables on SSD (RAID)

SSD File Location test results

Conclusions The biggest performance gains from SSDs are seen on queries involving non-indexed columns RAID arrays of SSDs do provide performance gains over individual disks, though these gains are not universal or consistently scaled Placing Source or Detection tables on SSDs both result in large performance gains Performance gains are heavily dependant on the query being run, and how much of the query uses the tables on the SSDs Due to SSD size constraints, Source tables are more likely to fit as they tend to be smaller Curation activities could be optimized by moving database files from HDD to SSD for intensive operations, then moved back to HDD once complete. This only works if the file movement speeds are high enough to overcome the extra time necessary for running operations on the HDD stored files Index file operations are actually slower on SSD than on HDD This is a serious surprise Further testing is necessary to determine if this is due to the inherent sequential access method of HDDs, or if this is due to some quirk of the MS SQL index file format