1 PennySort Award Ceremony Beijing China 23 October 2006.

Slides:



Advertisements
Similar presentations
Slide no: 1 ST3520 L2 MT Hardware - 1: Components of a Computer.
Advertisements

LIBRA: Lightweight Data Skew Mitigation in MapReduce
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
Hardware vs. Software Great Example: Data Compression
1 GPUTeraSort: High Performance Graphics Coprocessor Sorting for Large Database Management Naga K. Govindaraju Jim Gray Ritesh Kumar Dinesh Manocha Presented.
Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.
Novell Server Linux vs. windows server 2008 By: Gabe Miller.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Benchmarks Title: A Measure of Transaction Processing Power Authors: Anon Et. Al. Datamation, 1985.
Distributed Computing Economics Jim Gray Microsoft Research Presentation To Microsoft Venture Capital Summit 28.
1 Chapter 4 The Central Processing Unit and Memory.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Dean A. Klein VP Market Development Micron Technology, Inc.
Computer Fundamentals. A Computer Is a System Input Processing Output Data is entered into the computer Becomes useful information The data is processed.
Memory. Random Access Memory Defined What is memory? operating system and other system software that control the usage of the computer equipment application.
The Cost of Storage about 1K$/TB 12/1/1999 9/1/2000 9/1/2001 4/1/2002.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Report : Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.
1 CHAPTER 2 COMPUTER HARDWARE. 2 The Significance of Hardware  Pace of hardware development is extremely fast. Keeping up requires a basic understanding.
Technology Expectations in an Aeros Environment October 15, 2014.
… when you will open a computer We hope you will not look like …
Enterprise Computing With Aspects of Computer Architecture Jordan Harstad Technology Support Analyst Arizona State University.
Company LOGO High Performance Processors Miguel J. González Blanco Miguel A. Padilla Puig Felix Rivera Rivas.
Recap Technology trends Cost/performance Measuring and Reporting Performance What does it mean to say “computer X is faster than computer Y”? E.g. Machine.
3. April 2006Bernd Panzer-Steindel, CERN/IT1 HEPIX 2006 CPU technology session some ‘random walk’
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Hardware Trends. Contents Memory Hard Disks Processors Network Accessories Future.
Lecture 1: Introduction. Course Outline The aim of this course: Introduction to the methods and techniques of performance analysis of computer systems.
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
Inside your computer. Hardware Review Motherboard Processor / CPU Bus Bios chip Memory Hard drive Video Card Sound Card Monitor/printer Ports.
Inside your computer. Hardware Motherboard Processor / CPU Bus Bios chip Memory Hard drive Video Card Sound Card Monitor/printer Ports.
Hardware. Make sure you have paper and pen to hand as you will need to take notes and write down answers and thoughts that you can refer to later on.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL & MICROSOFT RESEARCH GPUTeraSort: High Performance Graphics Co-processor Sorting for Large Data Management.
Parallel Database Systems Instructor: Dr. Yingshu Li Student: Chunyu Ai.
© CCI Learning Solutions Inc. 1 Lesson 2: Elements of a Personal Computer System unit Microprocessor chip How memory is measured What ROM is What RAM is.
Computer Guts and Operating Systems CSCI 101 Week Two.
1 Greetings! From A File System User Jim Gray Microsoft Research 4th USENIX Conference on File and Storage Technologies (FAST 2005) 12/14/2005, San Francisco,
1 Tandem Daytona TeraByte Sort: Tsort 1 TB in 47.5 Minutes Daivd Cossock, Sam Fineberg, Pankaj Mehra, John Peck Trophy presentation by Jim Gray.
FIRST COURSE Essential Computer Concepts. XP New Perspectives on Microsoft Office 2007: Windows XP Edition2 What Is a Computer? A computer is an electronic.
EVLA Data Processing PDR Scale of processing needs Tim Cornwell, NRAO.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Installation of Storage Foundation for Windows High Availability 5.1 SP2 1 Daniel Schnack Principle Technical Support Engineer.
The Sort Benchmark AlgorithmsSolid State Disks External Memory Multiway Mergesort  Phase 1: Run Formation  Phase 2: Merge Runs  Careful parameter selection.
BMTS 242: Computer and Systems Lecture 2: Memory, and Software Yousef Alharbi Website
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
 System Requirements are the prerequisites needed in order for a software or any other resources to execute efficiently.  Most software defines two.
Computer Hardware & Processing Inside the Box CSC September 16, 2010.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
Hardware Trends CSE451 Andrew Whitaker. Motivation Hardware moves quickly OS code tends to stick around for a while “System building” extends way beyond.
General Concepts of ICT. Be able to identify the main components of a general- purpose computer:  central processing unit (CPU)  main/internal memory.
Microsoft Research San Francisco (aka BARC: bay area research center) Jim Gray Researcher Microsoft Research Scalable servers Scalable servers Collaboration.
Hardware Architecture
The Sort Benchmark AlgorithmsSolid State Disks External Memory Multiway Mergesort  Phase 1: Run Formation  Phase 2: Merge Runs  Careful parameter selection.
Computer Hardware. Focus Items  Design systems that meet business needs  Hardware industry trends  Problems Legacy hardware (and software) Dealing.
Information Technology (IT). Information Technology – technology used to create, store, exchange, and use information in its various forms (business data,
Homework Reading Machine Projects Labs Exam Next Class
Database Performance in The Era of Free Computing
Computer Hardware.
BUSINESS PLUG-IN B3 HARDWARE AND SOFTWARE BASICS
הכרת המחשב האישי PC - Personal Computer
المحور 3 : العمليات الأساسية والمفاهيم
External Sorting The slides for this text are organized into chapters. This lecture covers Chapter 11. Chapter 1: Introduction to Database Systems Chapter.
CS 345A Data Mining MapReduce This presentation has been altered.
External Sorting.
Presentation transcript:

1 PennySort Award Ceremony Beijing China 23 October 2006

2 Outline Penny Sort history and Award What I have been doing.

3 Benchmark History Wisconsin Bitton Boral DeWitt Turbyfill IBM TP 1-7 CA and Tony Lukes Debit Credit Gray Datamation Anon et al TPC-A MCC Boral &... TPC-B TPC-C TPC-W ? Teradata Bollinger &... TPC-D Sort PennySort MinuteSort TPC-H 2010

4 A Short History of Sort April Fools 1995: Datamation Sort –Sort 1M 100 B records –An IO benchmark: 15-min to 1 hr! 1993:{Minute | Penny}x{Daytona | Indy} 1998: TeraByte Sort Web site:

5 Ground Rules How much can you sort for a penny (or in a minute). –Hardware cost –Depreciated over 3 years –1M$ system gets about 1 second, –1K$ system gets about 1,000 seconds. – Time (seconds) = SystemPrice ($) / 946,080 Input and output are disk resident Input is –100-byte records (random data) –key is first 10 bytes. Must create output file and fill with sorted version of input file. Daytona (product) and Indy (special) categories

PennySort Hardware –266 Mhz Intel PPro –64 MB SDRAM (10ns) –Dual Fujitsu DMA 3.2GB EIDE disks Software –NT workstation 4.3 –NT 5 sort Performance –sort 15 M 100-byte records (~1.5 GB) –Disk to disk –elapsed time 820 sec cpu time = 404 sec

Daytona Terabyte Sort NEC Express/5800/1320Xd 32x Itanium2 1.5Ghz 128GB 900 disk TPC-C machine Striped across 20 HBA –Read and write at 3.5 GBps –Sort 34GB in 60 seconds. –Sort 1 TB in 33 minutes Input Phase of 1 TB nSort

Sort Records 2006 Sort Records Daytona Indy Penny 590 M records ( 55GB) in 644 seconds GpuTeraSort 1,469$ system 3 GHz Pentium IV, 2 GB RAM, 7800GT Nvidia graphics card, 9x80GB SATA disks (4 data and 5 “runs”) WindowsXP Naga Govindaraju, Ritesh Kumar, Dinesh Manocha, Jim Gray U. North Carolina at Chapel Hill, USA GpuTeraSort Naga GovindarajuRitesh Kumar Dinesh ManochaJim Gray U. North Carolina at Chapel Hill Minute 40 GB (400 million records) NeoSort pdf MSword Windows, Fujitsu 32 Itanium2, 128 SAN disks Chris Nyberg, Charles Koester Ordinal Technology NeoSort pdfMSword Chris NybergCharles KoesterOrdinal Technology ( 2005) 116GB (125 M records) SCS pdf 58.7 secondspdf Linux, 80 Itanium2, 2,520 SAN disks Jim WyllieJim Wyllie, IBM Almaden Research TeraByt e (2004) 33 minutes Nsort pdf, word, htm Windows, 32 Itanium2, 2,350 SAN disks Chris Nyberg, Charles Koester Ordinal Technologypdfwordhtm Chris NybergCharles KoesterOrdinal Technology (2005) 435 seconds (7.25 minutes) SCS pdf pdf Linux, 80 Itanium2, 2,520 SAN disks Jim Wyllie, IBM Almaden Research Jim Wyllie 344 million records (32 GB) in 1,679 seconds Bytes-Split-Index Sort (BSIS) $760 system 1.8 GHz AMD, 1 GB RAM, 4x80GB SATA disks, WindowsXP Xing Huang and BinHeng Song School of Software, Tsinghua U., Beijing, China Bo Huang Math&CS, Hunan U. of Technology, Zhuzhou, China Bytes-Split-Index Sort (BSIS) Xing HuangBinHeng Song School of Software, Tsinghua U. Bo Huang Math&CS, Hunan U. of Technology

9 Bytes Split Index Sort (BSIS) Xing Huang & BinHeng Song, Tsinghua Bo Huang, Hunan U. of Technology Xing HuangBinHeng Song, Tsinghua Bo Huang Hunan U. of Technology A radix-partition sort. Then merge the partitions. 344 million records (32 GB) in 1,679 seconds $760 system 1.8 GHz AMD, 1 GB RAM, 4x80GB SATA disks, WindowsXP Phase 1: 66 MB/s, Phase 28 MB/s See

10 Sort 100 byte records (minute / penny) Shows We Hit Memory Ceiling in Sort recs/s/cpu plateaued in 1995

11 Technology Trends: CPU and GPU 2.2 GHz 4.4 GHz 31 GHz 0.8 GHz 1.6 GHz Log of Relative Processing Power Corporate DT SW Requirements Moore’s Law Trajectory CPU Value Leading Edge Mobile Mainstream Desktop DT ‘Replacement’ Enthusiast / Specialty Cooling (Cost) Limitations GPU Moore’s Law 3 for 18 mo Then Moore’s Law trajectory Graphics Req’mts (enhanced experience) Leading Edge Value / UMA ? CPU

12 Moore’s Wall: Chip Heat Death Processor power density going to infinity. Solution: stablize clock at ~5GHz Multi-core (aka MTA) (1,000 core?)

13 GPU TeraSort Naga Govindaraju, Ritesh Kumar, Dinesh Manocha, U. North Carolina at Chapel Hill Naga GovindarajuRitesh Kumar Dinesh Manocha U. North Carolina at Chapel Hill Use GPU for Phase 1 bitonic sort 590 M records ( 55GB) in 644 seconds 1,469$ system 3 GHz Pentium IV, 2 GB RAM, 7800GT Nvidia graphics card, 9x80GB SATA disks (4 data and 5 “runs”) WindowsXP WindowsXP Phase 1: 185 MB/s, Phase 150 MB/s See

14 Sort 100 byte records (minute / penny) Shows We Hit Memory Ceiling in Sort recs/s/cpu plateaued in 1995 Had to get GPU to get better Memory bandwidth SIGMOD 2006 GpuTeraSort GPU better memory architecture, so finally more records/second

15 BSIS 2006 PennySort Price Breakdown Motherboard 16% CPU 12% GPU 18% RAM 10% Disk controller 6% Disks 33% Case, power, fan 3% Assembly 2% GpuTeraSort $760 $1470

16 Sort Performance/Price improved Based on parallelism and “commodity” not per-cpu performance.

17 Musings: PennySort=TBsort 2 pass so 3TB of disk = 8 disks if 400GB/disk = 0.5GBps (if each disk = 65 Mbps) So, 6000 seconds (3TB/5GBps) So, node can cost 200$ Costs 10x that today maybe in 5 years?

18 Musings: MinuteSort=TBsort Sorts 1TB in 1Minute 1 pass so 1TB of ram 266Gbps bisection bandwidth 1 pass so 2TB of IO in 60 sec => 600 disks => ~80 nodes: 8 disks 2GB ram => interconnect with 10Gbps Ethernet or 300 nodes at 1Gbps Ethernet. doable today

19 What I Have Been Doing Traveling & Talking Helping Build the SkyServer and the Virtual Observatory Doing spatial geometry in SQL (no kidding)! Trying to get all science literature and data online and interlinked. and… –to blob or not to blob –disk reliability

20 To Blob or Not To Blob For objects X smaller than 1MB Select X into x from T where key = 123 faster than h = open(X); read(h,x,n); close(h) So, blob beats file for objects < 1MB (on SQL Server – what about other DBs?) Because DB is CISC and FS is RISC Most things are less than 1MB DB should work to make this 10MB File system should borrow ideas from DB. “To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem?”To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem? Rusty Sears, Catharine Van Ingen, Jim Gray, MSR-TR , April 2006

21 How Often do Disks Fail? Observed failure rates. System Source Type Part Years Fails Fails /Year TerraServer SAN Barclay SCSI 10krpm % controllers % san switch % TerraServer Brick Barclay SATA 7krpm % Web Property 1 anon SCSI 10krpm 15, % controllers % Web Property 2 anon PATA 7krpm 22, % motherboard 3, %

22 What About Bit Error Rates Uncorrectable Errors on Read (UERs) –Quoted uncorrectable bit error rates to –That’s 1 error in 1TB to 1 error in 100TB –WOW!!! We moved 1.5 PB looking for errors Saw 5 UER events –3 real, 3 of them were masked by retry Many controller fails and system security reboots Conclusion: –UER not a useful metric – want mean time to data loss –UER better than advertised. Empirical Measurements of Disk Failure Rates and Error Rates Jim Gray, Catharine van Ingen, Microsoft Technical Report MSR-TR

23 So, You Want to Copy a Petabyte? Today, that’s 4,000 disks (read 2k write 2k) Takes ~4 hours if they run in parallel, but… Probably not one file. You will see a few UERs. What’s the best strategy? How fast can you move a Petabyte from CERN to Pasadena? Is sneaker-net fastest and cheapest?

24 UER things I wish I knew Better statistics from larger farms, and more diversity. What is the UER on a LAN, WAN? What is the UER over time: for a file on disk for a disk What’s the best replication strategy? –Symmetric (1+1)+(1+1) or triplex (1+1) + 1