Hong Jiang ( Yifeng Zhu, Xiao Qin, and David Swanson ) Department of Computer Science and Engineering University of Nebraska – Lincoln April 21, 2004 A.

Slides:



Advertisements
Similar presentations
A Case for Redundant Arrays Of Inexpensive Disks Paper By David A Patterson Garth Gibson Randy H Katz University of California Berkeley.
Advertisements

RAID Redundant Arrays of Independent Disks Courtesy of Satya, Fall 99.
Redundant Array of Independent Disks (RAID) Striping of data across multiple media for expansion, performance and reliability.
1 Lecture 18: RAID n I/O bottleneck n JBOD and SLED n striping and mirroring n classic RAID levels: 1 – 5 n additional RAID levels: 6, 0+1, 10 n RAID usage.
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
CSCE430/830 Computer Architecture
High Performance Cluster Computing Architectures and Systems Hai Jin Internet and Cluster Computing Center.
RAID- Redundant Array of Inexpensive Drives. Purpose Provide faster data access and larger storage Provide data redundancy.
RAID Redundant Arrays of Inexpensive Disks –Using lots of disk drives improves: Performance Reliability –Alternative: Specialized, high-performance hardware.
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
R.A.I.D. Copyright © 2005 by James Hug Redundant Array of Independent (or Inexpensive) Disks.
CSCE 212 Chapter 8 Storage, Networks, and Other Peripherals Instructor: Jason D. Bakos.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.
Using Metacomputing Tools to Facilitate Large Scale Analyses of Biological Databases Vinay D. Shet CMSC 838 Presentation Authors: Allison Waugh, Glenn.
A Comparative Study of Network Protocols & Interconnect for Cluster Computing Performance Evaluation of Fast Ethernet, Gigabit Ethernet and Myrinet.
TurboBLAST: A Parallel Implementation of BLAST Built on the TurboHub Bin Gan CMSC 838 Presentation.
Energy Efficient Prefetching – from models to Implementation 6/19/ Adam Manzanares and Xiao Qin Department of Computer Science and Software Engineering.
Lecture 3: A Case for RAID (Part 1) Prof. Shahram Ghandeharizadeh Computer Science Department University of Southern California.
Energy Efficient Prefetching with Buffer Disks for Cluster File Systems 6/19/ Adam Manzanares and Xiao Qin Department of Computer Science and Software.
Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
Reliable PVFS. High Performance I/O ? Three Categories of applications demand good I/O performance  Database management systems (DBMSs) Reading or writing.
I/O Systems and Storage Systems May 22, 2000 Instructor: Gary Kimura.
CSE 451: Operating Systems Winter 2010 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura.
07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Chapter 2 Computer Clusters Lecture 2.1 Overview.
Computer Organization CS224 Fall 2012 Lesson 51. Measuring I/O Performance  I/O performance depends on l Hardware: CPU, memory, controllers, buses l.
RAID: High-Performance, Reliable Secondary Storage Mei Qing & Chaoxia Liao Nov. 20, 2003.
N-Tier Client/Server Architectures Chapter 4 Server - RAID Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept RAID – Redundant Array.
I/O – Chapter 8 Introduction Disk Storage and Dependability – 8.2 Buses and other connectors – 8.4 I/O performance measures – 8.6.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Redundant Array of Inexpensive Disks aka Redundant Array of Independent Disks (RAID) Modified from CCT slides.
Disks Chapter 5 Thursday, April 5, Today’s Schedule Input/Output – Disks (Chapter 5.4)  Magnetic vs. Optical Disks  RAID levels and functions.
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
RAID SECTION (2.3.5) ASHLEY BAILEY SEYEDFARAZ YASROBI GOKUL SHANKAR.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
Fragmentation in Large Object Repositories Russell Sears Catharine van Ingen CIDR 2007 This work was performed at Microsoft Research San Francisco with.
Redundant Array of Independent Disks.  Many systems today need to store many terabytes of data.  Don’t want to use single, large disk  too expensive.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
The concept of RAID in Databases By Junaid Ali Siddiqui.
U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.
Sep. 17, 2002BESIII Review Meeting BESIII DAQ System BESIII Review Meeting IHEP · Beijing · China Sep , 2002.
Operating Systems (CS 340 D) Princess Nora University Faculty of Computer & Information Systems Computer science Department.
GEM: A Framework for Developing Shared- Memory Parallel GEnomic Applications on Memory Constrained Architectures Mucahid Kutlu Gagan Agrawal Department.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
Biosequence Similarity Search on the Mercury System Praveen Krishnamurthy, Jeremy Buhler, Roger Chamberlain, Mark Franklin, Kwame Gyang, and Joseph Lancaster.
Parallel IO for Cluster Computing Tran, Van Hoai.
CIP HPC CIP - HPC HPC = High Performance Computer It’s not a regular computer, it’s bigger, faster, more powerful, and more.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Tackling I/O Issues 1 David Race 16 March 2010.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques Dr. Xiao Qin Auburn University
RAID Redundant Arrays of Independent Disks
Unit OS10: Fault Tolerance
Introduction to Networks
RAID RAID Mukesh N Tekwani
CSE 451: Operating Systems Winter 2009 Module 13 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
TECHNICAL SEMINAR PRESENTATION
RAID Redundant Array of Inexpensive (Independent) Disks
UNIT IV RAID.
Mark Zbikowski and Gary Kimura
CSE 451: Operating Systems Winter 2012 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Mark Zbikowski Gary Kimura 1.
Computer Evolution and Performance
RAID RAID Mukesh N Tekwani April 23, 2019
IT 344: Operating Systems Winter 2007 Module 18 Redundant Arrays of Inexpensive Disks (RAID) Chia-Chi Teng CTB
Presentation transcript:

Hong Jiang ( Yifeng Zhu, Xiao Qin, and David Swanson ) Department of Computer Science and Engineering University of Nebraska – Lincoln April 21, 2004 A Case Study of Parallel I/O for Biological Sequence Search on Linux Clusters

Motivation (1): Cluster Computing Trends Cluster computing is the fastest growing computing platform. 208 clusters were listed among the Top 500 by Dec The power of cluster computing tends to be bounded by I/O

Motivation (2)  Biological sequence search is bounded by I/O –Databases are exploding in sizes –Growing at an exponential rate –Exceeding memory capacity and thrashing –Accelerating speed gap between CPU and I/O database size CPU utilization % performance in low memory  Goals – I/O characterization of biological search – Faster search by using parallel I/O – Using inexpensive Linux clusters – An inexpensive alternative to SAN or high end RAID

Talk Overview  Background: biological sequence search  Parallel I/O implementation  I/O characterizations  Performance evaluation  Conclusions

Biological Sequence Search – Retrieving similar sequences from existing databases is important. – With new sequences, biologists are eager to find similarities. – Used in AIDS and cancer studies Sequence Similarity  Structural Similarity  Functional Similarity  Unraveling life mystery by screening databases databases homologous sequences unknown sequences

BLAST: Basic Local Alignment Search Tool Query:DNAProtein Database:DNAProtein  BLAST –Most widely used similarity search tool –A family of sequence database search algorithms 1) Replicate database but split query sequence Examples: HT-BLAST, cBLAST, U. Iowa BLAST, TurboBLAST 2) Split database but replicate query sequence Examples: mpiBlast, TuboBlast, wuBLAST  Parallel BLAST –Sequence comparison is embarrassingly parallel –Divide and conquer

Large Biology Databases Doubling every 14 months — Public Domain NCBI GenBank 26 million sequences Largest database 8GBytes EMBL 32 million sequences 120 databases – Commercial Domain Celera Genomics DoubleTwist LabBook Incyte Genomics  Challenge: databases are large.

Parallel I/O disk Alleviate I/O Bottleneck in Clusters The following data is measured on the PrairieFire Cluster at University of Nebraska: PCI Bus 264 MB/s=33Mhz*64 bits W: 209 MB/s R: 236 MB/s Disk W: 32 MB/s R: 26 MB/s Memory W: 592 MB/s R: 464 MB/sC: 316 MB/s disk

Overview of CEFT-PVFS  Motivations – High throughput: exploit parallelism – High reliability: Assume MTTF of a disk is 5 years, MTTF of PVFS = MTTF of 1 node ÷ 128 = 2 weeks  Key features – Extension of PVFS – RAID-10 style fault tolerance – Free storage service, without any additional hardware costs – Pipelined duplication – “Lease” used to ensure consistency – Four write protocols to stride different balances between reliability and write throughput – Double parallelism for read operations – Skip hot spots for read

Double Read Parallelism in CEFT-PVFS Myrinet Switch Interleaved data from both groups are read simultaneously to boost the peak read performance.

Skip Heavily Loaded Server Myrinet Switch Heavily loaded node

Parallel BLAST over Parallel I/O mpiBLAST NCBI BLAST Lib Parallel I/O Libs Software stack  Avoid changing the search algorithms  Minor modification to I/O interfaces of NCBI BLAST For example: read(myfile, buffer, 10)  pvfs_read(myfile, buffer, 10)  ceft_pvfs_read(myfile, buffer, 10)  Arbitrarily choose mpiBLAST  This scheme is general to any parallel blasts as long as they use NCBI BLAST ( This is true in most case. ☺ )

Original mpiBLAST Disk CPU Disk CPU Disk CPU Disk CPU Disk CPU mpiBLAST Master mpiBLAST Worker mpiBLAST Worker mpiBLAST Worker mpiBLAST Worker  Master splits database into segments  Master assigns and copies segments to workers  Worker searches its own disk using entire query Switch

mpiBLAST over PVFS/CEFT-PVFS Disk CPU Disk CPU Disk CPU Disk CPU Disk CPU mpiBLAST Master mpiBLAST Worker mpiBLAST Worker mpiBLAST Worker mpiBLAST Worker Metadata Server Data Server Data Server Data Server Data Server  Database is striped over data servers;  Parallel I/O services for each mpiBLAST worker.  Fair comparisons: Overlap servers and mpiBLASTs Do not count the copy time in original mpiBLAST Switch

mpiBLAST over CEFT-PVFS with one hot spot skipped Disk CPU Disk CPU Disk CPU Disk CPU Disk CPU mpiBLAST Master mpiBLAST Worker mpiBLAST Worker mpiBLAST Worker Metadata Server Data Server 1 Data Server 2 Data Server 1’ Data Server 2’ Hot Spot  Double the degree of parallelism for reads  Hot spot may exist since servers are not dedicated  Exploit redundancy to avoid performance degradation mpiBLAST Worker Switch

Performance Measurement Environments  PrairieFire Cluster at UNL –Peak performance: 716 Gflops –128 IDEs, 2.6 TeraBytes total capacity –128 nodes, 256 processors –Myrinet and Fast Ethernet –Linux 2.4 and GM  Performance –TCP/IP: 112 MB/s –Disk writes: 32 MB/s –Disk reads: 26 MB/s

I/O Access Patterns  Database: nt – largest at NCBI web –1.76 million sequences, size 2.7 GBytes  Query sequence: subset of ecoli.nt (568 bytes) – Statistics shows typical query length within bytes 8 mpiBLAST workers on 8 fragments  I/O patterns of a specific case: 8 workers and 8 fragments – 90% are read operations. – Large reads and small writes – I/Os are out of phase gradually Asynchronousness is great!!!

I/O Access Patterns (cont) Read request size and number when the database fragment number changes from 5 to 30.  Focus attention on read operations due to their dominance.  I/O patterns with 8 workers but different number of fragments – Dominated by large reads. – 90% of all accessed data in databases are read by only 40% of the requests.

mpiBLAST vs mpiBLAST-over-PVFS Comparison: assuming the SAME amount of resources  With 1 node, mpiBLAST-over-PVFS performs worse.  Parallel I/O benefits the performance.  Gains of PVFS become smaller as node number increases. -8% 28% 27% 19%

mpiBLAST vs mpiBLAST-over-PVFS (cont)  mpiBLAST-over-PVFS outperforms due to parallel I/O.  Gain from parallel I/O saturates eventually. (Amdahl’s Law)  As database scales up, grain becomes more significant. Comparisons: assuming DIFFERENT amount of resources better saturated worse Parallel I/O is better even with less resources 4 CPUs, 4 disks 4 CPUs, 2 disks

mpiBLAST-over-PVFS vs. over-CEFT-PVFS  mpiBLAST is read-dominated  Same degree of parallelism in PVFS and CEFT-PVFS for read  Overhead in CEFT-PVFS is negligible PVFS: 8 data servers; CEFT-PVFS: 4 mirrors 4. Both cases use 8 disks. -1.5% -10% -3.6% -2.6% Comparisons: assuming different number of mpiBLAST workers

mpiBLAST-over-PVFS vs over-CEFT-PVFS  Configuration: PVFS 8 servers, CEFT-PVFS: 4 mirrors 4  Stress disk on one data server by repeatedly appending data.  While the original mpiBLAST and one over PVFS degraded by 10 and 21 folds, our approach degraded only by a factor of M = allocate(1 MBytes); 2. Create a file named F; 3. While(1) 4. If(size(F) > 2 GB) 5. Truncate F to zero byte; 6. Else 7. Synchronously append the 8. data in M to the end of F; 9. End of if 10. End of while disk stress program Comparisons when one disk is stressed.

Conclusions  Clusters are effective computing platforms for bioinformatics applications.  A higher degree of I/O parallelism may not lead to better performance unless I/O demand remains high (larger databases).  Doubling the degree of I/O parallelism for read operations provides CEFT-PVFS with a comparable read performance with PVFS.  Load balance in CEFT-PVFS is important and effective.

Thank you Question?