Design and Performance Evaluation of Networked Storage Architectures Xubin He July 25,2002 Dept. of Electrical and Computer Engineering.

Slides:



Advertisements
Similar presentations
Data Storage Solutions Module 1.2. Data Storage Solutions Upon completion of this module, you will be able to: List the common storage media and solutions.
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
NAS vs. SAN 10/2010 Palestinian Land Authority IT Department By Nahreen Ameen 1.
High Performance Cluster Computing Architectures and Systems Hai Jin Internet and Cluster Computing Center.
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
© 2006 EMC Corporation. All rights reserved. Network Attached Storage (NAS) Module 3.2.
Network-Attached Storage
Embedded Network Controller with Web Interface Bradley University Department of Electrical & Computer Engineering By: Ed Siok Advisor: Dr. Malinowski.
1 Recap (RAID and Storage Architectures). 2 RAID To increase the availability and the performance (bandwidth) of a storage system, instead of a single.
An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.
Web Server Hardware and Software
IP –Based SAN extensions and Performance Thao Pham CS 622 Fall 07.
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
A Comparative Study of Network Protocols & Interconnect for Cluster Computing Performance Evaluation of Fast Ethernet, Gigabit Ethernet and Myrinet.
Federated DAFS: Scalable Cluster-based Direct Access File Servers Murali Rangarajan, Suresh Gopalakrishnan Ashok Arumugam, Rabita Sarker Rutgers University.
BUCS— A Bottom Up Caching Structure for Storage Servers Ming Zhang and Dr. Ken Qing Yang HPCL, Dept. of ECE URI Storage Volume Data storage plays an essential.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
5/8/2006 Nicole SAN Protocols 1 Storage Networking Protocols Nicole Opferman CS 526.
Introduction to client/server architecture
Latest trends and technologies in Storage Networking By: Gururaja Nittur Dr. Chung E Wang Advisor: Dr. Chung E Wang Dr. Du Zhang Second Reader: Dr. Du.
Storage Area Network (SAN)
Storage Networking Technologies and Virtualization Section 2 DAS and Introduction to SCSI1.
Module – 7 network-attached storage (NAS)
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Platform as a Service (PaaS)
Frangipani: A Scalable Distributed File System C. A. Thekkath, T. Mann, and E. K. Lee Systems Research Center Digital Equipment Corporation.
Christopher Bednarz Justin Jones Prof. Xiang ECE 4986 Fall Department of Electrical and Computer Engineering University.
Managing Storage Lesson 3.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
RAID: High-Performance, Reliable Secondary Storage Mei Qing & Chaoxia Liao Nov. 20, 2003.
Stuart Cunningham - Computer Platforms COMPUTER PLATFORMS Network Operating Systems Week 9.
Report : Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.
TPT-RAID: A High Performance Multi-Box Storage System
Windows 2000 Advanced Server and Clustering Prepared by: Tetsu Nagayama Russ Smith Dale Pena.
Global NetWatch Copyright © 2003 Global NetWatch, Inc. Factors Affecting Web Performance Getting Maximum Performance Out Of Your Web Server.
LiNK: An Operating System Architecture for Network Processors Steve Muir, Jonathan Smith Princeton University, University of Pennsylvania
A Unified, Low-overhead Framework to Support Continuous Profiling and Optimization Xubin (Ben) He Storage Technology & Architecture Research(STAR)
Workload-driven Analysis of File Systems in Shared Multi-Tier Data-Centers over InfiniBand K. Vaidyanathan P. Balaji H. –W. Jin D.K. Panda Network-Based.
Peer-to-Peer Distributed Shared Memory? Gabriel Antoniu, Luc Bougé, Mathieu Jan IRISA / INRIA & ENS Cachan/Bretagne France Dagstuhl seminar, October 2003.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Increasing Web Server Throughput with Network Interface Data Caching October 9, 2002 Hyong-youb Kim, Vijay S. Pai, and Scott Rixner Rice Computer Architecture.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
Computer Architecture Lecture 27 Fasih ur Rehman.
Enabling Technologies for Distributed Computing Dr. Sanjay P. Ahuja, Ph.D. Fidelity National Financial Distinguished Professor of CIS School of Computing,
Server HW CSIS 4490 n-Tier Client/Server Dr. Hoganson Server Hardware Mission-critical –High reliability –redundancy Massive storage (disk) –RAID for redundancy.
Mr. P. K. GuptaSandeep Gupta Roopak Agarwal
July 14, KIMICS 2006 Kulveer Singh Graduate School of Design & IT Dongseo University Busan, South Korea Data Concurrency Issues in iSCSI Based-Data.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
VU-Advanced Computer Architecture Lecture 1-Introduction 1 Advanced Computer Architecture CS 704 Advanced Computer Architecture Lecture 1.
Lecture 1: Network Operating Systems (NOS)
Windows 2008 Overview Lecture 1.
Direct Attached Storage and Introduction to SCSI
Storage Networking.
Lecture 1: Network Operating Systems (NOS)
Introduction to Networks
Introduction to Networks
Direct Attached Storage and Introduction to SCSI
Storage Networking.
Storage Networking Protocols
Web Server Administration
(Architectural Support for) Semantically-Smart Disk Systems
Cost Effective Network Storage Solutions
Presentation transcript:

Design and Performance Evaluation of Networked Storage Architectures Xubin He July 25,2002 Dept. of Electrical and Computer Engineering University of Rhode Island

July 25, 2002High Performance Computing Lab(HPCL),URI Outline Introduction STICS: SCSI-To-IP Cache for Storage Area Networks DRALIC: Distributed RAID & Location Independence Cache vcRAID: Large Virtual NVRAM Cache for Software RAID Performance Eval. on Distributed Web Server Architectures Conclusions

July 25, 2002High Performance Computing Lab(HPCL),URI Background Data storage plays an essential role in today’s fast-growing data-intensive network services. Online data storage doubles every 9 months Storage is approaching more than 50% of IT spending.The storage cost will be up to 75% of the total IT cost in year 2003.

A Server-to-Storage Bottleneck Source: Brocade

July 25, 2002High Performance Computing Lab(HPCL),URI How to deploy data over the network efficiently and reliably?  Disparities between SCSI & IP  SCSI remote handshaking over IP  Processor-disk gap growing  High speed network  Large client memories  Cheap Disk & RAM, expensive NVRAM  RAID5 is reliable, but low performance  E-commerce over the Internet, distributed web servers Motivations STICS DRALIC vcRAID

July 25, 2002High Performance Computing Lab(HPCL),URI Introduction STICS: SCSI-To-IP Cache for Storage Area Networks DRALIC: Distributed RAID & Location Independence Cache vcRAID: Large Virtual NVRAM Cache for Software RAID Performance Eval. on Distributed Web Server Architectures Conclusions

July 25, 2002High Performance Computing Lab(HPCL),URI Introducing a New Device:STICS Whenever there is a disparity, cache helps Features of STICS: Smooth out disparities between SCSI and IP Localize SCSI protocol and filter out unnecessary traffic reducing bandwidth requirement Nonvolatile data caching Improve performance, reliability, manageability and scalability over current iSCSI systems.

System Overview System overview. A STICS connects to the host via SCSI interface and connects to other STICS’ or NAS via Internet. SCSI TCP/IP SCSI STICS 1 TCP/IP NAS SCSI STICS 2 TCP/IP Internet STICS 3STICS N Host 1 Host 2 or Storage Host M or Storage SCSI Disks or SAN

STICS Architecture SCSI Interface Processor RAM Log Disk Storage device Network Interface

July 25, 2002High Performance Computing Lab(HPCL),URI Internal Cache Structure log Disk Meta Data Memory Cache Data Cache

July 25, 2002High Performance Computing Lab(HPCL),URI Basic Operations Write Write requests from the host via SCSI Write requests from another STICS via NIC Read Read requests from the host via SCSI Read requests from another STICS via NIC Destage RAM —> log disk Log disk —> storage device Prefetch Storage device —> RAM

July 25, 2002High Performance Computing Lab(HPCL),URI Web-based Network Management Web browser-based Manager HTTP Servlet Management App. TCP/IP Local Manage App.

July 25, 2002High Performance Computing Lab(HPCL),URI Implementation Platform A STICS block is a PC running Linux OS: Linux with kernel Compiler:gcc Interfaces: STICS SCSI IP

July 25, 2002High Performance Computing Lab(HPCL),URI Performance Evaluations Methodology iSCSI implementation on Linux by Intel (iSCSI) Initial STICS Implementation on Linux  Two modes:  Immediate report (STICS-Imm)  Report after complete (STICS) Workloads Postmark of Network Appliances: throughput  Two configurations  Small: 1000/50k/436MB  Large: 20k/100k/740MB EMC Trace :response time  More than 230,000 I/O requests  Data set size: >900MB

Target (Squid) SCSI NIC Disks Host (Trout) NIC Switch iSCSI commands and data iSCSI configuration. The host Trout establishes connection to target, and the target Squid responds and connects. Then the Squid exports hard drive and Trout sees the disks as local. Cod Target (Squid) SCSI STICS 2 Disks Host (Trout) STICS 1 Switch Block Data STICS configuration. The STICS cache data from both SCSI and network. Cod Experimental Settings

PostMark Results: Throughput Ave. ImprovementSTICS-immSTICS Small set226%64% Large set318%97%

Where does the benefit come from? < >1024 iSCSI71,937, ,415,912 STICS4431, ,827 Total PacketsSmall Packets (%) Bytes Transferred Bytes per packet iSCSI3,353, %1,914,566, STICS1039, %980,963, # Of packets with different sizes (bytes) Network traffic analysis

July 25, 2002High Performance Computing Lab(HPCL),URI EMC Trace Results: Response Time a) STICS with immediate report(2.7 ms) b) STICS with report after complete (5.71 ms). c) iSCSI (16.73 ms). Histograms of I/O response times for trace EMC-tel.

July 25, 2002High Performance Computing Lab(HPCL),URI Summary A novel cache storage device that adds a new dimension to networked storages Significantly improving performance of iSCSI A cost-effective solution for building efficient SAN over IP Allow easy manageability, maintainability, and scalability

July 25, 2002High Performance Computing Lab(HPCL),URI Introduction STICS: SCSI-To-IP Cache for Storage Area Networks DRALIC: Distributed RAID and Location Independence Cache vcRAID: Large Virtual NVRAM Cache for Software RAID Performance Eval. on Distributed Web Server Architectures Conclusions

July 25, 2002High Performance Computing Lab(HPCL),URI Web Servers Overhead caused by FS is high Enterprise web server is expensive A Fujitsu Server: More than $5 million PCs are cheap: $1000 Disks: $160/120GB (IBM

July 25, 2002High Performance Computing Lab(HPCL),URI My Solution Combine or bridge the disk controller and network controller of existing PCs interconnected by a high-speed switch. Share memory and storage among peers

July 25, 2002High Performance Computing Lab(HPCL),URI Performance analysis B: data block size (8KB) N: number of nodes H lm : Local memory hit ratio H rm : Remote memory hit ratio T lm : Local memory access time T rm : Remote memory access time T raid : access time from the distributed RAID T dralic : Average response time of DRALIC system

Preliminary Performance Analysis

July 25, 2002High Performance Computing Lab(HPCL),URI Simulation Results DRALICSim: a simulator based on socket communication. Benchmark: PostMark: measures performance in terms of transaction rates provided by Network Appliance Inc. Configurations: 1000 initial files and transactions (small), 20000/50000(medium) and 20000/100000(large) 4 Nodes running Windows NT

July 25, 2002High Performance Computing Lab(HPCL),URI Simulation Results

July 25, 2002High Performance Computing Lab(HPCL),URI Summary Combination of HBAs and NICs will reduce the overhead. Share memory and storage among peers Make use of existing resources Our simulator has the performance gain up to 4.2 with 4 nodes

July 25, 2002High Performance Computing Lab(HPCL),URI Introduction STICS: SCSI-To-IP Cache for Storage Area Networks DRALIC: Distributed RAID & Location Independence Cache vcRAID: Large Virtual NVRAM Cache for Software RAID Performance Eval. on Distributed Web Server Architectures Conclusions

July 25, 2002High Performance Computing Lab(HPCL),URI VC-RAID Hiding the small write penalty of RAID5 by buffering small writes and destaging data back to RAID with parity computation when disk activity is low. A combination of a small portion of the system RAM and a log disk to form a hierarchical cache. This hierarchical cache appearing to the host as a large nonvolatile RAM.

July 25, 2002High Performance Computing Lab(HPCL),URI Buffer Cache Main Memory Cache Disk OS kernel Architecture RAID5

July 25, 2002High Performance Computing Lab(HPCL),URI Approaches

July 25, 2002High Performance Computing Lab(HPCL),URI Performance Results Test environment: Gateway G6-400, 64MB RAM, 4M RAM buffer, 200 MB Cache disk, 4 SCSI disks form a disk array. Benchmarks Postmark by Network Appliance Untar/copy/remove Compared to built-in RAID0 and RAID5

July 25, 2002High Performance Computing Lab(HPCL),URI Throughput SeriesRAID 0VC-RAIDRAID 5 Small (1k+50k) Medium (20k+50k) Large (20k+100k)

Response time (second)

July 25, 2002High Performance Computing Lab(HPCL),URI Summary Reliable: based on RAID5 Hard drive is more reliable than RAM Cost effective: hard drives are much cheaper than RAM Software, don’t need extra hardware Fast: increasing the cache size

July 25, 2002High Performance Computing Lab(HPCL),URI Introduction STICS: SCSI-To-IP Cache for Storage Area Networks DRALIC: Distributed RAID & Location Independence Cache vcRAID: Large Virtual NVRAM Cache for Software RAID Performance Eval. on Distributed Web Server Architectures Conclusions

July 25, 2002High Performance Computing Lab(HPCL),URI Observations E-Commerce has grown explosively Static web pages that are stored as files are no longer the dominant web accesses. about 70% of them start CGI, ASP, or Servlet calls to generate dynamic pages. Web server behaviors and the interaction between web server and database servers

July 25, 2002High Performance Computing Lab(HPCL),URI Benchmark and workloads Workloads Static pages Light CGI: 20% / 80%. Heavy CGI: 90% / 10%. Heavy servlet: 90% / 10%. Heavy database access: 90% /10%. Mixed workload: 7% / 8% / 30% /55% WebBench 3.5 (6010 static pages, 300 cgi, 300 simple servlets, 400 DB servlets using JDBC, 2 databases with 15 and 18 tables)

July 25, 2002High Performance Computing Lab(HPCL),URI Introduction STICS: SCSI-To-IP Cache for Storage Area Networks DRALIC: Distributed RAID & Location Independence Cache vcRAID: Large Virtual NVRAM Cache for Software RAID Performance Eval. on Distributed Web Server Architectures Conclusions

July 25, 2002High Performance Computing Lab(HPCL),URI Summary STICS couples reliable and high speed data caching with low overhead conversion between SCSI and IP. DRALIC boosts the web server performance by combining disk controller and NIC to reduce FS overhead. vcRAID presents a reliable and inexpensive solution for data storage. We carried out an extensive performance study on distributed web server architectures under realistic workloads.

July 25, 2002High Performance Computing Lab(HPCL),URI Patents (with Dr. Yang)  STICS: SCSI-To-IP Cache Storage, File pending, Serial Number 60/312,471, August 2001  DRALIC: Distributed RAid and Location Independence Cache, Filed pending, May 2001

July 25, 2002High Performance Computing Lab(HPCL),URI Publications (Journal) 1. Xubin He, Qing Yang, and Ming Zhang, “STICS: SCSI-To-IP Cache for Storage Area Networks,” Submitted to IEEE Transactions on Parallel and Distributed Systems. 2. Xubin He, Qing Yang, “Performance Evaluation of Distributed Web Server Architectures under E- Commerce Workloads,” Submitted to Journal of Parallel and Distributed Computing. 3. Xubin He, Qing Yang, “On Design and Implementation of a Large Virtual NVRAM Cache for Software RAID,” Special Issue of Journal on Parallel I/O for Cluster Computing, 2002.

July 25, 2002High Performance Computing Lab(HPCL),URI Publications (Conference) 1. Xubin He, Qing Yang, and Ming Zhang, “ A Caching Strategy to Improve iSCSI Performance,” To appear in IEEE Annual Conference on Local Computer Networks, Nov. 6-8, Xubin He, Qing Yang, and Ming Zhang, “Introducing SCSI-To-IP Cache for Storage Area Networks,” ICPP’2002, Vancouver, Canada, August Xubin He, Ming Zhang, Qing Yang, “DRALIC: A Peer-to-Peer Storage Architecture”, Proc. of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'2001), Xubin He, Qing Yang, “Characterizing the Home Pages”, Proc. of the 2nd International Conference on Internet Computing (IC’2001), Xubin He, Qing Yang, “VC-RAID: A Large Virtual NVRAM Cache for Software Do-it-yourself RAID”, Proc. of the International Symposium on Information Systems and Engineering (ISE'2001), Xubin He, Qing Yang, “Performance Evaluation of Distributed Web Server Architectures under E-Commerce Workloads”, Proc. of the 1 st International Conference on Internet Computing (IC’2000), 2000.

Thank You! Dr. Qing Dr. Jien-Chung Dr. Joan Dr. Peter Dr. Lisa And more…

Special thanks to my daughter, Rachel!

July 25, 2002High Performance Computing Lab(HPCL),URI

July 25, 2002High Performance Computing Lab(HPCL),URI

July 25, 2002High Performance Computing Lab(HPCL),URI