Download presentation
Presentation is loading. Please wait.
Published byMercy Stafford Modified over 9 years ago
1
Design and Performance Evaluation of Networked Storage Architectures Xubin He (Hexb@ele.uri.edu) July 25,2002 Dept. of Electrical and Computer Engineering University of Rhode Island
2
July 25, 2002High Performance Computing Lab(HPCL),URI Outline Introduction STICS: SCSI-To-IP Cache for Storage Area Networks DRALIC: Distributed RAID & Location Independence Cache vcRAID: Large Virtual NVRAM Cache for Software RAID Performance Eval. on Distributed Web Server Architectures Conclusions
3
July 25, 2002High Performance Computing Lab(HPCL),URI Background Data storage plays an essential role in today’s fast-growing data-intensive network services. Online data storage doubles every 9 months Storage is approaching more than 50% of IT spending.The storage cost will be up to 75% of the total IT cost in year 2003.
4
A Server-to-Storage Bottleneck Source: Brocade
5
July 25, 2002High Performance Computing Lab(HPCL),URI How to deploy data over the network efficiently and reliably? Disparities between SCSI & IP SCSI remote handshaking over IP Processor-disk gap growing High speed network Large client memories Cheap Disk & RAM, expensive NVRAM RAID5 is reliable, but low performance E-commerce over the Internet, distributed web servers Motivations STICS DRALIC vcRAID
6
July 25, 2002High Performance Computing Lab(HPCL),URI Introduction STICS: SCSI-To-IP Cache for Storage Area Networks DRALIC: Distributed RAID & Location Independence Cache vcRAID: Large Virtual NVRAM Cache for Software RAID Performance Eval. on Distributed Web Server Architectures Conclusions
7
July 25, 2002High Performance Computing Lab(HPCL),URI Introducing a New Device:STICS Whenever there is a disparity, cache helps Features of STICS: Smooth out disparities between SCSI and IP Localize SCSI protocol and filter out unnecessary traffic reducing bandwidth requirement Nonvolatile data caching Improve performance, reliability, manageability and scalability over current iSCSI systems.
8
System Overview System overview. A STICS connects to the host via SCSI interface and connects to other STICS’ or NAS via Internet. SCSI TCP/IP SCSI STICS 1 TCP/IP NAS SCSI STICS 2 TCP/IP Internet STICS 3STICS N Host 1 Host 2 or Storage Host M or Storage SCSI Disks or SAN
9
STICS Architecture SCSI Interface Processor RAM Log Disk Storage device Network Interface
10
July 25, 2002High Performance Computing Lab(HPCL),URI Internal Cache Structure log Disk Meta Data Memory Cache Data Cache
11
July 25, 2002High Performance Computing Lab(HPCL),URI Basic Operations Write Write requests from the host via SCSI Write requests from another STICS via NIC Read Read requests from the host via SCSI Read requests from another STICS via NIC Destage RAM —> log disk Log disk —> storage device Prefetch Storage device —> RAM
12
July 25, 2002High Performance Computing Lab(HPCL),URI Web-based Network Management Web browser-based Manager HTTP Servlet Management App. TCP/IP Local Manage App.
13
July 25, 2002High Performance Computing Lab(HPCL),URI Implementation Platform A STICS block is a PC running Linux OS: Linux with kernel 2.4.2 Compiler:gcc Interfaces: STICS SCSI IP
14
July 25, 2002High Performance Computing Lab(HPCL),URI Performance Evaluations Methodology iSCSI implementation on Linux by Intel (iSCSI) Initial STICS Implementation on Linux Two modes: Immediate report (STICS-Imm) Report after complete (STICS) Workloads Postmark of Network Appliances: throughput Two configurations Small: 1000/50k/436MB Large: 20k/100k/740MB EMC Trace :response time More than 230,000 I/O requests Data set size: >900MB
15
Target (Squid) SCSI NIC Disks Host (Trout) NIC Switch iSCSI commands and data iSCSI configuration. The host Trout establishes connection to target, and the target Squid responds and connects. Then the Squid exports hard drive and Trout sees the disks as local. Cod Target (Squid) SCSI STICS 2 Disks Host (Trout) STICS 1 Switch Block Data STICS configuration. The STICS cache data from both SCSI and network. Cod Experimental Settings
16
PostMark Results: Throughput Ave. ImprovementSTICS-immSTICS Small set226%64% Large set318%97%
17
Where does the benefit come from? <6465-127128- 255 255-511511- 1023 >1024 iSCSI71,937,7249160271,415,912 STICS4431,21616307607,827 Total PacketsSmall Packets (%) Bytes Transferred Bytes per packet iSCSI3,353,82157.8%1,914,566,504571 STICS1039,10041.5%980,963,821944 # Of packets with different sizes (bytes) Network traffic analysis
18
July 25, 2002High Performance Computing Lab(HPCL),URI EMC Trace Results: Response Time a) STICS with immediate report(2.7 ms) b) STICS with report after complete (5.71 ms). c) iSCSI (16.73 ms). Histograms of I/O response times for trace EMC-tel.
19
July 25, 2002High Performance Computing Lab(HPCL),URI Summary A novel cache storage device that adds a new dimension to networked storages Significantly improving performance of iSCSI A cost-effective solution for building efficient SAN over IP Allow easy manageability, maintainability, and scalability
20
July 25, 2002High Performance Computing Lab(HPCL),URI Introduction STICS: SCSI-To-IP Cache for Storage Area Networks DRALIC: Distributed RAID and Location Independence Cache vcRAID: Large Virtual NVRAM Cache for Software RAID Performance Eval. on Distributed Web Server Architectures Conclusions
21
July 25, 2002High Performance Computing Lab(HPCL),URI Web Servers Overhead caused by FS is high Enterprise web server is expensive A Fujitsu Server: More than $5 million PCs are cheap: $1000 Disks: $160/120GB (IBM Deskstar@CompUSA) DRAM:$100/256MB(@Crucial.com)
22
July 25, 2002High Performance Computing Lab(HPCL),URI My Solution Combine or bridge the disk controller and network controller of existing PCs interconnected by a high-speed switch. Share memory and storage among peers
24
July 25, 2002High Performance Computing Lab(HPCL),URI Performance analysis B: data block size (8KB) N: number of nodes H lm : Local memory hit ratio H rm : Remote memory hit ratio T lm : Local memory access time T rm : Remote memory access time T raid : access time from the distributed RAID T dralic : Average response time of DRALIC system
25
Preliminary Performance Analysis
26
July 25, 2002High Performance Computing Lab(HPCL),URI Simulation Results DRALICSim: a simulator based on socket communication. Benchmark: PostMark: measures performance in terms of transaction rates provided by Network Appliance Inc. Configurations: 1000 initial files and 50000 transactions (small), 20000/50000(medium) and 20000/100000(large) 4 Nodes running Windows NT
27
July 25, 2002High Performance Computing Lab(HPCL),URI Simulation Results
28
July 25, 2002High Performance Computing Lab(HPCL),URI Summary Combination of HBAs and NICs will reduce the overhead. Share memory and storage among peers Make use of existing resources Our simulator has the performance gain up to 4.2 with 4 nodes
29
July 25, 2002High Performance Computing Lab(HPCL),URI Introduction STICS: SCSI-To-IP Cache for Storage Area Networks DRALIC: Distributed RAID & Location Independence Cache vcRAID: Large Virtual NVRAM Cache for Software RAID Performance Eval. on Distributed Web Server Architectures Conclusions
30
July 25, 2002High Performance Computing Lab(HPCL),URI VC-RAID Hiding the small write penalty of RAID5 by buffering small writes and destaging data back to RAID with parity computation when disk activity is low. A combination of a small portion of the system RAM and a log disk to form a hierarchical cache. This hierarchical cache appearing to the host as a large nonvolatile RAM.
31
July 25, 2002High Performance Computing Lab(HPCL),URI Buffer Cache Main Memory Cache Disk OS kernel Architecture RAID5
32
July 25, 2002High Performance Computing Lab(HPCL),URI Approaches
33
July 25, 2002High Performance Computing Lab(HPCL),URI Performance Results Test environment: Gateway G6-400, 64MB RAM, 4M RAM buffer, 200 MB Cache disk, 4 SCSI disks form a disk array. Benchmarks Postmark by Network Appliance Untar/copy/remove Compared to built-in RAID0 and RAID5
34
July 25, 2002High Performance Computing Lab(HPCL),URI Throughput SeriesRAID 0VC-RAIDRAID 5 Small (1k+50k) 1111941561 Medium (20k+50k) 686330 Large (20k+100k) 312816
35
Response time (second)
36
July 25, 2002High Performance Computing Lab(HPCL),URI Summary Reliable: based on RAID5 Hard drive is more reliable than RAM Cost effective: hard drives are much cheaper than RAM Software, don’t need extra hardware Fast: increasing the cache size
37
July 25, 2002High Performance Computing Lab(HPCL),URI Introduction STICS: SCSI-To-IP Cache for Storage Area Networks DRALIC: Distributed RAID & Location Independence Cache vcRAID: Large Virtual NVRAM Cache for Software RAID Performance Eval. on Distributed Web Server Architectures Conclusions
38
July 25, 2002High Performance Computing Lab(HPCL),URI Observations E-Commerce has grown explosively Static web pages that are stored as files are no longer the dominant web accesses. about 70% of them start CGI, ASP, or Servlet calls to generate dynamic pages. Web server behaviors and the interaction between web server and database servers
40
July 25, 2002High Performance Computing Lab(HPCL),URI Benchmark and workloads Workloads Static pages Light CGI: 20% / 80%. Heavy CGI: 90% / 10%. Heavy servlet: 90% / 10%. Heavy database access: 90% /10%. Mixed workload: 7% / 8% / 30% /55% WebBench 3.5 (6010 static pages, 300 cgi, 300 simple servlets, 400 DB servlets using JDBC, 2 databases with 15 and 18 tables)
43
July 25, 2002High Performance Computing Lab(HPCL),URI Introduction STICS: SCSI-To-IP Cache for Storage Area Networks DRALIC: Distributed RAID & Location Independence Cache vcRAID: Large Virtual NVRAM Cache for Software RAID Performance Eval. on Distributed Web Server Architectures Conclusions
44
July 25, 2002High Performance Computing Lab(HPCL),URI Summary STICS couples reliable and high speed data caching with low overhead conversion between SCSI and IP. DRALIC boosts the web server performance by combining disk controller and NIC to reduce FS overhead. vcRAID presents a reliable and inexpensive solution for data storage. We carried out an extensive performance study on distributed web server architectures under realistic workloads.
45
July 25, 2002High Performance Computing Lab(HPCL),URI Patents (with Dr. Yang) STICS: SCSI-To-IP Cache Storage, File pending, Serial Number 60/312,471, August 2001 DRALIC: Distributed RAid and Location Independence Cache, Filed pending, May 2001
46
July 25, 2002High Performance Computing Lab(HPCL),URI Publications (Journal) 1. Xubin He, Qing Yang, and Ming Zhang, “STICS: SCSI-To-IP Cache for Storage Area Networks,” Submitted to IEEE Transactions on Parallel and Distributed Systems. 2. Xubin He, Qing Yang, “Performance Evaluation of Distributed Web Server Architectures under E- Commerce Workloads,” Submitted to Journal of Parallel and Distributed Computing. 3. Xubin He, Qing Yang, “On Design and Implementation of a Large Virtual NVRAM Cache for Software RAID,” Special Issue of Journal on Parallel I/O for Cluster Computing, 2002.
47
July 25, 2002High Performance Computing Lab(HPCL),URI Publications (Conference) 1. Xubin He, Qing Yang, and Ming Zhang, “ A Caching Strategy to Improve iSCSI Performance,” To appear in IEEE Annual Conference on Local Computer Networks, Nov. 6-8, 2002. 2. Xubin He, Qing Yang, and Ming Zhang, “Introducing SCSI-To-IP Cache for Storage Area Networks,” ICPP’2002, Vancouver, Canada, August 2002. 3. Xubin He, Ming Zhang, Qing Yang, “DRALIC: A Peer-to-Peer Storage Architecture”, Proc. of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'2001), 2001. 4. Xubin He, Qing Yang, “Characterizing the Home Pages”, Proc. of the 2nd International Conference on Internet Computing (IC’2001), 2001. 5. Xubin He, Qing Yang, “VC-RAID: A Large Virtual NVRAM Cache for Software Do-it-yourself RAID”, Proc. of the International Symposium on Information Systems and Engineering (ISE'2001), 2001. 6. Xubin He, Qing Yang, “Performance Evaluation of Distributed Web Server Architectures under E-Commerce Workloads”, Proc. of the 1 st International Conference on Internet Computing (IC’2000), 2000.
48
Thank You! Dr. Qing Yang @ELE Dr. Jien-Chung Lo @ELE Dr. Joan Peckham @CS Dr. Peter Swaszek @ELE Dr. Lisa DiPippo @CS And more…
49
Special thanks to my daughter, Rachel!
50
July 25, 2002High Performance Computing Lab(HPCL),URI
51
July 25, 2002High Performance Computing Lab(HPCL),URI
52
July 25, 2002High Performance Computing Lab(HPCL),URI
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.