Copyright Gordon Bell Clusters & Grids The CC – GRID? Era CC GS C 2002 Gordon Bell Bay Area Research Center Microsoft Corporation.

Slides:



Advertisements
Similar presentations
Microsoft Research Microsoft Research Jim Gray Distinguished Engineer Microsoft Research San Francisco SKYSERVER.
Advertisements

IT253: Computer Organization
Distributed Processing, Client/Server and Clusters
University of Chicago Department of Energy The Parallel and Grid I/O Perspective MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also.
SLA-Oriented Resource Provisioning for Cloud Computing
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts Amherst Operating Systems CMPSCI 377 Lecture.
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
SHARCNET. Multicomputer Systems r A multicomputer system comprises of a number of independent machines linked by an interconnection network. r Each computer.
NSF Visit Gordon Bell Microsoft Research 4 October 2002.
Distributed Computations
Testimony to the Advisory Committee on CyberInfrastructure v2.0 Gordon Bell Microsoft Bay Area Research Center 15 February 2002 (with post-testimony reprise)
© 2010, Robert K. Moniot Chapter 1 Introduction to Computers and the Internet 1.
Rutgers PANIC Laboratory The State University of New Jersey Self-Managing Federated Services Francisco Matias Cuenca-Acuna and Thu D. Nguyen Department.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Barracuda Networks Confidential1 Barracuda Backup Service Integrated Local & Offsite Data Backup.
PARALLEL DBMS VS MAP REDUCE “MapReduce and parallel DBMSs: friends or foes?” Stonebraker, Daniel Abadi, David J Dewitt et al.
Mohammed Saiyeedur Rahman.  E-commerce is buying and selling goods over the internet. This could include selling/buying mobile phones, clothes or DVD’s.
Chapter 7: Using Windows Servers to Share Information.
A brief overview about Distributed Systems Group A4 Chris Sun Bryan Maden Min Fang.
“Here comes the Grid” Mark Hayes Technical Director - Cambridge eScience Centre NIEeS Summer School 2003.
Global NetWatch Copyright © 2003 Global NetWatch, Inc. Factors Affecting Web Performance Getting Maximum Performance Out Of Your Web Server.
Internet, intranet, and multimedia database processing l Database processing across local and wide area networks l Alternative architectures for distributing.
DISTRIBUTED COMPUTING
The National Center for Atmospheric Research is operated by the University Corporation for Atmospheric Research under sponsorship of the National Science.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Microsoft Internet Explorer and the Internet Using Microsoft Explorer 5.
COMPTUER CLUSTERING WITH LINUX-ON-CD Robert Ibershoff Computer Electronic Networking.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
Putting it all together Dynamic Data Base Access Norman White Stern School of Business.
MapReduce M/R slides adapted from those of Jeff Dean’s.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
© Lindsay Bradford1 Scaling Dynamic Web Content Provision Using Elapsed-Time- Based Content Degradation Lindsay Bradford, Stephen Milliner and.
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
CMAQ Runtime Performance as Affected by Number of Processors and NFS Writes Patricia A. Bresnahan, a * Ahmed Ibrahim b, Jesse Bash a and David Miller a.
Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Comparison of Distributed Operating Systems. Systems Discussed ◦Plan 9 ◦AgentOS ◦Clouds ◦E1 ◦MOSIX.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Performance Improvements to BDII - Grid Information.
Computer Science Lecture 7, page 1 CS677: Distributed OS Multiprocessor Scheduling Will consider only shared memory multiprocessor Salient features: –One.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
CSI 3125, Preliminaries, page 1 SERVLET. CSI 3125, Preliminaries, page 2 SERVLET A servlet is a server-side software program, written in Java code, that.
Scott Koranda, UWM & NCSA 14 January 2016www.griphyn.org Lightweight Data Replicator Scott Koranda University of Wisconsin-Milwaukee & National Center.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
Web Technologies Lecture 13 Introduction to cloud computing.
Wide-Area Parallel Computing in Java Henri Bal Vrije Universiteit Amsterdam Faculty of Sciences vrije Universiteit.
ORACLE & VLDB Nilo Segura IT/DB - CERN. VLDB The real world is in the Tb range (British Telecom - 80Tb using Sun+Oracle) Data consolidated from different.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
Millions of Jobs or a few good solutions …. David Abramson Monash University MeSsAGE Lab X.
Technology Drill Down: Windows Azure Platform Eric Nelson | ISV Application Architect | Microsoft UK |
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
MicroGrid Update & A Synthetic Grid Resource Generator Xin Liu, Yang-suk Kee, Andrew Chien Department of Computer Science and Engineering Center for Networked.
Microsoft Research San Francisco (aka BARC: bay area research center) Jim Gray Researcher Microsoft Research Scalable servers Scalable servers Collaboration.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Introduction to Computers
Understanding and Improving Server Performance
Introduction to Computers
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Grid Computing.
Chapter 2 Objectives Identify Windows 7 Hardware Requirements.
TeraScale Supernova Initiative
Chapter-1 Computer is an advanced electronic device that takes raw data as an input from the user and processes it under the control of a set of instructions.
Presentation transcript:

Copyright Gordon Bell Clusters & Grids The CC – GRID? Era CC GS C 2002 Gordon Bell Bay Area Research Center Microsoft Corporation

Copyright Gordon Bell Clusters & Grids

Observations from a mostly Grid workshop Clusters. Let’s finish the job! Grids generally. Grids as arbitrary cluster platforms…why? Examples of Grid-types, especially web services Summary…

Copyright Gordon Bell Clusters & Grids Blades aka a “cluster in a cabinet” 366 servers per 44U cabinet – Single processor – GB/computer (24 TBytes) – Mbps Ethernets ~10x perf*, power, disk, I/O per cabinet ~3x price/perf Network services… Linux based *42, 2 processors, 84 Ethernet, 3 TBytes

Clusters aren’t as bad as programs make them out to be, but we need to make them work better and be more transparent. Everything is becoming a cluster. Certainly all of 500! 64 bit addressing will cause more change! Future nodes should bet on CLMP smP’s (p = 4-32). Utilize existing and emerging smP’s nodes versus assuming lcd PM-pairs & MPI. Massive gains from compiler and runtime. ES has set a new standard of efficiency and system transparency for “clusters”. Expand the MPI programming model: – Full transparency of MPI needs to be the goal – Objectify for greater flexibility and greater insulation from latency

Copyright Gordon Bell Clusters & Grids Grids: If they are the solution what’s the problem? Economics… thief, scavenger, power, efficiency or resource sharing? Research funding… that’s where the money is Are they where the problems lie? Does massive collaboration that the Grids enable, create massive overhead and generally less output? Unless the output is for a community! Is funding and middleware a good investment?

Same observations as 2000 GRID was/is an exciting concept … – They can/must work within a community, organization, or project. Apps need to drive. – “Necessity is the mother of invention.” Taxonomy… interesting vs necessity – Cycle scavenging and object evaluation (e.g. – File distribution/sharing for IP theft e.g. Napster – Databases &/or programs for a community (astronomy, bioinformatics, CERN, NCAR) – Workbenches: web workflow chem, bio… – Exchanges… many sites operating together – Single, large objectified pipeline… e.g. NASA. – Grid as a cluster platform! Transparent & arbitrary access including load balancing Web SVCs X

Grid n j. An arbitrary distributed, cluster platform A geographical and multi-organizational collection of diverse computers dynamically configured as cluster platforms responding to arbitrary, ill-defined jobs “thrown” at it. Costs are not necessarily favorable e.g. disks are less expensive than cost to transfer data. Latency and bandwidth are non-deterministic, thereby changing cluster characteristics Once a large body of data exists for a job, it is inherently bound to (set into) fixed resources. Large datasets & I/O bound programs need to be with their data or be database accesses… But are there resources there to share? Bound to cost more?

Bright spots… near term, user focus, a lesson for Grid suppliers Tony Hey apps-based funding. Web services based Grid & data orientation. David Abramson - Nimrod. – Parameter scans… other low hanging fruit – Encapsulate apps! “Excel”-- language/control mgmt. – “Legacy apps are programs that users just want, and there’s no time or resources to modify code …independent of age, author, or language e.g. Java.” Andrew Grimshaw - Avaki – Making Legion vision real. A reality check. Lip 4 pairs of “web services” based apps Gray et al Skyservice and Terraservice Goal: providing a web service must be as easy as publishing a web page…and will occur!!!

Copyright Gordon Bell Clusters & Grids SkyServer: delivering a web service to the astronomy community. Prototype for other sciences? Gray, Szalay, et al First paper on the SkyServer TR_2001_77_Virtual_Observatory.pdf TR_2001_77_Virtual_Observatory.doc Later, more detailed paper for database community TR_01_104_SkyServer_V1.pdf TR_01_104_SkyServer_V1.doc

Copyright Gordon Bell Clusters & Grids What can be learned from Sky Server? It’s about data, not about harvesting flops 1-2 hr. query programs versus 1 wk programs based on grep 10 minute runs versus 3 day compute & searches Database viewpoint. 100x speed-ups – Avoid costly re-computation and searches – Use indices and PARALLEL I/O. Read / Write >>1. – Parallelism is automatic, transparent, and just depends on the number of computers/disks. Limited experience and talent to use dbases.

Copyright Gordon Bell Clusters & Grids Heuristics for building communities that need to share data & programs Always go from working to working Do it by induction in time and space (Why version 3 is pretty good.) Put ONE database in place that’s useful by itself in terms of UI, content, & queries Invent and demo instances of use Get two working in a single location Extend to include a second community, with an appropriate superset capability

Some science is hitting a wall FTP and GREP are not adequate (Jim Gray) You can GREP 1 GB in a minute You can GREP 1 TB in 2 days You can GREP 1 PB in 3 years. 1PB ~10,000 >> 1,000 disks At some point you need indices to limit search parallel data search and analysis Goal using dbases. Make it easy to – Publish: Record structured data – Find data anywhere in the network Get the subset you need! – Explore datasets interactively Database becomes the file system!!! You can FTP 1 MB in 1 sec. You can FTP 1 GB / min. … 2 days and 1K$ … 3 years and 1M$

Network concerns Very high cost – $(1 + 1) / GByte to send on the net; Fedex and 160 GByte shipments are cheaper – DSL at home is $ $0.30 Disks cost less than $2/GByte to purchase Low availability of fast links (last mile problem) – Labs & universities have DS3 links at most, and they are very expensive – Traffic: Instant messaging, music stealing Performance at desktop is poor – Mbps; very poor communication links Manage: trade-in fast links for cheap links!!

Gray’s $2.4 K, 1 TByte Sneakernet aka Disk Brick Courtesy of Jim Gray, Microsoft Bay Area Research Cost to move a Terabyte Cost, time, and speed to move a Terabyte Cost of a “Sneaker-Net” TB We now ship NTFS/SQL disks. Not good format for Linux. Ship NFS/CIFS/ODBC servers (not disks). Plug “disk” into LAN. DHCP then file or DB serve… Web Service in long term

Cost to move a Terabyte

Cost, time of Sneaker-net vs Alts Medi aRobot$ Media $ TB read + write time ship time TotalTim/ TBMbps Cost (10 TB) $/TB shipped CD 15002x hrs 24 hrs6 days28$2 K$208 DVD 2002x8K40060 hrs 24 hrs6 days28$20 K$2,000 Tape 252x15K hrs 24 hrs5 days18$31 K$3,100 DiskBric 71K1,40019 hrs 24 hrs2 days52 $2.6 K$260 Courtesy of Jim Gray, Microsoft Bay Area Research

Copyright Gordon Bell Clusters & Grids Grids: Real and “personal” Two carrots, one downside. A bet. Bell will match any Gordon Bell Prize (parallelism, performance, or performance/cost) winner’s prize that is based on “Grid Platform Technology”. I will bet any individual or set of individuals of the Grid Research community up to $5,000 that a Grid application will not win the above by SC2005.

Copyright Gordon Bell Clusters & Grids The End How can GRIDs become a real, useful, computer structure? Get a life. Adopt an application community! Success if CCGSC2004 is the last …by making Grids ubiquitous.