Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Computing platform Andrew A. Chien Mohsen Saneei University of Tehran.

Similar presentations


Presentation on theme: "1 Computing platform Andrew A. Chien Mohsen Saneei University of Tehran."— Presentation transcript:

1 1 Computing platform Andrew A. Chien Mohsen Saneei University of Tehran

2 2 outline Basic Elements – Computing Elements – Communication Elements – Storage Elements Simple Composite Elements (SCE): Local Grids – High Throughput SCEs – High Reliability SCEs – Dedicated High Performance SCEs – Shared Controllable Performance SCEs Illinois HPVM Project and Similar Efforts

3 3 Basic Elements Computing Elements Gordon Moore’s low: the number of transistor and performance of chip, will be double every tow years. Transistor = 20 x 2 (yr-1965)/1.5 Example: – 1975: intel 8080 with 4500 transistors – 1998: pentium II with 7,500,000 tran. Performance: increases by 1.5 times each year – In 1975: 0.1 MIPS – In 1998: 1,000 MIPS

4 4 Basic Elements Communication Elements Networking element with terabit performance is available today But networks had slower advances: – Cost – Fundamental nature Over the past few years we have seen a rapid advance from 10 Mb/s networks to 100 Mb/s We focus on local networks (Cluster networks)

5 5 Basic Elements Communication Elements (cluster networks) cluster networks: – Physically localized – High speed – Generally low-volume products Some cluster networks: – Myricom’s Myrinet – Compaq/Tandem’s Servernet Servernet I Servernet II

6 6 Basic Elements Communication Elements (cluster networks) Myrinet: – High speed local network – Full duplex 1.28 Gb/s links – Derived from multicomputer routers – Wormhole routing – Switch latency below 1 µs – Myrinet open new horizon for research and produced messaging layers Fast message Active message

7 7 Basic Elements Communication Elements (cluster networks) Message size (byte) in Fast Message on Myrinet

8 8 Basic Elements Communication Elements (cluster networks) Compaq/Tandem’s Servernet I – Full duplex 50 MB/s – Wormhole routing – Reliable communication – 64 byte packets – A few microsecond latency Compaq/Tandem’s Servernet II – Full duplex 125 MB/s – 512 byte packet size – 64 bit network addressing

9 9 Basic Elements Storage Elements Capacity and cost-performance improved at an exponential rate Density: – In 1970 to 1988: 29% every year – After 1988: 60% every year Cost-per-byte: – In 1970 to 1988: 40% every year – After 1988: 100% every year But seek times are improving very slowly (average seek timea of 7-10 ms remain typical)

10 10 Basic Elements future MachineComputingMemoryDiskNetwork 2003PC8 GIPS1 GB128 GB1 Gb/s Supercomputer80 TIPS10 TB1,280 TB10 Tb/s 2008PC64 GIPS16 GB2 TB10 Gb/s Supercomputer640 TIPS160 TB20,000 TB100 Tb/s

11 11 Simple Composite Elements (SCE): Local Grids SCEs: collections of basic elements, aggregated with software and special hardware. They are often single administrative domains. SCEs are study for these reasons: – They can reduce the number of problems higher-level grids must solve. – SCEs use resources and software to implement the external properties – SCEs form the basis for the larger computational grid

12 12 Simple Composite Elements (SCE): Local Grids (cont.) A national computational grid: – Reliable CEs to management (access control & scheduling) and basic services (naming and routing) – Other CEs to resource pools (data caching, storage, …)

13 13 Simple Composite Elements (SCE): Local Grids (cont.) SCEs are defined by: – Their external interface – Their internal hardware requirements – Their ability to deliver efficient and flexible use of the hardware to application Their external interface: – Capacity – Aggregate performance – Reliability – Predictability – sharability

14 14 Simple Composite Elements (SCE): Local Grids (cont.) hardware requirements – Heterogeneity – Network requirements (special hardware, link length limited, bandwidth, …) – Distributed resources (links tens of meters or thousands of kilometers) – Changes in constituent system – Scalability (number of nodes)

15 15 SCE: Local Grids High Throughput SCEs Pooled resources are utilized to achieve high throughput on a set of sequential compute jobs – Example: Condor, Utopia, Symbio External interface: – high capacity for computation and a sharable resource – Interface for some parallel computing such as PVM are available

16 16 SCE: Local Grids High Throughput SCEs (cont.) Hardware requirements: – Running on a wide range of processor and network environment – Tolerating both processor and network heterogeneity in type and speed – Can scale to larger number of processors (hundreds to thousands)

17 17 SCE: Local Grids High Throughput SCEs (cont.) High Throughput SCEs in grids – Flexible and powerful systems for achieving high throughput on large numbers of sequential jobs. – Thus, they are well-matched grid elements for such tasks. No supported : – Aggregate performance – Reliability (partial reliable) – Predictability

18 18 SCE: Local Grids High Reliability SCEs (reliable clusters) Provide computational resources with extremely low probability of service interruption and data loss Limited scalability is used to increase system capacity Used of sharable resource Prefer compatible hardware to enable failover and data sharing

19 19 SCE: Local Grids High Reliability SCEs (reliable clusters) (cont.) Can used of lower performance standby system to reduce cost Can be physically localized or distributed over a wide area network Traditionally used special operating system

20 20 SCE: Local Grids Dedicated High Performance SCEs Merge basic element into a single resource, to be applied to a single computation. Used of collection of microprocessors or entire systems (scalable network of workstation) Initially applied to supercomputers tasks Scalable to connect hundreds or thousands of node with limited physical extent (tens of meters)

21 21 SCE: Local Grids Dedicated High Performance SCEs (cont.) Predominant programming model: message passing (such as MPI) Support both sequential jobs and parallel computation but focus on highest single-job performance don’t support: – Reliability – Predictability – Sharable

22 22 SCE: Local Grids Dedicated High Performance SCEs (cont.) Berkeley Network Of Workstations (NOW) is a project o the Dedicated High Performance SCEs IBM SP-2 and Intel/Sandia are 2 example: – Use high volume microprocessors as their basic computation engines – Use custom high performance interconnect delivering 5-100 MB/s of network bandwidth to each node Latencies of 20-100 µs

23 23 SCE: Local Grids Dedicated High Performance SCEs (cont.) IBM SP-2 – Employed entire workstation as the basic building block – Standard AIX workstation operating system – Allowing a single job to each node Intel/Sandia – Employed special system boards and packaging as the basic building block – A custom operating system PUMA – Multitasking and virtual memory are not provided on the compute node

24 24 SCE: Local Grids Shared Controllable-Performance SCEs Aim: deliver predictable high performance in a shared-resource, heterogeneous, distributed environment. This SCEs combine the capabilities of all the SCEs except reliability. This SCEs are High-Performance Virtual Machine (HPVM)

25 25 SCE: Local Grids Shared Controllable-Performance SCEs (cont.) HPVM simplifies programming task by allowing programmers to focus on the complexity of the application Construction of effective HPVM requires meeting a number of research challenges in: – High performance predictable communication – Management of heterogeneity – Performance models – Adaptive resource management

26 26 SCE: Local Grids Shared Controllable-Performance SCEs (cont.) To achieve efficient tight coupling, the network hardware will need to support both low latency and high bandwidth. It most be scalable to thousands of node, because HPVM execute on distributed resources Physically limited geographic distribution

27 27 SCE: Local Grids Illinois HPVM Project and Similar Efforts Aim: develop shared controllable-high- performance SCEs. Basic parameters : – Computing nodes: x86 and PCI computing systems – Operating systems: Windows NT and Linux – Networks: Myrinet, servernet, … Real World Computing Project (RWCP) & Berkeley Network Of Workstations II (NOW II) are another project on HPVM

28 28 summary Element type ScalableAggregatableReliablePredictableSharable Basic elements Basic compute No--No Yes Basic storage No--No yes Basic network No--No yes Local Grids (SCE ) High Throughput yesNopartialNoyes High ReliabilityLimited NoyesNoyes Dedicated High Performance yesYesNo Shared Controllable Performance yes Noyes


Download ppt "1 Computing platform Andrew A. Chien Mohsen Saneei University of Tehran."

Similar presentations


Ads by Google