Presentation is loading. Please wait.

Presentation is loading. Please wait.

מבוא לעיבוד מקבילי דר ' גיא תל - צור שקפי הרצאה מס ' 1.

Similar presentations


Presentation on theme: "מבוא לעיבוד מקבילי דר ' גיא תל - צור שקפי הרצאה מס ' 1."— Presentation transcript:

1 מבוא לעיבוד מקבילי דר ' גיא תל - צור שקפי הרצאה מס ' 1

2 Introduction to Parallel Processing Course Number 361-1-3621 אתר הקורס : http://www.ee.bgu.ac.il/~tel-zur/2003/pp.html

3 Course Objectives: The goal of this course is to provide in-depth introduction to modern parallel processing. The course will cover theoretical and practical aspects of parallel processing.

4 מבנה הקורס מבוא טכניקות מיקבול יישומים מקביליים בחישובים מדעים והנדסיים פרקטיקה נושאים נוספים ( העשרה, מגמות עתידיות...)

5 תכנית ההרצאה הראשונה מבוא ל " מבוא לחישוב מקבילי ". מושגי יסוד תאור קצר של המערך המקבילי עליו יתבצע התרגול בשבוע הבא ובמהלך הקורס.

6 מתחילים …

7 מהו { חישוב, עיבוד } מקבילי ? Parallel Computing Parallel Processing Cluster Computing Beowulf Clusters HPC – High Performance Computing

8 Oxford Dictionary of Science: A technique that allows more than one process – stream of activity – to be running at any given moment in a computer system, hence processes can be executed in parallel. This means that two or more processors are active among a group of processes at any instant.

9 האם מחשב מקבילי זהה למונח מחשב - על ?

10 A Supercomputer An extremely high power computer that has a large amount of main memory and very fast processors… Often the processors run in parallel.

11 A Supercomputer A definition from: http://www.cray.com/supercomputing

12 What is a Supercomputer? A supercomputer is defined simply as the most powerful class of computers at any point in time. Supercomputers are used to solve large and complex problems which would be insurmountable by smaller, less powerful computers. Since the pioneering Cray-1 ® system arrived in 1976, supercomputers have contributed enormously to the advancement of knowledge and the quality of human life. Problems of major economic, scientific and strategic importance typically are addressed by supercomputers years before becoming tractable on less-capable systems.

13 Why Study Parallel Architecture? Parallelism: Provides alternative to faster clock for performance Applies at all levels of system design (H/W – S/W Integration) Is a fascinating topic Is increasingly central in information processing, science and engineering

14 The Demand for Computational Speed Continual demand for greater computational speed from a computer system than is currently possible.Areas requiring great computational speed include numerical modeling and simulation of scientific and engineering problems. Computations must be completed within a “reasonable” time period.

15 Large Memory Requirements Use parallel computing for executing larger problems which require more memory than exists on a single computer.

16 Grand Challenge Problems A grand challenge problem is one that cannot be solved in a reasonable amount of time with today’s computers.Obviously, an execution time of 10 years is always unreasonable. Examples: Modeling large DNA structures,global weather forecasting, modeling motion of astronomical bodies.

17 Scientific Computing Demand

18 Cluster Computing – An Example

19 Cluster Computing – Cont’ Linux NetworX 11.2 Tflops Linux cluster 4.6 TB of aggregate memory 138.2 TB of aggregate local disk space 1152 total nodes plus separate hot spare cluster and development cluster 2,304 Intel 2.4 GHz Xeon processors http://www.llnl.gov/linux/mcr/

20 תרגיל נניח שבגלקסיה יש 10^11 כוכבים. הערך את הזמן שיידרש לחישוב 100 איטרציות על בסיס חישוב של O(N^2) במחשב בעל כח - חישוב של 1GFLOPS?

21 פתרון עבור 10^11 כוכבים תהינה 10^22 אינטראקטיות. סה " כ פעולות כולל 100 איטרציות : 10^24 לכן זמן החישוב יהיה :

22 פתרון - המשך חישוב על - פי N log(N): מסקנה: שיפור באלגוריתם חשוב בד“כ הרבה יותר מהוספת מעבדים!

23 Technology Trends

24 Clock Frequency Growth Rate

25 מיקבול הוא טוב אבל יש לו מחיר ! לא כל בעיה ניתנת למיקבול מיקבול תוכנה אינו דבר קל זמינות החומרה זמן הפיתוח מול אלטרנטיבות אחרות ( טכנולוגיה עתידית ) עלותו וכח - אדם

26 Parallel Architecture Considerations Resource Allocation: – how large a collection? – how powerful are the elements? – how much memory? Data access, Communication and Synchronization – how do the elements cooperate and communicate? – how are data transmitted between processors? – what are the abstractions and primitives for cooperation? Performance and Scalability – how does it all translate into performance? – how does it scale?

27 Conventional Computer

28 Shared Memory System

29 Message-Passing Multi-computer

30 הגישה יש לחלק את הבעיה לקטעים הניתנים להרצה במקביל כל קטע מהבעיה הוא תהליך אשר יורץ על מעבד אחד לשם העברת הנתונים / התוצאות בין המעבדים יש צורך בשליחת הודעות – Message Passing בין המעבדים ( קיימות גם שיטות אחרות ואנו נדון בהן בהמשך הקורס )

31 Distributed Shared Memory

32 Flynn (1966) Taxonomy SISD - a single instruction stream-single data stream computer. SIMD - a single instruction stream-multiple data stream computer. MIMD - a multiple instruction stream- multiple data stream computer.

33 Multiple Program Multiple Data (MPMD)

34 Single Program Multiple Data (SPMD) A Single source program Each processor will execute its personal copy of this program Independently and not in synchronism

35 Message-Passing Multi-computers

36 לרשת התקשורת תפקיד משמעותי במערך מחשבים מקבילי ! בשקפים הבאים נסקור פרמטרים מאפיינים של רשת התקשורת

37 Network Criteria – 1/6 Bandwidth Network Latency Communication Latency (H/W+S/W) Message Latency (see next slide)

38 Network Criteria – 2/6 Latency 1/slope=Bandwidth Message Size Time to Send Message Not latency l Bandwidth is the inverse of the slope of the line time = latency + (1/rate) size_of_message Latency is sometimes described as “ time to send a message of zero bytes ”. This is true only for the simple model. The number quoted is sometimes misleading.

39 Network Criteria – 3/6 Bisection Width - # links to be cut in order to divide the network into two equal parts 2

40 Network Criteria – 4/6 Diameter – The max. distance between any two nodes P/2

41 Network Criteria – 5/6 Connectivity – Multiplicity of paths between any two nodes 2

42 Network Criteria – 6/6 Cost – Number of links P

43 תרגיל : חשב את תכונות רשת בעלת P מעבדים שהיא Fully Connected כבציור :

44 פתרון Diameter = 1 Bisection=p^2/4 Connectivity=p-1 Cost=p(p-1)/2

45 פתרון עבור ה -Bisection - המשך Number of links: p(p-1)/2 Internal links in each half: (p/2)(p/2-1)/2 Internal links in both halves: (p/2)(p/2-1) Number of links being cut: p(p-1)/2 – (p/2)(p/2-1) = p^2/4

46 2D Mesh

47 Example: Intel Paragon

48 A Binary Tree – 1/2

49 A Binary – Tree 2/2 Fat tree: Thinking Machine CM5, 1993

50 3D Hypercube Network

51 4D Hypercube Network

52 Embedding – 1/2

53 Embedding – 2/2

54 Deadlock

55 Ethernet

56 Ethernet Frame Format

57 Point-to-Point Communication

58 Performance Computation/Communication ratio Speedup Factor Overhead Efficiency Cost Scalability Gustafson’s Law

59 Computation/Communication Ratio

60 Speedup Factor The maximum speedup is n (linear speedup)

61 Speedup and Comp/Comm Ratio Sequential Work Max (Work + Synch Wait Time + Comm Cost) Speedup <

62 Overhead Things that limit the speedup: –Serial parts of the computation –Some processors compute while others are idle –Communication time for sending messages –Extra computation in the parallel version not appearing in the serial version

63 Amdahl’s Law (1967)

64 Amdahl’s Law - continue With only 5% of the computation being serial, the maximum speedup is 20

65 Speedup

66 Efficiency E is the fraction of time that the processors are being used. If E=100% then S(n)=n.

67 Cost Cost-optimal algorithm is when the cost is proportional to the single processor cost ( i.e. execution time)

68 Scalability An imprecise term Reflects H/W and S/W scalability How to get increased performance when the H/W increased? What H/W is needed when problem size (e.g. # cells) is increased? Problem dependent!

69 Gustafson’s Law (1988) – 1/3 Gives an argument against the pessimistic Amdahl’s Law conclusion. Rather than assume that the problem size is fixed, we should assume that the parallel execution time is fixed. Define a Scaled Speedup for the case of increaseing the number of processors as well as the problem size

70 Gustafson’s Law – 2/3

71 Gustafson’s Law – 3/3 An Example: Assume we have n=20 and a serial fraction of s=0.05 S(scaled)=0.05+0.95*20=19.05, while the Speedup according to Amdahl’s Law is: S=20/(0.05(20-1)+1)=10.26

72 תרגיל מערך מחשבים מכיל 10 מעבדים, לכ " א כוח חישוב של 200MFLOPS. מהם ביצועי המערך ביחידות של MFLOPS אילו 10% מהקוד היה טורי ו - 90% מהקוד היה מקבילי ?

73 פתרון אילו כל הקוד היה מקבילי, כוח החישוב היה : 10*200 = 2000MFLOPs במקרה שלנו : 10 % מהקוד יבצע מחשב בודד ויתרת 90 % מהקוד יבצעו 10 מחשבים, לכן :

74 Domain Decomposition מיפוי הבעיה לפתרון על טופולוגית המערך המקבילי חלוקת הבעיה ליחידות חישוב נפרדות באופן אופטימלי : Load Balance Granularity

75 Load Balance – 1/2 All processors must be kept busy! The parallel cluster may not be homogenous (CPUs, memory, users/jobs, network…)

76 Load Balance 2/2 Static versus Dynamic techniques Static: Algorithmic assignment based on input; won’t change Low runtime overhead Computation must be predictable Preferable when applicable (except in multiprogrammed/heterogeneous environment) Dynamic: Adapt at runtime to balance load Can increase communication and reduce locality Can increase task management overheads

77 Task granularity: amount of work associated with a task General rule: –Coarse-grained => often less load balance –Fine-grained => more overhead; often more comm., contention Determining Task Granularity

78 Algorithms: Adding 8 Numbers

79 Summary – Terms Defined – 1 Flynn Taxonomy Message Passing Shared Memory Bandwidth Latency Bisection Width Diameter Connectivity Cost Meshes, Trees, Hypercubes… Deadlock

80 Summary – Terms Defined - 2 Embedding Process Amdahl’s Law Speedup Factor Efficiency Cost Scalability Gustafson’s Law Load Balance

81 Next Week Class… השיעור הבא יתקיים במעבדת המחשבים, קומה ג ' בבניין הנדסת חשמל ומחשבים, כיתה 330 לא לשכוח לפתוח חשבון על המערך המקבילי ועל מחשבי כיתת הלימוד (Email)!!! תלמיד שלא יפתח חשבון במחשב לא יוכל לבצע התרגול !!!

82 Task #2 http://www.lam-mpi.org/tutorials/ Download and print the file: “MPI quick reference sheet “MPI quick reference sheet Linux Tutorial: “http://www.ctssn.com/”, learn at least lessons 1,2 and 3.http://www.ctssn.com/

83 Cluster Computing COTS – Commodities of The Shelf Free O/S, e.g. Linux LOBOS – Lots Of Boxes On the Shelf PCs connected by a fast network

84 The Dwarves 1/5 12 (+2) PCs of several types Red Hat Linux 6.0-6.2 Fast Ethernet – 100Mbps Myrinet Network 1.28+1.28Gbps, SAN

85 The Dwarves – 2/5 There are 12 computers with Linux operating system. dwarf[1-12] or dwarf[1-12]m dwarf1[m], dwarf3[m]-dwarf7[m] - Pentium II 300 MHz, dwarf9[m]-dwarf12[m] - Pentium III 450 MHz (dual CPU), dwarf2[m], dwarf8[m] - Pentium III 733 MHz (dual CPU).

86 The Dwarves – 3/5 6 PII at 300MHz processors 8 PIII at 450MHz processors 4 PIII at 733MHz processors Total: 18 processors, ~8GFlops

87 The Dwarves 4/5 Dwarf1..dwarf12 – nodes names for the Fast Ethernet link Dwarf1m.. Dwarf12m – nodes names for the Myrinet network

88 The Dwarves 5/5 GNU FORTRAN / C Compilers PVM / MPI

89 Cluster Computing - 1

90 Cluster Computing - 2

91 Cluster Computing - 3

92 Cluster Computing - 4

93 Linux http://www.ee.bgu.ac.il/~tel-zur/linux.html

94 Linux In Google: Linux: 38,600,000 Microsoft: 21,500,000 Bible: 7,590,000

95 BGU Cray “Negev”


Download ppt "מבוא לעיבוד מקבילי דר ' גיא תל - צור שקפי הרצאה מס ' 1."

Similar presentations


Ads by Google