11. 2 Today's Topic ノンブロッキング通信 Non-Blocking Communication 通信の完了を待つ間に他の処理を行う Execute other instructions while waiting for the completion of a communication.

Slides:

Advertisements

Similar presentations

SPSSによるHosmer-Lemeshow検定について

Advertisements

1 ソフトウェア工学２００ 5 年５セメスタ. 2 履修にあたって教科書：「アルゴリズムとデータ構造」平田富夫著森北出版講義：５セメスター開講、専門科目、 K318 、金曜４時限担当：草苅良至 GI511, ２０９５、参考書：「プログラミング作法」

７．n次の行列式　　一般的な（n次の）行列式の定義には、数学的な概念がいろいろ必要である。まずそれらを順に見ていく。

９．線形写像.

３．多項式計算アルゴリズムべき乗の計算多項式の計算.

時間的に変化する信号. 普通の正弦波は豊富な情報を含んでいませんこれだけではラジオのような複雑な情報を送れない振幅 a あるいは角速度 ω を時間的に変化させて情報を送る.

九州大学岡村研究室久保貴哉 1. 利用中のＡＰの数の推移 2 横軸：時刻縦軸：接続要求数・深夜では一分間で平均一台、昼間では平均１４台程度の接続要求をＡＰが受けている。・急にＡＰの利用者数が増えてくるのは７～８時あたり.

麻雀ゲーム和島研究室ソ小林巧人

５．連立一次方程式.

1 情報量（２章）. 2 物理的概念との対比１（入れ物と中身）塩水塩データ情報情報の量？塩分の量！情報の量は見た目ではわからない。データと情報は異なる概念。塩分の量は見た目ではわからない。しかし、本質的なもの。

―本日の講義― ・平均と分散 -代表値 -ぱらつき(分散・標準偏差等) ・Excelによる演習

ノイズ. 雑音とも呼ばれる。（音でなくても、雑音という）入力データに含まれる、本来ほしくない成分.

この資料は、情報工学レクチャーシリーズ　オペレーティングシステム　松尾啓志　著（森北出版株式会社）を用いて授業を行うために、名古屋工業大学松尾啓志、津邑公暁が作成しました。パワーポイント2007で最終版として保存しているため、変更はできませ

広告付き価格サービス小園一正. はじめに世の中には様々な表現方法の広告があります。その中でも私たち学生にとって身近にあるものを広告媒体として取り入れられている。価格サービス（無料配布のルーズリーフ）を体験したことにより興味を惹かれるきっかけとなった。主な目的は、これ.

素数判定法 2011/6/20.

フーリエ係数の性質. どこまで足す？理想的には無限大であるが、実際にはそれは出来ないこれをフーリエ解析してみる.

地球温暖化と天候の関係性～温暖化は天候のせいなのではないのか～. 目的課題地球温暖化現象ただの気象条件によるものではないのか？地球温暖化現象に天候は関係しているのか？

公開鍵暗号系 2011/05/09.

１章　行列と行列式.

本宮市立白岩小学校. １はじめに２家庭学習プログラム開発の視点 ① 先行学習（予習）を生かした確かな学力を形成する授業づくり ② 家庭との連携を図った家庭学習の習慣化.

プログラミングⅠ（ 1 組）第 9 回

フーリエ級数. 一般的な波はこのように表せる a,b をフーリエ級数という比率：

3.エントロピーの性質と各種情報量.

９．通信路符号化手法１（誤り検出と誤り訂正の原理）

Excelによる積分.

1 ６．低次の行列式とその応用. 2 行列式とは行列式とは、正方行列の特徴を表す一つのスカラーである。すなわち、行列式は正方行列からスカラーに写す写像の一種とみなすこともできる。正方行列スカラー（実数）の行列に対する行列式を、次の行列式という。行列の行列式をとも表す。行列式と行列の記号.

計算のスピードアップコンピュータでも、sin、cosの計算は大変です足し算、引き算、掛け算、割り算は早いです

線形符号（１０章）.

1 ０章数学基礎. 2 ( 定義）集合集合については、３セメスタ開講の「離散数学」で詳しく扱う。集合大学では、高校より厳密に議論を行う。そのために、議論の対象を明確にする必要がある。ある “ もの ” （基本的な対象、概念）の集まりを、集合という。集合に含まれる “ もの ” を、集合の要素または元という。

10．PとNP完全問題との境界.

４．プッシュダウンオートマトンと文脈自由文法の等価性

1 ０章数学基礎. 2 ( 定義）集合集合については、３セメスタ開講の「離散数学」で詳しく扱う。集合大学では、高校より厳密に議論を行う。そのために、議論の対象を明確にする必要がある。ある “ もの ” （基本的な対象、概念）の集まりを、集合という。集合に含まれる “ もの ” を、集合の要素または元という。

信号測定. 正弦波多くの場合正弦波は 0V の上下で振動するしかし、これでは AD 変換器に入れられないので、オフセットを調整してデータを取った.

1 ９．線形写像. 2 ここでは、行列の積によって、写像を定義できることをみていく。また、行列の積によって定義される写像の性質を調べていく。

通信路（７章）.

アルゴリズムとデータ構造補足資料 7-4 「単純交換ソート exsort.c 」横浜国立大学理工学部数物・電子情報系学科富井尚志.

6.符号化法（６章）.

ビット. 十進数と二進数十進数  ０から９までの数字を使って０、１、２、３、４、５、６、７、８、９、１０、１１、１２と数える二進数  ０と１を使って０、１、１０、１１、１００、１０１、１１０、１１１と数える.

アルゴリズムとデータ構造補足資料14-1 「ハッシュ法」

結城諒司吉田成樹完成予定図ＯＮ！７セグＬＥＤランダム表示ＯＦＦ？数字が出たら勝ち！！

3．正方行列（単位行列、逆行列、対称行列、交代行列）

1 オペレーティングシステム #11 この資料は、情報工学レクチャーシリーズオペレーティングシステム松尾啓志著（森北出版株式会社）を用いて授業を行うために、名古屋工業大学松尾啓志、津邑公暁が作成しました。パワーポイント 2007 で最終版として保存しているため、変更はできませんが、授業でお使いなる場合は松尾.

様々な情報源（４章）.

プログラミング演習ＢＭＬ編第３回 2010/6/15 （コミ） 2010/6/16 （情報・知能）住井 ~sumii/class/proenb2010/ml3/

論理回路第１回. 今日の内容論理回路とは？本講義の位置づけ，達成目標講義スケジュールと内容受講時の注意事項成績の評価方法.

Three-Year Course Orientation International Course.

JPN 311: Conversation and Composition 許可 (permission)

方程式を「算木」で解いてみよう! 愛媛大学教育学部平田　浩一.

Ｃ言語応用構造体.

実装の流れと今後のスケジュール０３ｋ００１４岸原大祐. システム概要天気データをもとに、前向き推論をしていき、親の代わりに子供に服装、持ち物、気をつけることなどを教える。

３．多項式計算アルゴリズムべき乗の計算多項式の計算.

階層分析法. 表３．１ルートＲ1Ｒ1 Ｒ2Ｒ2 Ｒ3Ｒ3 Ｒ4Ｒ4 Ｒ5Ｒ5 Ｆ1Ｆ1 最寄駅までの所要時間（分） 10 7 Ｆ2Ｆ2 実乗車時間（分）Ｆ3Ｆ3 片道切符（円）ヶ月定期（円） 11,21011,9309,75012,46012,720.

プログラミング演習ＢＭＬ編第３回 2006/7/4 （通信コース） 2006/7/12 （情報コース）住井 ~sumii/class/proenb2006/ml3/

HKS Analysis Log Jul 2006 Part1 D.Kawama. 第壱部 HKS Sieve Slit Analysis.

HSPによる学習機能付きシューティングゲームの製作

移動エージェントプログラムの動作表示のためのアニメーション言語名古屋大学情報工学コース坂部研究室高岸健.

プログラミングⅠ（ 2 組）第 1 回 / pLB1.pptx.

８．任意のデータ構造（グラフの表現とアルゴリズム）

プログラミング入門２第3回複合文、繰り返し情報工学科篠埜功.

メニューに戻るメニューに戻る | 前表示スライド前表示スライド G*power 3 の web ページ Windows はこちら Mac はこちらダウンロード後，実行してインストール.

第１４回プログラムの意味論と検証（３）不動点意味論担当：犬塚

実験５規則波 C0XXXX 石黒 ○○ C0XXXX 杉浦 ○○ C0XXXX 大杉 ○○ C0XXXX 高柳 ○○ C0XXXX 岡田 ○○ C0XXXX 藤江 ○○ C0XXXX 尾形 ○○ C0XXXX 足立 ○○

オセロの思考アルゴリズムについて１１０３０７２　岩間　隆浩.

1 アルゴリズムの高速化. 2 アルゴリズムにおける大幅な性能アップ多項式時間アルゴリズムＶＳ対数時間アルゴリズム（最大公約数の問題）指数時間アルゴリズムＶＳ多項式時間アルゴリズム（フィボナッチ数列を求める問題）

音の変化を視覚化するサウンドプレイヤーの作成

プログラミングの基礎知識プログラミングの手順と重要概念アルゴリズム. プログラミングの手順コーディングエディタなどでコードを記述コンパイル・インタープリタ実行可能な形に翻訳デバッグ（虫取り、不具合の調整）完成！

Self-efficacy（自己効力感）について

本文. 考えながら読みましょう「いろいろなこと」（ 3 行目）は何ですか「①電話料金はコンビニで支払いをしています。いつでも払えますから、便利です。」「②夕食はコンビニで買います。お弁当やおかずがいろいろありますから。」今、若者に人気のあるコンビニは、いろいろなことをするのに非常に便利な場所になった。

たくさんの人がいっしょに乗れる乗り物を「公共交通」といいますバスや電車と自動車のよいところとよくないところよいところとよくないところを考えてみよう！

Presentation transcript:

11

2 Today's Topic ノンブロッキング通信 Non-Blocking Communication 通信の完了を待つ間に他の処理を行う Execute other instructions while waiting for the completion of a communication. 集団通信関数の実装 Implementation of collective communications MPI プログラムの時間計測 Measuring execution time of MPI programs デッドロック Deadlock

3 Today's Topic ノンブロッキング通信 Non-Blocking Communication 通信の完了を待つ間に他の処理を行う Execute other instructions while waiting for the completion of a communication. 集団通信関数の実装 Implementation of collective communications MPI プログラムの時間計測 Measuring execution time of MPI programs デッドロック Deadlock

ノンブロッキング通信関数 Non-blocking communication functions ノンブロッキング = ある命令の完了を待たずに次の命令に移る Non-blocking = Do not wait for the completion of an instruction and proceed to the next instruction Example) MPI_Irecv & MPI_Wait 4 MPI_Recv Wait for the arrival of data MPI_Irecv Proceed to the next instruction without waiting for the data data Blocking next instructions MPI_Wait data Non-Blocking

55 MPI_Irecv Non-Blocking Receive Parameters: start address for storing received data, number of elements, data type, rank of the source, tag (= 0, in most cases), communicator (= MPI_COMM_WORLD, in most cases), request request: 通信要求 Communication Request この通信の完了を待つ際に用いる Used for Waiting completion of this communication Example) MPI_Request req;... MPI_Irecv(a, 100, MPI_INT, 0, 0, MPI_COMM_WORLD, &req);... MPI_Wait(&req, &status); 5 Usage: int MPI_Irecv( void *b, int c, MPI_Datatype d, int src, int t, MPI_Comm comm, MPI_Request *r ) ;

66 MPI_Isend Non-Blocking Send Parameters: start address for sending data, number of elements, data type, rank of the destination, tag (= 0, in most cases), communicator (= MPI_COMM_WORLD, in most cases), request Example) MPI_Request req;... MPI_Isend(a, 100, MPI_INT, 1, 0, MPI_COMM_WORLD, &req);... MPI_Wait(&req, &status); 6 Usage: int MPI_Isend( void *b, int c, MPI_Datatype d, int dest, int t, MPI_Comm comm, MPI_Request *r ) ;

Non-Blocking Send? Blocking send (MPI_Send): 送信データが別の場所にコピーされるのを待つ Wait for the data to be copied to somewhere else. ネットワークにデータを送出し終わるか、一時的にデータのコピーを作成するまで。 Until completion of the data to be transferred to the network or, until completion of the data to be copied to a temporal memory. Non-Blocking send (MPI_Recv): 待たない 7

Notice: ノンブロッキング通信中はデータが不定 Data is not sure in non-blocking communications MPI_Irecv: 受信データの格納場所と指定した変数の値は MPI_Wait まで不定 Value of the variable specified for receiving data is not fixed before MPI_Wait 8 MPI_Irecv to A... ~ = A... MPI_Wait 10 A 50 A arrived data 50 Value of A at here can be 10 or 50 ~ = A Value of A is 50

Notice: ノンブロッキング通信中はデータが不定 Data is not sure in non-blocking communications MPI_Isend: 送信データを格納した変数を MPI_Wait より前に書き換えると、実際に送信される値は不定 If the variable that stored the data to be sent is modified before MPI_Wait, the value to be actually sent is unpredictable. 9 MPI_Isend A... A = MPI_Wait 10 A 50 A data sent 10 or 50 A = 100 Modifying value of A here causes incorrect communication You can modify value of A at here without any problem

MPI_Wait ノンブロッキング通信（ MPI_Isend 、 MPI_Irecv ）の完了を待つ。 Wait for the completion of MPI_Isend or MPI_Irecv 送信データの書き換えや受信データの参照が行える Make sure that sending data can be modified, or receiving data can be referred. Parameters: request, status status: MPI_Irecv 完了時に受信データの status を格納 The status of the received data is stored at the completion of MPI_Irecv 10 Usage: int MPI_Wait( MPI_Request *req, MPI_Status *stat);

MPI_Waitall 指定した数のノンブロッキング通信の完了を待つ Wait for the completion of specified number of non- blocking communications Parameters: count, requests, statuses count: ノンブロッキング通信の数 The number of non-blocking communications requests, statuses: 少なくとも count 個の要素を持つ MPI_Request と MPI_Status の配列 Arrays of MPI_Request or MPI_Status that consists at least 'count' number of elements. 11 Usage: int MPI_Waitall(int c, MPI_Request *requests, MPI_Status *statuses);

12 Today's Topic ノンブロッキング通信 Non-Blocking Communication 通信の完了を待つ間に他の処理を行う Execute other instructions while waiting for the completion of a communication. 集団通信関数の実装 Implementation of collective communications MPI プログラムの時間計測 Measuring execution time of MPI programs デッドロック Deadlock

集団通信関数の中身 Inside of the functions of collective communications 通常，集団通信関数は， MPI_Send, MPI_Recv, MPI_Isend, MPI_Irecv 等の一対一通信で実装される Usually, functions of collective communications are implemented by using message passing functions. 13

Inside of MPI_Bcast One of the most simple implementations 14 int MPI_Bcast(char *a, int c, MPI_Datatype d, int root, MPI_Comm comm) { int i, myid, procs; MPI_Status st; MPI_Comm_rank(comm, &myid); MPI_Comm_rank(comm, &procs); if (myid == root){ for (i = 0; i < procs) if (i != root) MPI_Send(a, c, d, i, 0, comm); } else{ MPI_Recv(a, c, d, root, 0, comm, &st); } return 0; }

Another implementation: With MPI_Isend 15 int MPI_Bcast(char *a, int c, MPI_Datatype d, int root, MPI_Comm comm) { int i, myid, procs, cntr; MPI_Status st, *stats; MPI_Request *reqs; MPI_Comm_rank(comm, &myid); MPI_Comm_rank(comm, &procs); if (myid == root){ stats = (MPI_Status *)malloc(sizeof(MPI_Status)*procs); reqs = (MPI_Request *)malloc(sizeof(MPI_Request)*procs); cntr = 0; for (i = 0; i < procs) if (i != root) MPI_Isend(a, c, d, i, 0, comm, &(reqs[cntr++])); MPI_Waitall(procs-1, reqs, stats); free(stats); free(reqs); } else{ MPI_Recv(a, c, d, root, 0, comm, &st); } return 0; }

Another implementation: Binomial Tree 16 int MPI_Bcast(char *a, int c, MPI_Datatype d, int root, MPI_Comm comm) { int i, myid, procs; MPI_Status st; int mask, relative_rank, src, dst; int tag = 1, success = 0; MPI_Comm_rank(comm, &myid); MPI_Comm_rank(comm, &procs); relative_rank = myid - root; if (relative_rank < 0) relative_rank += procs; mask = 1; while (mask < num_procs){ if (relative_rank & mask){ src = myid - mask; if (src < 0) src += procs; MPI_Recv(a, c, d, src, 0, comm, &st); break; } mask <<= 1; } mask >>= 1; while (mask > 0){ if (relative_rank + mask < procs){ dst = myid + mask; if (dst >= procs) dst -= procs; MPI_Send (a, c, d, dst, 0, comm); } mask >>= 1; } return 0; }

Flow of Binomial Tree Use 'mask' to determine when and how to Send/Recv 17 Rank 0Rank 1Rank 2Rank 3Rank 4Rank 5Rank 6Rank 7 mask = 1 mask = 2 mask = 4 mask = 2 mask = 1 Send to 4 Send to 2 Send to 1 mask = 1 Recv from 0 mask = 1 mask = 2 mask = 1 mask = 2 mask = 4 mask = 1 mask = 2 mask = 1 Recv from 2 mask = 1 Recv from 4 mask = 1 Recv from 6 Recv from 0 Recv from 4 mask = 1 Send to 3 mask = 2 Send to 6 mask = 1 Send to 7 mask = 1 Send to 5

18 Today's Topic ノンブロッキング通信 Non-Blocking Communication 通信の完了を待つ間に他の処理を行う Execute other instructions while waiting for the completion of a communication. 集団通信関数の実装 Implementation of collective communications MPI プログラムの時間計測 Measuring execution time of MPI programs デッドロック Deadlock

MPI プログラムの時間計測 Measure the time of MPI programs MPI_Wtime 現在時間（秒）を実数で返す関数 Returns the current time in seconds. Example) Measure time here... double t1, t2;... t1 = MPI_Wtime(); 処理 t2 = MPI_Wtime(); printf("Elapsed time: %e sec.\n", t2 – t1);

並列プログラムにおける時間計測の問題 Problem on measuring time in parallel programs プロセス毎に違う時間を測定：どの時間が本当の所要時間か ? Each process measures different time. Which time is the time we want? 20 Read Send Read Send Rank 0 Receive Rank 1 Rank 2 t1 = MPI_Wtime(); Measure time here

集団通信 MPI_Barrier を使った解決策 Use MPI_Barrier 時間計測前に MPI_Barrier で同期 Synchronize processes before each measurement For measuring total execution time. 21 Read Send Read Send Rank 0 Receive Rank 1 Rank 2 t1 = MPI_Wtime(); MPI_Barrier Measure time here

より細かい解析 Detailed analysis Average MPI_Reduce can be used to achieve the average: MAX and MIN Use MPI_Gather to gather all of the results to Rank 0. Let Rank 0 to find MAX and MIN 22 double t1, t2, t, total; t1 = MPI_Wtime();... t2 = MPI_Wtime(); t = t2 – t1; MPI_Reduce(&t, &total, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); if (myrank == 0) printf("Ave. elapsed: %e sec.\n", total/procs);

最大 (Max) 、平均 (Ave) 、最小 (Min) の関係 Relationships among Max, Ave and Min プロセス毎の負荷（仕事量）のばらつき検証に利用 Can be used for checking the load-balance. 23 Max – Ave is large Max – Ave is small Ave – Min is large NGMostly OK Ave – Min is small NGOK Time includes Computation Time and Communication Time

通信時間の計測 Measuring time for communications 24 double t1, t2, t3, t4 comm=0; t3 = MPI_Wtime(); for (i = 0; i < N; i++){ computation t1 = MPI_Wtime(); communication t2 = MPI_Wtime(); comm += t2 – t1; computation t1 = MPI_Wtime(); communication t2 = MPI_Wtime(); comm += t2 – t1; } t4 = MPI_Wtime();

Analyze computation time Computation time = Total time - Communication time Or, just measure the computation time 計算時間のばらつき＝負荷の不均衡の度合い Balance of computation time shows balance of the amount of computation 注意 : 通信時間には、負荷の不均衡によって生じた待ち時間が含まれるので、単純な評価は難しい Communication time is difficult to analyze since it consists waiting time caused by load-imbalance. ==> Balance computation first. 25

26 Today's Topic ノンブロッキング通信 Non-Blocking Communication 通信の完了を待つ間に他の処理を行う Execute other instructions while waiting for the completion of a communication. 集団通信関数の実装 Implementation of collective communications MPI プログラムの時間計測 Measuring execution time of MPI programs デッドロック Deadlock

Deadlock 何らかの理由で、プログラムを進行させることができなくなった状態 A status of a program in which it cannot proceed by some reasons. MPI プログラムでデッドロックが発生しやすい場所： Places you need to be careful for deadlocks: 1. MPI_Recv, MPI_Wait, MPI_Waitall 2. Collective communications 全部のプロセスが同じ集団通信関数を実行するまで先に進めない A program cannot proceed until all processes call the same collective communication function if (myid == 0){ MPI_Recv from rank 1 MPI_Send to rank 1 } if (myid == 1){ MPI_Recv from rank 0 MPI_Send to rank 0 } if (myid == 0){ MPI_Irecv from rank 1 MPI_Send to rank 1 MPI_Wait } if (myid == 1){ MPI_Irecv from rank 0 MPI_Send to rank 0 MPI_Wait } Wrong case:One solution: use MPI_Irecv

Summary 並列プログラムの作成には，計算の分割，データの分割，通信が必要 Parallel programs need distribution of computation, distribution of data and communications. 並列化で必ず高速化できるとは限らない Parallelization does not always speed up programs. 並列化出来ないプログラムがある There are non-parallelizable programs 並列プログラムではデッドロックに注意 Be careful about deadlocks. 28

Report) Make Reduce function by yourself 次のページのプログラムの my_reduce 関数の中身を追加してプログラムを完成させる Fill the inside of 'my_reduce' function in the program shown in the next slide my_reduce: MPI_Reduce の簡略版 Simplified version of MPI_Reduce 整数の総和のみ. ルートランクは 0 限定．コミュニケータは MPI_COMM_WORLD Calculates total sum of integer numbers. The root rank is always 0. The communicator is always MPI_COMM_WORLD. アルゴリズムは好きなものを考えてよい Any algorithm is OK. 29

30 #include #include "mpi.h" #define N 20 int my_reduce(int *a, int *b, int c) { return 0; } int main(int argc, char *argv[]) { int i, myid, procs; int a[N], b[N]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &procs); for (i = 0; i < N; i++){ a[i] = i; b[i] = 0; } my_reduce(a, b, N); if (myid == 0) for (i = 0; i < N; i++) printf("b[%d] = %d, correct answer = %d\n", i, b[i], i*procs); MPI_Finalize(); return 0; } complete here by yourself