HPC - High Performance Computation
DAG = Directed acyclic graph
PRAM = Parallel Random Access Machine)模型是多指令流多数据流(MIMD)- 内存被分配到各个格子中
Basic Knowledge
-
Cost model - 3 assumptions
- same speed
- 1 operation = 1 unit of time
- no edge cost
-
Sequential DAG vs Tree-based DAG
- Sequential - O(n) time
- Tree - O(logn) time
-
Work: W(n) = work (n nodes)
-
Span: D(n) = depth (vertices on the longest path - critical path)
-
Average available parallelism: \(\frac{W(n)}{D(n)}\)
-
Span Law: \(T_{p}(n) \ge D(n) \\p \ is\ the\ blocks\ in\ PRAM\ Model\\)
-
Work Law: \(T_{p}(n) \ge \left \lceil \frac{W(n)}{p} \right \rceil\)
-
(Compined) Work - Span Law: \(T_{p}(n) \ge max \left\{ D(n), \left \lceil \frac{W(n)}{p} \right \rceil \right\}\)
Brent’s theorem (get the upper-bound of the time consumption):
\[T_{p}(n) \ge \frac{W-D}{p}+D\]-
Setup: Break execution into phases:
-
Each phase has 1 c.p (critical point) vertex
-
Non-c.p vertices in each phase are independent
-
Every vertex must appear in some phase \(t_{k}=\left \lceil \frac{W(n)}{p} \right \rceil \Rightarrow T_{p}=\sum_{k=1}^D t_{k} \\ t_{k}\ is\ the\ time\ to\ exeute\ phase\ k\)
-
Aside: Floor & Ceiling Identities \(\left \lceil \frac{a}{b} \right \rceil = \left \lfloor \frac{a+b-1}{b} \right \rfloor \ ; \ \left \lfloor \frac{a}{b} \right \rfloor = \left \lceil \frac{a-b+1}{b} \right \rceil\)
\[\left \lceil \frac{a}{b} \right \rceil = \left \lfloor \frac{a-1}{b} \right \rfloor +1 \ ;\ \left \lfloor \frac{a}{b} \right \rfloor = \left \lceil \frac{a+1}{b} \right \rceil -1\]
-
-
Finish: Break execution into D phases: \(T_{p}=\sum_{k=1}^D \left \lceil \frac{W_{k}}{p} \right \rceil = \sum_{k=1}^D \left \lfloor \frac{W_{k}}{p} \right \rfloor +1 \leq \sum_{k=1}^D \frac{W_{k}-1}{p} +1\)
\[max \left\{ D(n), \left \lceil \frac{W(n)}{p} \right \rceil \right\} \ge T_{p} \leq \frac{W-D}{p}+D \ (Brent's\ theorem\ is\ the\ upper bound\ part)\]
Desiderata: Speedup, work-optimality, and weak-scalability
-
Speedup: \(\frac{best\ sequential\ time}{parallel\ time}\)
\[S_{p}(n) \equiv \frac{T_{*}(n)}{T_{p}(n)}\] \[T_{*}(n)=W_{*}(n)\] \[T_{p}(n)=f(W,D;n,p)\] -
Ideal Speedup: Linear in P
-
Work-optimality:
\(W(n)=O(W_{*}(n))\) -
Weak-scalability
-
Use Brent’s theorem: \(S_{p}(n)= \frac{W_{*}(n)}{T_{p}(n)} \ge \frac{W_{*}}{\frac {W-D}{p}+D} \\ \frac{W_{*}}{\frac {W-D}{p}+D} = \frac{p}{\frac {W}{W_{*}}+\frac{p-1}{W_{*} / D}}\)
-
Ideal: \(\frac{p-1}{W_{*} / D}=O(1)\)
-
Reality: \(p = O(\frac{W_{*}}{D}) \ or \frac{W_{*}}{p}=\Omega(D)\)
-
-
Spawn & Sync
- Spawn: keyword that is to mark either the function call or procedure call is independent.
- Sync: Sync matches any Spawn in the same frame. (note: There is always an implicit Sync before returning to the caller)