HPC - High Performance Computation

Cost model - 3 assumptions
1. same speed
2. 1 operation = 1 unit of time
3. no edge cost
Sequential DAG vs Tree-based DAG
- Sequential - O(n) time
- Tree - O(logn) time
Work: W(n) = work (n nodes)
Span: D(n) = depth (vertices on the longest path - critical path)
Average available parallelism: \(\frac{W(n)}{D(n)}\)
Span Law: \(T_{p}(n) \ge D(n) \\p \ is\ the\ blocks\ in\ PRAM\ Model\\)
Work Law: \(T_{p}(n) \ge \left \lceil \frac{W(n)}{p} \right \rceil\)
(Compined) Work - Span Law: \(T_{p}(n) \ge max \left\{ D(n), \left \lceil \frac{W(n)}{p} \right \rceil \right\}\)

\[T_{p}(n) \ge \frac{W-D}{p}+D\]

Setup: Break execution into phases:
- Each phase has 1 c.p (critical point) vertex
- Non-c.p vertices in each phase are independent
- Every vertex must appear in some phase \(t_{k}=\left \lceil \frac{W(n)}{p} \right \rceil \Rightarrow T_{p}=\sum_{k=1}^D t_{k} \\ t_{k}\ is\ the\ time\ to\ exeute\ phase\ k\)
- Aside: Floor & Ceiling Identities \(\left \lceil \frac{a}{b} \right \rceil = \left \lfloor \frac{a+b-1}{b} \right \rfloor \ ; \ \left \lfloor \frac{a}{b} \right \rfloor = \left \lceil \frac{a-b+1}{b} \right \rceil\)
  \[\left \lceil \frac{a}{b} \right \rceil = \left \lfloor \frac{a-1}{b} \right \rfloor +1 \ ;\ \left \lfloor \frac{a}{b} \right \rfloor = \left \lceil \frac{a+1}{b} \right \rceil -1\]
Finish: Break execution into D phases: \(T_{p}=\sum_{k=1}^D \left \lceil \frac{W_{k}}{p} \right \rceil = \sum_{k=1}^D \left \lfloor \frac{W_{k}}{p} \right \rfloor +1 \leq \sum_{k=1}^D \frac{W_{k}-1}{p} +1\)
\[max \left\{ D(n), \left \lceil \frac{W(n)}{p} \right \rceil \right\} \ge T_{p} \leq \frac{W-D}{p}+D \ (Brent's\ theorem\ is\ the\ upper bound\ part)\]

Speedup: \(\frac{best\ sequential\ time}{parallel\ time}\)
\[S_{p}(n) \equiv \frac{T_{*}(n)}{T_{p}(n)}\] \[T_{*}(n)=W_{*}(n)\] \[T_{p}(n)=f(W,D;n,p)\]
Ideal Speedup: Linear in P
- Work-optimality:
  \(W(n)=O(W_{*}(n))\)
- Weak-scalability
  - Use Brent’s theorem: \(S_{p}(n)= \frac{W_{*}(n)}{T_{p}(n)} \ge \frac{W_{*}}{\frac {W-D}{p}+D} \\ \frac{W_{*}}{\frac {W-D}{p}+D} = \frac{p}{\frac {W}{W_{*}}+\frac{p-1}{W_{*} / D}}\)
  - Ideal: \(\frac{p-1}{W_{*} / D}=O(1)\)
  - Reality: \(p = O(\frac{W_{*}}{D}) \ or \frac{W_{*}}{p}=\Omega(D)\)

Spawn: keyword that is to mark either the function call or procedure call is independent.
Sync: Sync matches any Spawn in the same frame. (note: There is always an implicit Sync before returning to the caller)

"并行计算基础"