HPC - High Performance Computation

"并行计算基础"

Posted by xxc on October 31, 2018

HPC - High Performance Computation

DAG = Directed acyclic graph

PRAM = Parallel Random Access Machine)模型是多指令流多数据流(MIMD)- 内存被分配到各个格子中


Basic Knowledge

  • Cost model - 3 assumptions

    1. same speed
    2. 1 operation = 1 unit of time
    3. no edge cost
  • Sequential DAG vs Tree-based DAG

    • Sequential - O(n) time
    • Tree - O(logn) time
  • Work: W(n) = work (n nodes)

  • Span: D(n) = depth (vertices on the longest path - critical path)

  • Average available parallelism: \(\frac{W(n)}{D(n)}\)

  • Span Law: \(T_{p}(n) \ge D(n) \\p \ is\ the\ blocks\ in\ PRAM\ Model\\)

  • Work Law: \(T_{p}(n) \ge \left \lceil \frac{W(n)}{p} \right \rceil\)

  • (Compined) Work - Span Law: \(T_{p}(n) \ge max \left\{ D(n), \left \lceil \frac{W(n)}{p} \right \rceil \right\}\)

Brent’s theorem (get the upper-bound of the time consumption):
\[T_{p}(n) \ge \frac{W-D}{p}+D\]
  1. Setup: Break execution into phases:

    • Each phase has 1 c.p (critical point) vertex

    • Non-c.p vertices in each phase are independent

    • Every vertex must appear in some phase \(t_{k}=\left \lceil \frac{W(n)}{p} \right \rceil \Rightarrow T_{p}=\sum_{k=1}^D t_{k} \\ t_{k}\ is\ the\ time\ to\ exeute\ phase\ k\)

    • Aside: Floor & Ceiling Identities \(\left \lceil \frac{a}{b} \right \rceil = \left \lfloor \frac{a+b-1}{b} \right \rfloor \ ; \ \left \lfloor \frac{a}{b} \right \rfloor = \left \lceil \frac{a-b+1}{b} \right \rceil\)

      \[\left \lceil \frac{a}{b} \right \rceil = \left \lfloor \frac{a-1}{b} \right \rfloor +1 \ ;\ \left \lfloor \frac{a}{b} \right \rfloor = \left \lceil \frac{a+1}{b} \right \rceil -1\]
  2. Finish: Break execution into D phases: \(T_{p}=\sum_{k=1}^D \left \lceil \frac{W_{k}}{p} \right \rceil = \sum_{k=1}^D \left \lfloor \frac{W_{k}}{p} \right \rfloor +1 \leq \sum_{k=1}^D \frac{W_{k}-1}{p} +1\)

    \[max \left\{ D(n), \left \lceil \frac{W(n)}{p} \right \rceil \right\} \ge T_{p} \leq \frac{W-D}{p}+D \ (Brent's\ theorem\ is\ the\ upper bound\ part)\]
Desiderata: Speedup, work-optimality, and weak-scalability
  • Speedup: \(\frac{best\ sequential\ time}{parallel\ time}\)

    \[S_{p}(n) \equiv \frac{T_{*}(n)}{T_{p}(n)}\] \[T_{*}(n)=W_{*}(n)\] \[T_{p}(n)=f(W,D;n,p)\]
  • Ideal Speedup: Linear in P

    • Work-optimality:
      \(W(n)=O(W_{*}(n))\)

    • Weak-scalability

      • Use Brent’s theorem: \(S_{p}(n)= \frac{W_{*}(n)}{T_{p}(n)} \ge \frac{W_{*}}{\frac {W-D}{p}+D} \\ \frac{W_{*}}{\frac {W-D}{p}+D} = \frac{p}{\frac {W}{W_{*}}+\frac{p-1}{W_{*} / D}}\)

      • Ideal: \(\frac{p-1}{W_{*} / D}=O(1)\)

      • Reality: \(p = O(\frac{W_{*}}{D}) \ or \frac{W_{*}}{p}=\Omega(D)\)

Spawn & Sync
  • Spawn: keyword that is to mark either the function call or procedure call is independent.
  • Sync: Sync matches any Spawn in the same frame. (note: There is always an implicit Sync before returning to the caller)