Problem E
Parallel Analysis
Today’s computer architectures are moving away from single processing cores toward multiple cores. Due to this shift, many computer programmers are honing their parallel programming skills to take advantage of new hardware capabilities.
Writing efficient parallel programs can be difficult, but software tools can make the job easier for programmers. You have decided to write a program to help with the analysis of existing serial programs. Your program will take as input a trace of the executed instructions in a serial program and identify how fast it might run on a parallel platform with many cores.
Your program will measure the time to execute the trace on a
shared-memory parallel architecture. Assume that every
instruction executes in one clock tick, and that all cores
share a single clock. The parallel execution time is measured
from the beginning of execution until all instructions have
finished. Thus, if a serial trace has
Two instructions can be executed simultaneously if they do
not have a data dependency. Suppose instruction
Execution traces are described in a simplified format that includes just the memory references of each instruction. This format ignores the semantics of the instructions executed (e.g. add, subtract) and the data values. Each instruction reads zero or more values from memory, and produces exactly 1 value and writes it to memory.
Here is an example trace. The middle column, which would be the input to your program, shows the number of memory references followed by the references for each instruction. The read addresses are first, followed by the write address. The right column gives pseudocode to help explain what these references mean.
instruction memory references pseudocode ------------------------------------------------------- A 2 1 0 max <- x B 3 1 2 3 if (y > x) C 3 3 2 0 max <- y D 3 1 4 4 a <- a + x E 3 2 4 4 a <- a + y F 3 0 4 4 a <- a + max
In the trace given above, instruction B is not dependent on instruction A, so they can be executed in parallel (or out of order). Thus, if two cores were available, in one clock tick they could complete the first two instructions. Instructions D, E, and F must execute sequentially (due to dependencies involving memory address 4), but instruction D may start at any time. However, instruction F may not execute until instruction C has completed. Further investigation would yield other insights about data dependencies in this trace.
Input
Input contains multiple traces. The first line of input
contains an integer
Output
For each trace, print out the trace number and the minimum
parallel run time for that trace running on
Sample Input 1 | Sample Output 1 |
---|---|
4 6 2 1 0 3 1 2 3 3 3 2 0 2 1 4 2 2 4 2 0 4 7 2 0 1 2 1 2 2 2 3 2 3 4 2 4 5 2 5 6 2 6 7 8 4 0 1 2 3 3 4 5 6 2 7 8 1 9 3 10 11 12 1 13 3 14 15 16 4 17 18 19 20 6 1 0 1 1 1 2 2 0 3 2 0 4 4 0 1 2 5 |
1 3 2 7 3 1 4 2 |