Final answer:
The global CPI for P1 and P2 are 3.5 and 2.7, respectively. P2 is faster due to a lower execution time. After improving P1's load/store CPI to 1 and doubling the clock rate to 8 GHz, performance is improved by 100%.
Step-by-step explanation:
To determine the global CPI for each implementation, we take the average of the CPIs weighted by the percentage of their occurrences in the program's operation classes for implementations P1 and P2.
For P1, the global CPI can be calculated as follows:
- (30% ALU × 1 CPI) + (40% load/store × 5 CPI) + (30% branch × 4 CPI)
- =(0.3 × 1) + (0.4 × 5) + (0.3 × 4)
- =0.3 + 2 + 1.2
- =3.5 CPI
For P2, the global CPI is:
- (30% ALU × 2 CPI) + (40% load/store × 3 CPI) + (30% branch × 3 CPI)
- =(0.3 × 2) + (0.4 × 3) + (0.3 × 3)
- =0.6 + 1.2 + 0.9
- =2.7 CPI
To compare which implementation is faster, we can calculate the execution time for the program on each:
- For P1: Execution Time P1 = (Global CPI P1 / Clock Rate P1) = (3.5 CPI / 4.0 GHz)
- For P2: Execution Time P2 = (Global CPI P2 / Clock Rate P2) = (2.7 CPI / 3.6 GHz)
The implementation with the lower execution time is faster.
If we want to improve the performance of P1 by 100%, thereby halving the execution time, we first calculate the new global CPI after improving load/store to 1 CPI:
- New Global CPI P1 = (0.3 × 1) + (0.4 × 1) + (0.3 × 4) = 0.3 + 0.4 + 1.2 = 1.9 CPI
To achieve the performance improvement, the clock rate should be doubled since Performance ≈ 1 / (CPI × Clock Cycle Time). Thus, if the original clock rate was 4.0 GHz, the new clock rate must be:
- New Clock Rate P1 = Old Clock Rate P1 × Performance Improvement Factor = 4.0 GHz × 2 = 8.0 GHz