41.4k views
4 votes
Consider two different implementations of the RISC-V instruction set architecture. P1 has a clock rate of 4.0 GHz and CPIs of 1, 5, and 4 for ALU, load/store and branch instructions. P2 has a clock rate of 3.6 GHz and CPIs of 2, 3, and 3 for the three classes of instructions. Given a program with a dynamic instruction count of 0E6 instructions divided into classes as follows: 30% ALU class, 40% load/store class, 30% branch class.

1) (5%) What is the global CPI for each implementation? Which implementation is faster?
2) (5%) For P1, if we can improve the CPU design so load/store instruction can have only 1 CPI. For the other two classes of instruction, CPI are the same, which are 1 and 4 respectively. What is the new clock rate of P1 if we want improve the performance of the program by 100% (half of the execution time of PI before the improvement)

User Hkariti
by
8.3k points

1 Answer

2 votes

Final answer:

The global CPI for P1 and P2 are 3.5 and 2.7, respectively. P2 is faster due to a lower execution time. After improving P1's load/store CPI to 1 and doubling the clock rate to 8 GHz, performance is improved by 100%.

Step-by-step explanation:

To determine the global CPI for each implementation, we take the average of the CPIs weighted by the percentage of their occurrences in the program's operation classes for implementations P1 and P2.

For P1, the global CPI can be calculated as follows:

  • (30% ALU × 1 CPI) + (40% load/store × 5 CPI) + (30% branch × 4 CPI)
  • =(0.3 × 1) + (0.4 × 5) + (0.3 × 4)
  • =0.3 + 2 + 1.2
  • =3.5 CPI

For P2, the global CPI is:

  • (30% ALU × 2 CPI) + (40% load/store × 3 CPI) + (30% branch × 3 CPI)
  • =(0.3 × 2) + (0.4 × 3) + (0.3 × 3)
  • =0.6 + 1.2 + 0.9
  • =2.7 CPI

To compare which implementation is faster, we can calculate the execution time for the program on each:

  • For P1: Execution Time P1 = (Global CPI P1 / Clock Rate P1) = (3.5 CPI / 4.0 GHz)
  • For P2: Execution Time P2 = (Global CPI P2 / Clock Rate P2) = (2.7 CPI / 3.6 GHz)

The implementation with the lower execution time is faster.

If we want to improve the performance of P1 by 100%, thereby halving the execution time, we first calculate the new global CPI after improving load/store to 1 CPI:

  • New Global CPI P1 = (0.3 × 1) + (0.4 × 1) + (0.3 × 4) = 0.3 + 0.4 + 1.2 = 1.9 CPI

To achieve the performance improvement, the clock rate should be doubled since Performance ≈ 1 / (CPI × Clock Cycle Time). Thus, if the original clock rate was 4.0 GHz, the new clock rate must be:

  • New Clock Rate P1 = Old Clock Rate P1 × Performance Improvement Factor = 4.0 GHz × 2 = 8.0 GHz

User Dmodulus
by
8.2k points