137k views
1 vote
the largest configuration of a cray t90 (cray t932) has 32 processors, each capable of generating 4 loads and 2 stores per clock cycle. the processor clock cycle is 2.167 ns, while the cycle time of the srams used for the memory system is 15 ns. calculate the minimum number of memory banks required to allow all processors to run at the full memory bandwidth. suppose we have 8 memory banks with a bank busy time of 6 clocks and a total memory latency of 12 cycles. how long will it take to complete a 64-element vector load with a stride of 1? with a stride of 32?

1 Answer

2 votes

To enable all processors at full memory bandwidth on a Cray T90 with 32 processors, 205 memory banks are required. The time to complete a 64-element vector load is 180 ns (stride 1) and 5760 ns (stride 32).

To calculate the minimum number of memory banks required to allow all processors to run at the full memory bandwidth, we can use the formula:


\[ \text{Memory Bandwidth} = \frac{\text{Number of Processors} * \text{Loads per Clock Cycle} * \text{Processor Clock Cycle}}{\text{Number of Memory Banks} * \text{Cycle Time of SRAMs}} \]

Given values:

- Number of Processors = 32

- Loads per Clock Cycle = 4

- Processor Clock Cycle = 2.167 ns

- Cycle Time of SRAMs = 15 ns

Substitute these values into the formula to find the minimum number of memory banks.


\[ \text{Memory Bandwidth} = \frac{32 * 4 * 2.167}{\text{Number of Memory Banks} * 15} \]

Now, for the second part of the question:


\[ \text{Time to Complete Load} = \text{Stride} * \text{Total Memory Latency} * \text{Cycle Time of SRAMs} \]

Given values:

- Total Memory Latency = 12 cycles

- Cycle Time of SRAMs = 15 ns

For a stride of 1:


\[ \text{Time to Complete Load (Stride 1)} = 1 * 12 * 15 \]

For a stride of 32:


\[ \text{Time to Complete Load (Stride 32)} = 32 * 12 * 15 \]

You can plug in these values to calculate the respective times in nanoseconds.

Minimum Number of Memory Banks:


\[ \text{Memory Bandwidth} = \frac{32 * 4 * 2.167}{\text{Number of Memory Banks} * 15} \]

Solving for the number of memory banks:


\[ \text{Number of Memory Banks} = \frac{32 * 4 * 2.167 * 15}{\text{Memory Bandwidth}} \]

Time to Complete Load (Stride 1):


\[ \text{Time to Complete Load (Stride 1)} = 1 * 12 * 15 \]

Time to Complete Load (Stride 32):


\[ \text{Time to Complete Load (Stride 32)} = 32 * 12 * 15 \]

Let's perform the calculations:

1. Minimum Number of Memory Banks:


\[ \text{Number of Memory Banks} = \frac{32 * 4 * 2.167 * 15}{\text{Memory Bandwidth}} \]\[ \text{Number of Memory Banks} \approx (32 * 4 * 2.167 * 15)/(15) \]\[ \text{Number of Memory Banks} \approx 205 \]

2. Time to Complete Load (Stride 1):


\[ \text{Time to Complete Load (Stride 1)} = 1 * 12 * 15 \]\[ \text{Time to Complete Load (Stride 1)} = 180 \text{ ns} \]

3. Time to Complete Load (Stride 32):


\[ \text{Time to Complete Load (Stride 32)} = 32 * 12 * 15 \]\[ \text{Time to Complete Load (Stride 32)} = 5760 \text{ ns} \]

Therefore, the minimum number of memory banks required is approximately 205. The time to complete a 64-element vector load is 180 ns with a stride of 1 and 5760 ns with a stride of 32.

User Ziga Petek
by
8.7k points