5.14 ◆ Write a version of the inner product procedure described in Problem 5.13 that uses 6 × 1 loop unrolling. For x86-64, our measurements of the unrolled version give a CPE o…

Question

asked Sep 13, 2021 97.7k views

1 Answer

← Prev Question Next Question →

Ask a Question

Dtrunk · Answer 1 · 2021-09-17T09:24:53+0000

Answer:

(a) the number of times the value is performs is up to four cycles. and as such the integer i is executed up to 5 times. (b)The point version of the floating point can have CPE of 3.00, even when the multiplication operation required is either 4 or 5 clock.

Step-by-step explanation:

Solution

The two floating point versions can have CPEs of 3.00, even though the multiplication operation demands either 4 or 5 clock cycles by the latency suggests the total number of clock cycles needed to work the actual operation, while issues time to specify the minimum number of cycles between operations.

Now,

sum = sum + udata[i] * vdata[i]

in this case, the value of i performs from 0 to 3.

Thus,

The value of sum is denoted as,

sum = ((((sum + udata[0] * vdata[0])+(udata[1] * vdata[1]))+( udata[2] * vdata[2]))+(udata[3] * vdata[3]))

Thus,

(A)The number of times the value is executed is up to 4 cycle. And the integer i performed up to 5 times.

Thus,

(B) The floating point version can have CPE of 3.00, even though the multiplication operation required either 4 or 5 clock.

5.14 ◆ Write a version of the inner product procedure described in Problem 5.13 that uses 6 × 1 loop unrolling. For x86-64, our measurements of the unrolled version give a CPE o…

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Categories

Other Questions