## Bengal Engineering and Science University, Shibpur B. E. Part-III (CST) 5th Semester Examinations, 2011

Subject: Computer Architecture

Time: 3 hours

Paper: CS-502 Full marks: 70

## Answer any four

1 a) Describe in brief the static model of Data Flow machine architecture with 4 PEs (processing elements) in processing section and 32 instruction cell blocks in memory section. How a Dynamic Data Flow machine differs from a Static Data Flow machine?

b) Draw a data flow graph and the activity template showing the computations of c = (a+b)(a-b) \* 5

7.5

2 a) What do you mean by loop unrolling? Consider the following loop for i=0 to 1003 with increment 1 do

f();

unroll the loop by inserting 4 copies of the body f(). Comment on the provision of loop unrolling when f() is

$$A\{i+1\} = A\{i\} + B\{i\}$$

7.5

- b) (i) Explain the superscalar processor instruction execution with issue rate m = 3. What is the importance of trace scheduling in superscalar processing?
- (ii) Consider the simple code motions (reorder) noted below in (A) and (B)



Show the compensation codes for both the cases.

10

- 3. Define 4-state MESI protocol. Consider a bus based shared memory (M) system consisting of three processors P1, P2, and P3. Each processor is having a cache C1, C2, and C3 that can fit only two blocks at any given time. The shared memory M is divided into three blocks A, B, C and contain shared variables X in A, Y in B, Z in C respectively. Each block can be in any one of the MESI protocol states. Initially, X = 10, Y = 20, and Z = 40 in M. Assuming caches are initially empty and the following events are occurred
  - (1) P1 reads X,
- (2) P2 reads Z,
- (3) P2 updates X=X+50,

- (4) P1 reads Y,
- (5) P2 reads X.
- (6) P1 updates X=X-20, (7) P3 reads X
- i) show the contents of the caches and memory,
- ii) show the states of blocks A and B at different caches (C1, C2, C3), after each event. Mention clearly the assumptions taken.

17.5

- 4 a) Define the function of a 2x2 crossbar switch. Construct
  - (i) an 1 of 8 demultiplexer using a 2x2 crossbar switch,
  - (ii) a 23 x 23 delta network.

Find out the number of 2x2 switches in 1 of 2<sup>n</sup> demultiplexer and in 2<sup>n</sup>x2<sup>n</sup> delta network. 12

b) Show interconnection of an SIMD machine with 8 PEs (PE<sub>0</sub> to PE<sub>7</sub>), using shuffle-exchange interconnection network. Show how PE<sub>2</sub> and PE<sub>3</sub> can transfer the data items to PE<sub>1</sub>. 5.5

- 5 a) What do you mean by compulsory miss, conflict miss and capacity miss? If the cache size
- (volume of cache interfaced) is increased, block size remaining constant, then clarify the impact on compulsory/conflict/capacity miss.
  - (i) pseudo-associative cache can reduce the cache miss rate,
  - (ii) loop fusion can reduce the cache miss rate for the following program segment

for i=1 to 100 with increment 1 do C[i] = B[i] + A[i];

for i=1 to 100 with increment 1 do  $D[i] = B[j][1] * 2 \times A[i];$ 

Mention the assumptions taken.

10

7.5

7.5

- 6 a) Differentiate RAW, WAR and WAW data hazards. Identify the cases of possible data hazards (RAW/WAR/WAW) for execution of the following program segment in a 5-stage
  - instruction pipeline: i1: R2 <- R1 + R3

b) Show how

- i2: R4 <-- R2 + R5
- i3: R5 <- -R1 + R2
- i4: R6 <- R1 + R6
- i5: R6 <-- R7

b) Define "predict taken" and "predict-not-taken" schemes. Consider the program segment:

L1: Load R1, M(R2) Sub R3, R3, R1

> Bnez R3 L1 Store R4, M(R3)

Identify the branch delay slot. Modify the original program segment by moving an instruction Ix to the delay slot. Evaluate the performance of modified program segment considering

- (i) "predict taken" scheme
- (ii) "predict-not-taken scheme.

10

- 7. Write short notes on the following
  - n the following 10+7.5
  - i) Data speculation
  - ii) In-order issue out-of-order completion policy