Advanced Computer Architecture
Advanced Computer Architecture
3.1 What would be the baseline performance (in cycles, per loop iteration) of the code sequence in Figure 3.48 if no new instruction’s execution could be initiated until the previous instruction’s execution had completed? Ignore front-end fetch and decode. Assume for now that execution does not stall for lack of the next instruction, but only one instruction/cycle can be issued. Assume the branch is taken, and that there is a one-cycle branch delay slot.
3.2 Think about what latency numbers really mean—they indicate the number of cycles a given function requires to produce its output, nothing more. If the overall pipeline stalls for the latency cycles of each functional unit, then you are at least guaranteed that any pair of back-to-back instructions (a “producer” followed by a “consumer”) will execute correctly. But not all instruction pairs have a producer/consumer relationship. Sometimes two adjacent instructions have nothing to do with each other. How many cycles would the loop body in the code sequence in Figure 3.48 require if the pipeline detected true data dependencies and only stalled on those, rather than blindly stalling everything just because one functional unit is busy? Show the code with
inserted where necessary to accommodate stated latencies. (Hint: An instruction with latency +2 requires two
cycles to be inserted into the code sequence. Think of it this way: A one-cycle instruction has latency 1 + 0, meaning zero extra wait states. So, latency 1 + 1 implies one stall cycle; latency 1 + N has N extra stall cycles.
"You need a similar assignment done from scratch? Our qualified writers will help you with a guaranteed AI-free & plagiarism-free A+ quality paper, Confidentiality, Timely delivery & Livechat/phone Support.
Discount Code: CIPD30
Click ORDER NOW..


