Java程序辅导

C C++ Java Python Processing编程在线培训 程序编写 软件开发 视频讲解

客服在线QQ:2653320439 微信:ittutor Email:itutor@qq.com
wx: cjtutor
QQ: 2653320439
Loop Unrolling window.onload = function () { Start.start("13em"); }; Loop unrolling is a compiler optimization applied to certain kinds of loops to reduce the frequency of branches and loop maintenance instructions. It is easily applied to sequential array processing loops where the number of iterations is known prior to execution of the loop. This web presentation first looks at how loop unrolling works. Then it examines loop unrolling applied to three example loops: an array sum loop a dot product loop a row operation loop All of these examples occur in various types of programs. The row operation loop is often a major component of scientific computations. Loop Unrolling Explained Array Sum Dot Product Row Operation The general idea of loop unrolling is to replicate the code inside a loop body a number of times. The number of copies is called the loop unrolling factor. The number of iterations is divided by the loop unrolling factor. Loop unrolling is most effective when computations involving the loop control variable can be simulated by the compiler. For example, when a loop is stepping sequentially through an array, increments to a register that points to array entries can be simulated by the compiler with changes in the displacements in load and store instructions. Loop unrolling can reduce the number of loop maintenance instruction executions by the loop unrolling factor. In effect, the computations are done by the compiler rather than being done during program execution. The loop unrolling factor does not have to exactly divide the number of iterations of the original loop. If the number of iterations is known at compile time then the compiler can add extra iterations after the unrolled loop. Otherwise it can just add a copy of the original loop. The following C code will compute the sum of the entries in a 100-entry vector A. double arraySum = 0; for (int i = 0; i < 100; i++) { arraySum += A[i]; } MIPS assembly language code initializations The code below omits the loop initializations: Initialize loop count ($7) to 100. Initialize arraySum ($f10) to 0. Initialize A[i] pointer ($5) to the base address of A. MIPS assembly language code loop3: l.d $f10, 0($5) ; $f10 ← A[i] add.d $f8, $f8, $f10 ; $f8 ← $f8 + A[i] addi $5, $5, 8 ; increment pointer for A[i] addi $7, $7, -1 ; decrement loop count test: bgtz $7, loop3 ; Continue if loop count > 0 The addi instructions in this code are loop maintenance: advancing addresses and counting iterations. Unrolled Loop This is the same code with loop unrolling with a loop unrolling factor of 4. Notice that the displacements in the loads are incremented by the compiler in the second, third, and fourth copies of the original loop body instructions. The immediate values in the addi instructions have been multiplied by 4 so that the effect of one iteration in the unrolled loop is the same as 4 iterations of the original loop. Notice that branches and loop maintenance instructions have been reduced by a factor of 4. loop3: l.d $f10, 0($5) ; iteration with displacement 0 add.d $f8, $f8, $f10 l.d $f10, 8($5) ; iteration with displacement 8 add.d $f8, $f8, $f10 l.d $f10, 16($5) ; iteration with displacement 16 add.d $f8, $f8, $f10 l.d $f10, 24($5) ; iteration with displacement 24 add.d $f8, $f8, $f10 addi $5, $5, 32 addi $7, $7, -4 test: bgtz $7, loop3 ; Continue loop if $7 > 0 The following C code will compute the dot product of two 100-entry vectors A and B. double dotProduct = 0; for (int i = 0; i < 100; i++) { dotProduct += A[i]*B[k]; } MIPS assembly language code initializations The code below omits the loop initializations: Initialize loop count ($7) to 100. Initialize dotProduct ($f10) to 0. Initialize A[i] pointer ($5) to the base address of A. Initialize B[i] pointer ($6) to the base address of B. MIPS assembly language code loop3: l.d $f10, 0($5) ; $f10 ← A[i] l.d $f12, 0($6) ; $f12 ← B[i] mul.d $f10, $f10, $f12 ; $f10 ← A[i]*B[i] add.d $f8, $f8, $f10 ; $f8 ← $f8 + A[i]*B[ki] addi $5, $5, 8 ; increment pointer for A[i] addi $6, $6, 8 ; increment pointer for B[i] addi $7, $7, -1 ; decrement loop count test: bgtz $7, loop3 ; Continue if loop count > 0 The addi instructions in this code are loop maintenance: advancing addresses and counting iterations. Unrolled Loop This is the same code with loop unrolling with a loop unrolling factor of 4. loop3: l.d $f10, 0($5) ; iteration with displacement 0 l.d $f12, 0($6) mul.d $f10, $f10, $f12 add.d $f8, $f8, $f10 l.d $f10, 8($5) ; iteration with displacement 8 l.d $f12, 8($6) mul.d $f10, $f10, $f12 add.d $f8, $f8, $f10 l.d $f10, 16($5) ; iteration with displacement 16 l.d $f12, 16($6) mul.d $f10, $f10, $f12 add.d $f8, $f8, $f10 l.d $f10, 24($5) ; iteration with displacement 24 l.d $f12, 24($6) mul.d $f10, $f10, $f12 add.d $f8, $f8, $f10 addi $5, $5, 32 addi $6, $6, 32 addi $7, $7, -4 test: bgtz $7, loop3 ; Continue loop if $7 > 0 The following C code will compute the dot product of two 100-entry vectors A and B. double dotProduct = 0; for (int i = 0; i < 100; i++) { dotProduct += A[i]*B[k]; } MIPS assembly language code initializations The code below omits the loop initializations: Initialize loop count ($7) to 100. Initialize A[i] pointer ($5) to the base address of A. Initialize B[i] pointer ($6) to the base address of B. Initialize $f0 to the constant C. MIPS assembly language code loop3: l.d $f10, 0($5) ; $f10 ← A[i] mul.d $f10, $f10, $f0 ; $f10 ← A[i]*C l.d $f12, 0($6) ; $f12 ← B[i] add.d $f12, $f12, $f10 ; $f12 ← $f12 + A[i]*C s.d $f12, 0($6) ; B[i] ← B[i] + A[i]*C addi $5, $5, 8 ; increment pointer for A[i] addi $6, $6, 8 ; increment pointer for B[i] addi $7, $7, -1 ; decrement loop count test: bgtz $7, loop3 ; Continue if loop count > 0 Unrolled Loop This is the same code with loop unrolling with a loop unrolling factor of 4. loop3: l.d $f10, 0($5) ; iteration with displacement 0 mul.d $f10, $f10, $f12 l.d $f12, 0($6) add.d $f12, $f12, $f10 s.d $f12, 0($6) l.d $f10, 8($5) ; iteration with displacement 8 mul.d $f10, $f10, $f12 l.d $f12, 8($6) add.d $f12, $f12, $f10 s.d $f12, 8($6) l.d $f10, 16($5) ; iteration with displacement 16 mul.d $f10, $f10, $f12 l.d $f12, 16($6) add.d $f12, $f12, $f10 s.d $f12, 16($6) l.d $f10, 24($5) ; iteration with displacement 24 mul.d $f10, $f10, $f12 l.d $f12, 24($6) add.d $f12, $f12, $f10 s.d $f12, 24($6) addi $5, $5, 32 addi $6, $6, 32 addi $7, $7, -4 test: bgtz $7, loop3 ; Continue loop if $7 > 0