CS2504, Spring'2007
©Dimitris Nikolopoulos
61
Large constants
MIPS has 16 bits in the instruction field
For large constants:
lui loads upper 16 bits of register with operand
Used to compose large 32-bit numbers
Example:
#goal = 0000 0000 0011 1101 0000 1001 0000 0000
#upper half=0000 0000 0000 0000 0000 0000 0011 1101
#or 61 decimal
lui $s0, 61
#lower half 0000 0000 0000 0000 0000 1001 0000 0000
#or 2304 decimal
ori $s0, 2304
CS2504, Spring'2007
©Dimitris Nikolopoulos
62
Large constants
Can it be done otherwise?
If use addi, addi would copy the most significant bit
(which is the sign bit) to all upper 16 bits of the
destination.
This is called sign extension
A negative operand would propagate 1's to the upper bits
Watch for automatic sign extensions in arithmetic
operations in MIPS
ori assumes that 16 higher bits of immediate operand
are zeros, therefore it does the job.
CS2504, Spring'2007
©Dimitris Nikolopoulos
63
Memory addressing
j instruction has a single constant operand
and no variants:
26-bit operand to specify memory location
6 bits still needed for opcode
Branch instructions have two register
operands (10 bits)
16-bit memory operand, can only address 216 bytes
Range -32,768,+32,767 bytes (signed offset)
Insufficient for 32-bit architectures
Need solution to address more memory
Key idea: base + offset
Register holds a base, offset indicates distance
(positive or negative from the base)
CS2504, Spring'2007
©Dimitris Nikolopoulos
64
PC-relative addressing
Base register is the program counter
MIPS-specific:
MIPS program counter actually points to next
instruction (PC+4) for efficiency purposes to be
clarified later
Offset encoded in 16 bits is actually a number of
words, not bytes (effectively extending the range
to 217 bytes, signed)
Offset is added to PC+4
Direct jump instruction(j), can address 228 bytes.
jr uses full 32-bit address stored in register
CS2504, Spring'2007
©Dimitris Nikolopoulos
65
PC relative addressing
Long jumps
j instruction has 26-bit argument, representing 28-
bit addresses
Missing 4 bits for complete address:
4 leftmost bits left untouched by branches and jumps
Program loader and linker are aware of this while
placing programs in memory
If jump has to cross boundaries set by the loader,
program must use jr with a register operand
Long-range branches?
bne $t0,$t1,L1 #L1 is far far away...
beq $t0,$t1,L2 #L2 is nearby
j L1
L2:
CS2504, Spring'2007
©Dimitris Nikolopoulos
66
MIPS Pop Quiz
What are the values of the offset fields of
the bne and the j instructions in this loop, if
the loop starts at 80000hex?
Loop: sll $t1, $s3, 2
add $t1, $t1, $s6
lw $t0, 0($t1)
bne $t0, $s5, Exit
addi $s3, $s3, 1
j Loop
Exit:
CS2504, Spring'2007
©Dimitris Nikolopoulos
67
Addressing modes
CS2504, Spring'2007
©Dimitris Nikolopoulos
68
Recap: MIPS Instruction formats
op rs rt rd shamt funct
R-format
op rs rt rd shamtim
I-format
op rs rt rd shamtad
J-format
CS2504, Spring'2007
©Dimitris Nikolopoulos
69
Homework
Learn how to decode MIPS instructions
Use table in page 103
CS2504, Spring'2007
©Dimitris Nikolopoulos
70
Decoding instruction example
00af8030hex #as in SPIM
opcode=000 000 (Rformat, bits:3126)
rs = 00101 (bits: 2521)
rt = 01111 (bits: 2016)
rd = 10000 (bits: 1511)
shamt=00000
funct=110000 (mult instruction)
mult $s0,$a1,$t7
CS2504, Spring'2007
©Dimitris Nikolopoulos
71
Translating a Program
CS2504, Spring'2007
©Dimitris Nikolopoulos
72
Assemble to simplify your life
Pseudoinstructions
Instructions composed of other MIPS assembly
instructions
move $t0, $t1 = add $t0,$zero,$t1
blt, bge, ble all use beq, bne, slt
CS2504, Spring'2007
©Dimitris Nikolopoulos
73
Assembler
Translate assembly to binary code
Binary code augmented with meta-
information
object file header, size of pieces of object file
text segment
static data segment
relocation information
symbol table (labels defined in the program)
debugging information (associates assembly
instructions with high-level language instructions)
CS2504, Spring'2007
©Dimitris Nikolopoulos
74
Linker
Primary motivation: libraries and reusable
code
Stitches together code and data modules
symbolically
Still no absolute addresses
Linker figures out new addresses of labels
Relocation information is used to figure out
positions of labels in libraries
Linker patches (does not recompile) the binary
Resolves all internal and external
references, complains otherwise
CS2504, Spring'2007
©Dimitris Nikolopoulos
75
Address resolution in MIPS
CS2504, Spring'2007
©Dimitris Nikolopoulos
76
Loader
Read executable, find size of text and data
segments
Create address space large enough to hold
text and data
Copy text and data into memory
Copy program parameters to stack
Initialize machine registers, stack pointer
Jump to start-up routine, which copies
parameters to registers and calls main
CS2504, Spring'2007
©Dimitris Nikolopoulos
77
Dynamic Linking
Library part of executable
Not good if library changes
May produce large executable although small
fraction of library is used
Dynamic linking
Attempt to link the library code at runtime, when
we need it. Furthermore, attempt to link only the
code we need, no less, no more
Concept of DLLs
CS2504, Spring'2007
©Dimitris Nikolopoulos
78
Lazy Linking
CS2504, Spring'2007
©Dimitris Nikolopoulos
79
Lazy linking explained
Program keeps a dummy routine, pointer to
dummy routine in data segment
Load pointer, jump to dummy
Dummy jumps to dynamic linker/loader
code
Dynamic linker loader locates target library
code, remaps and changes pointer of jump
in memory with new address
Voila!
CS2504, Spring'2007
©Dimitris Nikolopoulos
80
Execution in Java
CS2504, Spring'2007
©Dimitris Nikolopoulos
81
Java-specific
Java uses an interpreter (JVM)
JVM is equivalent to a hardware simulator
Interpreter helps portability
Java runs everywhere
Interpreter harms performance
Java uses JIT compilers to remedy
CS2504, Spring'2007
©Dimitris Nikolopoulos
82
Compiler Primer
CS2504, Spring'2007
©Dimitris Nikolopoulos
83
Compiler Primer
Compilers translate and optimize programs
Program representation changes
High-level language (the source, possibly with some
additional information sprinkled, machine-
independent)
One or more intermediate compiler representations
(IR)
High-level IR, close to source, mostly machine-
independent, good for source-to-source
transformations
Low-level IR, close to assembly, mostly machine-
dependent, good for architecture-specific
optimizations
CS2504, Spring'2007
©Dimitris Nikolopoulos
84
Some transformations
Procedure inlining
Reduce procedure call and argument passing
overhead
Increase code size
Loop unrolling
Reduce loop branching overhead
Increase code size
Enables pipelining and other optimizations
CS2504, Spring'2007
©Dimitris Nikolopoulos
85
Optimization example
x[i] = x[i] + 4
Address of x[i] is used twice
Naive interpretation (using virtual registers):
li R100, x
lw R101, i
sll R102,R101,2 #i offset from x base
add R103,R100,R102 #address of x[i]
lw R104,0(R103) #x[i] in R104
add R105,R104,4 # result in R105
li R106,x
lw R107,i
sll R108,R107,2
add R109,R107,R107
sw R105,0(R109)
CS2504, Spring'2007
©Dimitris Nikolopoulos
86
Optimization example
x[i] = x[i] + 4
Address of x[i] is used twice
Common expression elimination:
li R100, x
lw R101, i
sll R102,R101,2 #i offset from x base
add R103,R100,R102 #address of x[i]
lw R104,0(R103) #x[i] in R104
add R105,R104,4 # result in R105
sw R105,0(R103)
CS2504, Spring'2007
©Dimitris Nikolopoulos
87
Other optimizations
Constant propagation
Copy propagation
Dead code elimination
Data store elimination
CS2504, Spring'2007
©Dimitris Nikolopoulos
88
Compiler primer
Compilers are conservative
Can be extremely hard to verify correctness of an
optimization. If the compiler can't verify it won't
apply it
What is easy to humans may be difficult for the
compiler in some cases
The opposite is true too (try staring at optimized
assembly code and figure it out)
Pointers, dynamic allocation, other high-
level language features make optimization
difficult
Although they do increase programmer's
productivity
CS2504, Spring'2007
©Dimitris Nikolopoulos
89
Putting it all together
void swap (int v[], int k)
{
int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
CS2504, Spring'2007
©Dimitris Nikolopoulos
90
Putting it all together
swap:
#a0,a1 hold pointer to v[] and k
sll $t1,$a1,2 #multiply k*4 to find offset
add $t1,$a0,$t1 #find address of v[k]
lw $t0,0($t1) #load v[k], t0 is our temp
lw $t2,4($t1) #load v[k+1]
sw $t2,0($t1) #store *v[k+1] in v[k]
sw $t0,4($t1) #store *v[k] in v[k+1]
jr $ra
CS2504, Spring'2007
©Dimitris Nikolopoulos
91
Putting it all together
void sort (int v[], int n)
{
int i, j;
for (i=0;i=0 && v[j]>v[j+1]; j) {
swap(v,j)
}
}
}
CS2504, Spring'2007
©Dimitris Nikolopoulos
92
Deconstruct procedure
for (i=0;i= n
#loop body...
addi $s0,$s0,1 #i = i + 1
j for1tst
Putting it all together
CS2504, Spring'2007
©Dimitris Nikolopoulos
93
Deconstruct procedure
#for (j=i1;j>=0 && v[j] > v[j+1];j)
addi $s1,$s0,1 #initialize j
for2tst:
slti $t0,$s1,0 #if j < 0 exit
bne $t0,$zero,exit2
sll $t1,$s1,2 #find j offset in v
add $t2,$t1,$a0 #find address v[j]
lw $t3,0($t2) #load v[j]
lw $t4,4($t2) #load v[j+1]
slt $t0,$t4,$t3 #if v[j+1]= n
CS2504, Spring'2007
©Dimitris Nikolopoulos
98
Putting it all together - Sort
Inner loop
addi $s1,$s0,1 #initialize j
for2tst:
slti $t0,$s1,0 #if j < 0 exit
beq $t0,$zero,exit
sll $t1,$s1,2 #find j offset in v
add $t2,$t1,$a0 #find address v[j]
lw $t3,0($t2) #load v[j]
lw $t4,4($t2) #load v[j+1]
slt $t0,$t4,$t3 #if v[j+1]