5.4. MAL: MIPS Assembly Language 5.4. MAL: MIPS Assembly Language Prev Chapter 5. The MIPS Architecture Next 5.4. MAL: MIPS Assembly Language 5.4.1. A Simple Example Program
#########################################################################
# Description:
# Simple example program
#
# Modification history:
# Date Name Modification
# 2010-12-16 Jason Bacon Begin
#########################################################################
#########################################################################
# System call constants
#########################################################################
SYS_PRINT_STRING = 4
#########################################################################
# Main program
#########################################################################
# Variables for main
.data
main_hello: .asciiz "Hello, world!\n"
# Main body
.text
main:
li $v0, SYS_PRINT_STRING
la $a0, main_hello
syscall
# Return to calling program
jr $ra
5.4.2. Memory Segments In Von Neumann architectures, such as the MIPS, both the program (machine code) and data reside in the same memory while the program is running. In MAL, we must mark each part of the program as text (code) or data using the .text and .data directives.
# Main program
.data
# Variables for main
.text
# Main body
ret
The .text segments can contain only instructions, while the .data segments contain only variable definitions. 5.4.3. MAL Instruction Format An instruction in MAL source code has the following structure:
label: opcode/directive operand[, operand[, operand]] # comment
MAL, like most assembly languages is line-oriented. With the exception of the label, most components of an instruction must be on the same line, and the end of the line marks the end of the instruction. 5.4.4. Comments A comment is anything from a “#” to the end of the line.
# This is a comment
Block comments are simply formed from multiple line comments.
###############################################################
# This is a block comment
###############################################################
5.4.5. Labels Note The label portion of a statement must begin in column 1 and must end with a ':'. The ':' is not part of the label. It only serves to visually distinguish a new label definition from other program elements. Each label represents a memory address in assembly language. It could be the address of data or the address of an instruction (i.e. labels can appear in both .text and .data sections). A label represents the address of the instruction or data element that immediately follows it, whether it follows on the same line or a subsequent line. Labels defined in a .data section are like variable names in HLLs, and follow the same naming rules. They must begin with a letter or underscore, and can contains letters, underscores, and digits. Caution Variable names cannot be keywords such as MAL opcodes. Labels defined in a .text section represent the address of an instruction, and are used as arguments by jump and branch instructions to "go to" that instruction. 5.4.6. Directives See appendix A for full listing. Directives do not represent machine instructions, hence they are not executed at run-time. They direct the assembler to do something while translating the program to machine language, such as allocate space for a variable and give it an initial value, which it will have when the program begins executing. Directives in MAL can be distinguished from instructions by the fact that they begin with a '.'. Directives must be indented (they cannot begin in column 1). Data allocation directives are somewhat like variable definitions in high-level languages, but are not quite as meaningful. They usually, but not necessarily, follow a newly defined label (variable) and must be followed by one or more initial values.
.data
age: .word 30 # 32-bit word initialized with decimal
gpa: .half 0x10 # 16-bit word initialized with hex
nl: .byte 012 # 8-bit word initialized with octal
.align 2 # Aligns next element to multiple of 2^2
height: .word 70
gpa: .float 3.65 # 32-bit floating point
# Lookup-table for factorials
fact: .word 1, 1, 2, 6, 24, 120
# Uninitialized space for a keyboard buffer
keyb: .space 1024 # 1 KB buffer
# Null-terminated string
hello: .asciiz "Hello, world!\n"
# Non-terminated string. Not very useful unless followed
# by a null-terminated string.
bug: .ascii "This string is a time bomb without a null byte"
bigmsg: .ascii "This string is part of a longer message that"
.asciiz "was too big to fit on a single line.\n"
.text
Table 5.4. Data Allocation Directives Directive Type .word 32-bit integer .half 16-bit integer .byte 8-bit integer .float 32-bit IEEE floating point .double 64-bit IEEE floating point .space Uninitialized memory block .ascii ASCII string .asciiz Null-terminated ASCII string These directives define the type of the value stored there, but only for the sake of converting the initial value(s) to the proper format. The data type is used only during processing of the initial value, and is ignored by the rest of the program. Variables are added to the data segment in the order they appear in the assembly source. If the variable "age" above has address 4000, then gpa will be at 4004, and nl at 4006. The two strings following the label bigmsg are contiguous in memory, and therefore can be treated as one string. This method works for any data type. If you want to define an array, and the initializer doesn't fit on one line, simply repeat the directive below without another label.
# Array spread across multiple lines
bigarray: .word 4, 3, 1, 7, 9, 3, 6, 7, 2, 6
.word 8, 2, 5, 6, 0, 1, 1, 5, 8, 2
What will the address of height be? Note Variable definition directives normally go on the same line as the label, regardless of how much indentation is required. All variable definitions within a block should use the same indentation.
a_very_long_variable_name: .word 0
age: .word 0
See Chapter 7, Code Quality Standards for more information. A string defined with .ascii or .asciiz is an array of characters initialized using character literals and escape sequences such as '\n'. The .asciiz directive adds a null byte to the end of the string. For example, the following two directives are equivalent:
string: .asciiz "Hello!\n"
string: .byte 72,101,108,108,111,33,10,0
The .asciiz directive is an unnecessary convenience, since the following are equivalent:
string: .asciiz "Hello!\n"
string: .ascii "Hello!\n\0"
The .align num directive aligns the next variable on an address which is a multiple of 2num. Hence, .align 2 aligns the next variable on an address which is a multiple of 4, which is known as a word boundary. In the MIPS architecture, variables that occupy multiple bytes should not cross a word boundary. Doing so causes an unaligned memory reference, which may be an error, or may slow down memory access, depending on the details of the access. Hence, .word variables should always start on a word boundary, and .half variables should start on an address that is a multiple of 2. .double variables require 8 bytes, and therefore must cross a word boundary. However, they should always start on a word boundary so that they don't cross multiple word boundaries. 5.4.7. Constants Numeric constants in MAL use the same syntax as C, C++, and Java.
0x100 Hexadecimal integer
0100 Octal integer
100 Decimal integer
1.0 Decimal float or double
Labels can also be used as numeric constants, in which case their value is the memory address they represent. Suppose again that the variable age resides at memory address 4000:
.data
age: .word 30 # 32-bit word initialized with decimal
gpa: .half 0x10 # 16-bit word initialized with hex
nl: .byte 012 # 8-bit word initialized with octal
ptr: .word age # Same as .word 4000
The variable ptr initially contains the address of age. SPIM has the ability to define integer constants, which can be used as immediate operands:
ISO_LF = 10
li $v0, ISO_LF # Same as li $v0, 10
With very few exceptions, every constant used in a program should be given a name. This has two effects: It makes the program easier to read. Consider the following two statements, which are equivalent:
li $v0, ISO_LF
li $v0, 10
The name ISO_LF tells the reader the purpose of loading this value into $v0. The constant 10 leaves the reader wondering, unless they study the surrounding code and decipher what it's doing. A programmer will look at hundreds or thousands of such statements in a typical day, so the ability to understand them quickly and easily is paramount. Even 10 seconds examining code is too long. Make it obvious at a glance whenever possible. Naming constants makes it easy and safe to change them. Suppose we use the constant 5.25, which represents the state sales tax rate, in several places in the program. If a law is passed changing the tax rate, we will have to carefully go through the code and change all of them. We cannot simply change every occurrence of 5.25, because some of them might not represent the tax rate. We must carefully examine each one and decide whether to change it. This is a lot of work, and a great opportunity to introduce bugs into the code. If, instead, we define a constant called STATE_TAX and use this throughout the program instead, then our code is more readable, and only ONE change to the code is necessary if the tax rate changes. 5.4.8. Instructions The complete MAL instruction set is shown in Appendix C and Appendix D. Note All instructions must be indented. A single tab (column 8) is typical for opcodes and directives. These are normally placed below labels to avoid excessive indentation of code, and leave room for comments to the right of the instruction.
a_long_label:
addi $t0, $t5, $s2 # $t0 = $t5 + $s2
By convention, the first operand is the destination for MAL instructions. This makes the operands appear in the same order they would in a HLL assignment statement. A small number of instructions, such as the store instructions, violate this convention. Integer Arithmetic and Logic Instructions Below are some sample integer arithmetic instructions. Most of these instructions can only use CPU registers for operands (source and destination). A few instructions take one immediate value as a source operand. In any case, arithmetic instructions such as add, sub, mul, etc. assume that the register contains a binary integer (either unsigned or two's complement). Logic instructions such as and, or, xor, etc. treat the register contents as independent bits, so the binary encoding is irrelevant to the instruction.
add $t0, $t1, $t2 # $t0 = $t1 + $t2 Exception for signed overflow
addu $t0, $t1, $t2 # $t0 = $t1 + $t2 No exception
sub $s0, $s0, $t4 # $s0 = $s0 - $t4
mul $t0, $a0, $s3 # $t0 = $a0 * $s3
move $t0, $a0 # $t0 = $a0
addi $t0, $t2, 4 # $t0 = $t2 + 4
ori $t0, $t0, 0x000000ff # Set rightmost 8 bits to 1
andi $t0, $t0, 0xffff0000 # Clear rightmost 16 bits
Integer Load and Store Instructions Load and store instructions are the only instructions that can access memory. Their purpose is to move data between a memory location and a CPU register. They are not interchangeable with the move instruction. Load and store instructions take one register operand and one memory address or immediate operand.
lw $t0, label # Load 32-bit word at label to $t0
lb $t0, label # Load sign-extended byte at label to $t0
lbu $t0, label # Load 0-extended byte at label to $t0
lh $t0, label # Load sign-extended halfword (16 bits)
lhu $t0, label # Load 0-extended halfword (16 bits)
li $t0, value # Load 32-bit constant to $t0
# Pseudo-instruction for lui + ori
# Immediate values are 16 bits!
# Same syntax as C/C++/Java for bases.
la $t0, label # Load ADDRESS of label
sw $t0, label # The only instruction that has destination last!
Jump and Branch Instructions Jump and branch instructions take an address within a .text segment as the target address. Conditional branch instructions also take two register operands to compare.
j label # Unconditional jump
beq $t0, $t4, label # Branch if $t0 == $t4
blt $s4, $a0, label # Branch if $s4 < $a0
Floating Point Instructions Caution Britton's book lists mdc1 as the instruction to move a value from coprocessor 1, while the correct instruction in SPIM is mfc1. It also shows both single and double store instructions as s.d, whereas the single should be s.s. Floating point operations are carried out by a separate set of instructions and operate on a separate register file, all of which are part of the floating point coprocessor (an extension to the base CPU). There are single precision instructions, which end in “.s”, and double precision instructions which end in “.d”. The MIPS floating point coprocessor has 32 32-bit registers, called $f0 through $f31. Double precision instructions use pairs of adjacent registers such as $f0 and $f1, $f2 and $f3, etc. Therefore, the arguments to double precision instructions must be even-numbered registers. The odd-numbered register that follows is assumed to be the other half of the operand.
.data
x: .double 1.0
y: .float 2.5
.text
l.d $f0, x # $f0,$f1 ← x
l.d $f1, x # Error! Must use even registers with double
l.s $f1, y # $f1 ← y
add.s $f0, $f1, $f7 # $f0 ← $f1 + $f7
add.d $f0, $f2, $f6 # $f0,$f1 ← $f2,$f3 + $f6,$f7
c.lt.d $f0, $f4 # if $f0,$f1 < $f4,$f5 then
# condition flag ← true
# else
# condition flag ← false
bc1t label # Branch if condition flag is true
bc1f label # Branch if condition flag is false
Data type conversions involve two kinds of instructions: Conversion instructions change the binary format of the data within the floating point coprocessor. Conversion instructions support converting between word (32-bit two's complement), single (32-bit IEEE floating point), and double (64-bit IEEE floating point). Hence, a floating point register, at any given moment, may contain an integer, a single precision floating point value, or half of a double precision floating point value. It is the programmer's responsibility to keep track! Transfer instructions for moving data between the floating point coprocessor registers ($f0, $f1, ...) and either CPU registers ($t0, $s3, $a1, ...) or memory.
# Convert integer to double
mtc1 $t0, $f0 # $f0 ← $t0 (no format change!)
# $f0 now contains an integer!
cvt.d.w $f0, $f0 # $f0,$f1 ← (double)$f0
# Another way to convert integer to double
lwc1 $f5, intvar # Load integer into $f5 (no format change!)
cvt.d.w $f0, $f5 # $f0,$f1 ← (double)$f5
# Another way to convert integer to double
# Note that load and store instructions are oblivious to
# the binary data format (they only need to know how many bits
# to transfer). Hence, l.s (load single) does the same
# thing as lwc1 (load word to coprocessor 1). The following
# will work, but is misleading. Using lwc1 is better
# self-documentation.
l.s $f5, intvar # Load integer into $f5 (no format change!)
cvt.d.w $f0, $f5 # $f0,$f1 ← (double)$f5
# Convert double to integer
cvt.w.d $f3, $f4 # $f3 ← (int)$f4,$f5
# $f3 now contains an integer!
mfc1 $t0, $f3 # $t0 ← $f3 (no format change!)
# Another way to convert double to integer
cvt.w.d $f3, $f4 # $f3 ← (int)$f4,$f5
# $f3 now contains an integer!
swc1 $f3, intvar # Store integer from $f3 to mem
# Convert single precision to double precision
cvt.d.s $f0, $f3 # $f0,$f1 ← (double)$f3
Data Types Assembly language is typeless. This means data type is determined by the instruction, not the variable. The type (.word, .half, .byte) used to define the variable only tells the assembler how much memory to allocate, and to what binary format to convert the initial values in the directive. It has no impact on any other part of the program. Machine instructions are aware only of the location of an operand by the instruction code, and they assume the binary data format. For example, and "add" instruction assumes all operands are binary integers in either unsigned or two's complement format, and it is the programmer's responsibility to ensure that integer data is stored in them. Likewise, an "add.s" instruction assumes that operands are 32-bit floating point values.
.data
# Allocate 32 bits and store 3.5 there in floating point format.
gpa: .float 3.5
.text
lw $a0, gpa # Copies 32 bits, format is irrelevant
addi $a0, $a0, 1 # Treats $a0 as an integer!
The value 3.5 is stored in IEEE floating point format, which contains a sign bit, an 8-bit bias-127 exponent, and a 23-bit fractional binary mantissa. The IEEE binary encoding of 3.5 looks like this:
01000000011000000000000000000000
Naturally, taking this 32-bit package and interpreting it as an integer is going to produce garbage. In fact, the decimal value when viewed as an integer is 1080033280. Adding one to it yields 01000000011000000000000000000001, which is 1080033281 if viewed as an integer, or 3.500000238 if viewed as IEEE floating point. Pseudo-instructions Some MAL instructions do not correspond to real machine instructions. This is one of the minor benefits of assembly language over machine language. These instructions are known as pseudo instructions or macro instructions.
move $t0, $t1 # addi $t0, $zero, $t1
li $t0, 50 # ori $t0, $zero, 50
la $t0, label # lui $at, upper 16 bits of address
# ori $t0, $at, lower 16 bits of address
mul $s0, $a0, $t4 # mult $a0, $t4
# mflo $s0
# High word is discarded!
The mul and div instructions in MAL are pseudo-instructions with limited capability. The product of two 32-bit values can require up to 64 bits, so a mul instruction cannot always capture the complete result. The MIPS uses special registers for multiplication and division.
mult $a1, $s1 # Stores results in "high" and "low" registers
mfhi $v0 # Move low 32 bits to $v0
mflo $v1 # Move high 32 bits to $v1
div $a1, $s1 # Stores quotient in
mflo $v0 # Quotient
mfhi $v1 # Remainder
5.4.9. Input/Output: System Calls Input and output in SPIM is performed by system calls, which are calls to operating system kernel subprograms. To initiate a system call, we load the system call function code into $v0. If the function requires arguments, they go into the argument registers, starting with $a0. The system call function codes are as follows: Table 5.5. Syscall Codes Function Code in $v0 Argument or Return Value PRINT_INT 1 $a0 = value PRINT_FLOAT 2 $f12 = value PRINT_DOUBLE 3 $f12 = value PRINT_STRING 4 $a0 = address of string READ_INT 5 Result placed in $v0 READ_FLOAT 6 Result placed in $f0 READ_DOUBLE 7 Result placed in $f0 READ_STRING 8 $a0 = address, $a1 = maximum length SBRK (Mem allocate) 9 $a0 = number of bytes EXIT 10 None PRINT_CHAR 11 $a0 low byte = character READ_CHAR 12 Character returned in low byte of $v0 OPEN_FILE 13 $a0 = address of filename, $a1 = flags, $a2 = mode. File descriptor returned in $v0. (negative if error occurred) READ 14 $a0 = file descriptor, $a1 = address of buffer, $a2 = buffer length. Number of characters actually read returned in $v0. WRITE 15 $a0 = file descriptor, $a1 = address of buffer, $a2 = number of bytes to write. Number of bytes actually written returned in $v0. CLOSE 16 $a0 = file descriptor EXIT2 17 $a0 = exit code Note The system call facility in SPIM is poorly implemented. They do not provide any error indication for bad input, so there is no way of knowing where a user entered valid data. In addition, many of the system calls should not be system calls to begin with, but would be better handled as user-level subprograms. Caution Note that the syscall instruction calls a subprogram, and subprograms are not obligated to restore the values of temporary registers. Hence, you cannot assume that the contents of temporary registers will be the same following a syscall. 5.4.10. A Complete Example Program
#########################################################################
# Description:
#
# Modification history:
# Date Name Modification
# 2010-12-16 Jason Bacon Begin
#########################################################################
#########################################################################
# System call constants
#########################################################################
SYS_PRINT_INT = 1
SYS_PRINT_FLOAT = 2
SYS_PRINT_DOUBLE = 3
SYS_PRINT_STRING = 4
SYS_READ_INT = 5
SYS_READ_FLOAT = 6
SYS_READ_DOUBLE = 7
SYS_READ_STRING = 8
SYS_SBRK = 9
SYS_EXIT = 10
SYS_PRINT_CHAR = 11
SYS_READ_CHAR = 12
#########################################################################
# Main program
#########################################################################
# Variables for main
.data
.align 2 # Put next label on a word boundary
length: .word 0
width: .word 0
area: .word 0
length_prompt: .asciiz "Please enter the length of the rectangle: "
width_prompt: .asciiz "Please enter the width of the rectangle: "
area_msg: .asciiz "The area of the rectangle is "
period_nl: .asciiz ".\n"
# Main body
.text
main:
# Input length
li $v0, SYS_PRINT_STRING
la $a0, length_prompt
syscall
li $v0, SYS_READ_INT
syscall
sw $v0, length
# Input width
li $v0, SYS_PRINT_STRING
la $a0, width_prompt
syscall
li $v0, SYS_READ_INT
syscall
sw $v0, width
# Compute area
lw $t0, length
lw $t1, width
mul $t0, $t0, $t1
sw $t0, area
# Print area
li $v0, SYS_PRINT_STRING
la $a0, area_msg
syscall
li $v0, SYS_PRINT_INT
lw $a0, area
syscall
li $v0, SYS_PRINT_STRING
la $a0, period_nl
syscall
# Return to calling program
jr $ra
5.4.11. Addressing Modes An addressing mode is simply a way of specifying a memory address. MAL offers several addressing modes for use in load and store instructions. The effective address is the actual address of the data. Understanding addressing modes is a matter of understanding how effective addresses are calculated. Table 5.6. Addressing Modes Mode Example Effective address Memory Direct lw $t0, age Address represented by label Register Indirect lw $t0, ($s3) Contents of register in () Immediate Offset lw $t0, 4($s3) Contents of register in () + offset Symbol Offset lw $t0, list($s3) Contents of register in () + address of label Symbol + Immediate lw $t0, list+4 Address of label + offset Symbol + Immediate Offset lw $t0, list+4($s3) Address of label + offset + contents of register in () Immediate addressing can also be considered an addressing mode, but it is special in that it doesn't specify an address, but instead embeds the operand in the instruction code itself. Examples:
Assume $t0 contains 1000
Address of "list" is 2000
lw $a0, ($t0) $a0 ← contents of address 1000
lw $a0, 4($t0) $a0 ← contents of address 1004
lw $a0, list $a0 ← contents of address 2000
la $a0, list $a0 ← 2000
lw $a0, list+4 $a0 ← contents of address 2004
lw $a0, list($t0) $a0 ← contents of address 3000
lw $a0, list+4($t0) $a0 ← contents of address 3004
li $a0, 5 $a0 ← 5
Interestingly, only "base address plus displacement" (a.k.a. offset) mode is actually implemented in the MIPS hardware. The assembler (or simulator) translates all other modes into offset mode. For example, since the assembler knows the address of every label, it can easily perform additions such as "list+4". Direct addressing such as "age" can be represented as age($zero). This illustrates some of the advantages of assembly language over machine language, besides just being mnemonic. Prev Up Next 5.3. The MIPS Register Files Home 5.5. Architecture Classifications