Computer Science Courses
Related sites
Lab 8 |
The Guest Lecture in Week Twelve will introduce the concept of a virtual machine, of which a Java Virtual Machine is one of the most prominent example. What challenges do developers face when trying to improve the performance of a virtual machine? What is a garbage collector and what role does it play in operations of JVM? What is µVM (developed in the Research School of Computer Sciences,ANU)? These and other questions will be the subject of the lecture.Learning the basic of JVM code execution will help you to prepare forthe above discussions.
To get an idea how a Java Virtual Machine works, one has to understand the structure of bytecode classes which are generated by the compiler, loaded in and executed by a JVM. Creating a new JVM language or developing a bytecode processing tool (like JRebel), require good knowledge of how a JVM operates, and what are its fundamental constraints.
Note: exercises in the lab are based on talks and other presentations which were made in recent years by the developers from the company "Zero Turnaround", the maker of a well known productivity tool JRebel. A particular credit goes to one of their leading engineer Anton Arhipov. Anton himself acknowledges the influence by an article written for the IBM Developer Works online magazine in 2001 by Peter Haggar, the author of an undeservedly forgotten book Practical Java.
Note: the following is a rather long set of exploratoryexercises which may appear a bit dry and mechanical. The reward will be a better grasp of important concepts which are to be discussed at the Guest lecture. Additional references to study bytecodeis the Wlikipaedia article, and ("old but gold") Peter Haggar's 2001 article in the IBM's online journal DeveloperWorks.
Type in (or paste-and-copy) the following simple Java code for theclass Foo.java
:
public class Foo { private String bar; public String getBar(){ return bar; } public void setBar(String bar) { this.bar = bar; }}
compile it, and then use the JDK disassembler command javap
to examine the generated bytecode:
% javap -c FooCompiled from "Foo.java"public class Foo { public Foo(); Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."":()V 4: return public java.lang.String getBar(); Code: 0: aload_0 1: getfield #2 // Field bar:Ljava/lang/String; 4: areturn public void setBar(java.lang.String); Code: 0: aload_0 1: aload_1 2: putfield #2 // Field bar:Ljava/lang/String; 5: return}
observe the correspondence between the source code and the bytecode. Notice how the code representing the constructor and two methods is described in the bytecode: it consists of a number of specially named operations, the Java bytecode instructions.
Ponder the following questions:
aload
commands (they load a reference onto the stack from a local variable) is one bigger then the number of arguments the corresponding constructor or methods take?aload
? (to answer, experiment — edit the source file and change the type of bar
field to int
or another primitive type, and the signature of the setBar
and getBar
methods; recompile and display the byte code again, and observe the difference;)In the bytecode listing above, they are represented bythe mnemonics (mnemonic names), when they are described by thehexadecimal (or binary) numbers they are called opcodes.You can see the opcodes if you display the .class
file in ahex-editor, or simply use the hexdump
command (the first columnisn't the bytecode, it's the byte-counter, in hex, of course):
% hexdump Foo.class0000000 ca fe ba be 00 00 00 34 00 15 0a 00 04 00 11 090000010 00 03 00 12 07 00 13 07 00 14 01 00 03 62 61 720000020 01 00 12 4c 6a 61 76 61 2f 6c 61 6e 67 2f 53 740000030 72 69 6e 67 3b 01 00 06 3c 69 6e 69 74 3e 01 000000040 03 28 29 56 01 00 04 43 6f 64 65 01 00 0f 4c 690000050 6e 65 4e 75 6d 62 65 72 54 61 62 6c 65 01 00 060000060 67 65 74 42 61 72 01 00 14 28 29 4c 6a 61 76 610000070 2f 6c 61 6e 67 2f 53 74 72 69 6e 67 3b 01 00 060000080 73 65 74 42 61 72 01 00 15 28 4c 6a 61 76 61 2f0000090 6c 61 6e 67 2f 53 74 72 69 6e 67 3b 29 56 01 0000000a0 0a 53 6f 75 72 63 65 46 69 6c 65 01 00 08 46 6f00000b0 6f 2e 6a 61 76 61 0c 00 07 00 08 0c 00 05 00 0600000c0 01 00 03 46 6f 6f 01 00 10 6a 61 76 61 2f 6c 6100000d0 6e 67 2f 4f 62 6a 65 63 74 00 21 00 03 00 04 0000000e0 00 00 01 00 02 00 05 00 06 00 00 00 03 00 01 0000000f0 07 00 08 00 01 00 09 00 00 00 1d 00 01 00 01 000000100 00 00 05 2a b7 00 01 b1 00 00 00 01 00 0a 00 000000110 00 06 00 01 00 00 00 01 00 01 00 0b 00 0c 00 010000120 00 09 00 00 00 1d 00 01 00 01 00 00 00 05 2a b40000130 00 02 b0 00 00 00 01 00 0a 00 00 00 06 00 01 000000140 00 00 05 00 01 00 0d 00 0e 00 01 00 09 00 00 000000150 22 00 02 00 02 00 00 00 06 2a 2b b5 00 02 b1 000000160 00 00 01 00 0a 00 00 00 0a 00 02 00 00 00 09 000000170 05 00 0a 00 01 00 0f 00 00 00 02 00 10
where a pair of hex-digits (a byte) is translatable to an opcode.Consult the nomenclature of the opcodes from the Wikipedia page and find allplaces in the hexdump
which represent a load
operation.
Three instructions above take operands with the hash-prefix (#1, #2
). They refer to values from the constants pool of the class. To display this information, run the javap
command with additional options -verbose
(to see all the details):
% javap -verbose -s Foo....
(the output is not included because it's a bit long, but you do get it!)
The meaning of #
-ed operands become clear:
#2 = Fieldref #3.#18 // Foo.bar:Ljava/lang/String;
which refers to other two
#3 = Class #19 // Foo#18 = NameAndType #5:#6 // bar:Ljava/lang/String;
which, in turn, refer to others and so on.
Finally, observe how every opcode is marked with a number ( 0: aload_0
inside Foo
block). These numbers mark the JVM frames.
JVM is a stack-based machine — it is both simple, efficient and easy to program, and has certain fundamental limitations which affect what languages compilable to the Java bytecode can and cannot do (eg, the limit on the number of recursive calls is directly related to the stack architecture of the JVM, as the inability to perform the so-called recursion tail-call optimisation). Each thread has a JVM stack which stores frames.Each time a method is invoked a new stack frame is created. The frame consists of an operand stack, an array of local variables, and a reference to the runtime constant pool of the class of the current method.
The array of local variables is determined at compile time; it depends on the number (and size) of the method local variables and its formal parameters. The frame stack is (as the name suggests) Last-In-First-Out (LIFO) data structure, controlled by operations push
and pop
. The size of the operand stack is also compile-time dependent. Some opcode instructions push values onto the operand stack; others take operands from the stack, manipulate them, and push theresult. The operand stack is also used to receive return values from methods.
public String getBar(){ return bar; }public java.lang.String getBar(); Code: 0: aload_0 1: getfield #2; //Field bar:Ljava/lang/String; 4: areturn
There are three opcode instructions here:
aload_0
pushes the value from index 0 of the local variable table onto the operand stack; it is always stored at 0-location of the localvariables table for constructor and instance methodsgetfield
fetches a field from an objectareturn
(you've guessed it) returns a reference from a methodOne can see the bytecode array representing a particular method. From the hexdum
above, the getBar()
code is found on the lines:
....0000120 00 09 00 00 00 1d 00 01 00 01 00 00 00 05 2a b40000130 00 02 b0 00 00 00 01 00 0a 00 00 00 06 00 01 00....
Consult again the optcode table from to make sense out this.The byte code has three instructions, but the byte array has five elements, because the getfield
(b4
) requires 2 parameters to be supplied (00 02), and those parameters occupy positions 2 and 3 in the array, hence the array size is 5 and areturn
instruction is shifted to the position 4.
To observe handling of local variables, consider another example:
public class Example { public int plus(int a){ int b = 1; return a + b; }}
There're two local variables — the method parameter int a
and int b
; The bytecode looks as follows:
public int plus(int); Code: Stack=2, Locals=3, Args_size=2 0: iconst_1 1: istore_2 2: iload_1 3: iload_2 4: iadd 5: ireturn LineNumberTable: line 5: 0 line 6: 2 LocalVariableTable: Start Length Slot Name Signature 0 6 0 this LExample; 0 6 1 a I 2 4 2 b I
As one can see, the method loads constant 1 with iconst_1
and stores it in a local variable number 2 with istore_2
. The local variables table shows that slot number 2 is occupied by the variable name b. After that, iload_1
loads value of a to the stack, iload_2
loads value of b; iadd
pops 2 operands from the stack, adds themem, and stores the value back to return the value from themethod.
The exceptions treatment in bytecode has some peculiarities. A simplecode with try-catch-finally
blocks:
public class ExceptionExample { public void foo(){ try { tryMethod(); } catch (Exception e) { catchMethod(); } finally { finallyMethod(); } } private void tryMethod() throws Exception{ ;} private void catchMethod() { ;} private void finallyMethod(){ ;}}
Will generate the bytecode (only foo()
is shown):
public void foo(); Code: 0: aload_0 1: invokespecial #2; //Method tryMethod:()V 4: aload_0 5: invokespecial #3; //Method finallyMethod:()V 8: goto 30 11: astore_1 12: aload_0 13: invokespecial #5; //Method catchMethod:()V 16: aload_0 17: invokespecial #3; //Method finallyMethod:()V 20: goto 30 23: astore_2 24: aload_0 25: invokespecial #3; //Method finallyMethod:()V 28: aload_2 29: athrow 30: return Exception table: from to target type 0 4 11 Class java/lang/Exception 0 4 23 any 11 16 23 any 23 24 23 any
We see that the compiler has generated the code for all the scenarios possible within the try-catch-finally
execution — the call for finallyMethod()
was inferred 3 times. The try
block is compiled just as it would be if the try
were not present and merged with finally
:
0: aload_01: invokespecial #2; //Method tryMethod:()V4: aload_05: invokespecial #3; //Method finallyMethod:()V
If the block executes successfully, the the goto
instruction will lead the execution to the position 30 which is the return
opcode.
If tryMethod
throws an instance of Exception, the first (innermost) applicable exception handler in the exception table is chosen to handle the exception. From the exception table one can see that the position to proceed with the exception handling is 11:
0 4 11 Class java/lang/Exception
which leads to the executions of catchMethod()
and finallyMethod()
:
11: astore_112: aload_013: invokespecial #5; //Method catchMethod:()V16: aload_017: invokespecial #3; //Method finallyMethod:()V
The instructions which follow the position 23:
23: astore_224: aload_025: invokespecial #3; //Method finallyMethod:()V28: aload_229: athrow30: return
The method finallyMethod()
is always executed, with aload_2
and athrow
to rise the handled exceptionwhich remains non-handled.
Knowledge of the JVM bytecode is essential for optimisation and for creating(and using) various bytecode manipulation tools and libraries. To name a few:
One of the simplest libraries to manipulate bytecode is Javassist.If you feel interested, try using it to manipulate the bytecode of a simplefactorial program which, at the code level, is written recursively (seethe lecture for details). It could be converted to an ordinary loop if Java were able to perform the tail-call optimisation (which it cannot). Download (git clone
) the library (a small jar
-file), and study three (relatively short) tutorials (they are in the tutorial
directory of the main repository), and of course use the Javassist API, to perform a tail-call optimisation at the bytecode level (ie, after compilingthe Factorial.java
, operate only on its bytecode). Despite itmay sound arduous, such exercise, in fact, isn't very time and/or effort demanding. Your reward will be the removal of the call stack limitation for aprogram which you may not have as a source code (really, after compilation, delete, or safely hide somewhere the .java
file, andwork only with .class
bytecode).
Lab 8 |
Updated: Sun 12 Jun 2016 17:27:37 AEST • Responsible Officer: JavaScript must be enabled to display this email address. • Page Contact:
+61 2 6125 5111
The Australian National University, Canberra
CRICOS Provider : 00120C ABN : 52 234 063 906