Friday 15 March 2013

Biting into Bytecode

As a part of my programming goals, I am gaining a better insight to how my Java code translates to byte-code. In this post I will document my findings and figuring out what "warming up" your Java code actually means!

Please note that much of what I have learnt has been distilled from the excellent book: "The Well-Grounded Java Developer" by Ben Evans and Martijn Verburg. I highly recommend the book to anyone looking to better understand Java, the JVM and the Java ecosystem.

Java to CPU - A Journey


Here are the basic steps for how Java code ends up running on the CPU:
  1. The Java compiler (javac) takes Java files (.java) as input.
  2. Class files (.class) are produced which contain bytecode.
  3. The JVM will interpret this bytecode, executing your program.
  4. Bytecode maybe translated into native machine code to run directly on the CPU.
Although that is a very simplistic and general overview, it highlights the key points. An interesting thing to note is that Java is both a compiled and interpreted language! In fact Java is a twice compiled language, once into bytecode and potentially another time into machine code.

So Java code is translated into bytecode before it can be executed... but what exactly is bytecode?

Class Files & Bytecode - Under the Covers


Java is designed to "run anywhere", which means it must be abstracted from the underlying hardware and operating system it is running on. To achieve this, Java uses bytecode!

The most important thing to remember is that bytecode is not machine code but an abstraction of machine code. For me, bytecode reads a lot like assembly code and in essence it is the same thing, except it is non-specific to a machine architecture.

This means that the bytecode deals with fundamental operations such as loading values off the stack and jumping to specific instructions. This all comes together to represent the Java code you wrote (for reference to all the JVM bytecode operations, see the JVM spec).

However the bytecode is not always a direct translation of what was in the code. Optimisations may be made, such as when String concatenation occurs. Under the covers, javac won't produce bytecode to build a new String object directly but instead use a StringBuilder!

The implication of this is that when building simple Strings outside of loops, you don't need to manually create a StringBuilder yourself! So instead of trying thinking you have to write things like this:
String name = getName();
StringBuilder messageBuilder = new StringBuilder()
    .append("Hello ")
    .append(name)
    .append('!');
System.out.println(messageBuilder.toString());
You can instead be happy in the knowledge that javac has your back and you can write this:
String name = getName();
System.out.println("Hello " + name + "!");

Warming Up Code - Just In Time Compilation


As we now know, the JVM takes in bytecode and executes it for us. However what isn't clear is when, how or why that bytecode is turned into machine code. Is this part of the mysterious process of "warming up" the Java code?

I recall when I first tasked with profiling real low latency code and I was told that you had to "warm up" the Java code before you start taking measurements. At the time I understood this has to do with runtime optimisations but I never truly knew what was happening... until now!

When executing methods, the JVM will begin by reading and running the bytecode in it line by line. This is typically a lot slower than if the entire method was translated from bytecode to native machine code. However the JVM won't do this to any method until it is ready to, or rather it will do it "just in time" ;) [Thanks to StackOverflow for patching up my understanding!]

Part of what qualifies a method for compilation is the number of times it has run. Hence we warm up code by calling the methods repeatedly because it will trigger JIT compilation and make our methods run faster up-to 100 times faster!

Conclusion


To summarise, we have found that Java is a language which is first compiled into bytecode,  an abstraction over machine code. This bytecode is interpretted by the JVM to run the underlying program, with the JIT compiler kicking in to translate it into native machine code after a "while".

Probably the most interesting question to come out of this, for me, is how exactly the JVM decides to trigger JIT compilation and the mechanics behind it all. Expect that and diving into garbage collection in future posts!

No comments:

Post a Comment