Java VM tuning

11Apr12

JVM tuning is not a new topic of discussion. However, considering that many tasks in computational linguistics/NLP are extremely expensive computationally, and that there are a number of excellent NLP tools written in Java, the importance of JVM tuning for NLP tasks should not be overlooked: One task I did took over a week of computing time on a computer cluster with 16 nodes with a server, each cluster having an Intel Xeon X5355 4-core 2.66GHz CPU and 16GB RAM; The server was a Sun Fire X4600 M2 with 8 AMD Opteron 8220 2-core 2.8GHz CPUs and 128GB RAM (of course, I wasn’t the only person using the cluster at the time, but my point is that computational linguists — like other people involved in machine learning — do some serious number-crunching). In order to run this task, I had to increase the size of the memory pool allocated to the JVM, a very blasé adjustment that just about everyone does. However, I realised only fairly recently that I’d completely forgotten about other possible tweaks I could’ve done to (possibly) speed up my job, when I had recently tried (once again) to optimise the performance of Eclipse: For example, I had managed to roughly half Eclipse’s >12-seconds’ startup time on my Lenovo Thinkpad T410i through some simple changes in my eclipse.ini file (this is by no means a rigorous measurement, but the point of this post isn’t about quantitatively measuring performance improvements through JVM tuning).

Of course, the best way to improve performance is to find a better algorithm, but, given you’ve already tried that and you’re stuck using Java for one reason or another, you can still tweak a bit more performance out of your JVM.

Upgrade your JVM

This may not always be possible given your computing set-up (e.g. when using a cluster), but if at all possible, the best improvements in performance are to be found in the newest JVM(s); At the moment, Sun’s JDK 7 is starting to take off, and it seems to be  significantly faster than JDK 6 in e.g. arithmetic operations and operations on arrays — two things critical to performance in machine-learning tasks.

JVM Options

Not only are newer JVMs simply faster than older ones, but they also offer many new tuning options which can even further their performance. However, even when using e.g. JDK 6, there are a few quick and “safe” options you can quickly specify as command-line options (e.g. java [options]) for usually (but not always) improving performance:

  • -server — By far one of the most stable and effective tweaks, this enables use of the “server JVM”, which is optimised for actual application performance rather than for JVM startup/shutdown speed (as the default “client JVM” is).
  • -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC — Perhaps the single largest performance bottleneck for any JVM-based application is garbage collectionProperly tuning a GC can get very serious and methodical, and if you want to get the absolute best performance for your system you will likely have to do some profiling. However, lazy people like me can just try using the new G1 garbage collector set to be standard in JDK 7. If you’re running at least Java SE 6 Update 17, you can try out the new G1GC instead of the older Concurrent Mark Sweep collector which most people use.
  • -XX:+G1ParallelRSetUpdatingEnabled -XX:+G1ParallelRSetScanningEnabled — According to the G1GC documentation, “to run G1 at its full potential, try setting these two parameters which are currently disabled by default because they may uncover a rare race condition“.
  • -XX:+UseFastAccessorMethods — Also requiring -XX:+UnlockExperimentalVMOptions like the previous options, this enables optimisations of getter/accessor methods which simply return the value of a member variable.
  • -XX:+AggressiveOptsPoint performance compiler optimisations.
  • -XX:+UseCompressedOops — Enables compression of pointers when using a 64-bit JVM; can save some memory from the extra size requirements of 64-bit pointers over 32-bit ones.
  • -XX:+OptimizeStringConcat — Does exactly what it says on the tin: it (tries to) optimise string concatenations to some degree. This could give you a performance boost if you deal with strings a lot and don’t know how to deal with strings efficiently, but I haven’t played around with this option much, so I can’t say anything about its effectiveness.

Although I’ve never had any serious problems using these options (only speed differences), according to the Java HotSpot JVM documentation, “Options that are specified with -XX are not stable and are not recommended for casual use. These options are subject to change without notice”; Likewise, not all these options may be available for all JVM vendors and versions and their effectiveness may vary. Additionally, there are yet more options which can be used to tune the JVM specifically to your machine and your task, but the ones listed above are those I have tried out and most of which I keep enabled for nearly all my applications (except for -XX:+G1ParallelRSetUpdatingEnabled, -XX:+G1ParallelRSetScanningEnabled and -XX:+OptimizeStringConcat).

Conclusion

By keeping your machine as up-to-date as possible and by enabling features which are normally left disabled, it is possible to optimise your JVM for long-running, computationally- and memory-intensive tasks of the type seen in computational linguistics/NLP. Although there are ways to accurately profile your program to squeeze every last bit of performance out of your JVM, for many experimental purposes the time it takes to do such profiling is counter-productive, and it can suffice to optimise the JVM simply “well enough” so that you can perhaps shave off a few days from the time it takes to train a model, etc.



No Responses Yet to “Java VM tuning”

  1. Leave a Comment

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


%d bloggers like this: