JMeterガイド - Java Tuning White Paper(和訳)
意訳超訳です

Java Tuning White Paper

Java? Enterprise Platforms and Developer Organization
Sun Microsystems, Inc.

Revised: December 20, 2005

http://java.sun.com/performance/reference/whitepap...



4 Tuning Ideas

By now you have taken the easy steps in the Best Practices section and have prepared for tuning by understanding the right ways of Making Decisions from Data. This section on Tuning Ideas contains suggestions on various tuning options that you should try on with your Java application. All comparisons made between different sets of options should be performed using the statistical techniques discussed above.

ここからは、ベストプラクティスを順を踏んでやっていきます。
データを使った正しい決定の仕方を理解することでチューニングの準備をしていきます。

4.1 General Tuning Guidelines


Here are some general tuning guidelines to help you categorize the kinds of Java tuning you will perform.

一般的にはいくつかのチューニングガイドラインがあり、
あなたがこれから取るJavaチューニングの分類するのに役立ちます。
4.1.1 Be Aware of Ergonomics Settings

Before you start to tune the command line arguments for Java be aware that Sun's HotSpot? Java Virtual Machine has incorporated technology to begin to tune itself. This smart tuning is referred to as Ergonomics. Most computers that have at least 2 CPU's and at least 2 GB of physical memory are considered server-class machines which means that by default the settings are:


2CPU、2GB以上の物理メモリを搭載しているようなコンピュータのほとんどは、server-class(リンクあり)のマシンと見なされ、デフォルトではこんな設定になります。
  • The -server compiler
  • The -XX:+UseParallelGC parallel (throughput) garbage collector
  • The -Xms initial heap size is 1/64th of the machine's physical memory
  • The -Xmx maximum heap size is 1/4th of the machine's physical memory (up to 1 GB max).
  • -server コンパイラ(javacのこと?)
  • -XX:+UseParallelGC パラレルGC
  • -Xms heapの初期サイズはマシンの物理メモリの64分の1(2GB積んでいると、32MB)
  • -Xmx heapの最大サイズはマシンの物理メモリの4分の1(上限は1GB。2GB積んでいると、512MB)


Please note that 32-bit Windows systems all use the -client compiler by default and 64-bit Windows systems which meet the criteria above will be be treated as server-class machines.

32bitのWindowsだとデフォルトでは-clientのコンパイラが使われる。(javacのことを言ってるの?)
64bitのWindowsだとserver-cassマシンだと判断されるので-serverのコンパイラが使われる。


(リンクページ)
Server-Class Machine Detection
server-classマシンかどうかの自動検知

http://java.sun.com/j2se/1.5.0/docs/guide/vm/serve...

Starting with J2SE 5.0, when an application starts up, the launcher can attempt to detect whether the application is running on a "server-class" machine and, if so, use the Java HotSpot Server Virtual Machine (server VM) instead of the Java HotSpot Client Virtual Machine (client VM). The aim is to improve performance even if no one configures the VM to reflect the application it's running. In general, the server VM starts up more slowly than the client VM, but over time runs more quickly.

J2SE5.0でアプリケーションを起動すると、ランチャー(javaコマンドのこと)はアプリケーションがserver-classなマシン上で実行されているのかどうか検知しようとします。そしてもしそう(server-classなマシン上での実行)ならば、clientVMではなくてserverVMが使われます。
一般に、serverVMはclientVMに比べて起動は遅いですが、動作は速いです。


Note: For J2SE 5.0, the definition of a server-class machine is one with at least 2 CPUs and at least 2GB of physical memory.

J2SE5.0は2CPU、2GB物理メモリを超えるとserver-classマシンだと判断します。


In J2SE 5.0, server-class detection occurs if neither -server nor -client is specified when launching the application on an i586 or Sparc 32-bit machine running Solaris or Linux. As the following table shows, the i586 Microsoft Windows platform is not subject to the server-class test (i.e., is never treated as a server-class machine by default) and uses the client VM by default. The remaining Sun-supported platforms use only the server VM.

J2SE5.0では、server-classかどうかの検知が動く条件。
  • -serverも-clientのどちらもjavaコマンドのオプションとして指定していない。
  • i586、Sparc(32bit)on Solaris or Linuxである。

次の表にはプラットフォームとデフォルトのVMについてまとめてある。
i586のWIndowsではデフォルトではserver-classとはならず、clientVMが使われる。

(訳者注)
この表を見る限り、WIndowsだとマシンスペックによってjavaコマンドが自動でserverVMに切り替えたりはしないようだ。
つまり、引数で-serverを指定する必要があるということだろう。
一方intelPCでLinuxだと、スペックが上記の条件を満たせば自動でserverVMを使ってくれる。

4.1.2 Heap Sizing
heapの見積もり

Even though Ergonomics significantly improves the "out of the box" experience for many applications, optimal tuning often requires more attention to the sizing of the Java memory regions.

Javaのメモリの見積もりにもっと注意を払うことが、チューニングには必要だったりします。


The maximum heap size of a Java application is limited by three factors: the process data model (32-bit or 64-bit) and the associated operating system limitations, the amount of virtual memory available on the system, and the amount of physical memory available on the system.

Javaアプリの最大heapサイズは次の3つの要素で決まります。
  • 処理データモデル(32bit or 64bit)とOSの制限
  • システム上で利用できる仮想メモリの総量
  • システム上で利用できる物理メモリの総量

The size of the Java heap for a particular application can never exceed or even reach the maximum virtual address space of the process data model. For a 32-bit process model, the maximum virtual address size of the process is typically 4 GB, though some operating systems limit this to 2 GB or 3 GB. The maximum heap size is typically -Xmx3800m (1600m) for 2 GB limits), though the actual limitation is application dependent. For 64-bit process models, the maximum is essentially unlimited.

1つのJavaアプリで使えるheapの上限は処理データモデルの仮想アドレス空間の上限を超えることはない。
32bit処理系だと仮想アドレスの上限サイズは4GBである。(上限が2GB、3GBのOSも存在する)。
heapの上限は一般に、3800MB(引数で書くと、-Xmx3800m)。2GBが上限の場合だと1600MB。実際の制限はアプリによるけれどね。
64bit処理系では、実質的に上限は無いといっていい。

訳者注)
OSでの上限が2GBの時javaHeapのmaxが1600MBということのようだけど、この約400MBの理由は後の節を読めばわかるのだろうか?


For a single Java application on a dedicated system, the size of the Java heap should never be set to the amount of physical RAM on the system, as additional RAM is needed for the operating system, other system processes, and even for other JVM operations. Committing too much of a system's physical memory is likely to result in paging of virtual memory to disk, quite likely during garbage collection operations, leading to significant performance issues. On systems with multiple Java processes, or multiple processes in general, the sum of the Java heaps for those processes should also not exceed the the size of the physical RAM in the system.

専用システム上で動く単独のJavaアプリの場合、heapサイズはシステムの物理RAM量と同じにしてはいけない。というのも、OSやシステムの他のプロセスや他のJavaVMでもRAMを必要とするからです。
システムの物理メモリをJavaアプリにあまりにもたくさん割り当てすぎると結果として、GCの間、仮想メモリからHDDへのページングが多発することになったりして深刻なパフォーマンス低下を招くことになる。
複数のJavaプロセスがあるシステムではJavaheapの合計サイズはシステム上の物理メモリのサイズを超えないようにしよう。


The next most important Java memory tunable is the size of if the young generation (also known as the NewSize). Generally speaking the largest recommended value for the young generation is 3/8 of the maximum heap size. Note that with the throughput and low pause time collectors it may be possible to exceed this ratio. For more information please see the discussions of the Young Generation Guarantee in the Tuning Garbage Collection with the 5.0 Java Virtual Machine document.

次に重要なJavaメモリのチューニングポイントは、young領域(NewSizeと呼ばれたりする)のサイズです。
一般には、young領域のオススメサイズは、heapの最大サイズの8分の3です。(heap1GBだと、youngは414MB。思ったより多いかも。)

もっと知りたい人は「the Young Generation Guarantee in the Tuning Garbage Collection with the 5.0 Java Virtual Machine document」でのディスカッションを参照してね。

Additional memory settings, such as the stack size, will be covered in greater detail below.

これ以外のメモリの設定については(stackのサイズとか)は後ろの章で詳しくやるよ。

4.1.3 Garbage Collector Policy

The Java? Platform offers a choice of Garbage Collection algorithms. For each of these algorithms there are various policy tunables. Instead of repeating the details of the Tuning Garbage Collection document here suffice it to say that first two choices are most common for large server applications:
The -XX:+UseParallelGC parallel (throughput) garbage collector, or
The -XX:+UseConcMarkSweepGC concurrent (low pause time) garbage collector (also known as CMS)
The -XX:+UseSerialGC serial garbage collector (for smaller applications and systems)

JavaではGCのアルゴリズムを選べます。
それぞれのアルゴリズムには様々な特徴があります。
ここではGCチューニングに関して詳細に述べる替わりに、大きなサーバアプリの場合に選ばれるであろう3つのアルゴリズムを取り上げます。
  • -XX:+UseParallelGC パラレルGC(スループット重視)
  • -XX:+UseConcMarkSweepGC コンカレントGC(GC中の停止時間重視) (CMSと呼ばれたりする)
  • -XX:+UseSerialGC シリアルGC (小さなアプリや小さなシステム用)

訳者注)無指定のときに使われるのはどれ?

4.1.4 Other Tuning Parameters

Certain other Java tuning parameters that have a high impact on performance will be mentioned here. Please see the Pointers section for a comprehensive reference of Java tuning parameters.

ここでは、パフォーマンスに大きな影響のある、その他のチューニングパラメータを扱います。
チューニングパラメータについて一通り知りたい場合は、別途参照してね。


The VM Options page discusses Java support for Large Memory Pages. By appropriately configuring the operating system and then using the command line options -XX:+UseLargePages (on by default for Solaris) and -XX:LargePageSizeInBytes you can get the best efficiency out of the memory management system of your server. Note that with larger page sizes we can make better use of virtual memory hardware resources (TLBs), but that may cause larger space sizes for the Permanent Generation and the Code Cache, which in turn can force you to reduce the size of your Java heap. This is a small concern with 2 MB or 4 MB page sizes but a more interesting concern with 256 MB page sizes.



An example of a Solaris-specific tunable is selecting the libumem alternative heap allocator. To experiment with libumem on Solaris you can use the following LD_PRELOAD environment variable directive:
To set libumem for all child processes of a given shell, set and export the environment variable
LD_PRELOAD=/usr/lib/libumem.so
To launch a Java application with libumem from sh:
LD_PRELOAD=/usr/lib/libumem.so java java-settings application-args
To launch a Java application with libumem from csh:
env LD_PRELOAD=/usr/lib/libumem.so java java-settings application-args
You can verify that libumem is in use by verifying the process settings with pmap(1) or pldd(1).

4.2 Tuning Examples

チューニング例

Here are some specific tuning examples for your experimentation. Please understand that these are only examples and that the optimal heap sizes and tuning parameters for your application on your hardware may differ.

チューニング例だよ。
アプリとかハードウェアによって最適なheapサイズやチューニングパラメータってのは違ってくるので、そこんとこ忘れずに。

4.2.1 Tuning Example 1: Tuning for Throughput
例1:スループット重視

Here is an example of specific command line tuning for a server application running on system with 4 GB of memory and capable of running 32 threads simultaneously (CPU's and cores or contexts).

次のJavaコマンドラインは、4GBの物理メモリ、同時に32スレッドを実行できるマシンでのサーバアプリのケースです。

java -Xmx3800m -Xms3800m -Xmn2g -Xss128k -XX:+UseParallelGC -XX:ParallelGCThreads=20

Comments:
  • -Xmx3800m -Xms3800m
Configures a large Java heap to take advantage of the large memory system.
物理メモリが大きいシステムの利点を活かすため、JavaHeapサイズを大きくした。
  • -Xmn2g
Configures a large heap for the young generation (which can be collected in parallel), again taking advantage of the large memory system. It helps prevent short lived objects from being prematurely promoted to the old generation, where garbage collection is more expensive.

young領域(並行してGCできる。)のheapを大きくした。これによって物理メモリ量が多いシステムの長所をさらに引き出す。
これによって、短命なオブジェクトがold領域に移動するのを防げる。old領域でのGCはyoung領域でのGCに比べてコストがかかる。
  • -Xss128k
Reduces the default maximum thread stack size, which allows more of the process' virtual memory address space to be used by the Java heap.

スレッドのスタックサイズの最大値(初期値)を減らす。これによって、より多くのプロセスの仮想メモリアドレス空間がjavaHeapによって利用できるようになる
  • -XX:+UseParallelGC
Selects the parallel garbage collector for the new generation of the Java heap (note: this is generally the default on server-class machines)

JavaHeapのnew領域で、パラレルGCを使う。ちなみに、serverクラスのマシンではこれが普通です。
  • -XX:ParallelGCThreads=20
Reduces the number of garbage collection threads. The default would be equal to the processor count, which would probably be unnecessarily high on a 32 thread capable system.

GCのスレッド数を減らす。デフォルトではプロセッサー数(CPU数?コア数?)と同じです。
32スレッドを同時実行できるシステムではおそらくそんなには(32スレッドも)いらないです。

4.2.2 Tuning Example 2: Try the parallel old generation collector
例2:old領域でパラレルGCに挑戦!

Similar to example 1 we here want to test the impact of the parallel old generation collector.

例1と同じ環境で、old領域でパラレルGCを選択した場合の影響、インパクトを調べてみよう。

java -Xmx3550m -Xms3550m -Xmn2g -Xss128k -XX:+UseParallelGC -XX:ParallelGCThreads=20 -XX:+UseParallelOldGC

Comments:
  • -Xmx3550m -Xms3550m
Sizes have been reduced. The ParallelOldGC collector has additional native, non-Java heap memory requirements and so the Java heap sizes may need to be reduced when running a 32-bit JVM.

サイズを減らした。old領域のパラレルGCではネイティブで非JavaHeapのメモリ領域を追加で使うことになる。JavaHeapサイズは32bitのJavaVMでは減らした方が良さそうだ。
  • -XX:+UseParallelOldGC
Use the parallel old generation collector. Certain phases of an old generation collection can be performed in parallel, speeding up a old generation collection.

old領域でパラレルGCを使う。old領域のGCがパラレルで実行されるので、速くなるだろう。

4.2.3 Tuning Example 3: Try 256 MB pages
チューニング例3:256MBのページ(?)に挑戦

This tuning example is specific to those Solaris-based systems that would support the huge page size of 256 MB.

このチューニング例はソラリスに特化したものです。ソラリスでは256MBという巨大なページサイズをサポートしています。

java -Xmx2506m -Xms2506m -Xmn1536m -Xss128k -XX:+UseParallelGC -XX:ParallelGCThreads=20 -XX:+UseParallelOldGC -XX:LargePageSizeInBytes=256m


Comments:
  • -Xmx2506m -Xms2506m
Sizes have been reduced because using the large page setting causes the permanent generation and code caches sizes to be 256 MB and this reduces memory available for the Java heap.

heapのサイズは減らしました。というのも、巨大なページを使うことによってpermanent領域とコードのキャッシュが256MBになってしまい、JavaHeapとして利用可能なメモリ量が減ってしまうからです。
  • -Xmn1536m
The young generation heap is often sized as a fraction of the overall Java heap size. Typically we suggest you start tuning with a young generation size of 1/4th the overall heap size. The young generation was reduced in this case to maintain a similar ratio between young generation and old generation sizing used in the previous example option used.

young領域のheapにはJavaHeapサイズ全体のほんの一部だけを割り当てられることがしばしばあります。
私たちはよくこんな風に提案します。まずはheap全体の4分の1をyoung領域に割り当てることからチューニングを始めましょう。
young領域とold領域のサイズの比率を前の例(4.2.2)と同じ同じにするために、young領域のサイズを減らしました。
  • -XX:LargePageSizeInBytes=256m
Causes the Java heap, including the permanent generation, and the compiled code cache to use as a minimum size one 256 MB page (for those platforms which support it).

Java heap(permanent領域を含む)とコンパイルされたコードのキャッシュは最低でも256MB(1単位が256MBなので。)使われるようになる。

4.2.4 Tuning Example 4: Try -XX:+AggressiveOpts
チューニング例4:AggressiveOptsを使ってみる。

This tuning example is similar to Example 2, but adds the AggressiveOpts option.

このチューニング例は例2と似ていますが、AggressiveOptsオプションを追加したところが違います。

java -Xmx3550m -Xms3550m -Xmn2g -Xss128k -XX:+UseParallelGC -XX:ParallelGCThreads=20 -XX:+UseParallelOldGC -XX:+AggressiveOpts

Comments:
  • -Xmx3550m -Xms3550m
Sizes have been increased back to the level of Example 2 since we no longer using huge pages.

例2と同じ値です。
  • -Xmn2g
Sizes have been increased back to the level of Example 2 since we no longer using huge pages.

例2と同じ値です。
  • -XX:+AggressiveOpts
Turns on point performance optimizations that are expected to be on by default in upcoming releases.
The changes grouped by this flag are minor changes to JVM runtime compiled code and not distinct performance features (such as BiasedLocking and ParallelOldGC).
This is a good flag to try the JVM engineering team's latest performance tweaks for upcoming releases.
Note: this option is experimental!
The specific optimizations enabled by this option can change from release to release and even build to build.
You should reevaluate the effects of this option with prior to deploying a new release of Java.




4.2.5 Tuning Example 5: Try Biased Locking

This tuning example is builds on Example 4, and adds the Biased Locking option.

java -Xmx3550m -Xms3550m -Xmn2g -Xss128k -XX:+UseParallelGC -XX:ParallelGCThreads=20 -XX:+UseParallelOldGC -XX:+AggressiveOpts -XX:+UseBiasedLocking

Comments:
  • XX:+UseBiasedLocking
Enables a technique for improving the performance of uncontended synchronization. An object is "biased" toward the thread which first acquires its monitor via a monitorenter bytecode or synchronized method invocation; subsequent monitor-related operations performed by that thread are relatively much faster on multiprocessor machines. Some applications with significant amounts of uncontended synchronization may attain significant speedups with this flag enabled; some applications with certain patterns of locking may see slowdowns, though attempts have been made to minimize the negative impact.
4.2.6 Tuning Example 6: Tuning for low pause times and high throughput

This tuning example similar to Example 2, but uses the concurrent garbage collector (instead of the parallel throughput collector).

java -Xmx3550m -Xms3550m -Xmn2g -Xss128k -XX:ParallelGCThreads=20 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=31

Comments:
  • XX:+UseConcMarkSweepGC -XX:+UseParNewGC
Selects the Concurrent Mark Sweep collector. This collector may deliver better response time properties for the application (i.e., low application pause time). It is a parallel and mostly-concurrent collector and and can be a good match for the threading ability of an large multi-processor systems.
  • XX:SurvivorRatio=8
Sets survivor space ratio to 1:8, resulting in larger survivor spaces (the smaller the ratio, the larger the space). Larger survivor spaces allow short lived objects a longer time period to die in the young generation.
  • XX:TargetSurvivorRatio=90
Allows 90% of the survivor spaces to be occupied instead of the default 50%, allowing better utilization of the survivor space memory.
  • XX:MaxTenuringThreshold=31
Allows short lived objects a longer time period to die in the young generation (and hence, avoid promotion). A consequence of this setting is that minor GC times can increase due to additional objects to copy. This value and survivor space sizes may need to be adjusted so as to balance overheads of copying between survivor spaces versus tenuring objects that are going to live for a long time. The default settings for CMS are SurvivorRatio=1024 and MaxTenuringThreshold=0 which cause all survivors of a scavenge to be promoted. This can place a lot of pressure on the single concurrent thread collecting the tenured generation. Note: when used with -XX:+UseBiasedLocking, this setting should be 15.
4.2.7 Tuning Example 7: Try AggressiveOpts for low pause times and high throughput

This tuning example is builds on Example 6, and adds the AggressiveOpts option.

java -Xmx3550m -Xms3550m -Xmn2g -Xss128k -XX:ParallelGCThreads=20 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:SurvivorRatio=8 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=31 -XX:+AggressiveOpts

Comments:
  • XX:+AggressiveOpts
Turns on point performance optimizations that are expected to be on by default in upcoming releases. The changes grouped by this flag are minor changes to JVM runtime compiled code and not distinct performance features (such as BiasedLocking and ParallelOldGC). This is a good flag to try the JVM engineering team's latest performance tweaks for upcoming releases. Note: this option is experimental! The specific optimizations enabled by this option can change from release to release and even build to build. You should reevaluate the effects of this option with prior to deploying a new release of Java.

5 Monitoring and Profiling


Discussing monitoring (extracting high level statistics from a running application) or profiling (instrumenting an application to provide detailed performance statistics) are subjects which are worthy of White Papers in their own right. For the purpose of this Java Tuning White Paper these subjects will be introduced using tools as examples which can be used on a permanent basis without charge.

5.1 Monitoring


The Java? Platform comes with a great deal of monitoring facilities built-in. Please see the document Monitoring and Management for the Java? Platform for more information.

The most popular of these "built-in" tools are JConsole and the jvmstat technologies.

5.2 Profiling


The Java? Platform also includes some profiling facilities. The most popular of these "built-in" profiling tools are The -Xprof Profiler and the HPROF profiler (for use with HPROF see also Heap Analysis Tool).

A profiler based on JFluid Technology has been incorporated into the popular NetBeans development tool.