If you want to build and run faster Java applications on the IBM Developer Kit for Linux, version 1.3, roll up your sleeves and prepare to get dirty. This article provides hands-on instruction for profiling, monitoring, and performance tuning not only your IBM Developer Kit, but your hardware capacity, the Linux 2.2.x kernel, and your Java applications.
The IBM Developer Kit for Linux, Java 2 Technology Edition, version 1.3 (IBM Developer Kit 1.3) is a development kit and runtime environment that contains IBM's just-in-time (JIT) compiler, a high-performance interpreter, and a reengineered Java 2 virtual machine. The IBM Developer Kit 1.3 package provides all the tools you need to develop and run Java applications and applets on Linux:
- The Java Runtime Environment (JRE) -- consisting of the Java virtual machine (JVM), its supporting files, and the system memory available to the JVM -- allows you to run Java applets and applications.
- The Java interpreter executes programs written in the Java programming language.
- The JIT compiler converts Java byte code (.class files) into native machine code.
- Java class libraries define all the standard Java classes, allowing your Java applications to create and extend existing Java objects.
- Library and header files are used to build native applications that interface with Java and make use of the Java Native Interface (JNI).
- The applet viewer lets you run an applet from the command prompt.
- Demonstration applets demonstrate the proper use of Java constructs.
In this article we provide step-by-step performance tuning tips that can help your Java applications run faster on the IBM Developer Kit 1.3. For optimum application performance, you need to tune each of the following:
- Your hardware capacity
- The Linux 2.2.x kernel
- The IBM Developer Kit 1.3
- Your Java applications
Before we get started on performance tuning your hardware capacity, you'll want to get to know your Linux system. For that, we'll show you how to use the handy little filesystem called /proc.
Getting to know your Linux system
The Linux process filesystem, commonly known as /proc, can provide you with a wealth of information about the Linux kernel and the processes currently running on your system. /proc is used as an interface to access kernel data structures, and allows you to change select kernel variables. Most of the files contained in /proc are readable, so you can use the command cat or more to view their contents. Three /proc files are essential to getting to know your Linux system:
/proc/version lists information about the Linux kernel running on your system.
# cat /proc/version Linux version 2.2.12-20 (root@porky.devel.redhat.com) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #1 Mon Sep 27 10:40:35 EDT 1999 |
/proc/meminfo lets you view the layout of your memory resources.
# cat /proc/meminfo
total: used: free: shared: buffers: cached:
Mem: 262938624 92098560 170840064 54906880 5857280 41869312
Swap: 682655744 0 682655744
MemTotal: 256776 kB
MemFree: 166836 kB
MemShared: 53620 kB
Buffers: 5720 kB
Cached: 40888 kB
SwapTotal: 66656 kB
SwapFree: 66656 kB
|
/proc/swaps tells you what you need to know about your swap file.
# cat /proc/swaps Filename Type Size Used Priority /dev/hda8 partition 666656 0 -1 |
You may also want to take a moment to explore the following files under /proc, although they are secondary to our purposes here:
- /proc/partitions lists all disk drive partitions on your system.
- /proc/cpuinfo lists characteristics of your system processor.
- /proc/pci lists information about the PCI devices on your system.
- /proc/interrupts lists information about the IRQs being used.
- /proc/dma lists information about the DMA channels being used.
- /proc/ioports lists the I/O port address ranges being used.
See the /proc man page to learn more about /proc.
Once you know what's going on under the hood of your Linux system, you'll be ready to begin tuning it for maximum performance. We'll begin by tuning your hardware capacity.
|
Tuning your hardware capacity
In this section we show you how to configure the two system elements that have the biggest impact on the performance of your hardware: memory/swap space and your hard drive. But first, you should know the general rules for achieving maximum hardware capacity, as follows:
- Use only the latest version of the firmware for your system, BIOS, disks, adapters, and controllers.
- Be sure you have sufficient memory and swap space to handle the workload.
- Use high-performance disks, adapters, and controllers.
- Balance file I/O by using different filesystems and different disks.
The right memory configuration for the job
If your application is not I/O- or network-bound, the simplest way to improve performance is to increase memory and processor speed. If your application is network bound, check your network topology and network adapter card settings, since they can affect overall performance. In a server-client scenario, remember that outgoing response packets from a server are usually much larger than incoming request packets. Check the number of network adapters and the network utilization rate to balance network traffic.
In a server environment, check connection memory requirements such as kernel socket structure, TCP control block structure, and any socket buffer space for handling communications packets. Multiply the memory resources consumed by each connection by the total number of connections to estimate your server capacity for connection support.
It can also be very helpful to create swap partitions on different disk drives. Doing so distributes I/O activities to different drives, which averts drive bottlenecks.
Configuring system memory and swap space
Swap space is a partition that Linux uses as an extension of its virtual memory. Insufficient memory resources and/or swap space will cause severe performance problems, especially in a server environment. To configure memory and swap space you must determine how much physical memory your workload requires, especially at peak times; determine how much swap space you need; and configure the swap space in order to efficiently distribute the disk I/O.
The guideline is that swap space should be at least twice as big as physical memory. A swap space can be either a file or a disk (swap) partition. A swap partition is much more efficient and offers better performance than a swap file. If a swap partition was not created during installation, or if you want to create another swap partition, you can use fdisk or cfdisk to create one.
How to create a swap partition
To create a partition, run the fdisk command. Choose option m (for menu) to display all options, followed by option n to add a new partition. Once a partition has been created, choose option w to write the changes to the hard drive and exit. Below is a list of all options available under the fdisk command.
Listing 1. Options available under fdisk
# fdisk /dev/hda
Command (m for help): m
Command action
a toggle a bootable flag
b edit bsd disklabel
c toggle the dos compatibility flag
d delete a partition
l list known partition types
m print this menu
n add a new partition
o create a new empty DOS partition table
p print the partition table
q quit without saving changes
s create a new empty Sun disklabel
t change a partition's system id
u change display/entry units
v verify the partition table
w write table to disk and exit
x extra functionality (experts only)
|
To make your new partition a swap partition, use the mkswap command as described below.
How to create a swap file
To create a swap file, you need to do three things: set aside the space for the file, create the actual file, and enable it. We'll also show you how to disable the file should you need to.
In the example below, we set aside a file called swap with block size of 1024 and file size of 10,240 bytes from /dev/zero.
# dd if=/dev/zero of=/swap bs=1024 count=10240 10240+0 records in 10240+0 records out |
mkswap is the command to create the actual /swap file:
# mkswap -c /swap 10240 Setting up swapspace version 0, size = 10481664 bytes |
swapon will turn it on:
# swapon /swap |
And swapoff will turn it off:
# swapoff /swap |
Now, let's have a look at that hard drive.
How to configure your hard drive
Linux provides a tool called hdparm that allows you to get and set hard drive parameters such as filesystem read ahead count, drive look ahead, drive write cache, drive multisector access, and drive look ahead prefetch count. Below are all the options that can affect the performance of your hard drive.
Listing 2. hdparm options that can affect the performance of your hard drive
-a get/set fs readahead
-A * set drive read-lookahead flag (0/1)
-c * get/set IDE 32-bit IO setting
-d * get/set using_dma flag
-m * get/set multiple sector count
-n * get/set ignore-write-errors flag (0/1)
-p * set PIO mode on IDE interface chipset (0,1,2,3,4,...)
-P * set drive prefetch count
-q change next setting quietly
-r get/set readonly flag (DANGEROUS to set)
-S * set standby (spindown) timeout
-t perform device read timings
-T perform cache read timings
-u * get/set unmaskirq flag (0/1)
-v default; same as -acdgkmnru (-gr for SCSI)
-W * set drive write-caching flag (0/1) (DANGEROUS)
-X * set IDE xfer mode (DANGEROUS)
* = IDE drives only
|
For our own hard-drive tuning exercise, we timed the device read and cache read operations before and after setting the 32-bit I/O option. In this way, we could determine the option's performance impact on our IDE drive.
Before making the change, the results of hdparm -tT yielded
# hdparm -tT /dev/hda /dev/hda: Timing buffer-cache reads: 128 MB in 1.20 seconds =106.67 MB/sec Timing buffered disk reads: 64 MB in 20.54 seconds = 3.12 MB/sec |
Setting 32-bit I/O support resulted in the output
# hdparm -c1 /dev/hda
/dev/hda:
setting 32-bit I/O support flag to 1
I/O support = 1 (32-bit)
|
And here are the results obtained from the same test after setting 32-bit I/O
# hdparm -tT /dev/hda /dev/hda: Timing buffer-cache reads: 128 MB in 1.18 seconds =108.47 MB/sec Timing buffered disk reads: 64 MB in 13.89 seconds = 4.61 MB/sec |
Not bad! Now try setting a couple of options for yourself.
To use 32-bit I/O over the PCI bus, enter the command
# hdparm -c 1 /dev/hda |
To enable DMA, enter
# hdparm -d 1 /dev/hda |
Test the results of your changes by using option -t (device read timings) and/or -T (cache read timings)
# hdparm -t /dev/hda. |
If you wish to keep the new settings across an IDE hard drive reset, enter
# hdparm -k 1 /dev/hda. |
Once you've set your system swap and memory and configured your hard drive, you need to begin monitoring and diagnosing the results. We'll show you how in the next section.
|
Monitoring system load and hardware performance
Linux provides several utilities for monitoring and diagnosing performance problems. We've found three of them indispensible:
vmstatdisplays information about processes, virtual memory usage, I/O, and CPU usage activity.topprovides process monitoring in real time.netstatdisplays the status of the system's network connections.
vmstat for virtual memory statistics
The vmstat command provides data that can help you look for unusual system activity such as high page faults or excessive context switches, which can degrade your system performance. We ran vmstat on our test system and got the following output:
Listing 3. Output of vmstat
# vmstat
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 0 0 940 88724 76504 41628 0 0 2 0 137 222 2 4 93
0 0 0 940 88656 76504 41628 0 0 0 3 228 421 3 5 93
0 0 0 940 88656 76504 41628 0 0 205 1 543 105 1 32 67
|
Here's how the information breaks down:
procsshows the number of processes that are ready and running (r), blocked (b), or swapped out (s).memoryshows the amounts of swap (swpd), free (free), buffered (buff), and cached memory (cache) in kilobytes.swapshows in kilobytes per second the amount of memory swapped in (si) from disk and swapped out (so) to disk.ioshows the number of blocks sent (bi) and received (bo) to and from block devices per second.systemshows the number of interrupts (in) and context switches (cs) per second.cpushows the percentage of total CPU time in terms of user (us), system (sy), and idle (id) time.
Use vmstat to monitor and evaluate system activity. For example, if the value for free is small and accompanied by high values for swap (si and so), you have excessive paging and swapping due to physical memory shortage.
If the value of so is consistently high, you may either have insufficient swap space or physical memory. Use the free command to see your memory and swap space configurations. Use the swapon -s command to display your swap device configuration. Use the iostat command to see which disk is being used the most.
top for process statistics
The top command provides an updating overview of all running processes and the system load. Here's what top told us about our system:
Listing 4. Output of top
# top
4:04pm up 1 day, 18 min, 1 user, load average: 0.06, 0.10, 0.15
49 processes: 48 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: 0.3% user, 1.7% system, 0.0% nice, 97.8% idle
Mem: 256776K av, 117792K used, 138984K free, 49796K shrd, 30516K buff
Swap: 666656K av, 940K used, 665716K free 41620K cached
PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND
2130 root 10 0 2904 2904 2248 S 0 1.1 1.1 0:03 magicdev
2188 root 17 0 1020 1020 820 R 0 0.9 0.3 0:00 top
1 root 0 0 460 460 388 S 0 0.0 0.1 0:04 init
2 root 0 0 0 0 0 SW 0 0.0 0.0 0:00 kflushd
. . . . .
|
Here's how top's output breaks down, line by line:
- Line 1 shows the system uptime: the current time, how long the system has been up since the last reboot, the current number of users, and three load average numbers which represent the average numbers of processes ready to run during the previous 1, 5, and 15 minutes.
- Line 2 shows the process statistics, which include the total number of processes running at the time of the last top screen update. This line also shows the number of sleeping, running, zombie, and stopped processes.
- Line 3 displays CPU statistics, which contain percentage of CPU time used by the user, system, niced, and idle processes.
- Line 4 provides memory statistics, which include total available memory, free memory, used memory, shared memory, and memory used for buffers.
- Line 5 shows virtual memory or swap statistics, which include total available swap space, used swap space, free swap space, and cached swap space.
Use top to identify which processes are using the most resources. In the above example, it is magicdev. The process that consumes the second-most resources is the top utility.
To change the update intervals under top, use option s and enter the desired number of seconds between each update. To sort the processes by memory utilization, use option M. To exit top, use option q.
netstat for network statistics
You can use netstat to view information about active network connections and diagnose your TCP/IP networking problems. netstat displays a list of active sockets for each network protocol: TCP, UDP, RAW, or UNIX. It also provides information about network routes, and cumulative statistics for network interfaces, including the number of incoming and outgoing packets and the number of packet collisions.
Running netstat on our system returned the following results:
Listing 5. Output of netstat
# netstat Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 1 0 xxx.austin.:48912 vip-flfc04-sj.flyca:www CLOSE_WAIT tcp 0 0 xxx.austin.:48913 204.50.109.96:www CLOSE_WAIT Active UNIX domain sockets (w/o servers) Proto RefCnt Flags Type State I-Node Path unix 9 [ ] DGRAM 947 /dev/log unix 3 [ ] STREAM CONNECTED 1811690/tmp/.X11-unix/X0 unix 3 [ ] STREAM CONNECTED 1811689 unix 3 [ ] STREAM CONNECTED 1810757 unix 3 [ ] STREAM CONNECTED 1810756 unix 3 [ ] STREAM CONNECTED 1810755 unix 3 [ ] STREAM CONNECTED 1810754 unix 3 [ ] STREAM CONNECTED 1810742/tmp/.ICE-unix/825 unix 3 [ ] STREAM CONNECTED 1810741 unix 3 [ ] STREAM CONNECTED 1810739/tmp/.X11-unix/X0 ääääääää.. |
netstat output is grouped into five columns:
- Proto lists the protocol in use.
- Recv-Q and Send-Q display the number of bytes in receive and transmit queues.
- Local Address displays the local TCP port.
- Foreign Address displays the remote port.
- State displays the state of the connection.
We have shown you three Linux commands you can use to view the working state of your system hardware at any given moment. By understanding your hardware capacity, you can knowledgeably tweak its parameters to create optimum system behavior, which will in turn benefit your application performance.
|
Tuning your Linux 2.2.x kernel
In this section we'll introduce five tuning tricks to maximize the performance of your Linux kernel, as follows:
- Monitor and set process priority
- Disable unused system services
- Optimize runtime
- Suppress the last-access timestamp
- Increase the number of allowed processes
Monitoring and setting process priority
Linux daemon processes are started by the init process at boot time. In this way, init is a parent to other resources. These resources can then parent other resources, called child processes. When a process launches another process, it uses the Fork system call to create the new process. In a heavily loaded system with multiuser support, thousands of processes may be running at a given time. As a result, you need to efficiently control and allocate resources to achieve optimum performance.
You can control which processes are run at boot time by configuring /etc/inittab. To start, run the ps command, which will display all processes running on your system. Listing 6 shows a truncated version of the output generated for our system using option -axf, which yields a tree view of all system processes.
Listing 6. Output of option ps-axf (truncated)
# ps -axf
PID TTY STAT TIME COMMAND
1 ? S 0:04 init
2 ? SW 0:00 [kflushd]
3 ? SW 0:01 [kupdate]
4 ? SW 0:00 [kpiod]
5 ? SW 0:00 [kswapd]
6 ? SW < 0:00 [mdrecoveryd]
287 ? S 0:00 portmap
303 ? S 0:00 /usr/sbin/apmd -p 10 -w 5 -W
356 ? S 0:00 syslogd -m 0
367 ? S 0:00 klogd
383 ? S 0:00 /usr/sbin/atd
399 ? S 0:00 crond
419 ? S 0:00 inetd
435 ? S 0:00 lpd
473 ? S 0:00 sendmail: accepting connections on port 25
490 ? S 0:00 gpm -t ps/2
507 ? S 0:00 xfs -droppriv -daemon -port -1
. . . . .
781 pts/1 S 0:00 \_ bash
822 pts/1 S 0:00 \_ /usr/java/jre/bin/exe/java
ExceptionTimes 10000
827 pts/1 S 0:00 \_ /usr/java/jre/bin/exe/java
ExceptionTimes 10000
828 pts/1 S 0:00 \_
/usr/java/jre/bin/exe/java ExceptionTimes 1000
829 pts/1 S 0:00 \_
/usr/java/jre/bin/exe/java ExceptionTimes 1000
830 pts/1 S 0:00 \_
/usr/java/jre/bin/exe/java ExceptionTimes 1000
|
You can lower the priorities of noncritical tasks and daemons using the command nice or snice. Linux has two priority numbers associated with each process: PRI (priority) and NI (nice). PRI is the actual process priority, which is dynamically computed by Linux. Nice is the requested process execution priority number, which can be set by you or root to influence the PRI number. The valid range of process priority is from -20 (highest priority) to 19 (lowest). Only root can increase a process priority.
To increase the priority of process process_id by 5, you would enter
# snice -5 process_id |
To lower the priority of process process_id by 5, you would enter
# snice +5 process_id |
Disabling unused services
Every open service uses system resources, so it's a good idea to disable all unused services and daemons, particularly those that are network related. To disable a service, simply comment out the service name in the /etc/inetd.conf file. You must restart inetd services after modifying /etc/inetd.conf.
The following services are installed and enabled on your Linux system by default:
- FTP
- Telnet
- Gopher
- rsh (remote shell)
- rlogin (remote login)
- rexec (remote exec)
- talk
- ntalk
- POP2
- POP3
- IMAP
- finger
- tcp (time)
- udp (time)
- ident (auth)
- Web services for linuxconf
You should remove all unused applications from your system, including multimedia applications, scripting languages, file editors, and server software such as Samba, NFS, and NIS.
Use option -qa to obtain a list of all installed packages:
# rpm -qa > /tmp/all.packages |
Use option -qi to find out more about a specific package type:
# rpm -qi package_name |
Use option -e to remove a package type:
# rpm -e package_name |
Optimizing runtime
The Linux /proc/sys filesystem offers many settings that can be easily configured or disabled without recompiling the kernel. Try configuring the following options for runtime optimization:
Increase the maximum file size and inode count. For example, to increase your maximum file size to 4096 and inode count to 12288, use the command
# echo 4096 > /proc/sys/fs/file-max
# echo 12288 > /proc/sys/fs/inode-max
|
Disable TCP timestamps. Where the default value of 1 is enabled, use the command
# echo 0 > /proc/sys/net/ipv4/tcp_timestamps |
Disable TCP selective acknowledgments using the command
# echo 0 > /proc/sys/net/ipv4tcp_sack |
Suppressing the last-access timestamp
In addition to timestamps that let you see when files were created and last modified, Linux also provides a timestamp of the last access time (reading and writing) for a file. This information is generally irrelevant, and suppressing it can result in significant performance improvement.
The ext2 filesystem allows root to mark individual files, directories, or an entire filesystem such that the last access time is not recorded. Use the chattr command to set the noatime flag attribute to a file.
# chattr +A file_name |
You can set the noatime flag attribute recursively to all files below a given directory.
# chattr -R +A directory_name |
Additionally, entire partitions can be mounted with the noatime flag attribute. Simply edit /etc/fstab, adding the noatime option under the options column, separated by a comma.
# device mount point type options dump frequency fsck pass number /dev/hda8 / ext2 defaults, noatime 1 2 |
Increasing the allowed number of running processes
The file /usr/src/linux/include/linux/tasks.h lets you define the number of processes that can run on your Linux system. The default value for the Red Hat Linux 6.1 on which we ran our configurations is 2560 (#define NR_TASKS 2560). You can change this value to run up to 4096 processes. To do so, however, you would need to recompile the kernel, an explanation of which is well beyond the scope of this article.
With your hardware working at full capacity and your Linux kernel configured to your liking, you're ready to get started on the IBM Developer Kit 1.3. In the next section we'll show you how to get the most out of your IBM Developer Kit 1.3 by tuning the JVM. We show you how to access all configurable parameters of the JVM, but focus on the one that can most affect performance: the Java heap. We also show you how to disable class garbage collection and discuss the pros and cons of using the JIT compiler.
|
Tuning your IBM Developer Kit 1.3
The performance of your Java application can be improved by tuning several parameters of the JVM. You can set JVM parameters with the Java program launcher, java. The command-line arguments for java are as follows:
# java
Usage: java [-options] class [args...]
to execute a class)
or java -jar [-options] jarfile [args...]
(to execute a jar file)
|
Listing 7 shows all the options available under java.
Listing 7. Options available under java
The following performance-related JVM parameters are considered nonstandard under the Java 2 specification and are subject to change without notice. Use option -X under java to view these parameters:
Listing 8. Parameters considered nonstandard under Java 2
# java -X
-Xbootclasspath:<directories and zip/jar files separated by ;>
set search path for bootstrap classes and resources
-Xbootclasspath/a:<directories and zip/jar files separated by ;>
append to end of bootstrap class path
-Xbootclasspath/p:<directories and zip/jar files separated by ;>
prepend in front of bootstrap class path
-Xnoclassgc disable class garbage collection
-Xms<:size> set initial Java heap size
-Xmx<:size> set maximum Java heap size
-Xrs reduce the use of OS signals
-Xcheck:jni perform additional checks for JNI functions
-Xcheck:nabounds perform additional checks for JNI array operations
-Xrunhprof[:help]|[:7lt;option>=<value>, ...]
perform heap, cpu, or monitor profiling
-Xdebug enable remote debugging
-Xfuture enable strictest checks, anticipating future default
|
Note that a number of options that were available under the IBM Developer Kit version 1.1.8 are not available under version 1.3. The option -noasyncgc and options to deal with the Java stack size and the native thread stack size (such as -oss and -ss) are unavailable under IBM Developer Kit 1.3.
The Java heap
In the latest version of the IBM Developer Kit, the Java heap is managed differently than it has been in previous releases. Rather than simply manage heap size to yield a fixed fraction of free space, the JVM actually measures garbage collection overhead throughout operation. This allows free space to be increased beyond normal targets when allocation rates are especially high. In this way the JVM can respond to the actual needs of the program, rather than a simpler but less accurate estimate. Care has also been taken to ensure the heap size is expanded by a useful fraction, avoiding the need for frequent expansion events during periods of increasing memory pressure. When heap utilization is low, memory is returned to the operating system by shrinking the heap.
Setting the heap size
If you are running many processes (including JVMs) on your system, you must analyze your application's heap size requirement and then use the -Xms and -Xmx parameters to set smaller heaps. -Xms sets the initial heap size and -Xmx sets the maximum heap size. The memory you allocate to one application will not be available to other applications, so make your calculations with care.
To set the initial heap size of HelloWorld.class to 64 megabytes and the maximum heap to 128 megabytes, enter the command:
Java -Xms64m -Xmx128m HelloWorld. |
Once you've set the heap size for your application, use the -verbosegc command to test the results. Listing 9 is an example of the output of -verbosegc.
Listing 9. Output of -verbosegc
The output in Listing 9 shows the three phases of garbage collection: mark, sweep, and compaction. It also shows the reference objects and their types: soft, weak, final, and phantom. Each type corresponds to a different level of reachability and is used for implementing different types of objects. The first garbage-collection action shows it moved 3130 objects, freeing 190424 bytes; it also shows the time spent for each of the three phases: mark (3 minutes), sweep (1 minute), and compaction (6 minutes).
In general, compaction always takes the most time. The last line shows the total number of bytes moved during garbage collection (33138928 bytes) and the total elapsed running time of garbage collection (2322 minutes). For a detailed explanation of garbage collection, see Resources.
Heap size and physical memory
Make sure that you never allow the heap size to be larger than physical memory. The maximum heap size that you set with the option -Xmx should always be less than the difference between the total physical memory size less other processes' working set size.
A large setting for the Java heap will cause more harm than good. In the following example, we ran an application with the settings -Xms128m -Xmx196m. Since our test system has only 256 megabytes, we thus had only 128 megabytes left for the system to share with other applications after Java had been initialized. The vmstat data in Listing 10 clearly indicates that memory was a constraint and heavy paging occurred (840 pages were swapped out).
Listing 10. Output of vmstat, indicating an overlarge heap setting
# vmstat 5 10
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 0 0 940 23296 76508 41800 0 0 4 0 140 222 2 4 93
1 0 0 4472 3152 50508 22440 0 840 0 215 1841 39 69 23 9
0 0 0 4472 3144 50508 22440 0 0 0 0 122 19 91 2 6
0 0 0 4472 3144 50508 22440 0 0 0 0 121 18 0 3 97
|
Heap size and garbage collection
Heap size is dynamically grown and shrunk based on current system memory resources and workload demand. The JVM will expand the heap when less than a certain percentage of the heap is not free after a garbage collection. In the IBM Developer Kit 1.3, this value is set at 30 percent. When necessary, the JVM grows the heap to reach this required free-space threshold. This heap management solution yields three important benefits: fast object allocation, low contention on heap lock, and minimal heap fragmentation.
Manually controlling heap expansion after GC
Manually controlling heap expansion after garbage collection can improve application performance. The following four options, introduced in the IBM Developer Kit, version 1.1.8, allow manual control of garbage-collected heap expansion and shrinkage.
-minesets the minimum size of each expansion of the heap. This option also sets the minimum size of free object space in the heap. The heap will be returned to the OS only when the amount of free object space in the heap space is more than "mine" bytes.-maxesets the maximum size of each expansion of the heap when more memory is required.-minfsets the approximate percentage of minimum free space in the heap. This controls the rate of heap expansion. After expansion, approximately "minf" percent of the heap should be free. The actual amount of free space after expansion is dependent on "minf" and fragmentation of free spaces within the heap.-maxfsets the approximate percentage of maximum free heap. If the percentage of free object space in the heap is more than "maxf" and the amount of free object space in the heap is more than "mine", and free object space is located at the top of the heap, the garbage collector will attempt to shrink the heap by returning a portion of free object space back to the OS. The heap size will not shrink below its initial size.
The default heap values are set as follows for the IBM Developer Kit 1.3:
- -Xmine: 1 MB
- -Xmaxe: 4 MB
- -Xminf: 0.30
- -Xmaxf: 0.60
As discussed above, IBM Developer Kit 1.3 aims to keep a minimum of 30 percent of the heap free, so it tends to expand more aggressively than IBM Developer Kit 1.1.8. Also note that the growth ratio is not set.
Try the following settings:
- Use -Xmaxf if you have a long-running application and your working set size is variable. Set -Xmaxf to 50 percent for more aggressive shrinking.
- Set -Xminf to 0.1 and -Xmaxf to 0.3 for conservative memory use.
- Set -Xminf to 0.15 and -Xmaxf to 0.25 to save more space (at the expense of throughput).
- Set -Xminf to > 0.35 and -Xmaxf to 1.0 for aggressive heap growth and no shrinkage.
Avoid these two combinations:
- -Xmaxf = -Xminf
- -Xmaxf + -Xminf = 1 or more
If you mistakenly set java -Xminf0.5 -Xmaxf0.3 HelloWorld, for example, you will receive the following warning:
Incompatible minimum and maximum heap free percents specified:
minimum: 0.500000, maximum: 0.300000
The minimum must be at least .05 less than the maximum.
The default initial and maximum are 0.300000 and 0.600000.
|
Disabling class garbage collection
If you want to enable more class reuse, disable class garbage collection. To turn off class garbage collection, use the option -Xnoclassgc. By default, the Java interpreter reclaims space for unused Java classes during garbage collection.
Pros and cons of using the JIT compiler
Applications written in the Java programming language are translated by the Java compiler (javac) into an architecture-independent distribution format, called Java byte code, which is interpreted by a JVM during execution on a hardware platform. While the use of byte code and JVM allows the "write-once, run-anywhere" capability that makes Java so attractive, the execution by interpretation imposes a substantial performance penalty. One way to alleviate this performance penalty is to use the JIT compiler, which converts Java byte code sequences on the fly, rendering equivalent sequences of the underlying machine's native code.
The downside of using the JIT compiler is that the Java program's execution time will be impacted by the compilation overhead of the JIT compiler. To alleviate this, the JIT compiler is only invoked for methods that are computer-intensive, so the time taken to perform these methods is recovered by the faster runtime execution.
Despite the program execution overhead, we recommend you use the JIT compiler. On the IBM Developer Kit 1.3, the JIT compiler is turned on by default. If you want to disable the JIT compiler, use the argument -Djava.compiler=NONE or -Djava.compiler=" " upon launching the java command, as in the example below.
java -Djava.compiler=NONE HelloWorld
or java -Djava.compiler=" " HelloWorld
|
To learn more about the performance tradeoffs of the JIT compiler, see Resources.
|
Profiling application performance
You can learn a lot about a Java application by running an execution profile of its CPU usage, heap, and monitor contention. For this purpose, the Java 2 Platform introduced the JVM Profiling Interface (JVMPI), a set of hooks that enables you to capture an application's execution profile. Once captured, the data can be processed by a profiling tool called hprof. hprof can also report complete heap dumps and apprise you of the state of every monitor and thread in the JVM. In this section we show you how to capture and process an application's execution profile.
To begin, run the java option -Xrunhprof:help. This will give you a list of suboptions available under hprof and their default values.
Listing 11. Output of java -Xrunhprof:help
# java -Xrunhprof:help
Hprof usage: -Xrunhprof[:help]|[=, ...]
Option Name and Value Description Default
--------------------- ----------- -------
heap=dump|sites|all heap profiling all
cpu=samples|times|old CPU usage off
monitor=y|n monitor contention n
format=a|b ascii or binary output a
file=<file> write data to file
java.hprof(.txt for ascii)
net=<host>:<port> send data over a socket
write to file
depth=<size> stack trace depth 4
cutoff=<value> output cutoff point 0.0001
lineno=y|n line number in traces? y
thread=y|n thread in traces? n
doe=y|n dump on exit? y
Example: java -Xrunhprof:cpu=samples,file=log.txt,depth=3 FooClass
|
hprof's output contains the following types of records:
- THREAD START/THREAD END
- Marks the lifetime of Java threads.
- TRACE
- Represents a Java stack trace. Each trace consists of a series of stack frames. Other records refer to TRACE to identify where object allocations have taken place, the frames in which garbage-collected roots were found, and frequently executed methods.
- HEAP DUMP
- Is a complete snapshot of all live objects in the Java heap. The following distinctions apply:
- ROOT: root set as determined by GC
- CLS: classes
- OBJ: instances
- ARR: arrays
- SITES
- Is a sorted list of allocation sites. This identifies the most heavily allocated object types and the trace at which those allocations occurred.
- CPU SAMPLES
- Is a statistical profile of program execution. The JVM periodically samples all running threads and assigns a quantum to active traces in those threads. Entries in this record are traces ranked by the percentage of total quanta they consumed; top-ranked traces are typically hot spots in the program.
- CPU TIME
- Is a profile of program execution obtained by measuring the time spent in individual methods (excluding the time spent in callees), as well as by counting the number of times each method is called. Entries in this record are traces ranked by the percentage of total CPU time. The "count" field indicates the number of times each trace is invoked.
- MONITOR TIME
- Is a profile of monitor contention obtained by measuring the time spent by a thread waiting to enter a monitor. Entries in this record are traces ranked by the percentage of total monitor contention time and a brief description of the monitor. The "count" field indicates the number of times the monitor was contended at a given trace.
- MONITOR DUMP
- Is a complete snapshot of all the monitors and threads in the system.
A detailed explanation of performance analysis using records available in hprof is beyond the scope of this article, so we will simply get you started with a couple of introductory examples. See Resources for a more detailed explanation of performance analysis using hprof.
Example 1. Using hprof to track down a memory hog
In this example, we are trying to find the memory leak in the application MemoryBenchmark. We begin by taking a snapshot of the Java heap for the application in question (MemoryBenchmark) and saving the output into a text file called log1.txt. To do so, we use the following command:
Java -Xrunhprof:heap=all, file=log1.txt MemoryBenchmark |
The saved output in log1.txt contains thousands of lines detailing the Java stack trace, the heap dump, and a sorted list of allocated objects. The heap dump output begins with the statement "HEAP DUMP BEGIN". Here is the saved heap dump output:
Listing 12. Output of HEAP DUMP
HEAP DUMP BEGIN (3417 objects, 191340 bytes) Fri Oct 6 11:22:14 2000 ROOT 0 (kind=<thread>, id=0, trace=1) ROOT 812b370 (kind=<thread>, id=3, trace=1) ROOT 8125e68 (kind=<thread>, id=2, trace=1) ROOT 8121658 (kind=<thread>, id=1, trace=1) ROOT 8258c80 (kind=<native stack>, thread=0) . . . |
The first line of the output above enumerates the number of live objects in the Java heap and its total size of 191340 bytes. As discussed in the previous section, you can use the total heap size to specify your initial Java heap size.
The SITES record is an ordered list of allocated objects. The top 10 objects are shown in Listing 13.
Listing 13. Output of SITES
SITES BEGIN (ordered by live bytes) Fri Oct 6 11:22:14 2000
percent live alloc'ed stack class
rank self accum bytes objs bytes objs trace name
1 17.13% 17.13% 32776 2 32776 2 1161 [C
2 8.57% 25.70% 16392 2 16392 2 1159 [B
3 8.30% 33.99% 15872 92 16148 105 1 [C
4 4.22% 38.21% 8068 1 8068 1 29 [S
5 3.81% 42.02% 7288 4 7288 4 1040 [C
6 2.94% 44.95% 5616 36 5616 36 1 <Unknown>
7 2.36% 47.32% 4524 29 4524 29 1 java/lang/Class
8 1.93% 49.25% 3692 1 3692 1 784 [L<Unknown>;
9 1.32% 50.57% 2532 1 2532 1 30 [I
10 1.07% 51.64% 2052 1 2052 1 1145 [B
|
The first line shows that the class name [C (array of character) was found live on the heap two times, consuming a total of 32776 bytes. The reference key to this object, listed under the column "stack trace", is 1161. With this data we ran a search for trace 1161, which pointed to the source BufferedWriter.java where the objects were allocated.
Listing 14. Results of a trace search
TRACE 1161:
java/io/BufferedWriter.<init>(BufferedWriter.java:96)
java/io/BufferedWriter.<init>(BufferedWriter.java:79)
java/io/PrintStream.<init>(PrintStream.java:87)
|
Continuing the search for trace 1161 in the HEAP DUMP record, we found two objects allocated, each with a size of 16388 bytes; the objects were identified as shown in Listing 15.
Listing 15. Results of a trace search in HEAP DUMP
ARR 8191bd8 (sz=16388, trace=1161, nelems=8192, elem type=1char) ARR 8192338 (sz=16388, trace=1161, nelems=8192, elem type=1char) |
As you can see from our first example, you can use hprof to not only identify which objects are using the most memory, but also to trace back to the original source code of a memory-consuming object. If memory reduction were your performance goal, as in the example above, you would reduce the number of elements of type "1char" from 8192 to a lower number to meet your requirements.
In addition to memory utilization profiling, hprof also provides information on CPU usage via two options. The first option, cpu=samples, reports the frequency count of a particular method. The second option, cpu=times, reports the execution time of methods, ranking the heaviest users first, as shown in our second example.
Example 2. Monitoring CPU time
CPU TIME tells where execution time is spent in your application, helping you to detect and tune out resource hogs. A typical CPU TIME listing is shown in Listing 16. Note that CPU utilization is ranked to help you quickly identify the resource hog.
Listing 16. Output of CPU TIME
# java -Xrunhprof:cpu=times myclass
CPU TIME (ms) BEGIN (total = 290) Mon Oct 09 14:32:36 2000
rank self accum count trace method
1 6.90% 6.90% 1 22 sun/io/ByteToCharISO8859_1.convert
2 6.90% 13.79% 106 19 java/io/BufferedReader.readLine
3 3.45% 17.24% 105 5 java/lang/String.
4 3.45% 20.69% 1 17 java/util/Collections.
5 3.45% 24.14% 286 16 java/util/jar/Attributes$Name.isValid
6 3.45% 27.59% 1 12 sun/misc/URLClassPath.access$100
7 3.45% 31.03% 1 10 java/lang/System.initializeSystemClass
8 3.45% 34.48% 1 9 java/io/Win32FileSystem.normalize
9 3.45% 37.93% 1 8 sun/misc/Launcher$ExtClassLoader.getExtClassLoader
10 3.45% 41.38% 1 21 java/security/Security.loadProviders
|
A profiling tool like hprof merely offers a window into your application's resource usage. It is up to you to analyze the results and tune your application for maximum performance. We'll get you started on application performance tuning in the next section.
|
Tuning the application
Tuning your hardware capacity, the Linux kernel, and the IBM Developer Kit 1.3 will go a long way toward improving your application performance. But ultimately the performance potential of your applications rests on whether you employ good design and programming techniques in your development effort. Before you even begin testing and tuning your Java application, you must be sure you've written clean, working, and well-documented code.
In this section, we'll focus on basic Java coding tips, provide some information about code optimization and IBM Developer Kit 1.3, and close with a discussion of the performance benefits of using java archive (jar) files.
Coding tips
Please note that the coding tips below function as a general checklist for writing clean, high-performance code. See Resources for a listing of more in-depth articles on Java programming techniques.
- Use local variables whenever possible, since the scope of variables can impact performance.
- Use int (32-bit) instead of long (64-bit).
- Use arrays instead of vectors.
- Use local variables in loops.
- Use primitive types such as int and double instead of objects. Using primitive types for variables sidesteps the costs of object creation and manipulation, saving significant amounts of memory and processor bandwidth.
- For thread switching, you can use either
wait()/notify()orwait()/notifyAll()according to the needs of your algorithm. The performance cost for each is the same under IBM Developer Kit 1.3. - Use exceptions only when necessary, since exception handling exacts a high performance cost.
- Implement object reuse as much as possible. Try to create few new objects.
- Avoid writing to the console. In production code, only write system-critical information to the console. Although helpful in debugging, writing to the console involves a great deal of string manipulation, text formatting, and serial output. These are typically slow operations even when the console is not displayed.
- Cache frequently used objects whenever possible. Caching objects eliminates the time needed to allocate new objects, and also reduces the frequency of garbage collection. In applications, cache frequently used objects for reuse. In libraries, keep pools of new objects available.
- Before bundling all of your code into a jar file (discussed below), profile its execution to identify which code is being executed. Bundle only the most frequently executed code.
- Declare methods as final. Classes and methods that aren't going to be redefined should be declared as final. The JVM, compilers, and JITs can all optimize final classes and methods better by removing dynamic method invocation.
- Declare constants as static final. Since the JVM, compilers, and JITs can all optimize static final variables for speed by reducing stack manipulation. JVMs can use immediate operations.
- Limit synchronized methods. Use synchronized methods only when necessary and at an appropriate level of granularity. Try to use algorithms that don't use locks or hold locks for short periods of time. Locks can have a high performance overhead, limit the effectiveness of threading, and cause priority inversions.
- Null old object references. When an object is no longer needed, set all references to the object to null and clear all internal system references, instead of waiting to reassign or cache the referencing variable. The efficiency of
gc()is pretty much the time to collect abandoned objects divided by this time plus the time to examine objects that are still referenced. Also, in the event that you discover memory leaks, you can find the cause of these memory leaks more easily when you remove unneeded references. - Cache with soft references. Using the correct reference type (soft, weak, phantom) along with the optional claimed reference queue allows the application to maintain extra data and the garbage collector to scavenge these weak links when things get tight.
- Cache information that will be reused and is expensive to generate, such as INI files and network information. Some basic operations such as reading files and retrieving information from a network are expensive. In such cases, the added memory requirement of caching is more than made up for.
A note about code optimization
In previous implementations of the JDK, code performance could be optimized by using the javac -O (optimization) option. Unfortunately, this option is no longer working under IBM Developer Kit 1.3. The good news is that the JIT compiler automatically implements several architecture-specific optimizations, such as code scheduling and register allocation. As a result, looping and arithmetic operations are much faster under IBM Developer Kit 1.3 than they were under version 1.1.8. See Resources for more information about code optimization and the JIT compiler.
Using jar files for faster loading
When your applet is executed, all of its classes must be loaded, linked, and initialized. The more classes you have, the slower the files will load -- especially if the classes have to be loaded one file at a time. A simple way to reduce load time is to compress all the classes into a java archive format file, commonly known as a jar. The jar format allows a Java applet and all of its associated class files to be compressed as a single unit. As a result, they can be downloaded by a browser in a single HTTP transaction (rather than by establishing a new connection as each file is needed).
During applet execution, members of a jar file are downloaded in their compressed format and then dynamically decompressed. You create jar files using the jar tool that comes with the IBM Developer Kit 1.3. Enter command jar to receive the following list of options:
Listing 17. Options under jar
# jar
Usage: jar {ctxu}[vfm0M] [jar-file] [manifest-file] [-C dir] files ...
Options:
-c create new archive
-t list table of contents for archive
-x extract named (or all) files from archive
-u update existing archive
-v generate verbose output on standard output
-f specify archive file name
-m include manifest information from specified manifest file
-0 store only; use no ZIP compression
-M do not create a manifest file for the entries
-i generate index information for the specified jar files
-C change to the specified directory and include the following file
|
Directory files will be processed recursively. The manifest file name and the archive file name must be specified in the same order that the m and f flags are specified. Creating jar files is fairly simple, just follow the examples below.
To archive two class files into a jar called classes.jar, enter the command
jar cvf classes.jar Foo.class Bar.class |
To use an existing manifest file (mymanifest) and archive all the files in a directory (foo/) into the archive classes.jar, enter the command
jar cvfm classes.jar mymanifest -C foo/ |
To bundle a number of existing files (file1.html, file2.html, etc.) into a jar, enter
# jar cvf file.jar file1.html file2.html file3.html |
At runtime, the jar file is downloaded through the specification of the ARCHIVE= attribute in the HTML code:
<APPLET ARCHIVE = "file.jar" CODE = "myclass" WIDTH= . . . HEIGHT= . . . > <PARAM NAME = "parm1" VALUE = "value1"> </APPLET> |
|
Conclusion
This article provides basic start-to-finish tuning techniques that you can use to take the first steps toward faster Java application performance. For optimum application performance, employ sound object-based design from the beginning of your software development cycle and measure and tune application performance at regular intervals. As you improve your understanding of how each element of the Java platform works together, you will naturally develop increasingly robust, high-performance Java applications.
discuss this topic to forum
