WebLogic Server Performance Tuning GuideGuidelines for Optimizing the Performance of Your WebLogic Server
This document contains information on how to tune the WebLogic Server to match your application needs. Specifically, tips and instructions are provided that will allow you to configure different operating parameters of the WebLogic Server. Information on how to tune your application running on the WebLogic Server is also provided.
This section defines terms related to performance and tuning that are used throughout this document.
Platform: The platform consists of all of the components that come together to compose the environment on which the WebLogic Server will run. This includes:
Response Time: Response time is the time that a single task takes to complete. The factors that determine response time for a given request are:
Throughput: Throughput refers to the amount of work completed in a fixed time period. There are many ways to think about throughput. For example, you can want to find the throughput of a given server. You can speak about throughput of an entire cluster of servers operating in parallel. Or, you can even measure the throughput of a given client.
For the purposes of this document, we will commonly refer to throughput as the total number of requests handled by a given server or cluster of servers in a given time period. This number is commonly measured in transactions per second (tps).
Scalability: Scalability can be defined as the ability of a system to grow in one or more dimensions as more resources are added to the system. Typically, these dimensions include (among other things), the number of concurrent users that can be supported and the number of transactions that can be processed in a given unit of time.
Given a well-designed application, it is entirely possible to increase performance by simply adding more resources (for instance, an extra server). In this regard, the WebLogic Server excels. To increase your load handling capabilities, all you need to do is simply add another WebLogic Server to your cluster -- without changing your application.
The steps for achieving the best performance for your WebLogic Server and your application are:
we will now go through each of these steps.
The word platform in this document refers to all of the components that are used to construct the environment in which the WebLogic Server and your application reside.
At the base level is the hardware and network configuration surrounding your WebLogic Server.
We see similar performance between clusters of smaller systems and a single WebLogic Server running on large-scale multiprocessor systems. However, clusters also provide other benefits as well:
Where possible, you should use a Just-in-Time (JIT) compiler when running the WebLogic Server. JVMs with JITs include those from Sun Microsystems and Symantec. We have seen good performance improvements from the use of JITs.
NOTE: JITs offer greater performance, but also pose greater risks. Many times these interpreters are not as error-free as standard JVM's. Be sure to test your application thoroughly before deploying on a JIT in a production environment.
The Java Virtual Machine is a key component determining performance of the WebLogic Server and customer applications. The Java Virtual Machine (JVM) version, vendor, and execution parameters all affect performance.
You should use only production JVMs on which the WebLogic Server has been certified. Also, we have seen that Java 2 exhibits better performance than Java 1.1.
The parameters that are set when running the JVM are very important. Basically, the factors that affect performance relate to the heap size specified when you execute your JVM.
The heap size determines how often, and for how long, you will spend doing garbage collection (de-allocating unused Java Objects). The following sections cover exactly how you should set these parameters:
Heap Size: The Java heap is a repository for live objects, dead objects and free memory. When the JVM runs out of memory in the heap, all execution in the JVM stops while a Garbage Collection (GC) algorithm goes through memory and frees space that is no longer required by an application. This is an obvious performance hit because your users must wait while GC happens. No server-side work can be done during GC.
To keep this performance degradation at a minimum, you should use the command line option -noclassgc. This will inhibit a thread that would normally clear out unused classes (thus saving the load incurred by that thread).
The goals of tuning your heap size are twofold: minimize the amount of time that you spend doing GC while maximizing the amount of clients that you can handle at a given time.
Java 2 Heap Size: If you are using Java 2, these JDKs have much better garbage collecting facilities. This means that you should use as large a heap size as possible without causing your system to "swap" pages to disk (some implementations actually allocate as much as twice the heap size specified to support the copying half space). If you are booting your WebLogic Server and you see that your allocated "virtual" memory is more than your RAM can handle, you should lower your heap size under Java 2. Typically, use 80% of available RAM not taken by the operating system or other processes for your JVM running WebLogic Server.
Another approach would be to set no initial heap size to ensure that performance does not spike. This theoretically leads to more cycles being spent on garbage collection but often increases the perceived speed of the application. Also, some customers have seen clients being dropped when the garbage collection cycle is extremely long due to the heap sizes being set to 2GB (leading to GC cycles taking longer than 30 seconds). For benchmarking purposes you might set the heap size to high values to ensure that GC does not occur during the entire run of the benchmark and thus ensuring maximum performance. The garbage collectors in HotSpot, Jview and JDK1.2 manage their heap size more efficiently than the previous versions.
Tip: To avoid the difficulty of tuning a single WebLogic Server under a 1.1 JVM, use a cluster of WebLogic Servers on a single computer. For example, if you have 2 GB of physical memory, place a cluster of eight WebLogic Servers on the box and assign each one 128 MB of RAM. That leaves about ½ GB for OS overhead. Because each WebLogic Server will GC at different times, your response blackouts due to GC will be less severe.
Java 1.1 Heap Size: JDK 1.1 has very poor garbage collecting facilities. This makes it more difficult to determine the heap size that you should use.
NOTE: The process detailed here for heap size tuning for JDK 1.1 is time-consuming and difficult. If you do not want to test your WebLogic Server running under load to optimize your heap size, you should set your heap size to as large as possible without causing page swapping to disk during startup. Typically, this value is 80% of existing physical memory. Efficient Weblogic installations typically use between 128-384MB heap size per processor.
In fact, you can only optimize your Heap Size under 1.1 using a running WebLogic Server under expected maximum load. Choosing the most effective heap size under Java 1.1 involves first deciding the longest pause that your deployment situation can stand. In other words, determine what is the acceptable maximum period of time that your clients can wait after making a request if garbage collection (GC) ensues on the server side. The reason that you need to know this is because the server can do no work for your clients while it is collecting garbage.
To illustrate this example, if your clients can wait no longer than 1 second for the server to do GC and respond to them, then you are going to need to tune your server so that the GC period lasts no more than 800 ms.
So, you will need to monitor the performance of the WebLogic Server under maximum load while running your application. You should measure the amount of memory allocated at any one time using the -verbosegc flag included with the JVM. you will see exactly how much time and resources are put into garbage collection. If you are spending unacceptable periods of time in garbage collection, lower your heap size until you reach your maximum. At this point, you will have the maximum possible capacity for your WebLogic Server while keeping your acceptable GC pauses to a minimum. If you find that you have a large amount of RAM remaining, you should run more WebLogic Servers on your machine.
You should tune your operating system according to its operating procedures. Values that should be of interest include file descriptors and the maximum memory available for a user process.
In the case of file descriptors, each socket for the server consumes a file descriptor. As such, you need to configure your operating system to have as many file descriptors as you will need.
In the case of the maximum memory available for a user process, you should check your operating system documentation to see what the maximum allowed size is. In some operating systems, this value is as low as 128 MB.
Be sure that you have an appropriate amount of network bandwidth available for your WebLogic Server and its connections to other tiers in your architecture. Specifically, you should focus on obtaining a large enough "pipe" to support all of the connections that your WebLogic Server needs to make to clients and back-end resources such as databases. Be sure to consider your network bandwidth when optimizing the performance of the WebLogic server.
This value in the weblogic.properties file equals the number of simultaneous operations that can be performed by the WebLogic Server. As work enters a WebLogic Server, it is placed on an execute queue while waiting to be performed. This work is then assigned to a thread that does the work on it.
The default value is 15. For most applications, you should leave this value unchanged. You will only see marginal benefit from increasing this value if you do not know what you are doing. If you are in doubt about this parameter, leave it at the default.
Adding more threads does not necessarily imply that you will be able to process more work. Even if you add more threads, you are still limited by the power of your processor. As such, you can degrade performance by increasing this value unnecessarily. Since threads are resources that consume memory, a very high execute thread count causes more memory to be used and increased context switches. This will degrade your performance as the following explanation illustrates:
Setting the executeThreadCount too high will cause too much context switching. The executeThreadCount value is more CPU related than WebLogic related, so the general rules of thumb regarding threads and CPUs apply. Assume that:
n = executeThreadCount (number of threads) and k = number of CPUs
The following scenarios are possible:
For example, if you have 4 processors, then 4 threads can concurrently be running. So, you want the execute threads to be 4 + (the number of blocked threads).
This is very dependent upon the application. For instance, how long the application might block on threads, which can invalidate the formula above. The value of the executethreadCount depends very much on the type of work the application does. For example, if your client application is thin and does a lot of its work through remote invocation, the time your client app spends connected will be greater than for a client application that does a lot of client-side processing, for example.
If your application makes database calls that takes a long time to return, then you will need more execute threads than an application that makes calls that are short and turnover very rapidly. For the latter, you can use a small number of execute threads and improve performance.
It is also important to note that when the native performance packs are not being used, some of the execute threads will be used to read from the sockets (see weblogic.system.percentSocketReaders).
If your executeThreadCount is too low, you will see the following symptoms under maximum load on your server:
If your executeThreadCount is too high, you will see the following symptoms when running the WebLogic Server under maximum load:
You should use the performance packs for your platform. The performance packs use platform-optimized native I/O. Benchmarks show performance improvements of up to a factor of 3 for most workloads. For a list of currently available performance packs, please see: http://www.weblogic.com/docs/admindocs/tuning.html#perfpacks
NOTE: Only applicable if performance packs are NOT being used. If possible, use the Performance Pack for your platform. This parameter sets the maximum percentage of execute threads that are set to read messages from a socket. Allocating execute threads to act as socket reader threads increases the speed and the ability of the server to accept client requests. Again, an optimal value for this property is very application specific. It is essential to have a good balance between the number of execute threads that are devoted to reading messages from a socket and those that perform the actual execution of tasks in the server.
The default is 33, and the valid range is 1-99.
During operations, if many connections are dropped or refused at the client, and there are no other error messages at the server, the problem could be with the TCP backlog parameter. This parameter specifies how many TCP connections can be buffered in a wait queue. This queue is populated with requests for connections that the TCP stack has received by the application has not accepted yet. This is a fixed size queue definable by this parameter.
If you are getting "connection refused" messages when you access the WebLogic Server, raise this number from the default by 25%. Continue increasing the value of weblogic.system.acceptBacklog by 25% until the messages cease to appear.
When EJBs are created, the bean instance is created and given an identity. When the client removes a bean, the bean instance will be placed in the free pool. A subsequent bean creation can avoid an object allocation by reusing the previous instance that is in the free pool. This property improves performance if there are frequent creation/deletions of EJBs.
Do not change this value unless you frequently create beans, do a quick operation, and then throw them away. Then, enlarge your free pool by 25-50% and see if performance improves. If object creation represents a small fraction of your workload, then increasing this parameter does not get you much. For applications where EJBs are very database intensive, do not change the value of this parameter.
Caution: do not go overboard. Tuning this parameter too high uses extra memory. Tuning it too low will cause unnecessary object creation. If you are in doubt about changing this parameter, leave it unchanged.
The WebLogic Server allows you to configure the number of active beans (with an identity) which are present in the EJB cache. This cache is the in-memory space where beans exist.
When a bean is brought into the cache, ejbActivate() is called, when it is removed, ejbPassivate() is called. It is basically equivalent to virtual memory being kept in memory or on disk. Tuning it too high will use up memory unnecessarily.
In general, the main idea behind setting an optimal value for maxBeansInCache is to avoid excessive passivation (the transfer of an EJB instance from memory to secondary storage) and activation (the transfer of an EJB instance from secondary storage to memory). As mentioned earlier, the EJB container performs passivation when it invokes the ejbPassivate method and when the EJB session object is needed again, it is recalled with the ejbActivate method. When the ejbPassivate() call is made, the EJB object is serialized using the Java serialization API or other similar methods and stored in secondary memory (disk). The ejbActivate() method causes just the opposite.
The container automatically manages this working set of session objects in the EJB cache without the client or server's direct intervention. There are specific callback methods in each EJB which describes how to passivate (store in cache) or activate (retrieve from cache) these objects. Excessive activation and passivation will nullify the performance benefits of caching the working set of session objects in the EJB cache - especially when the application has to handle a large number of session objects.
Activation and passivation of EJBs is analogous to virtual memory on a computer. You want to minimize the number of times that your beans are activated and passivated. The cache size can help minimize this activity. To set determine if you cache size should be bigger, take a number of execution snapshots. Look at these snapshots of your execution and see if there is a lot of passivation and activation going on. If so, increase the size of your cache and see if performance improves. Otherwise, leave this value alone.
You cannot set this value to "false" if you are running in a cluster or if another process will modify or use the same database.
It is safe to have the number of Database connections equal the executeThread count. Additionally, it might be worthwhile to have the executeThread count equal to Database connections be one or two more than the number of execute threads. This way the remaining threads can do work while the others are blocked waiting for the database.
Also, since connections in the pool are allocated on a one per transaction basis, it is possible that long-lived transactions could block the pool for a long time, meaning that you will need to increase your pool size. This has nothing to do with the number of execute thread count but must be taken into consideration. When the native performance packs are not being used, you should also take into consideration the number of threads dedicated to reading from the sockets.
This section covers the settings and configuration that should be used for tuning clients to the WebLogic Server.
If you are using WebLogic RMI clients, there are more than 2 Weblogic servers in a cluster, and the data being transferred is greater than 80KB, you may encounter a significant performance degradation (very long round trip times for stateless session beans, for instance). The fix is to make a couple of property changes on the client side as explained below.
The solution to this problem is to ensure that there are at least as many socket reader threads as there are connections to the server and also allowing for some extra threads for processing other tasks. This is accomplished by starting the client with the command line argument "-Dweblogic.system.percentSocketReaders" set to a sufficiently high percentage (say 50) and by ensuring that there sufficient number of execute threads for other processing on the client. A metric of twice the number of execute threads as there are servers in a cluster should work fine if the above percentage is at 50. The command line argument affecting the number of execute threads is
For instance, we could use:
"-Dweblogic.system.executeThreadCount=10" and "-Dweblogic.system.percentSocketReaders=50"when testing with 3 or 4 servers in a cluster.
The WebLogic Server only performs as well as the applications running inside of it. It is important to determine the bottlenecks that impede performance.
1. OptimizeIt (http://www.optimizeit.com) A good performance debugging tool for Solaris and NT.
2. JProbe (http://www.klg.com) Has a family of products that provide the capability to detect performance bottlenecks, perform code coverage and other metrics.
In addition, Java 2 has some excellent profiling tools. For more information please see http://java.sun.com/products/jdk/1.2/docs
These profilers can show you where you are spending the majority of time during your application's execution.
Use sessions sparingly. Sessions should only be used for state which cannot realistically be kept on the client or if URL rewriting support is required. Use of sessions involves a scalability trade off. Simple bits of state, such as a user's name, should be kept in cookies directly. If desired one can write a wrapper class to do the getting and setting of these cookies in order to make life easier for other servlet developers working on the same project. The fewer accesses made to a session object the better; each is a costly operation. Keep frequently used values in local variables. Put aggregate objects rather than multiple single objects into the session where possible.
Because of the way the type-4 MS SQL driver is written, it may be much faster to create and execute an SQL statement without parameters, or with parameter values converted to their string counterparts and added as appropriate to the string, than to do a long series of setXXX() calls, followed by execute().
Decide how many simultaneous connections you expect (there is never any reason to have more http threads than http connections), how many simultaneous requests you are likely to have and how long the http threads are likely to block servicing the requests. Arrange parameters for the minimum number of threads (but equal to or greater than the number of CPUs) possible so that there is never a situation where all threads are blocked at the same time there are pending requests. Assuming a constant number of connections and a constant request rate from those connections this means that the more your servlets block the more HTTP threads you will need.
Data accessibility is controlled through the transaction-isolation level mechanism, which determines the degree to which multiple interleaved transactions are prevented from interfering with each other in a multi-user database system. Transaction isolation is achieved through use of locking protocols that guide the reading and writing of transaction data. This transaction data is written to the disk in a process called "serialization." Lower isolation levels give you better database concurrency at the cost of less transaction isolation.
You should optimize your application so that it does as little work as possible when handling session persistence. In the case of the WebLogic Server, several options are available for Session Persistence including: in-memory replication and JDBC-based persistence.
In-memory replication is up to 10-times faster than JDBC-based persistence for session state. You should use in-memory replication if possible.
If you are using JDBC-based persistence, you should optimize your code so that it has a high granularity for session state persistence as possible. This means that you want to reduce the number of "puts" that you use during your http session. In the case of JDBC-based persistence, every session "put" that you do results in a database write. You want to minimize how often information is persisted during a given session. Look at your "puts" and see if you can combine them into one large put instead.
If you followed the steps described here, you should now have a well-tuned WebLogic Server and application. You should remember, however, that performance tuning is an iterative science and requires time and effort. Experimentation with your given application and configuration is the best way to develop the best performing system possible on the WebLogic Server.
Copyright © 2000 BEA Systems, Inc. All rights reserved.