Performance tuning your JDBC application
OverviewThe concepts involved with Java, JDBC, and DBMS processing are new to many programmers. As Java becomes more widely used, database access and database applications will become increasingly easy to implement. This document provides some tips on how to obtain the best performance from JDBC applications.
WebLogic performance-enhancing featuresWebLogic has several features that enhance performance for JDBC applications.
Connection PoolsEstablishing a JDBC connection with a DBMS can be very slow. If your application requires database connections that are repeatedly opened and closed, this can become a significant performance issue; creating a database connection can be exponentially more expensive than any other. WebLogic connection pools offer an efficient solution to this problem.
When the WebLogic Server is started, a connection pool is created containing connections that are independent of any single client. Unlike applications that open and close a new connection for each query, a connection from a connection pool may be used by any client; and its lifetime is not tied in any way to the lifetime of a client. When a client closes a connection from a connection pool, the connection is returned to the pool and becomes available again for other clients, but the connection itself is not closed. Since the connections already exist, there is no overhead involved with opening and closing the connection, which improves overall performance.
How many connections should you create in the pool? A connection pool can grow and shrink according to a configured interval, up to a maximum number of connections. The best performance will always be when the connection pool is just large enough to service users without waits.
You can set properties in the weblogic.properties file that tune how the WebLogic Server handles client requests. You should make sure that WebLogic is configured with enough execute threads to service both incoming client requests and connections in the connection pool. A good rule to follow is:
ExecuteThreads => UserConnections + ConnPoolSize
Where ExecuteThreads is the value of the property weblogic.system.executeThreadCount, and it is at least equal to the number of concurrent users you expect (which should correspond in some ratio to the limit of connections you have configured with the weblogic.system.maxConnections property) plus the number of connections in the connection pool. You can reduce the number of execute threads to suit less powerful hardware, but some of your clients may have to wait for service.
For more information read Using the WebLogic JDBC t3 Driver: Using connection pools.
Caching data in WebLogic Workspaces
Accessing a DBMS is an operation that uses considerable resources. If your program accesses frequently used data that can be shared among applications or can persist between connections, you can store the data in a WebLogic Workspace. Data in a Workspace can be obtained, altered, and either kept private or shared quickly by any application without repeatedly querying the DBMS.
WebLogic provides bidirectional asynchronous service to any client application. You can write a server-side class that queries the DBMS and maintains Workspace contents for applications. Your server-side class can generate events that notify client applications of changes in Workspace contents.
Speeding up applets with archives
The JDBC protocol is also resource-intensive. An applet that uses JDBC directly will need to download many classes, which takes time. WebLogic has utilities that scan an HTML server log and create a zip file of classes for an applet. Using this zip file by referencing it in the ARCHIVE attribute of your applet will improve performance when the applet first starts.
An even faster alternative is to avoid JDBC in the applet when possible, and obtain DBMS data in HTML form from servlets instead. Servlets can run queries for the applet, and/or retrieve data from Workspaces and supply it as HTML. In concert with WebLogic processes which asynchronously maintain this data, the performance of your applications will improve.
Row caching: WebLogic JDBC and Oracle
WebLogic JDBC is not just a passive proxy for JDBC calls. It will, by default, pre-fetch up to 25 rows for any query sent by a WebLogic client, and supply them in an array to the WebLogic JDBC driver. The driver can then make next() calls without each next() call requiring a round-trip to WebLogic and to the DBMS and back. This 25-row default can be altered via the property weblogic.t3.cacheRows, and can also be changed by the client via a call to the Connection.cacheRows(int) method.
Oracle also provides array fetching to its clients, and jDriver for Oracle supports this feature. By default, jDriver for Oracle will array-fetch up to 100 rows from the DBMS. This number can be altered via the property weblogic.oci.cacheRows.
By using the above methods, a WebLogic JDBC query for 100 rows will make only 4 calls from the client to WebLogic, and for only one of those will WebLogic actually go all the way to the DBMS for data.
Designing your application for best performance
The large majority of the performance to be gained or lost in a DBMS application is not in the application language, but in how the application is designed. The number and location of clients, size and structure of DBMS tables and indexes, and the number and types of queries all affect application performance.
1. Streamline data before the result crosses the network
Most serious performance problems in DBMS applications come from moving raw data around needlessly, whether it is across the network or just in and out of cache in the DBMS. A good method for minimizing this waste is to put your logic where the data is -- in the DBMS, not in the client -- even if the client is running on the same box as the DBMS. In fact, for some DBMSes a fat client and a fat DBMS sharing one CPU is a performance disaster.
Most DBMSes provide stored procedures, an ideal tool for putting your logic where your data is. There is a significant difference in performance between a client that calls a stored procedure to update 10 rows, and another client that fetches those rows, alters them, and sends update statements to save the changes to the DBMS.
You should also review the DBMS documentation on managing cache memory in the DBMS. Some DBMSes (Sybase, for example) provide the means to partition the virtual memory allotted to the DBMS, and to guarantee certain objects exclusive use of some fixed areas of cache. This means that an important table or index can be read once from disk and remain available to all clients without having to access the disk again.
2. Use built-in DBMS set-based processing
SQL is a set processing language. DBMSes are designed from the ground up to do set-based processing. Accessing a database one row at a time is, without exception, slower than set-based processing and, on some DBMSes is poorly implemented. For example, it will always be faster to update each of four tables one at a time for all the hundred employees represented in the tables than to alter each table 100 times, once for each employee.
Understanding set-based methodology can be very useful. Many complicated processes that were originally thought too complex to do any other way but row-at-a-time have been rewritten using set-based processing, resulting in improved performance. For example, a major payroll application was converted from a huge slow COBOL application to four stored procedures running in series, and what took hours on a multi-CPU machine now takes fifteen minutes with many fewer resources used.
3. Make your queries as smart as possible
Frequently customers ask how to tell how many rows will be coming back in a given result set. This is a valid question, but there is no easy answer. The only way to find out without fetching all the rows is by issuing the same query using the count keyword:
SELECT count(*) from myTable, yourTable where ...
This will return the number of rows the original query would have returned. The actual count may change when the query is issued if there has been any other DBMS activity which alters the relevant data.
You should be aware, however, that this is a resource-intensive operation. Depending on the original query, the DBMS will have to perform nearly as much work to count the rows as it will to send them.
Your application should tailor its queries to be as specific as possible about what data it actually wants. Tricks include first selecting into temporary tables, returning only the count, and then sending a refined second query to return only a subset of the rows in the temporary table.
Learning to select only the data you really want at the client is crucial. Some applications ported from ISAM will unnecessarily send a query selecting all the rows in a table when only the first few rows are really wanted. Some applications use a 'sort by' clause to get the rows they want to come back first. Database queries like this cause unnecessary degradation of performance.
Proper use of SQL can avoid these performance problems. For example, if you only want data about the top 3 earners on the payroll, the proper way to make this query is with a correlated subquery. Here is the entire table, which is returned by the SQL
Here is the much smaller result returned by a correlated subquery:
This query will return only 3 rows, with the name and salary of the top 3 earners. It scans through the payroll table, and for every row, it goes through the whole payroll table again in an inner loop to see how many salaries are higher than the current row of the outer scan. This may look complicated, but DBMSes are designed to use SQL efficiently for this type of operation.
4. Make transactions single-batch wherever possible
Whenever possible, collect a set of data operations and submit an update transaction in one statement in the form:
This approach results in better performance than using separate statements and commits. Even with conditional logic and temporary tables in the batch, it is preferable because the DBMS will obtain all the locks necessary on the various rows and tables, and will use them and release them in one step. Using separate statements and commits results in many more client-to-DBMS transmissions and holds the locks in the DBMS for much longer. These locks will block out other clients from accessing this data, and, depending on whether different updates can alter tables in different orders, may cause deadlocks.
Warning: If any individual statement in the above transaction might fail, due, for instance, to violating a unique key constraint, you should put in conditional SQL logic to detect any statement failure and rollback the transaction rather than commit. If, in the above example, the insert failed, most DBMSes will send back an error message about the failed insert, but will behave as if you got the message between the second and third statement, and decided to commit anyway! MS SQLServer has a nice connection option enabled by executing the SQL set xact_abort on, which will automatically roll back the transaction if any statement fails.
5. Never have a DBMS transaction span user input
If an application sends a 'BEGIN TRAN' and some SQL which locks rows or tables for an update, do not write your application so that it must wait on the user to press a key before committing the transaction. That user may go to lunch first and lock up a whole DBMS table until he comes back.
If user input is needed to form or complete a transaction, use optimistic locking. Briefly, optimistic locking employs timestamps and triggers (some DBMSes will generate these automatically with tables set up for it) in queries and updates. Queries select data with timestamp values and prepare a transaction based on that data, without locking the data in a transaction.
When an update transaction is finally defined by the user input, it is sent as a single submission that includes timestamped safeguards to make sure the data is the same as originally fetched. A successful transaction will automatically update the relevant timestamps for changed data. If any interceding update from another client has altered any of the data on which the current transaction is based, the timestamps will have changed, and the current transaction will be rejected. Most of the time, no relevant data has been changed so transactions usually succeed. When one a transaction fails, the application can refetch the updated data to present to the user to reform the transaction if desired.
6. Use in-place updates
Changing a data row in place is much faster than moving a row, which may be required if the update requires more space than the table design can accommodate. If you design your rows to have the space they need initially, updates will be faster. The trade-off is that your table may require more disk space but may run faster. Since disk space is cheap, using a little more of it can be a worthwhile investment to improve performance.
7. Keep your operational data set as small as possible
Some applications store operational data in the same table as historical data. Over time and with accumulation of this historical data, all operational queries have to read through lots of useless (on a day-to-day basis) data to get to the more current data. Move non-current data to other tables and do joins to these tables for the rarer historical queries. If this can't be done, index and cluster your table so that the most frequently used data is logically and physically localized.
8. Use pipelining and parallelism
DBMSes are designed to work best when very busy with lots of different things to do. The worst way to use a DBMS is as dumb file storage for one big single-threaded application. If you can design your application and data to support lots of parallel processes working on easily distinguished subsets of the work, your application will be much faster. If there are multiple steps to processing, try to design your application so that subsequent steps can start working on the portion of data that any prior process has finished, instead of having to wait until the prior process is complete. This may not always be possible, but you can dramatically improve performance by designing your program with this in mind.
9. Choose the right driver for your application
There are several possible choices for a JDBC driver. The Type 2 WebLogic jDriver for Oracle uses a native layer and is usually faster than a pure-Java Type 4 driver, although this difference is narrowing as JITs continue to increase their performance. A Type 3 driver (also pure-Java) uses no native methods, and requires no client-side installation of vendor libraries. For an overview see WebLogic's whitepaper Choosing a Java Database Connectivity driver.
Copyright © 2000 BEA Systems, Inc. All rights reserved.