On the importance of tuning thread pool in Java Web application

Author：Eve Cole Update Time：2025-05-22 11:32:01

Regardless of whether you are following or not, Java Web applications use thread pools to handle requests to a greater or lesser extent. The implementation details of thread pools may be ignored, but it is necessary to understand sooner or later on the use and tuning of thread pools. This article mainly introduces the use of Java thread pool and how to correctly configure thread pool.

Single threaded

Let's start with the basics. No matter which application server or framework (such as Tomcat, Jetty, etc.) are used, they have similar basic implementations. The basis of a web service is a socket, which is responsible for listening to the port, waiting for the TCP connection, and accepting the TCP connection. Once the TCP connection is accepted, data can be read and sent from the newly created TCP connection.

In order to understand the above process, we do not use any application server directly, but build a simple web service from scratch. This service is a microcosm of most application servers. A simple single-threaded web service looks like this:

 ServerSocket listener = new ServerSocket(8080);try { while (true) { Socket socket = listener.accept(); try { handleRequest(socket); } catch (IOException e) { e.printStackTrace(); } }} finally { listener.close();}

The above code creates a server socket (ServerSocket) , listens to port 8080, and then loops to check the socket to see if there is a new connection. Once a new connection is accepted, the socket will be passed into the handleRequest method. This method parses the data stream into an HTTP request, responds, and writes the response data. In this simple example, the handleRequest method simply implements the read in of the data stream and returns a simple response data. In general implementations, this method will be much more complex, such as reading data from a database, etc.

 final static String response = "HTTP/1.0 200 OK/r/n" + "Content-type: text/plain/r/n" + "/r/n" + "Hello World/r/n"; public static void handleRequest(Socket socket) throws IOException { // Read the input stream, and return "200 OK" try { BufferedReader in = new BufferedReader( new InputStreamReader(socket.getInputStream())); log.info(in.readLine()); OutputStream out = socket.getOutputStream(); out.write(response.getBytes(StandardCharsets.UTF_8)); } finally { socket.close(); }}

Since there is only one thread to process the request, each request must wait for the previous request to be processed before it can be responded. Assuming a request response time is 100 milliseconds, the number of responses per second (tps) of this server is only 10.

Multi-threaded

Although the handleRequest method may block on IO, the CPU can still handle more requests. But in a single threaded case, this cannot be done. Therefore, the server's parallel processing capability can be improved by creating multi-threading methods.

 public static class HandleRequestRunnable implements Runnable { final Socket socket; public HandleRequestRunnable(Socket socket) { this.socket = socket; } public void run() { try { handleRequest(socket); } catch (IOException e) { e.printStackTrace(); } }}ServerSocket listener = new ServerSocket(8080);try { while (true) { Socket socket = listener.accept(); new Thread(new HandleRequestRunnable(socket)).start(); }} finally { listener.close();}

Here, the accept() method is still called in the main thread, but once the TCP connection is established, a new thread will be created to handle the new request, which is to execute the handleRequest method in the previous text in the new thread.

By creating a new thread, the main thread can continue to accept new TCP connections, and these requests can be processed in parallel. This method is called "one thread per request". Of course, there are other ways to improve processing performance, such as the asynchronous event-driven model used by NGINX and Node.js, but they do not use thread pools and are therefore not covered by this article.

In each request one thread implementation, creating a thread (and subsequent destruction) overhead is very expensive because both the JVM and the operating system need to allocate resources. In addition, the above implementation also has a problem, that is, the number of threads created is uncontrollable, which may cause the system resources to be quickly exhausted.

Exhausted resources

Each thread requires a certain amount of stack memory space. In the most recent 64-bit JVM, the default stack size is 1024KB. If the server receives a large number of requests, or the handleRequest method executes slowly, the server may crash because of creating a large number of threads. For example, there are 1000 parallel requests, and the 1000 threads created need to use 1GB of JVM memory as thread stack space. In addition, objects created during the execution of each thread's code may also be created on the heap. If this situation worsens, it will exceed the JVM heap memory and generate a large amount of garbage collection operations, which will eventually cause memory overflow (OutOfMemoryErrors).

These threads not only consume memory, they also use other limited resources, such as file handles, database connections, etc. Uncontrollable creation threads may also cause other types of errors and crashes. Therefore, an important way to avoid resource exhaustion is to avoid uncontrollable data structures.

By the way, due to memory problems caused by thread stack size, the stack size can be adjusted through the -Xss switch. After reducing the stack size of the thread, the overhead per thread can be reduced, but stack overflow (StackOverflowErrors) may be raised. For general applications, the default 1024KB is too rich, and it may be more appropriate to reduce it to 256KB or 512KB. The minimum allowed value in Java is 160KB.

Thread pool

To avoid continuous creation of new threads, you can limit the upper limit of the thread pool by using a simple thread pool. The thread pool manages all threads. If the number of threads has not reached the upper limit, the thread pool creates threads to the upper limit and reuses free threads as much as possible.

 ServerSocket listener = new ServerSocket(8080);ExecutorService executor = Executors.newFixedThreadPool(4);try { while (true) { Socket socket = listener.accept(); executor.submit( new HandleRequestRunnable(socket) ); }} finally { listener.close();}

In this example, instead of creating the thread directly, the ExecutorService is used. It submits the tasks that need to be executed (need to implement the Runnables interface) to the thread pool and executes the code using threads in the thread pool. In the example, a fixed-size thread pool with a number of threads of 4 is used to process all requests. This limits the number of threads that handle requests and also limits the use of resources.

In addition to creating a fixed-size thread pool through the newFixedThreadPool method, the Executors class also provides the newCachedThreadPool method. Reusing a thread pool may still lead to uncontrollable number of threads, but it will use the idle threads that have been created before as much as possible. Usually this type of thread pool is suitable for short tasks that are not blocked by external resources.

Work queue

After using a fixed-size thread pool, if all threads are busy, what will happen if another request comes? ThreadPoolExecutor uses a queue to hold pending requests, and fixed-size thread pools use unlimited linked lists by default. Note that this may in turn cause resource exhaustion problems, but it will not happen as long as the thread processing speed is greater than the queue growth rate. Then in the previous example, each queued request will hold a socket, which in some operating systems will consume the file handle. Since the operating system limits the number of file handles opened by the process, it is best to limit the size of the work queue.

 public static ExecutorService newBoundedFixedThreadPool(int nThreads, int capacity) { return new ThreadPoolExecutor(nThreads, nThreads, 0L, TimeUnit.MILLISECONDS, new LinkedBlockingQueue<Runnable>(capacity), new ThreadPoolExecutor.DiscardPolicy());}public static void boundedThreadPoolServerSocket() throws IOException { ServerSocket listener = new ServerSocket(8080); ExecutorService executor = newBoundedFixedThreadPool(4, 16); try { while (true) { Socket socket = listener.accept(); executor.submit( new HandleRequestRunnable(socket) ); } } finally { listener.close(); }}

Here, instead of directly using the Executors.newFixedThreadPool method to create a thread pool, we built the ThreadPoolExecutor object ourselves and limited the work queue length to 16 elements.

If all threads are busy, the new task will be filled into the queue. Since the queue limits the size to 16 elements, if this limit is exceeded, it needs to be handled by the last parameter when constructing the ThreadPoolExecutor object. In the example, a DiscardPolicy is used, that is, when the queue reaches the upper limit, the new task will be discarded. In addition to the first time, there is also an abort policy (AbortPolicy) and a caller execution policy (CallerRunsPolicy). The former will throw an exception, while the latter will execute the task in the caller thread.

For web applications, the optimal default policy should be to abandon or abort the policy and return an error to the client (such as an HTTP 503 error). Of course, it is also possible to avoid abandoning client requests by increasing the length of the work queue, but user requests are generally unwilling to wait for a long time, and this will consume more server resources. The purpose of the work queue is not to respond to client requests without limit, but to smooth and burst requests. Normally, the work queue should be empty.

Thread count tuning

The previous example shows how to create and use a thread pool, but the core issue with using a thread pool is how many threads should be used. First, we need to ensure that when the thread limit is reached, the resource will not be exhausted. Resources here include memory (heap and stack), number of open file handles, number of TCP connections, number of remote database connections, and other limited resources. In particular, if threaded tasks are computationally intensive, the number of CPU cores is also one of the resource limitations. Generally speaking, the number of threads should not exceed the number of CPU cores.

Since the selection of thread count depends on the type of application, it may take a lot of performance testing before the optimal results can be obtained. Of course, you can also improve the performance of your application by increasing the number of resources. For example, modify the JVM heap memory size, or modify the upper limit of the file handle of the operating system, etc. Then, these adjustments will eventually hit the theoretical upper limit.

Little's Law

Little's law describes the relationship between three variables in a stable system.

Where L represents the average number of requests, λ represents the frequency of requests, and W represents the average time to respond to the request. For example, if the number of requests per second is 10 and each request processing time is 1 second, then at any moment there are 10 requests being processed. Back to our topic, it requires 10 threads to process. If the processing time of a single request doubles, the number of threads processed will also double, becoming 20.

After understanding the impact of processing time on request processing efficiency, we will find that the theoretical upper limit may not be the optimal value for thread pool size. The thread pool upper limit also requires a reference task processing time.

Assuming that the JVM can process 1000 tasks in parallel, if each request processing time does not exceed 30 seconds, then in the worst case, at most 33.3 requests per second can be processed. However, if each request only takes 500 milliseconds, the application can process 2000 requests per second.

Split thread pool

In microservices or service-oriented architectures (SOA), access to multiple backend services is usually required. If one of the services performs degraded, it may cause the thread pool to run out of threads, which affects requests to other services.

An effective way to deal with backend service failure is to isolate the thread pool used by each service. In this mode, there is still a dispatched thread pool that dispatches tasks to different backend request thread pools. This thread pool may have no load because of a slow backend, and transfer the burden to a thread pool that requests slow backend.

In addition, the multi-threaded pooling mode also needs to avoid deadlock problems. If each thread is blocking while waiting for the result of an unprocessed request, a deadlock occurs. Therefore, in multithreaded pool mode, it is necessary to understand the tasks executed by each thread pool and the dependencies between them, so as to avoid deadlock problems as much as possible.

Summarize

Even if thread pools are not used directly in the application, they are likely to be used indirectly by the application server or framework in the application. Frameworks such as Tomcat, JBoss, Undertow, Dropwizard, etc. all provide options to tune thread pools (thread pools used by servlet execution).

I hope this article can improve your understanding of thread pool and help you learn.