Abstract
This best-practice article is about analyzing and fixing potential concurrency issues in the Wildfly Java Application Server.
Concurrency problems can cause headaches, but multi-threading and process synchronisation is not rocket science, nor witchcraft. In a deterministic machine everything has a cause.
Symptoms
- Requests to a WebApp crashes the whole Wildfly instance
- A WebApp is unresponsive to requests, even minutes after a high load scenario
- High cpu load on the server machine for unknown reason
Apply analyzing techniques
Step 1) Gather information / Connect a monitoring tool (ie. VisualVM) to the Wildfly instance
The first step is to find the reason behind the potential deadlock. This can be done using a monitoring tool like VisualVM.
In this scenario I will make a remote connection from my Windows 10 Workstation to a Wildfly instance that runs on another machine [1].
A) On the Remote Wildfly machine do the following steps:
- For jstad you need to create a file “security.policy” with the following contents:
grant codebase "file:${java.home}/../lib/tools.jar"{permission java.security.AllPermission;};
Then you should execute jstatd on the remote machine like this:
{PATH_TO_JDK}/bin/jstatd -J -Djava.security.policy=security.policy
B) On your working station you need to do the following steps:
- Download VisualVM to your working station
- Download the corresponding Wildfly version to you working station to have access to the jboss client file called “jboss-client.jar” inside the /bin directory.
- (Windows) Open the POSIX-conform Windows PowerShell and type following:
{PATH_TO_VISUALVM}\visualvm.exe -cp:a {PATH_TO_LOCALWILDFLY}\bin\client\jboss-client.jar
In VisualVM you need to make the JMX connection to the remote Wildfly instance.
- Hit “Add Remote Host” and enter the Hostname/IP of the Wildfly Instance
- Hit “Add JMX connection” and enter following connection url:
service:jmx:http-jmx-remoting:<hostname>:9990
Whereas 9990 is the monitoring port of the Wildly instance.
After you hit “OK” there will be a new connection-entry in the connection tree to the left. You just have to double click on the entry and a connection will be made. If everything went well, the Overview panel with JVM metrics of the Wildfly instance will be presented to you.
Step 2) Analyse the basic JVM metrics
The Overview panel of VisualVM will give you metrics about CPU load, heap consumption and running live threads. You’ll be able to see if the Wildfly instance is currently under load or is idling around.
Step 3) Make a Thread Dump
In VisualVM you easily can make a snapshot of all running threads by opening the “Thread” panel via clicking on “Thread” and hitting the “Thread dump” button.
After the snapshot was taken the thread dump will be saved for later investigations and is also immediately presented to you for further analyzes.
Step 4) Analyse the Thread Dump
In Java the stack trace of a thread is read from the bottom upwards. The bottom is the main cause that lead to the other method calls.
Threads have different kinds of states:
WAITING: The thread actually sleeps and does nothing until it gets awakened. These kind of threads are usually not the cause of problem.
TIMED_WAITING: The waiting thread is actually marked for removal and will vanish any time soon. This kind of threads ain’t the cause of problem as well.
RUNNABLE: The runnable threads are the bad boys we need to further analyse. They actually are in a RUN state, but are actually so slow that we were able to make a snapshot of them.
Find the cause
The Java thread stack trace dump will also show you the line number at which the next method was called.
If you have the thread dump you can narrow down the problem and find issues that you can actually solve by yourself by analyzing the source code of your WebApp at the shown line.
Solve the problem
Typical causes to consider
- C-1) JAX-RS Response Objects must be consumed (ie. via readEntity()) to get automatically closed or they must be manually closed (call of method close()), otherwise they will keep the client connection alive and fill up the HTTP thread pool. This usually is a problem to consider if you retrieve data from another web service as part of a Proxy Pattern; a) Consume or close any proxy response object used for synchronous transfer, b) Register a CompletionCallback to an AsyncResponse Object if you use Streams and integrate a cleanup logic in the CompletionCallback.
- C-2) Unmanaged und unsynchronized access of threads to a commonly shared resource (memory/network/files/video/etc) is the typical cause for general IO issues that lead to hard crashes. Imagine your family members fighting for the TV control. In any case the access of threads to a common resource must be managed in a multi-threading environment. This can be done by using managed pools for threads, jdbc connections or other kind of connections, as well as using process synchronization techniques for critical sections in your code. HTTP-Requests are handled by a Wildfly thread pool by default.
- C-3) Mutually blocking threads due to wrongly implemented or unconditioned process synchronization are the typical cause for deadlocks. If you experience such issues you should consider to priories Threads and/or apply a fair-scheduling policy for your critical sections using Java’s ReentrantLock (new ReentrantLock(true)).
- C-4) There is a known bug in SSLSocketImpl in OpenJDK, that might lead to deadlocks under high load if internal interservice communication is done RESTful via https. The much more robust technique for internal Service2Service communication is the usage of a distributed messaging architecture style, like event-driven architecture (see Apache Kafka, RabbitMQ, Spring Cloud Stream).
References
[1] http://www.mastertheboss.com/jboss-server/wildfly-8/monitoring-wildfly-using-visualvm