/ Java concurrency: the hidden thread deadlocks ~ Java EE Support Patterns

1.25.2013

Java concurrency: the hidden thread deadlocks

Most Java programmers are familiar with the Java thread deadlock concept. It essentially involves 2 threads waiting forever for each other. This condition is often the result of flat (synchronized) or ReentrantLock (read or write) lock-ordering problems.

Found one Java-level deadlock:
=============================
"pool-1-thread-2":
  waiting to lock monitor 0x0237ada4 (object 0x272200e8, a java.lang.Object),
  which is held by "pool-1-thread-1"
"pool-1-thread-1":
  waiting to lock monitor 0x0237aa64 (object 0x272200f0, a java.lang.Object),
  which is held by "pool-1-thread-2"

The good news is that the HotSpot JVM is always able to detect this condition for you…or is it?

A recent thread deadlock problem affecting an Oracle Service Bus production environment has forced us to revisit this classic problem and identify the existence of “hidden” deadlock situations.

This article will demonstrate and replicate via a simple Java program a very special lock-ordering deadlock condition which is not detected by the latest HotSpot JVM 1.7. You will also find a video at the end of the article explaining you the Java sample program and the troubleshooting approach used.

The crime scene

I usually like to compare major Java concurrency problems to a crime scene where you play the lead investigator role. In this context, the “crime” is an actual production outage of your client IT environment. Your job is to:

  • Collect all the evidences, hints & facts (thread dump, logs, business impact, load figures…)
  • Interrogate the witnesses & domain experts (support team, delivery team, vendor, client…)
The next step of your investigation is to analyze the collected information and establish a potential list of one or many “suspects” along with clear proofs. Eventually, you want to narrow it down to a primary suspect or root cause. Obviously the law “innocent until proven guilty” does not apply here, exactly the opposite.

Lack of evidence can prevent you to achieve the above goal. What you will see next is that the lack of deadlock detection by the Hotspot JVM does not necessary prove that you are not dealing with this problem.

The suspect

In this troubleshooting context, the “suspect” is defined as the application or middleware code with the following problematic execution pattern.

  • Usage of FLAT lock followed by the usage of ReentrantLock WRITE lock (execution path #1)
  • Usage of ReentrantLock READ lock followed by the usage of FLAT lock (execution path #2)
  • Concurrent execution performed by 2 Java threads but via a reversed execution order
The above lock-ordering deadlock criteria’s can be visualized as per below:


Now let’s replicate this problem via our sample Java program and look at the JVM thread dump output.

Sample Java program

This above deadlock conditions was first identified from our Oracle OSB problem case. We then re-created it via a simple Java program. You can download the entire source code of our program here.

The program is simply creating and firing 2 worker threads. Each of them execute a different execution path and attempt to acquire locks on shared objects but in different orders. We also created a deadlock detector thread for monitoring and logging purposes.

For now, find below the Java class implementing the 2 different execution paths.

package org.ph.javaee.training8;

import java.util.concurrent.locks.ReentrantReadWriteLock;

/**
 * A simple thread task representation
 * @author Pierre-Hugues Charbonneau
 *
 */
public class Task {
      
       // Object used for FLAT lock
       private final Object sharedObject = new Object();
       // ReentrantReadWriteLock used for WRITE & READ locks
       private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
      
       /**
        *  Execution pattern #1
        */
       public void executeTask1() {
            
             // 1. Attempt to acquire a ReentrantReadWriteLock READ lock
             lock.readLock().lock();
            
             // Wait 2 seconds to simulate some work...
             try { Thread.sleep(2000);}catch (Throwable any) {}
            
             try {              
                    // 2. Attempt to acquire a Flat lock...
                    synchronized (sharedObject) {}
             }
             // Remove the READ lock
             finally {
                    lock.readLock().unlock();
             }           
            
             System.out.println("executeTask1() :: Work Done!");
       }
      
       /**
        *  Execution pattern #2
        */
       public void executeTask2() {
            
             // 1. Attempt to acquire a Flat lock
             synchronized (sharedObject) {                 
                   
                    // Wait 2 seconds to simulate some work...
                    try { Thread.sleep(2000);}catch (Throwable any) {}
                   
                    // 2. Attempt to acquire a WRITE lock                   
                    lock.writeLock().lock();
                   
                    try {
                           // Do nothing
                    }
                   
                    // Remove the WRITE lock
                    finally {
                           lock.writeLock().unlock();
                    }
             }
            
             System.out.println("executeTask2() :: Work Done!");
       }
      
       public ReentrantReadWriteLock getReentrantReadWriteLock() {
             return lock;
       }
}

As soon ad the deadlock situation was triggered, a JVM thread dump was generated using JVisualVM.


As you can see from the Java thread dump sample. The JVM did not detect this deadlock condition (e.g. no presence of Found one Java-level deadlock) but it is clear these 2 threads are in deadlock state.

Root cause: ReetrantLock READ lock behavior

The main explanation we found at this point is associated with the usage of the ReetrantLock READ lock. The read locks are normally not designed to have a notion of ownership. Since there is not a record of which thread holds a read lock, this appears to prevent the HotSpot JVM deadlock detector logic to detect deadlock involving read locks.

Some improvements were implemented since then but we can see that the JVM still cannot detect this special deadlock scenario.

Now if we replace the read lock (execution pattern #1) in our program by a write lock, the JVM will finally detect the deadlock condition but why?

Found one Java-level deadlock:
=============================
"pool-1-thread-2":
  waiting for ownable synchronizer 0x272239c0, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
  which is held by "pool-1-thread-1"
"pool-1-thread-1":
  waiting to lock monitor 0x025cad3c (object 0x272236d0, a java.lang.Object),
  which is held by "pool-1-thread-2"

Java stack information for the threads listed above:
===================================================
"pool-1-thread-2":
       at sun.misc.Unsafe.park(Native Method)
       - parking to wait for  <0x272239c0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
       at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
       at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
       at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
       at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
       at org.ph.javaee.training8.Task.executeTask2(Task.java:54)
       - locked <0x272236d0> (a java.lang.Object)
       at org.ph.javaee.training8.WorkerThread2.run(WorkerThread2.java:29)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
       at java.lang.Thread.run(Thread.java:722)
"pool-1-thread-1":
       at org.ph.javaee.training8.Task.executeTask1(Task.java:31)
       - waiting to lock <0x272236d0> (a java.lang.Object)
       at org.ph.javaee.training8.WorkerThread1.run(WorkerThread1.java:29)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
       at java.lang.Thread.run(Thread.java:722)

This is because write locks are tracked by the JVM similar to flat locks. This means the HotSpot JVM deadlock detector appears to be currently designed to detect:

  • Deadlock on Object monitors involving FLAT locks
  • Deadlock involving Locked ownable synchronizers associated with WRITE locks
The lack of read lock per-thread tracking appears to prevent deadlock detection for this scenario and significantly increase the troubleshooting complexity.

I suggest that you read Doug Lea’s comments on this whole issue since concerns were raised back in 2005 regarding the possibility to add per-thread read-hold tracking due to some potential lock overhead.

Find below my troubleshooting recommendations if you suspect a hidden deadlock condition involving read locks:

  • Analyze closely the thread call stack trace, it may reveal some code potentially acquiring read locks and preventing other threads to acquire write locks.
  • If you are the owner of the code, keep track of the read lock count via the usage of the lock.getReadLockCount() method
I’m looking forward for your feedback, especially from individuals with experience on this type of deadlock involving read locks.

Finally, find below a video explaining such findings via the execution and monitoring of our sample Java program.




2 comments:

"Now if we replace the read lock (execution pattern #2) in our program by a write lock" wasn't this supposed to be "[...] (execution pattern #1) [...]"?

Thanks anonymous for pointing that out, clear typo. I meant to replace the READ lock by a WRITE lock from execution pattern #1.

I just updated the article. Glad to see that you were following it closely.

Regards,
P-H

Post a Comment