Sunday, January 25, 2009

Classloaders Keeping Jar Files Open

If you write code that creates classloaders, you need to know about this bug:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5041014

It is very insidious and something I just came across myself in some code.

You normally only have to worry about this if you are writing code that creates and destroys classloaders (for example, if you have some kind of pluggable architecture where a pluggable component found in a jar file gets its own classloader, and you want that pluggable component to be hot-deployable - that is, you want to be able to overwrite or modify that jar file with updated code). In Jopr's case, this happens on the agent - each "product plugin" (e.g. the JBossAS plugin, or the Postgres plugin, etc) has its own classloader, managed separately and kept independent of other plugin classloaders (there is a dependency model in place, but ignore that for this discussion).

Well, this VM bug is so bad it seems that anytime a classloader loads in a jar file, that jar file's file descriptor remains open for the lifetime of that VM (in other words, the classloader never calls JarFile.close() for all the jar files it previously streamed content from). At least that's what the bug report infers and what I'm seeing when I was debugging this. There is a nifty tool from Timothy Quinn that he used to track issues in Glassfish, but this tool is useful to track this kind of problem for any application, not just Glassfish - in fact, I used it to debug the issue in the Jopr agent. This bug manifested itself in the Jopr agent when hot-deploying agent plugins on Windows (Windows has the "feature" of not being able to manipulate files that are locked by others). I suspect similar issues will occur on UNIX because, even though UNIX doesn't do the file locking that Windows does, the file descriptors are still open and copying a file with the same name over the opened file will probably just create a second file descriptor.

The worst part about this is - there is no real workaround. The Jopr agent has its own classloader implementation - it is very basic and extends java.net.URLClassLoader to reuse most of its functionality. But the Java classloader API has no public, protected or package-scoped method or data field that you can override or access within URLClassLoader to help workaround the problem.

To actually fix the problem, it is simple - when you know you are done with a classloader, you just need to have that classloader close all .jar files it previously had opened. Alas, there is no "close" type method on the classloader object - there is absolutely no way to tell a classloader "I am done with you, clean up any resources you have open".

Once a classloader opens a jar file, that jar file's file descriptor remains open by the operating system for the lifetime of the VM. I find this completely unacceptable - this is clearly a design flaw that slipped through the cracks when the Java API was conceived and implemented. In order to support hot-deployable Java code, one would need to destroy and recreate classloaders. The current Java implementation does not make it easy to do this (requiring people to write their own classloader implementations from scratch does not meet the definition of "easy-to-do" and doesn't that defeat the purpose of OO and code reuse anyway?).

So, how do you support hot-deployable code and not see this bug? There are two main ways to do this as I see it:

1) write your own classloader implementation that allows you to close the open file descriptors when the classloader is no longer needed
2) copy the jar files that a classloader needs to a temporary location and put the temporary jars in the classloader (NOT the original jar files). When you need to hot-deploy an updated jar file, simply copy that new jar to a new temporary location, throw away the old classloader (which still has the file descriptor open, but its the old temporary jar file) and create a new classloader that opens the new temporary jar file. This sucks because if you hot-deploy frequently, you may run into your limit of the number of allowed open file descriptors (along with the problem that Windows presents - that being you can't delete the old temporary jar files until your VM exits).

Anyway, here is some code you can use to "workaround" this issue. It is a major hack - it only works if you are running in a SUN VM and because it relies on the implementation of internal SUN classes and code, you may break in the future should SUN decide to change how these classes are implemented (however, the good thing about this code is it has no compile time dependencies on any SUN-specific classes). I tested this code on SUN's Java6 JRE.

This method needs to be placed in your classloader that extends URLClassLoader. It uses reflection to iterate over the set of currently opened jar files as found in a private data member (URLClassLoader.ucp.loaders) of the classloader you want to discard. After running this code, I verified that no more jar files are left open.


public void close() {
try {
Class clazz = java.net.URLClassLoader.class;
java.lang.reflect.Field ucp = clazz.getDeclaredField("ucp");
ucp.setAccessible(true);
Object sun_misc_URLClassPath = ucp.get(this);
java.lang.reflect.Field loaders =
sun_misc_URLClassPath.getClass().getDeclaredField("loaders");
loaders.setAccessible(true);
Object java_util_Collection = loaders.get(sun_misc_URLClassPath);
for (Object sun_misc_URLClassPath_JarLoader :
((java.util.Collection) java_util_Collection).toArray()) {
try {
java.lang.reflect.Field loader =
sun_misc_URLClassPath_JarLoader.getClass().getDeclaredField("jar");
loader.setAccessible(true);
Object java_util_jar_JarFile =
loader.get(sun_misc_URLClassPath_JarLoader);
((java.util.jar.JarFile) java_util_jar_JarFile).close();
} catch (Throwable t) {
// if we got this far, this is probably not a JAR loader so skip it
}
}
} catch (Throwable t) {
// probably not a SUN VM
}
return;
}



If you happen to be using JNI (native libraries), you might also have to play games like the above to close the JNI jars too (same cavets as above apply regarding this needing to access the SUN implementation code). You can add this code to the close() method above:



// now do native libraries
clazz = ClassLoader.class;
java.lang.reflect.Field nativeLibraries = clazz.getDeclaredField("nativeLibraries");
nativeLibraries.setAccessible(true);
java.util.Vector java_lang_ClassLoader_NativeLibrary =
(java.util.Vector) nativeLibraries.get(this);
for (Object lib : java_lang_ClassLoader_NativeLibrary) {
java.lang.reflect.Method finalize =
lib.getClass().getDeclaredMethod("finalize", new Class[0]);
finalize.setAccessible(true);
finalize.invoke(lib, new Object[0]);
}



But even if you do this, I'm still not sure everything will work due to yet more SUN VM bugs (well, I think these are all basically the same bug):

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4299094
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4642062
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4286309

In the end, the Jopr agent didn't really need to do the above. I found that in the Jopr agent code, it was creating temporary classloaders unnecessarily which was locking the plugin jars. Once I removed the unnecessary classloaders from being created, the agent hot-deployment worked just fine since the plugin jars no longer got locked. For the record, the Jopr agent uses method #2 as described above to do its hot-deployment.

8 comments:

  1. Something we should be worried with our current JBoss MC CL impl?

    ReplyDelete
  2. I don't think so. Scott and Adrian knew about this bug 3 years ago (see https://jira.jboss.org/jira/browse/JBAS-766 ), so I have to believe they designed the new deployers with this problem in mind. But it might not be a bad idea to ping them anyway :)

    ReplyDelete
  3. Actually it's not deployers. ;-)
    It's VFS and CL, deployers just use/delegate to it (see http://www.jboss.org/community/docs/DOC-13267).

    I'm asking since it's my to worry about (leading the MC ;-), only in 'emergency' cases pinging them.

    ReplyDelete
  4. Ales - I'm not familiar with the new JBoss MC code base, but it should be very easy to know if this problem will affect it - see if there are any custom classloaders that extend URLClassLoader OR if there is any place in the code that instantiates and uses an instance of URLClassLoader (or one of its subclasses). If there is anyplace that does this, you will have a problem.

    ReplyDelete
  5. "if there are any custom classloaders that extend URLClassLoader OR if there is any place in the code that instantiates and uses an instance of URLClassLoader (or one of its subclasses). If there is anyplace that does this, you will have a problem"

    That's exactly what we fixed. ;-)

    ReplyDelete
  6. Way to finally get to the bottom of the plugin reload issue! I like the hack...I mean workaround :-)

    ReplyDelete
  7. I have observed this problem while i am working with EAP 6.1 alpha and trying deploy/undeploy a Jboss module through CLI. If i load any class which is part of my jboss module and later try to undeploy the module doesn't work.

    ReplyDelete