Monday, October 17, 2005

java NIO allocate vs allocateDirect

Java NIO (new IO) came up with JDK 1.4.

As of now there is only one book out there that deals with it :-
http://www.amazon.com/exec/obidos/tg/detail/-/0596002882/104-4162257-7590320?v=glance

I wanted to have a look at some of the new stuff, and looked at some material on the net.

Buffers are to pimitive types what collections are to Objects.

What interests me is that the new java nio ByteBuffer class has two ways to create a new ByteBuffer i.e the static methods allocate and allocateDirect.

As i understand allocateDirect, will allocate it directly in the OS memory space and not in the JVM space. It has the primary advantage of not using up too much JVM memory for large files or other hige primitive type data.

Here are the common method used for the code below:-
static void logTime(String seq){
System.out.println(seq + " : " + new Date(System.currentTimeMillis()));
}


static void showBufferProperties( Buffer buf,String name){

System.out.println( "Buffer Properties for " + name + "\n capacity=" + buf.capacity() + " limit=" + buf.limit() + " position=" + buf.position());

}



I tried out the following :-


logTime("1");
ByteBuffer buf5 = ByteBuffer.allocate(300000000);
logTime("1a");
for(int i = 0; i
buf5.putDouble(1.0/3.0).
putFloat((float)(1.0/6.0)).
putLong(Long.MAX_VALUE);
}
showBufferProperties(buf5,"buf5");
logTime("2");

And got this result.


1 : Mon Oct 17 10:20:19 BST 2005
java.lang.OutOfMemoryError
Exception in thread "main"



Then i tried this :-


logTime("3");
ByteBuffer buf6 = ByteBuffer.allocateDirect(300000000);
logTime("3a");
for(int i = 0; i
buf6.putDouble(1.0/3.0).
putFloat((float)(1.0/6.0)).
putLong(Long.MAX_VALUE);
}
showBufferProperties(buf6,"buf6");
logTime("4");


And got this result:-


3 : Mon Oct 17 10:22:53 BST 2005
3a : Mon Oct 17 10:23:00 BST 2005
Buffer Properties for buf6
capacity=300000000 limit=300000000 position=60000000
4 : Mon Oct 17 10:23:09 BST 2005




So clearly for huge files allocateDirect does work well as compared to allocate which give an out of memory error.


Here is what the java doc says :-


Direct vs. non-direct buffers


A byte buffer is either direct or non-direct. Given a direct byte buffer, the Java virtual machine will make a best effort to perform native I/O operations directly upon it. That is, it will attempt to avoid copying the buffer's content to (or from) an intermediate buffer before (or after) each invocation of one of the underlying operating system's native I/O operations.
A direct byte buffer may be created by invoking the
allocateDirect factory method of this class. The buffers returned by this method typically have somewhat higher allocation and deallocation costs than non-direct buffers. It is therefore recommended that such buffers be used primarily for large, long-lived buffers that are subject to the underlying system's native
I/O operations.

A direct byte buffer may also be created by mapping a region of a file directly into memory. An implementation of the Java platform may optionally support the creation of direct byte buffers from native code via JNI. If an instance of one of these kinds of buffers refers to an inaccessible region of memory then an attempt to access that region will either return an arbitrary value, have no visible effect, or cause an unspecified exception to be thrown.

Whether a byte buffer is direct or non-direct may be determined by invoking its
isDirect method.


So it is as good as reading or writing to system memory directy using JNI and then wrapping it with java code. The additional advantage is it does not use the JVM memory space.



I can also see the advantage of using this if there are many smaller files all open at the same time for a application. It could use up the jvm heap space very fast. JVM is a heap space meant for objects. To use it up for reading or writing files would not be so good.

Also i guess it is better if we use the OS for what it does best i.e handle files.



I can see that this new bytebuffer class will go a long way to help with the above.

Also the other advantage v/s doing the same with JNI is that this is platform neutral. Just say allocateDirect and the java code will use the OS memory. JNI code might have to be coded for each OS on which you will deploy the application.

Note:-

It seems that for direct buffers hasArray method returns true meaning that it does have a array behind it , and for non-direct buffers the hasArray method returns false that is it does not have a backing array. The array can be retrieved by invoking the the array() method on ByteBuffer class.

Thursday, October 13, 2005

JDK 1.4 VS Log4J Logging

JDK 1.4 has a java.util.logging package. It is based on JSR047.

http://www.swzoo.org/documents/miscellaneous/jsr047/


I just tried to have a look and see how it is different from Log4j that i have used over the years.

As per ..\j2sdk1.4.2_03\docs\api\java\util\logging\Level.html it has following levels :-

"
The levels in descending order are:
SEVERE (highest value)
WARNING
INFO
CONFIG
FINE
FINER
FINEST (lowest value) In addition there is a level OFF that can be used to turn off logging, and a level ALL that can be used to enable logging of all messages.

"

I guess ALL and OFF could be used as they are provided , but why would one logically use them is the question.

Also Fine, Finer and Finest seem a bit .... Odd i would say. What is wrong with log4j's - FATAL, ERROR, WARN, INFO and DEBUG.

What is CONFIG anyway ?

From the same API page

"
CONFIGpublic static final Level CONFIG
CONFIG is a message level for static configuration messages.
CONFIG messages are intended to provide a variety of static configuration information, to assist in debugging problems that may be associated with particular configurations. For example, CONFIG message might include the CPU type, the graphics depth, the GUI look-and-feel, etc. This level is initialized to 700.

"

So there is a level for just for configuration type log messages ? Interesting !

Then i found out what a developer of Log4j has to say here :-
http://www.ingrid.org/jajakarta/log4j/jakarta-log4j-1.1.3/docs/critique.html

and also a discussion on theserverside here
http://www.theserverside.com/news/thread.tss?thread_id=7177

What they seem to saying in gist is :-

At first glance, this might look like a mundane implementation detail but it is actually quite fundamental.

Configuration order matters
In JSR47, when you set the level of a logger, say wombat, JSR47 traverses the tree below wombat. In other words, the levels for all the loggers descending from wombat are overwritten. This can be a very expensive operation for large trees. In particular, for the most common case where one sets the level of the root logger. However, performance is not the point I am trying to make.

In log4j, changing the priority of a category involves the change of a single field. Children categories dynamically inherit the priority of their parent by traversing the hierarchy tree upwards.

It follows that with JSR47 if you configure the level for logger "foo.bar1" before configuring the level for "foo", then the latter instruction will overwrite the first exactly as if the first instruction for configuring "foo.bar1" had never existed. Configuration order dependence is not a show stopper but it is something that will bite you time and again.

In contrast, in log4j categories can be configured in any order. You never have to worry about configuration order.

Limited inheritance
In JSR47, a logger does not walk the hierarchy to inherit its level but possesses a copy of it.
Unfortunately, in the JSR47 API, handlers cannot be inherited because it would be prohibitively expensive to let each logger to contain a distinct Vector of all inherited handlers, especially in large trees.

To circumvent this problem by JSR47 defines global handlers. A logger logs to global handlers and to the handlers attached to itself directly. It does not inherit any handlers from the hierarchy.

In log4j, appenders are inherited additively from the hierarchy. A category will log to the appenders attached to itself as well as the appenders attached to its ancestors. This might not seem like much until the day you need handler inheritance; probably a week after you decide to adopt a logging API.

Similarly, in log4j resource bundles are inherited from the hierarchy. In JSR47, a resource bundle must be attached to each logger individually. There is no resource bundle inheritance in JSR47. In practice, this means that you have to choose between internationalization and the benefits of the named logger hierarchy. It's one or the other. This limitation is particularly surprising because support for internationalization is advocated as one of the primary advantages of the JSR47 API.

Limited functionality
Log4j has appenders capable of logging to the console, to files, to Unix Syslog daemons, to Microsoft NT EventLoggers, remote servers, to JMS channels, automatically generate email etc. It can roll log files by size or date and log asynchronously.

JSR47 can log to the console, to files, to sockets and to a memory buffer.

Log4j has an extensible and powerful layout called the PatternLayout. JSR47 offers the much weaker SimpleFormatter as an alternative.

Log4j supports configuration through property files as well as XML documents. JSR47 currently admits only property files. Moreover, the language of JSR47 configuration files is very weak. In particular, you can only configure one instance of a given handler class. This means that you can log to just one file at a time.

Error Handling
In JSR 47 when an error occurs then either a RunTimeException is thrown to the user or (in handlers only) an internal field is set. In the first case, the RunTimeException will cause your application to crash. In the latter case, you can retrieve the last caught exception in the handler by querying the getException method of that handler. The former is totally unacceptable while the latter is impractical.

In log4j, under no circumstances are exceptions thrown at the user. However, all appenders have an associated ErrorHandler. This ErrorHandler is invoked by the appender whenever a handler-specific error occurs. By default, log4j appenders are associated with an OnlyOnceErrorHandler which emits a message on the console for the first error in an appender and ignoring all following errors.

An ErrorHandler can implement an arbitrary error handling policy. For example, after a failure to write to a database a JDBCAppender can be redirected to fall back on a FileAppender. This functionality is supported in XML configuration files. You do not need to change a single line of client code.

But again who cares about errors, right?

Performance
Logging performance must be studied in three distinct cases: when logging is turned off, when turned on but due to priority comparison logic not enabled, and when actually logging. Please refer to the
log4j manual for a more detailed discussion of logging performance.

When logging is turned on, log4j will be about three times slower to decide whether a log statement is enabled or not. This is due to the dynamic nature of log4j which requires it to walk the hierarchy. To give you an idea about the figures involved, we are talking about 90 nanoseconds instead of 30 nanoseconds on a 800Mhz Intel processor. In other words, one million disabled logging requests will cost under a second in both environments.

In a shipped binary, you can turn off logging entirely and both APIs will perform identically. Note that if one is not careful, the cost of parameter construction before invoking a disabled log statement will overwhelm any other performance consideration. Regardless of the API you decide to use, logging statements should never be placed in tight loops, for example, before or after an element swap instruction in a sort algorithm.

In log4j, caller localization information is optional whereas in JSR47 it is always extracted. Since
the extraction of caller localization is a very slow operation, in the common case where caller information is not needed, log4j will log the same information 4 to 100 times faster.

Lastly as of now the new logging mechanism will work only with JDK 1.4 and above.

Then i looked at 1.5 api here :-
http://java.sun.com/j2se/1.5.0/docs/api/index.html

The Logging levels at least seem to be the same. They seem to have added a interface "LoggingMXBean" http://java.sun.com/j2se/1.5.0/docs/api/java/util/logging/LoggingMXBean.html .

It is a
"The management interface for the logging facility.
There is a single global instance of the LoggingMXBean
"

The LogManager class in java.uti.logging package now has a LogManager.getLoggingMXBean() method which returns this interface used for managing the logging facility.

Ansyway as of now i will stick with Log4j.