Dec 11

Solving Problems with International (UTF-8) Data using Hibernate, Oracle, and Tomcat

Tag: hibernate, java, oracle, tomcatpmularien @ 6:19 am

Ran into a very interesting issue the other day relating to storing NLS character data in Oracle via Hibernate, running in the Tomcat servlet container. It turns out the solution is not entirely obvious (or well-documented), so I thought I’d jot the important bits down.

NVARCHAR2 is a data type assigned to a column that is used to store NLS data, most generally multi-lingual (UTF-8 or equivalent). This differs from the typical VARCHAR2 data type, which is usually intended for storing data in the local character set (see this for a weak explanation).

In theory you should not have to do anything different in your JDBC-based application to write to an NVARCHAR2 column vs a VARCHAR2 column, since String type in Java is UTF-8.

However, the evil Oracle JDBC driver rears its ugly head again (if you’ve been working with Oracle JDBC for any length of time, you know what I’m talking about ;) )!

It turns out that the default behavior of the driver is to ignore NVARCHAR2 column metadata and write every *VARCHAR2 column as though it were a VARCHAR2. Oracle’s helpful readme on the use of NCHAR data with their JDBC driver indicates that you simply (1) cast the PreparedStatement you are using to an OraclePreparedStatement, and (2) call a proprietary Oracle API — setFormOfUse — on said object to convert a particular column to NCHAR form, as seen in this code snippet excerpt:

((OraclePreparedStatement) pstmt).setFormOfUse(4,
OraclePreparedStatement.FORM_NCHAR);

That’s great, except that we’re not using JDBC, we’re instead using the magic of Hibernate. Interestingly, the Hibernate forum has a post describing this issue way back from 2003 (along with the always humorous responses from Gavin and Christian). Various suggestions are given, but none is one that I really want to implement (ideally, the application code shouldn’t care about what database it’s talking to, and I don’t particularly care for putting in shims to work around bugs).

Fortunately, the Oracle JDBC NLS readme mentions that you can modify your JDBC connection properties to make all String columns write as NVARCHAR2. For some applications, this would probably not be optimal (the readme mentions potential performance concerns), but for my purposes this was perfect.

The only problem was that I was using Tomcat’s [Apache DBCP] connection pooling. For reasons unknown to me, the Tomcat documentation does not include instructions for passing arbitrary JDBC connection parameters to the DBCP provider. Neither does the DBCP page on the Apache Commons Wiki (and the page is immutable, so I can’t add instructions!).

So… in order to actually get this to work, you need to (simply) add the following to your context configuration file:

<Context path="/blah" reloadable="true" >
<Resource name="blah"
  auth="Container"
  type="javax.sql.DataSource"
  maxActive="20"
  maxIdle="2"
  maxWait="1000"
  username="test"
  password="test"
  driverClassName="oracle.jdbc.driver.OracleDriver"
  url="jdbc:oracle:thin:@127.0.0.1:1521:TEST"
  connectionProperties="defaultNChar=true"
/>
</Context>

Note the “connectionProperties” option enabling default treatment of NCHAR data - this is the magic parameter.

Once you redeploy, the NVARCHAR2 problem will be solved (whew!).

Thanks also to Kieran Shaw for the excellent blog entry on NVARCHAR2, Hibernate, and Oracle issues - between the two of us, I think we have the problem documented enough to help some folks out :)

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>