Dec 11

Solving Problems with International (UTF-8) Data using Hibernate, Oracle, and Tomcat

Tag: hibernate,java,oracle,tomcatpmularien @ 6:19 am

Ran into a very interesting issue the other day relating to storing NLS character data in Oracle via Hibernate, running in the Tomcat servlet container. It turns out the solution is not entirely obvious (or well-documented), so I thought I’d jot the important bits down.

NVARCHAR2 is a data type assigned to a column that is used to store NLS data, most generally multi-lingual (UTF-8 or equivalent). This differs from the typical VARCHAR2 data type, which is usually intended for storing data in the local character set (see this for a weak explanation).

In theory you should not have to do anything different in your JDBC-based application to write to an NVARCHAR2 column vs a VARCHAR2 column, since String type in Java is UTF-8.

However, the evil Oracle JDBC driver rears its ugly head again (if you’ve been working with Oracle JDBC for any length of time, you know what I’m talking about ;) )!

It turns out that the default behavior of the driver is to ignore NVARCHAR2 column metadata and write every *VARCHAR2 column as though it were a VARCHAR2. Oracle’s helpful readme on the use of NCHAR data with their JDBC driver indicates that you simply (1) cast the PreparedStatement you are using to an OraclePreparedStatement, and (2) call a proprietary Oracle API — setFormOfUse — on said object to convert a particular column to NCHAR form, as seen in this code snippet excerpt:

((OraclePreparedStatement) pstmt).setFormOfUse(4,
OraclePreparedStatement.FORM_NCHAR);

That’s great, except that we’re not using JDBC, we’re instead using the magic of Hibernate. Interestingly, the Hibernate forum has a post describing this issue way back from 2003 (along with the always humorous responses from Gavin and Christian). Various suggestions are given, but none is one that I really want to implement (ideally, the application code shouldn’t care about what database it’s talking to, and I don’t particularly care for putting in shims to work around bugs).

Fortunately, the Oracle JDBC NLS readme mentions that you can modify your JDBC connection properties to make all String columns write as NVARCHAR2. For some applications, this would probably not be optimal (the readme mentions potential performance concerns), but for my purposes this was perfect.

The only problem was that I was using Tomcat’s [Apache DBCP] connection pooling. For reasons unknown to me, the Tomcat documentation does not include instructions for passing arbitrary JDBC connection parameters to the DBCP provider. Neither does the DBCP page on the Apache Commons Wiki (and the page is immutable, so I can’t add instructions!).

So… in order to actually get this to work, you need to (simply) add the following to your context configuration file:

<Context path="/blah" reloadable="true" >
<Resource name="blah"
  auth="Container"
  type="javax.sql.DataSource"
  maxActive="20"
  maxIdle="2"
  maxWait="1000"
  username="test"
  password="test"
  driverClassName="oracle.jdbc.driver.OracleDriver"
  url="jdbc:oracle:thin:@127.0.0.1:1521:TEST"
  connectionProperties="defaultNChar=true"
/>
</Context>

Note the “connectionProperties” option enabling default treatment of NCHAR data – this is the magic parameter.

Once you redeploy, the NVARCHAR2 problem will be solved (whew!).

Thanks also to Kieran Shaw for the excellent blog entry on NVARCHAR2, Hibernate, and Oracle issues – between the two of us, I think we have the problem documented enough to help some folks out :)

11 Responses to “Solving Problems with International (UTF-8) Data using Hibernate, Oracle, and Tomcat”

  1. Laxmi says:

    Information helped me . Thanks

  2. Laxmi says:

    hi

    I got small problem. When i have used connectionProperties=”defaultNChar=true” and to view kannada font no changes made at browser then data is getting stored in ುಪ in this format.

    But when i have used connectionProperties=”defaultNChar=true” and to view kannada font i have setencoding of the browser to utf-8 then the data is getting stored in ಿುಮಾರ.

    Can u please help me its urgent

  3. Laxmi says:

    hi

    I got small problem. When i have used connectionProperties=”defaultNChar=true” and to view kannada font no changes made at browser then data is getting stored in ?? (“ುಪ”)in this format.

    But when i have used connectionProperties=”defaultNChar=true” and to view kannada font i have setencoding of the browser to utf-8 then the data is getting stored in ಿುಮಾರ.

    Can u please help me its urgen

  4. Duc Trinh says:

    Minor correction to one of your statements above… Java actually uses UCS2 (not UTF-8) as its native encoding. Starting with JDK 5, Java added UTF-16 supplementary characters support.

  5. Mark says:

    Thank you so much for this blog entry, I have been struggling with this issue for a few days now and, its a little creepy that what you detail is exactly what happened to me. From the hibernate post, to the issues with the Oracle documentation. Its almost like you are describing my last two days of work and research on the issue…

    I would like to confirm that this indeed does work, but what I needed to do was to re-deploy my app in tomcat, simply stopping and starting it did not work as we have external context descriptor files for our app.

    Thanks for the post.

  6. Tino Nitze says:

    Big thanks to you. I promise I soon will be continuing my blog to help others as well :)

  7. Ram says:

    Hi,
    The article seems to hold good for writing the double-byte characters to Oracle database but how about reading? Although it seems no brainer that the concept holds good for reading as well, it doesn’t work for me. I may be wrong somewhere but can anyone help out here?

    This is the code I’m using to set in hibernate.cfg.xml file

    true

    Just setting this alone doesn’t seem to be working. Is there anything else I need to change, perhaps in the Java code when I read the chineese characters from database?

  8. Ram says:

    Hi,
    The article seems to hold good for writing the double-byte characters to Oracle database but how about reading? Although it seems no brainer that the concept holds good for reading as well, it doesn’t work for me. I may be wrong somewhere but can anyone help out here?

    This is the code I’m using to set in hibernate.cfg.xml file

    Set the property tag with name as hibernate.connection.defaultNChar and value as true

    Just setting this alone doesn’t seem to be working. Is there anything else I need to change, perhaps in the Java code when I read the chineese characters from database?

  9. Abhishek says:

    Very nice article , this solved my problem!

  10. Jesper Steen Møller says:

    A word of warning here: If you use the defaultNChar=true setting on the connectionProperties, Oracle will widen all bound parameters to NCHAR / NVARCHAR, which will impact how the query optimizer works.
    In short, if you have a VARCHAR column which is properly index, and you try to look up a value in it, Oracle will not use the index (at least not on 10g, and likely 11). Assume that you have this statement
    select * from SOMETABLE where KEY = :1

    An explain plan will fool you easily, since it will assume that :1 will be cartable to a string, so it should give you the same as explain plan on:

    select * from SOMETABLE where KEY = ‘some key’

    However, with defaultNChar=true, you’ll actually be sending N’some key’ as :1, so you should be explaining this plan instead:

    select * from SOMETABLE where KEY = N’some key’

    That one is likely to miss the index. In many cases, you will not notice this problem before SOMETABLE is of substantial size.

    More info at: http://docs.oracle.com/cd/B28359_01/server.111/b28298/ch7progrunicode.htm#i1008044

  11. Tiago Mesquita de Araujo Cunha says:

    In some cases u should just put directly in a filter the char set to be used, like this: request.setCharacterEncoding(“UTF-8″); response.setCharacterEncoding(“UTF-8″);

    Worked perfectly for me!!!

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>