Archiv für November, 2008

Note to self about class methods

Don't use them unless you are knowing what you are doing. I recently ran into something strange and was quite busy fixing the problem using much database mojo.

For the calculation of some MD5 sums I was using Digest::MD5. And for some reason I used it this way:

  1. perl -MDigest::MD5 -le 'print Digest::MD5->md5_hex("foo")'
  2. 3200a4cda22a4a935412da8113b4139b

Looks good doesn't it? But the MD5 digest is incorrect. Just try it yourself and enter 'foo' in any given web MD5 generator out there.

The right way to use Digest::MD5 (and Digest::SHA1 too) would be

  1. perl -M'Digest::MD5 "md5_hex"' -le 'print md5_hex("foo")'
  2. acbd18db4cc2f85cedef654fccc4a4d8

Okay, my fault. But this annoys me a bit:

  1. perl -w -MDigest::MD5 -le 'print Digest::MD5->md5_hex("foo")'
  2. &Digest::MD5::md5_hex function probably called as class method at -e line 1.
  3. 3200a4cda22a4a935412da8113b4139b
  4. fh$ perl -Mstrict -Mwarnings -MDigest::MD5 -le \
  5. 'print Digest::MD5->md5_hex("foo")'
  6. 3200a4cda22a4a935412da8113b4139b

To be honest I have no idea, why -w produces a warning (which is a good thing) and use warning doesn't.


In the process of fixing our code for an upcoming upgrade of one database version for one of our $-projects I encountered a strange behaviour. Initiual situation:

  • we're moving from PostgreSQL 8.1.3 to the 8.3.5
  • we're moving from database encoding LATIN1 to UTF8
  • in our code we're using the TO_ASCII function a few times.

And this combination produces some headaches.

But first a gentle introduction to the TO_ASCII function. It converts any given text into it's ASCII representation. Folks bound to languages with german umlauts or some kind of apostrophes encounter many problems. For example: what should you do if you have to build some kind of index based on the first character of the lastname. Certainly you don't want to have an extra entry with 'Ü', instead you want to but them into the 'U' list. Grand entrance TO_ASCII:

  1. SELECT to_ascii('Übermeier');
  2. Ubermeie

Works like a charm. Caveat: TO_ASCII only supports LATIN1, LATIN2, LATIN9 and WIN1250 encodings but no UTF8.

Okay, the first guess would be to do something like this:

  1. SELECT to_ascii(convert_to('Übermeier', 'latin1'));
  2. ERROR:  FUNCTION to_ascii(bytea) does NOT exist

Bummer. CONVERT_TO returnes BYTEA, TO_ASCII only wants TEXT.

There has been some discussion going on on the pgsql.hackers mailinglist and frankly I can follow both parties in their point of view. But thanks to Pavel Stehule we have some kind of a hack to sidestep this issue:

  1. CREATE FUNCTION to_ascii(bytea, name)
  2. RETURNS text STRICT AS 'to_ascii_encname' LANGUAGE internal;

This version gladly accepts the BYTEA data returned by CONVERT_TO so we can just use it in this way:

  1. SELECT to_ascii(convert_to('Übermeier', 'latin1'), 'latin1');
  2. Ubermeie

Problem solved.

Edit: Added fix by eMerzh. Thanks!