Archiv für PostgreSQL

A better CONCAT for PostgreSQL

Concatination is one thing you need quite often when working with databases and especially when you have to generate some reports. And even as it seems to be a simple problem looking for an easy solution it's a pithole. The natural way would be to do something like this:

  1. CREATE TABLE persons (
  2. id        SERIAL      PRIMARY KEY,
  3. firstname TEXT        NOT NULL,
  4. surname   TEXT        NOT NULL,
  5. title     TEXT
  6. );
  7.  
  8. INSERT INTO persons (firstname, surname, title)
  9. VALUES ('Edward', 'Hyde', NULL);
  10. INSERT INTO persons (firstname, surname, title)
  11. VALUES ('Henry', 'Jekyll', 'Dr.');
  12.  
  13. SELECT * FROM persons;
  14.  
  15. id | firstname | surname | title
  16. ----+-----------+---------+-------
  17. 1 | Edward    | Hyde    |
  18. 2 | Henry     | Jekyll  | Dr.
  19. (2 rows)
  20.  
  21. SELECT title || ' ' || firstname || ' ' || surname FROM persons;
  22.  
  23. ?COLUMN?
  24. ------------------
  25.  
  26. Dr. Henry Jekyll
  27. (2 rows)

As you see, the row containing the NULL value is omitted. So how to fix that? Of course we might do something like using COALESCE to return at least an empty string instead of a NULL value, but this way we'd get a whitespace we might not want (on the web not that bad because normaly the browser renders multiple whitespace as one).

So, if you research this issue a bit via google you'll find many solutions. From PL/pgSQL CONCAT functions with might handle two, three or four arguments (CONCAT2, CONCAT3, you get the picture) to solutions using custom aggregate functions. But most of them don't care about NULL values and will break.

Let's try this with a simple CONCAT function:

  1. CREATE OR REPLACE FUNCTION concat(text, text, text) RETURNS text AS $$
  2. SELECT $1 || ' ' || $2 || ' ' || $3;
  3. $$ LANGUAGE 'sql';
  4.  
  5. SELECT CONCAT(title, firstname, surname) FROM persons;
  6. concat
  7. ----------------
  8.  
  9. Dr. Henry Jekyll
  10. (2 rows)

Still, the same problem with the row containing the NULL value.

While looking into this, I ran over the buildin function ARRAY_TO_STRING which leads the path to a very straight forward solution:

  1. SELECT ARRAY_TO_STRING(ARRAY[title, firstname, surname], ' ') FROM persons;
  2.  
  3. array_to_string
  4. ------------------
  5. Edward Hyde
  6. Dr. Henry Jekyll
  7. (2 rows)

Problem solved!

PostgreSQL: TO_ASCII & UTF8

In the process of fixing our code for an upcoming upgrade of one database version for one of our $-projects I encountered a strange behaviour. Initiual situation:

  • we're moving from PostgreSQL 8.1.3 to the 8.3.5
  • we're moving from database encoding LATIN1 to UTF8
  • in our code we're using the TO_ASCII function a few times.

And this combination produces some headaches.

But first a gentle introduction to the TO_ASCII function. It converts any given text into it's ASCII representation. Folks bound to languages with german umlauts or some kind of apostrophes encounter many problems. For example: what should you do if you have to build some kind of index based on the first character of the lastname. Certainly you don't want to have an extra entry with 'Ü', instead you want to but them into the 'U' list. Grand entrance TO_ASCII:

  1. SELECT to_ascii('Übermeier');
  2. Ubermeie

Works like a charm. Caveat: TO_ASCII only supports LATIN1, LATIN2, LATIN9 and WIN1250 encodings but no UTF8.

Okay, the first guess would be to do something like this:

  1. SELECT to_ascii(convert_to('Übermeier', 'latin1'));
  2. ERROR:  FUNCTION to_ascii(bytea) does NOT exist

Bummer. CONVERT_TO returnes BYTEA, TO_ASCII only wants TEXT.

There has been some discussion going on on the pgsql.hackers mailinglist and frankly I can follow both parties in their point of view. But thanks to Pavel Stehule we have some kind of a hack to sidestep this issue:

  1. CREATE FUNCTION to_ascii(bytea, name)
  2. RETURNS text STRICT AS 'to_ascii_encname' LANGUAGE internal;

This version gladly accepts the BYTEA data returned by CONVERT_TO so we can just use it in this way:

  1. SELECT to_ascii(convert_to('Übermeier', 'latin1'), 'latin1');
  2. Ubermeie

Problem solved.

Edit: Added fix by eMerzh. Thanks!