<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Florian Helmberger's blog &#187; UTF8</title>
	<atom:link href="http://www.laudatio.com/wordpress/tag/utf8/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.laudatio.com/wordpress</link>
	<description>laudatio.com</description>
	<lastBuildDate>Mon, 23 Nov 2009 21:07:16 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=abc</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>PostgreSQL: TO_ASCII &amp; UTF8</title>
		<link>http://www.laudatio.com/wordpress/2008/11/05/postgresql-83-to_ascii-utf8/</link>
		<comments>http://www.laudatio.com/wordpress/2008/11/05/postgresql-83-to_ascii-utf8/#comments</comments>
		<pubDate>Wed, 05 Nov 2008 19:11:29 +0000</pubDate>
		<dc:creator>fh</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[UTF8]]></category>

		<guid isPermaLink="false">http://www.laudatio.com/wordpress/?p=62</guid>
		<description><![CDATA[In the process of fixing our code for an upcoming upgrade of one database version for one of our $-projects I encountered a strange behaviour. Initiual situation:

we're moving from PostgreSQL 8.1.3 to the 8.3.5
we're moving from database encoding LATIN1 to UTF8
in our code we're using the TO_ASCII function a few times.

And this combination produces some [...]]]></description>
			<content:encoded><![CDATA[<p>In the process of fixing our code for an upcoming upgrade of one database version for one of our $-projects I encountered a strange behaviour. Initiual situation:</p>
<ul>
<li>we're moving from PostgreSQL 8.1.3 to the 8.3.5</li>
<li>we're moving from database encoding LATIN1 to UTF8</li>
<li>in our code we're using the TO_ASCII function a few times.</li>
</ul>
<p>And this combination produces some headaches.</p>
<p>But first a gentle introduction to the TO_ASCII function. It converts any given text into it's ASCII representation. Folks bound to languages with german umlauts or some kind of apostrophes encounter many problems. For example: what should you do if you have to build some kind of index based on the first character of the lastname. Certainly you don't want to have an extra entry with 'Ü', instead you want to but them into the 'U' list. Grand entrance TO_ASCII:</p>
<pre class="sql"><span style="color: #993333; font-weight: bold;">SELECT</span> to_ascii<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'Übermeier'</span><span style="color: #66cc66;">&#41;</span>;
Ubermeie</pre>
<p>Works like a charm. Caveat: TO_ASCII only supports LATIN1, LATIN2, LATIN9 and WIN1250 encodings but no UTF8.</p>
<p>Okay, the first guess would be to do something like this:</p>
<pre class="sql"><span style="color: #993333; font-weight: bold;">SELECT</span> to_ascii<span style="color: #66cc66;">&#40;</span>convert_to<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'Übermeier'</span>, <span style="color: #ff0000;">'latin1'</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
ERROR:  <span style="color: #993333; font-weight: bold;">FUNCTION</span> to_ascii<span style="color: #66cc66;">&#40;</span>bytea<span style="color: #66cc66;">&#41;</span> does <span style="color: #993333; font-weight: bold;">NOT</span> exist</pre>
<p>Bummer. CONVERT_TO returnes <a href="http://www.postgresql.org/docs/8.3/interactive/datatype-binary.html">BYTEA</a>, TO_ASCII only wants TEXT.</p>
<p>There has been some <a href="http://www.archivum.info/pgsql.hackers/2008-08/msg00346.html">discussion going on</a> on the pgsql.hackers mailinglist and frankly I can follow both parties in their point of view. But thanks to <a href="http://okbob.blogspot.com/">Pavel Stehule</a> we have some kind of a hack to sidestep this issue:</p>
<pre class="sql"><span style="color: #993333; font-weight: bold;">CREATE</span> <span style="color: #993333; font-weight: bold;">FUNCTION</span> to_ascii<span style="color: #66cc66;">&#40;</span>bytea, name<span style="color: #66cc66;">&#41;</span>
RETURNS text STRICT <span style="color: #993333; font-weight: bold;">AS</span> <span style="color: #ff0000;">'to_ascii_encname'</span> <span style="color: #993333; font-weight: bold;">LANGUAGE</span> internal;</pre>
<p>This version gladly accepts the BYTEA data returned by CONVERT_TO so we can just use it in this way:</p>
<pre class="sql"><span style="color: #993333; font-weight: bold;">SELECT</span> to_ascii<span style="color: #66cc66;">&#40;</span>convert_to<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">'Übermeier'</span>, <span style="color: #ff0000;">'latin1'</span><span style="color: #66cc66;">&#41;</span>, <span style="color: #ff0000;">'latin1'</span><span style="color: #66cc66;">&#41;</span>;
Ubermeie</pre>
<p>Problem solved.</p>
<p>Edit: Added fix by eMerzh. Thanks!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.laudatio.com/wordpress/2008/11/05/postgresql-83-to_ascii-utf8/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>
