I have just spent a while altering my content management system editor to cope with multiple languages including Russian, Chinese and Arabic. I have a project that requires four languages and it seems like a good time to go double byte.
Having done a similar job in the past, I knew there would be some difficult bits converting the system to use UTF-8, single byte to double byte conversions always have some nasty surprises. Last time it was for a PERL with SQL Server system using the MSHTML ‘editor’, this time its PHP with MySQL using the TinyMCE editor.
Getting the Russian code to roundtrip starting from a TinyMCE input box took a while, I had to trace the data path trough the system in a fair amount of detail, I was almost tripped up by an htmlentities call.
What made this task so much easier than last time was the fact I was using PHP and MySQL both of which have a great deal of community support, two pages helped me no end, PHP UTF-8 Cheatsheet and MySQL and UTF-8 thanks for shortening a chore no end!