How to Convert Character Set and Collation of WordPress Database

Home»Cloud & Internet»Web Publishing»How to Convert Character Set and Collation of WordPress Database
Since WordPress 2.2, WordPress supports feature that allows the user to define both the WordPress database character set and collation with DB_CHARSET and DB_COLLATE values in wp-config.php file. With these values defined, WordPress will use the designated database character-set (charset) and database collation (i.e. sort order of the letters, numbers, and symbols of a character set) when connecting a database tables.

However, in existing WordPress installations that are upgraded from earlier version of WordPress or not explicitly set a Unicode UTF-8 charset collation, the default database character set is normally set up as Latin1 (default on almost all MySQL installation) with latin1_swedish_ci collation. If you run a bilingual or multilingual blog with WordPress, you may face problem on character encoding when your blog posts are written in other foreign languages, or when you export and backup the database and later attempt to re-import the database dump in the event of database failure or server migration and moving. The symptom is obvious, your WordPress posts or pages will contains garbled, weird and funny characters, sometime just lots of ????? (question marks), rendering the WordPress database with your hard work useless and output unreadable. (May cause also by wrong charset collation)

The best solution to the character encoding problem in WordPress is to convert the charset or database and collation to UTF-8 or Unicode. However, you CANNOT simply connect to MySQL via shell or phpMyAdmin and hoping all your scripts will convert nicely. As explain by WordPress database conversion guide, convert character sets requires using the the MySQL ALTER TABLE command. When converting the character sets, all TEXT (and similar) fields are converted to UTF-8, but that conversion will BREAK existing TEXT because the conversion expects the data to be in latin1, but WordPress may have stored Unicode characters in a latin1 database, and as a result, data could end up as garbage after a conversion!

The guide provides a very rough and vague guide as a solution on how to actually convert WordPress MySQL database tables from one character set to another, usually UTF-8. However, the guide actually works, although the process can be lengthy. To convert, the steps involved generally are to alter each and every TEXT and related fields inside every WP tables to BLOB, then alter the character set of database and finally change the BLOB fields back to TEXT. Looks easy, but how long it would take to convert so many fields on so many tables? Furthermore, you will also need to remember the original type and length or values of all fields.

andersapt has posted a conversion script named convert_to_utf8_sql_generator.txt (no longer available) which automatically generates a list of SQL statements and commands need to fully convert your WordPress database to UTF8 based on the guide. However, there seems to be a minor bug with the script, although the author claimed it worked, where in my case, it simply won’t generate the list of SQL commands to run due to the error “PHP Fatal error: Call to a member function get_results() on a non-object in convert.php on line 37”. Once fixed, with this script in hand we can easily and quickly convert the database, tables and fields to use utf8_general_ci collation.

Note: I have tried out UTF-8 Database Converter plugin (no longer available), but it’s a failure. It seemed like the author change to character set directly.

Guide to Convert WordPress Database Character Set to UTF8 (Unicode)

  1. Take the WordPress blog offline by placing a out-of-service or maintenance notice.
  2. Backup database – this is very important, nothing is guaranteed to work. If you’re using cPanel or other control panel, it’s best to perform a database backup from the control panel itself, where you can restore the database in one piece instead of by SQL statements, in the case of normal dump.
  3. Download the fixed convert_to_utf8_sql_generator script and save it with a PHP extension.
  4. Modify the script to input the database name your WordPress blog is using. Locate the following text:

    Tables_in_DATABASENAME

    The DATABASENAME in red is the only thing that you need to change to match your WordPress database name. It should looks like this after change, for instance,

    Tables_in_wp_techjourney
  5. Upload the convert_to_utf8_sql_generator.php (or you can rename to a shorter name such as convert.php) to the base root WordPress installation directory, where wp-config.php is also located.
  6. Now, call and browse the script from any web browser. To do this, simply add convert_to_utf8_sql_generator.php (or any name you give to the script) to the end of your blog URL (i.e http://www.mywebsite.com/convert_to_utf8_sql_generator.php) and press Enter. A long list of SQL statements will be generated on the web page.
  7. Ensure that your post_content and post title fields on wp_posts table DOES NOT belongs to any indexes or FULLTEXT indexes. Else the type of the fields may not be converted to BLOB with one of the errors list below. Some plugins, such as related posts tend to add indexes to these fields. In this case, temporarily drop the indexes.
    ERROR 1170 (42000): BLOB/TEXT column 'post_content' used in key specification without a key length
    ERROR 1283 (HY000): Column 'post_content' cannot be part of FULLTEXT index
  8. Login to your server shell by telnet or SSH. You can skip this part of using Unix shell if you intend to use phpMyAdmin to do the dirty work, but I have not tried it. So if you do, do feedback on whether it can be done.
  9. Connect to MySQL server from the shell.
  10. Issue the following command first in MySQL prompt:
    use DATABASENAME;

    Again, replace DATABASENAME in red to the actual WordPress database name.

  11. Then copy and paste the whole list of SQL statements auto generated by the conversion script, and paste them into the MySQL prompt. Each and every SQL command should now be processed and executed by MySQL one by one. You may need to press Enter key to finish off the last one.
  12. During the processing, the similar error messages related to key length as mentioned may appears. In my case, the conversion to BLOB failed with such message in the following fields:

    wp_categories.category_nicename
    wp_comments.comment_approved
    wp_links.link_visible
    wp_options.option_name
    wp_postmeta.meta_key
    wp_posts.post_status
    wp_posts.post_name
    wp_posts.post_type
    wp_usermeta.meta_key
    wp_users.user_login

    All these fields are unlikely to contains non-ASCII characters. And fields such as category_nicename (category slug) and post_name (post slug) have been URL encoded (where your URL with unsafe non-alphanumeric characters will be replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs). Initial encoding of byte codes and character assignments for UTF-8 is consistent with ASCII, so direct conversion of these fields to UTF8 should not bring too much problem.

  13. Edit the wp-config.php file to add in DB_CHARSET and DB_COLLATE definitions. Add the following two lines, preferably under the section of MySQL Settings:
    define('DB_CHARSET', 'utf8');
    define('DB_COLLATE', '');

    As explained in WordPress Codex, DB_COLLATE is left blank (null) so that the database collation will be automatically assigned by MySQL based on the database character set.

  14. Recreate the indexes and/or FULLTEXT indexes been dropped, if any.
  15. Activate the blog back into production mode.
  16. Check your blog to see if everything and every characters is okay.
  17. Delete the PHP script.

By | 2016-12-09T08:39:52+00:00 December 9th, 2016|Categories: Web Publishing|Tags: |21 Comments

About the Author:

LK is a technology writer for Tech Journey with background of system and network administrator. He has be documenting his experiences in digital and technology world for over 15 years.Connect with LK through Tech Journey on Facebook, Twitter or Google+.
  • Hi…

    I have to say (in my favor) that m not changing directly the database character set, please review my plugin and you will see that first it will convert your database binary and them to utf8 and finally it will put everything like before encoding but obviously with the UTF-8 character set…

    I prefer to do and a..

    ALTER TABLE table_name CONVERT TO CHARACTER SET binary

    Than just make a mysql sentence for every text/string field, because as you can see the problems comes with key and some index value and the BLOB field type change.

    Converting truth the sentence that my plugin use will transform all char to binary, all varchar to varbinary, all text (tynitext, text, mediumtext, longtext) to his binary representations (tinyblob, blob, mediumblob, longblob) and finally just set enum and set types with the collation binary.

    I just came here to say this in favor and by the way the majority of the problems of my plugins are related to host configurations and other things that obviously i can control.

    BTW its a better to solution to avoid problems of key and index by just setting the correct binary representation rather than using blob in all the way.

    Greetings from mexico and sorry for all mistyping problems.

  • Chris

    Why don't you just link the file directly here?

    It is pain to register just to download the file.

  • admin

    Hi Chris, sorry for inconveniences, it's so that users can discuss any bugs on the script on the forum when needed.

  • Pingback: Isotipo()

  • Hi, it worked fine for me using MyPHPAdmin.

    Thanks for the article.

  • I'm using PHPmyAdmin and try to convert my database to utf-8, but…PHPmA interrupts the process with the first error message (concering key length as mentioned in ur guide).

    Is there a way to make PHPmA ignore errors / to proceed the other tables?

    Thx in advance!

  • Larry Hirscg

    I must applaud you for publishing this. After hours of searching for an easy solution, when my provider moved my database to an “upgraded” platform, I was able to follow your instructions and and get the character set changed. I used the phpmyadmin to execute the code and deleted the lines that had errors. It ran perfectly and I only needed to edit a few slugs that had odd character in them. Fabulous!

  • something2

    I've uploaded an extended script to the thread. It handles cases where the tables and columns are utf8 but the strings inside are latin1. This happens when the tables are created as utf8 but the wp-config.php is missing DB_CHARSET and DB_COLLATE variables.

  • I agree with Chris, it's annoying to having to register for your forum just to download the file. If we have a problem, we can go to the forum then.

  • Morten

    Thanks a lot – that was a great help!

  • Thanks for the guide! I found that I had to remove some columns' indexes before I could run some of the queries, and I had to add "text" to the ALTER TABLE statements, like so:

    ALTER TABLE wp_comments MODIFY comment_author TEXT CHARACTER SET latin1 COLLATE latin1_swedish_ci;

    However, after all the queries had finished, I ended up with the same result as I did using g30rg3_x's plugin: all posts and pages were truncated, usually at the first curly quote. Any suggestions for getting around this?

  • Guy

    I have the exact same problem as Stephanie. My old posts will come up fixed, but all my new posts (made after the upgrade which caused the encoding problems in the first place) which have characters like é or curly quotes will be truncated. It's like the conversion back doesn't know what to do with these characters and just ends the process there. Any idea what might cause that?

  • Pingback: Fixing mixed-encoding MySQL dumpfiles with WordPress()

  • daniel

    How does one "temporarily drop the indexes"? I'm getting the error referenced in Step 7 above. Thanks.

  • Pingback: blog from OUR kitchen » I think I have finally gotten rid of the stray Âs()

  • Paul Sanduleac

    Is there an easier way?

  • squarecandy

    I had a hard time wrapping my brain around the ALTER TABLE mySQL stuff. I was having trouble and ended up going with a combo of 2 plugins:
    UTF8 Sanitize
    Search & Replace

    UTF8 Sanitize does a lot of work to help fix things without making any changes to your database at all – just to the display, which is nice. You can also use it to make the changes it outputs permanent.

    Then I used Search & Replace to pick up some of the stray characters that the Sanitize plugin didn't get.

    This method might not work so well if you have a very large number of posts and comments, but it worked well for a small site.

    Both plugins worked well w/ wordpress 3.0.1 for me.

  • Bob Ray

    I've done a more complex script that should, in theory, convert any database to any character set. It converts to UTF-8 by default. It hasn't been tested with the WordPress DB, but the readme.txt file describes how to create a before and after SQL file so you can compare and see what it's done. The script is here:
    http://bobsguides.com/assets/cdc/cdc.zip

    • Francis

      Thank you Bob for this nice script !

      Very useful and work perfectly in my case 🙂

  • Bob Ray

    I should mention that the script automatically drops and recreates any indexes on text fields and preserves comments and index types (including compound indexes).

  • Sinco.pe

    well, at the end, im not really sure what to look for, utf8_bin or probably unicode_ci, thanks anyway

Pin It on Pinterest

Share This

Share This

Share this post with your friends!