WordPress:Converting Database Character Sets
This article addresses, in general, the process of converting your WordPress MySQL database tables from one character set to another. Warning: character set conversion is not a simple process. Please complete a backup of your database before attempting any conversion.
这篇文章主要讲述将你的WordPressMySQL 数据库表格从一种字符集转会为另一种字符集的过程。注意: 转变字符集并不是个简单,在尝试任何转换之前,请备份你的数据库。
The History[ ]
历史[ ]
Up to and including WordPress WordPress:Version 2.1.3, most WordPress databases were created using the latin1 character set and the latin1_swedish_ci collation.
直到并且包括WordPress2.1.3版本在内,大多数WordPress是使用latin1字符集创建的,使用latin1_swedish_ci整理。
Character set and collation can now be defined[ ]
现在可以定义字符集和整理了[ ]
Beginning with WordPress:Version 2.2, WordPress allows the user to define both the database character set and the collation in their wp-config.php file. Setting the DB_CHARSET and DB_COLLATE values in wp-config.php causes WordPress to create the database with the appropriate settings. But, the setting can only be designated for new installations, not for 'already installed' copies of WordPress. The rest of this article will explain how to convert the character set and collation for existing WordPress installations.
从 2.2版本开始,WordPress允许用户在各自的wp-config.php 文件中定义数据库字符集和校勘。 在wp-config.php 中设置DB_CHARSET 和DB_COLLATE 参数值,会使得WordPress使用适当的设置创建数据库。但是,只有新的安装,才有这个设置, '已经安装的'WordPress没有这个设置。这篇文章的剩余部分,解释了怎样为当前安装的WordPress转换字符集和校勘。
Converting your database[ ]
转变你的数据库[ ]
Before beginning any conversion, please backup your database. WordPress:Backing Up Your Database has easy-to-follow instructions.
在开始任何操作之前,请备份你的数据库。备份你的数据库有简单容易掌握的说明。
For discussion purposes, it is assumed you have a database in the latin1 character set that needs converting to a utf8 character set.
为了便于讨论,加入你在latin1字符集中有个数据库,这个数据库需要转变到utf8字符集。
The Problem[ ]
问题[ ]
To convert character sets requires using the the MySQL ALTER TABLE command. When converting the character sets, all TEXT (and similar) fields are converted to UTF-8, but that conversion will BREAK existing TEXT because the conversion expects the data to be in latin1, but WordPress may have stored unicode characters in a latin1 database, and as a result, data could end up as garbage after a conversion!
要转变字符集,需要使用MySQL ALTER TABLE command。转变字符集的时候,所有的文本(和相似的)栏,都转变为UTF-8,但是转变会会破坏当前的文本,因为转变期望数据是在latin1,但是WordPress可能将统一的字符储存在latin1数据库,因此,转变后,数据可能变成没用的乱码。
The Solution[ ]
解决办法[ ]
The solution is to ALTER all TEXT and related fields to their binary counterparts, then alter the character set and finally change the binary data type fields back to TEXT.
解决办法就是将所有的文本和相关的栏,更改为对应的二进制,然后更改字符集,然后再将二进制数据类型重新更改为文本。
Example steps: 例子步骤:
- Place notice that blog is out of service
- Backup database
- ALTER TABLE wp_users MODIFY display_name BLOB;
- ...ALTER TABLE commands for all other tables/columns...
- ALTER DATABASE wordpress charset=utf8;
- ALTER TABLE wp_users charset=utf8;
- ...ALTER TABLE command for all other tables...
- ALTER TABLE wp_users MODIFY display_name TEXT CHARACTER SET utf8;
- ...ALTER TABLE for all other tables/columns...
- Add DB_CHARSET and DB_COLLATE definitions to wp-config.php
- Place blog back on-line
- 张贴通知,博客不能够正常运行
- 备份数据库
- 更改表格 wp_users 更改 display_name 博客;
- ...为所有其它的表格/栏更改表格命令行...
- 更改数据库 wordpress charset=utf8;
- 更改表格 wp_users charset=utf8;
- ...ALTER TABLE command for all other tables...
- 更改表格 wp_users MODIFY display_name 文本字符集 utf8;
- ...为所有其它的表格/栏更改表格...
- 向 wp-config.php添加DB_CHARSET 和DB_COLLATE定义
- 使博客重新上线,运行
The string field types need to be converted to their binary field types counterparts. The list is as follows:
字符串栏的类型应该转变为相应的二进制栏类型。列表如下:
- CHAR -> BINARY
- VARCHAR -> VARBINARY
- TINYTEXT -> TINYBLOB
- TEXT -> BLOB
- MEDIUMTEXT -> MEDIUMBLOB
- LONGTEXT -> LONGBLOB
- CHAR -> BINARY
- VARCHAR -> VARBINARY
- TINYTEXT -> TINYBLOB
- TEXT -> BLOB
- MEDIUMTEXT -> MEDIUMBLOB
- LONGTEXT -> LONGBLOB
This information was originally posted by member g30rg3x in Forum Thread 117955.
ENUM and SET have more specific conversion rules:
这个信息,最初是由g30rg3x成员在论坛主题 117955上发表的。
Set the character set to binary, or to UTF8 if you are sure that no ENUM or SET field has special characters that might get garbled during conversion.
如果你确定没有ENUM 或者 SET field中不能转变的字符,请将字符集设置为二进制的,或者设置为UTF8
The SQL for this is: SQL是:
- ALTER TABLE wp_links CHANGE link_visible link_visible ENUM('Y','N') CHARACTER SET utf8;
- 更改表格wp_links CHANGE link_visible link_visible ENUM('Y','N') CHARACTER SET utf8;
The field name does need to be repeated, as well as the ENUM specification.
不需要重复栏的名称和ENUM规定。
When specifying BINARY and VARBINARY, the field length also needs to be specified, and needs to be the same value as the original CHAR and VARCHAR field length. In other words, VARCHAR(200) becomes VARBINARY(200).
规定二进制和VARBINARY的时候,也要规定栏的长度,而且需要与原始的CHAR和VARCHAR栏的长度相同。换句话说,VARCHAR(200) 变成了 VARBINARY(200)。
So, in Steps 3 and 4 change CHAR, VARCHAR, TEXT, ENUM, and SET fields to their binary counterparts (BLOB, VARBINARY, etc), in Step 5 switch the database to utf8, in Steps 6 and 7 switch all the tables to utf8, and finally, in Steps 8 and 9 return the binary fields back to the respective CHAR, VARCHAR, TEXT, ENUM, and SET data types with the utf8 character set.
因此,在第三步和第四步,更改CHAR, VARCHAR, TEXT, ENUM,并且将栏设置为相应的二进制(BLOB, VARBINARY, 等等),在第五步中,将数据库转变为utf8,在第六步和第七步中,将所有的表格转变为utf8,最后,在第八步和第九步中,将二进制栏重新返回为各自的CHAR, VARCHAR, TEXT, ENUM,并且使用utf8字符集设置数据类型。
The key to the conversion is that a field with a binary data type, unlike CHAR, VARCHAR, TEXT, ENUM, and SET fields, will not be converted to garbage when the database and tables are switched to utf8.
转换的关键在于,带有二进制数据类型的栏,不像CHAR, VARCHAR, TEXT, ENUM和SET栏,当数据库和表格转变为utf8,这种栏,不会变为乱码。
Conversion Scripts and Plugins[ ]
转变脚本和插件[ ]
In the WordPress Forums, Member andersapt, in Forum Thread 117955 submitted a conversion script, Convert UTF8 SQL Generator, to automatically convert a WordPress database. (This link is currently dead.)
在WordPress论坛中,成员andersapt在论坛主题 117955中递交了一个转变脚本,Convert UTF8 SQL Generator,自动转变WordPress数据库。(这个链接当前不能够运行。)
A plugin, UTF-8 Database Converter, is available from g30rg3_x. Carefully review the readme file included with the plugin. This plugin corrupts data in modern versions of Wordpress.
插件UTF-8 数据库转换器,可以在g30rg3_x得到。仔细阅读插件中的readme文件,这个插件破坏了新版本中的WordPress的数据。
Discussions on character sets[ ]
字符集讨论[ ]
- http://trac.wordpress.org/ticket/2828
- http://trac.wordpress.org/ticket/2942
- http://trac.wordpress.org/ticket/3184
- http://trac.wordpress.org/ticket/3517
- http://trac.wordpress.org/ticket/4219
- http://comox.textdrive.com/pipermail/wp-testers/2007-May/004510.html
- http://jonkenpon.com/2007/02/20/making-your-wordpress-database-portable-because-it-probably-isnt-right-now/
- http://wordpress.org/support/topic/101135
- http://wordpress.org/support/topic/116746
- http://wordpress.org/support/topic/117865
- http://wordpress.org/support/topic/117955
- http://wordpress.org/support/topic/117999
- http://wordpress.org/support/topic/118781
- http://wordpress.org/support/topic/119611
- http://wordpress.org/support/topic/119750
- http://wordpress.org/support/topic/119858
- http://wordpress.org/support/topic/119998
- http://wordpress.org/support/topic/119999
- http://wordpress.org/support/topic/120029
- http://wordpress.org/support/topic/120065
- http://wordpress.org/support/topic/120135
- http://wordpress.org/support/topic/120352
- http://wordpress.org/support/topic/120397
- http://wordpress.org/support/topic/120414
- http://wordpress.org/support/topic/120466
- http://wordpress.org/support/topic/120562
- http://wordpress.org/support/topic/120687
- http://wordpress.org/support/topic/144884
- http://trac.wordpress.org/ticket/2828
- http://trac.wordpress.org/ticket/2942
- http://trac.wordpress.org/ticket/3184
- http://trac.wordpress.org/ticket/3517
- http://trac.wordpress.org/ticket/4219
- http://comox.textdrive.com/pipermail/wp-testers/2007-May/004510.html
- http://jonkenpon.com/2007/02/20/making-your-wordpress-database-portable-because-it-probably-isnt-right-now/
- http://wordpress.org/support/topic/101135
- http://wordpress.org/support/topic/116746
- http://wordpress.org/support/topic/117865
- http://wordpress.org/support/topic/117955
- http://wordpress.org/support/topic/117999
- http://wordpress.org/support/topic/118781
- http://wordpress.org/support/topic/119611
- http://wordpress.org/support/topic/119750
- http://wordpress.org/support/topic/119858
- http://wordpress.org/support/topic/119998
- http://wordpress.org/support/topic/119999
- http://wordpress.org/support/topic/120029
- http://wordpress.org/support/topic/120065
- http://wordpress.org/support/topic/120135
- http://wordpress.org/support/topic/120352
- http://wordpress.org/support/topic/120397
- http://wordpress.org/support/topic/120414
- http://wordpress.org/support/topic/120466
- http://wordpress.org/support/topic/120562
- http://wordpress.org/support/topic/120687
- http://wordpress.org/support/topic/144884
Resources[ ]
资源[ ]
- Character set at Wikipedia
- Unicode at Wikipedia
- UTF-8 at Wikipedia
- Character sets and collation at MySQL
- Character Sets and Collations That MySQL Supports
- Gentoo tip on converting latin1 to utf8 in MySQL
- Alex King's blog about latin1 to utf8 conversion