编辑“WordPress:Converting Database Character Sets”
该编辑可以被撤销。 请检查下面的对比以核实您想要撤销的内容,然后发布下面的更改以完成撤销。
最后版本 | 您的文本 | ||
第1行: | 第1行: | ||
This article addresses, in general, the process of converting your WordPress [[WordPress:Glossary#MySQL|MySQL]] [[WordPress:Database Description|database tables]] from one [[WordPress:Glossary#Character set|character set]] to another. '''Warning: character set conversion is not a simple process. Please complete a backup of your database before attempting any conversion.''' | This article addresses, in general, the process of converting your WordPress [[WordPress:Glossary#MySQL|MySQL]] [[WordPress:Database Description|database tables]] from one [[WordPress:Glossary#Character set|character set]] to another. '''Warning: character set conversion is not a simple process. Please complete a backup of your database before attempting any conversion.''' | ||
==The History== | ==The History== | ||
Up to and including WordPress [[WordPress:Version 2.1.3]], most WordPress databases were created using the ''latin1'' character set and the ''latin1_swedish_ci'' [[WordPress:Glossary#Collation|collation]]. | Up to and including WordPress [[WordPress:Version 2.1.3]], most WordPress databases were created using the ''latin1'' character set and the ''latin1_swedish_ci'' [[WordPress:Glossary#Collation|collation]]. | ||
==Character set and collation can now be defined== | ==Character set and collation can now be defined== | ||
Beginning with [[WordPress:Version 2.2]], WordPress allows the user to define both the database character set and the collation in their [[WordPress:Editing wp-config.php|wp-config.php]] file. Setting the [[WordPress:Editing wp-config.php#Database character set|DB_CHARSET]] and [[WordPress:Editing wp-config.php#Database collation|DB_COLLATE]] values in ''wp-config.php'' causes WordPress to create the database with the appropriate settings. But, the setting can only be designated for new installations, not for 'already installed' copies of WordPress. The rest of this article will explain how to convert the character set and collation for existing WordPress installations. | Beginning with [[WordPress:Version 2.2]], WordPress allows the user to define both the database character set and the collation in their [[WordPress:Editing wp-config.php|wp-config.php]] file. Setting the [[WordPress:Editing wp-config.php#Database character set|DB_CHARSET]] and [[WordPress:Editing wp-config.php#Database collation|DB_COLLATE]] values in ''wp-config.php'' causes WordPress to create the database with the appropriate settings. But, the setting can only be designated for new installations, not for 'already installed' copies of WordPress. The rest of this article will explain how to convert the character set and collation for existing WordPress installations. | ||
==Converting your database== | ==Converting your database== | ||
Before beginning any conversion, please backup your database. [[WordPress:Backing Up Your Database]] has easy-to-follow instructions. | Before beginning any conversion, please backup your database. [[WordPress:Backing Up Your Database]] has easy-to-follow instructions. | ||
For discussion purposes, it is assumed you have a database in the '''latin1''' character set that needs converting to a '''utf8''' character set. | For discussion purposes, it is assumed you have a database in the '''latin1''' character set that needs converting to a '''utf8''' character set. | ||
===The Problem=== | ===The Problem=== | ||
To convert character sets requires using the the MySQL ALTER TABLE command. When converting the character sets, all TEXT (and similar) fields are converted to UTF-8, but that conversion will BREAK existing TEXT because the conversion expects the data to be in latin1, but WordPress may have stored unicode characters in a latin1 database, and as a result, data could end up as garbage after a conversion! | To convert character sets requires using the the MySQL ALTER TABLE command. When converting the character sets, all TEXT (and similar) fields are converted to UTF-8, but that conversion will BREAK existing TEXT because the conversion expects the data to be in latin1, but WordPress may have stored unicode characters in a latin1 database, and as a result, data could end up as garbage after a conversion! | ||
===The Solution=== | ===The Solution=== | ||
The solution is to ALTER all TEXT and related fields to their binary counterparts, then alter the character set and finally change the binary data type fields back to TEXT. | The solution is to ALTER all TEXT and related fields to their binary counterparts, then alter the character set and finally change the binary data type fields back to TEXT. | ||
Example steps: | Example steps: | ||
# Place notice that blog is out of service | # Place notice that blog is out of service | ||
# Backup database | # Backup database | ||
第60行: | 第30行: | ||
# Add DB_CHARSET and DB_COLLATE definitions to [[WordPress:Editing_wp-config.php|wp-config.php]] | # Add DB_CHARSET and DB_COLLATE definitions to [[WordPress:Editing_wp-config.php|wp-config.php]] | ||
# Place blog back on-line | # Place blog back on-line | ||
The string field types need to be converted to their binary field types counterparts. The list is as follows: | The string field types need to be converted to their binary field types counterparts. The list is as follows: | ||
* CHAR -> BINARY | * CHAR -> BINARY | ||
* VARCHAR -> VARBINARY | * VARCHAR -> VARBINARY | ||
第97行: | 第42行: | ||
ENUM and SET have more specific conversion rules: | ENUM and SET have more specific conversion rules: | ||
Set the character set to binary, or to UTF8 if you are sure that no ENUM or SET field has special characters that might get garbled during conversion. | Set the character set to binary, or to UTF8 if you are sure that no ENUM or SET field has special characters that might get garbled during conversion. | ||
The SQL for this is: | The SQL for this is: | ||
* ALTER TABLE wp_links CHANGE link_visible link_visible ENUM('Y','N') CHARACTER SET utf8; | * ALTER TABLE wp_links CHANGE link_visible link_visible ENUM('Y','N') CHARACTER SET utf8; | ||
The field name does need to be repeated, as well as the ENUM specification. | The field name does need to be repeated, as well as the ENUM specification. | ||
When specifying BINARY and VARBINARY, the field length also needs to be specified, and needs to be the same value as the original CHAR and VARCHAR field length. In other words, VARCHAR(200) becomes VARBINARY(200). | When specifying BINARY and VARBINARY, the field length also needs to be specified, and needs to be the same value as the original CHAR and VARCHAR field length. In other words, VARCHAR(200) becomes VARBINARY(200). | ||
So, in Steps 3 and 4 change CHAR, VARCHAR, TEXT, ENUM, and SET fields to their binary counterparts (BLOB, VARBINARY, etc), in Step 5 switch the database to utf8, in Steps 6 and 7 switch all the tables to utf8, and finally, in Steps 8 and 9 return the binary fields back to the respective CHAR, VARCHAR, TEXT, ENUM, and SET data types with the utf8 character set. | So, in Steps 3 and 4 change CHAR, VARCHAR, TEXT, ENUM, and SET fields to their binary counterparts (BLOB, VARBINARY, etc), in Step 5 switch the database to utf8, in Steps 6 and 7 switch all the tables to utf8, and finally, in Steps 8 and 9 return the binary fields back to the respective CHAR, VARCHAR, TEXT, ENUM, and SET data types with the utf8 character set. | ||
The key to the conversion is that a field with a binary data type, unlike CHAR, VARCHAR, TEXT, ENUM, and SET fields, will not be converted to garbage when the database and tables are switched to utf8. | The key to the conversion is that a field with a binary data type, unlike CHAR, VARCHAR, TEXT, ENUM, and SET fields, will not be converted to garbage when the database and tables are switched to utf8. | ||
====Conversion Scripts and Plugins==== | ====Conversion Scripts and Plugins==== | ||
In the WordPress Forums, Member andersapt, in [http://wordpress.org/support/topic/117955 Forum Thread 117955] submitted a conversion script, [http://kunde.apt.no/aso/wordpress/convert_to_utf8_sql_generator.txt Convert UTF8 SQL Generator], to automatically convert a WordPress database. (This link is currently dead.) | In the WordPress Forums, Member andersapt, in [http://wordpress.org/support/topic/117955 Forum Thread 117955] submitted a conversion script, [http://kunde.apt.no/aso/wordpress/convert_to_utf8_sql_generator.txt Convert UTF8 SQL Generator], to automatically convert a WordPress database. (This link is currently dead.) | ||
A plugin, [http://g30rg3x.com/utf8-database-converter/ UTF-8 Database Converter], is available from g30rg3_x. Carefully review the readme file included with the plugin. This plugin corrupts data in modern versions of Wordpress. | A plugin, [http://g30rg3x.com/utf8-database-converter/ UTF-8 Database Converter], is available from g30rg3_x. Carefully review the readme file included with the plugin. This plugin corrupts data in modern versions of Wordpress. | ||
==Discussions on character sets== | ==Discussions on character sets== | ||
*http://trac.wordpress.org/ticket/2828 | *http://trac.wordpress.org/ticket/2828 | ||
*http://trac.wordpress.org/ticket/2942 | *http://trac.wordpress.org/ticket/2942 | ||
第169行: | 第87行: | ||
*http://wordpress.org/support/topic/120687 | *http://wordpress.org/support/topic/120687 | ||
*http://wordpress.org/support/topic/144884 | *http://wordpress.org/support/topic/144884 | ||
==Resources== | ==Resources== | ||
*[http://en.wikipedia.org/wiki/Character_set Character set at Wikipedia] | *[http://en.wikipedia.org/wiki/Character_set Character set at Wikipedia] | ||
*[http://en.wikipedia.org/wiki/Unicode Unicode at Wikipedia] | *[http://en.wikipedia.org/wiki/Unicode Unicode at Wikipedia] | ||
第229行: | 第96行: | ||
*[http://gentoo-wiki.com/TIP_Convert_latin1_to_UTF-8_in_MySQL Gentoo tip on converting latin1 to utf8 in MySQL] | *[http://gentoo-wiki.com/TIP_Convert_latin1_to_UTF-8_in_MySQL Gentoo tip on converting latin1 to utf8 in MySQL] | ||
*[http://alexking.org/blog/2008/03/06/mysql-latin1-utf8-conversion Alex King's blog about latin1 to utf8 conversion] | *[http://alexking.org/blog/2008/03/06/mysql-latin1-utf8-conversion Alex King's blog about latin1 to utf8 conversion] | ||