This entry was posted on Tuesday, November 11th, 2008 at 2:35 pm and is filed under IT Info, Tips & Tricks. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.


SIMBE Blog
SIMBE Official Weblog
How To Use Multibyte String Functions
This is tips and trick to use Multibyte String Function on PHP,
hope can be useful to all of you ^^
[Introduction]
There are many languages in which all characters can be expressed by single byte. Multi-byte character codes are used to express many characters for many languages. While there are many languages in which every necessary character can be represented by a one-to-one mapping to an 8-bit value, there are also several languages which require so many characters for written communication that they cannot be contained within the range a mere byte can code (A byte is made up of eight bits. Each bit can contain only two distinct values, one or zero. Because of this, a byte can only represent 256 unique values (two to the power of eight)). Multibyte character encoding schemes were developed to express more than 256 characters in the regular bytewise coding system. Multibyte string function (mbstring) is developed to handle Japanese characters and other language also.
Encodings of the following types are safely used with PHP :
A Singlebyte Encoding,
- which has ASCII-compatible (ISO646 compatible) mappings for the characters in range of 00h to 7fh
A Multibyte Encoding,
- which has ASCII-compatible mappings for the characters in range of 00h to 7fh.
- which don’t use ISO2022 escape sequences.
- which don’t use a value from 00h to 7fh in any of the compounded bytes that represents a single character.
These are examples of character encodings that are unlikely to work with PHP.
JIS, SJIS, ISO-2022-JP, BIG-5 |
[Installation]
To install mbstring, you must explicitly enable the module with the configure option. Open PHP.INI and enable the mbstring configuration, if u are using XAMPP, you can open it on …XAMPPPHPPHP.INI
;configuration on PHP setting
[php]
output_buffering = On
output_handler = mb_output_handler
;configuration ono mbstring setting
[mbstring]
mbstring.language = Neutral
mbstring.internal_encoding = UTF-8
mbstring.http_input = auto
mbstring.http_output = UTF-8
mbstring.encoding_translation = On
mbstring.detect_order = auto
mbstring.substitute_character = 12307
mbstring.func_overload = 2
[Runtime Configuration]
The behaviour of these functions is affected by settings in php.ini.
| Name | Default | Changeable | Changelog |
|---|---|---|---|
| mbstring.language | “neutral” | PHP_INI_PERDIR | Available since PHP 4.3.0. |
| mbstring.detect_order | NULL | PHP_INI_ALL | Available since PHP 4.0.6. |
| mbstring.http_input | “pass” | PHP_INI_ALL | Available since PHP 4.0.6. |
| mbstring.http_output | “pass” | PHP_INI_ALL | Available since PHP 4.0.6. |
| mbstring.internal_encoding | NULL | PHP_INI_ALL | Available since PHP 4.0.6. |
| mbstring.script_encoding | NULL | PHP_INI_ALL | Available since PHP 4.3.0. |
| mbstring.substitute_character | NULL | PHP_INI_ALL | Available since PHP 4.0.6. |
| mbstring.func_overload | “0″ | PHP_INI_PERDIR | PHP_INI_SYSTEM in PHP <= 4.2.3. Available since PHP 4.2.0. |
| mbstring.encoding_translation | “0″ | PHP_INI_PERDIR | Available since PHP 4.3.0. |
| mbstring.strict_detection | “0″ | PHP_INI_ALL | Available since PHP 5.1.2. |
[Supported Character Encodings]
Currently the following character encodings are supported by the mbstring module. Any of those Character encodings can be specified in the encoding parameter of mbstring functions, they are:
- UCS-4
- UCS-4BE
- UCS-4LE
- UCS-2
- UCS-2BE
- UCS-2LE
- UTF-32
- UTF-32BE
- UTF-32LE
- UTF-16
- UTF-16BE
- UTF-16LE
- UTF-7
- UTF7-IMAP
- UTF-8
- ASCII
- EUC-JP
- SJIS
- eucJP-win
- SJIS-win
- ISO-2022-JP
- JIS
- ISO-8859-1
- ISO-8859-2
- ISO-8859-3
- ISO-8859-4
- ISO-8859-5
- ISO-8859-6
- ISO-8859-7
- ISO-8859-8
- ISO-8859-9
- ISO-8859-10
- ISO-8859-13
- ISO-8859-14
- ISO-8859-15
- byte2be
- byte2le
- byte4be
- byte4le
- BASE64
- HTML-ENTITIES
- 7bit
- 8bit
- EUC-CN
- CP936
- HZ
- EUC-TW
- CP950
- BIG-5
- EUC-KR
- UHC (CP949)
- ISO-2022-KR
- Windows-1251 (CP1251)
- Windows-1252 (CP1252)
- CP866 (IBM866)
- KOI8-R
[Function Overloading Feature]
Most of the mbstring function are inherited from PHP general function which enable you to add multibyte awareness to such an application without code modification by overloading multibyte counterparts on the standard string functions.
To use function overloading, set mbstring.func_overload in php.ini to a positive value that represents a combination of bitmasks specifying the categories of functions to be overloaded. It should be set to 1 to overload the mail() function. 2 for string functions, 4 for regular expression functions. For example, if it is set to 7, mail, strings and regular expression functions will be overloaded. The list of overloaded functions are shown below.
| value of mbstring.func_overload | original function | overloaded function |
|---|---|---|
| 1 | mail() | mb_send_mail() |
| 2 | strlen() | mb_strlen() |
| 2 | strpos() | mb_strpos() |
| 2 | strrpos() | mb_strrpos() |
| 2 | substr() | mb_substr() |
| 2 | strtolower() | mb_strtolower() |
| 2 | strtoupper() | mb_strtoupper() |
| 2 | substr_count() | mb_substr_count() |
| 4 | ereg() | mb_ereg() |
| 4 | eregi() | mb_eregi() |
| 4 | ereg_replace() | mb_ereg_replace() |
| 4 | eregi_replace() | mb_eregi_replace() |
| 4 | split() | mb_split() |
[References]
for more information about mbstring, you can visit http://id2.php.net/mbstring
Leave a Reply















