Windows编程中使用UNICODE

数据类型
ANSI和UNICODE通用
ANSI和UNICODE互相转换
- MultiByteToWideChar
- WideCharToMultiByte
参考

在UNICODE介绍中已经说到，windows使用的Unicode实际上就是UCS-2，也就是说每个Unicode字符其实是两个byte的。因此，我们就不要使用char来存放字符了，放不下的，我们应该使用其他的类型，这里就给大家介绍一下在Windows编程中关于Unicode的处理。

数据类型

通用类型

首先我们需要知道在C/C++数据类型中有个 内置的数据类型 是 wchar_t *,不知道你注意过没有，他是至少2个字节的，也有可能是4个字节，因此，用来存放UCS-2足以。
这里需要注意的是 wchar_t 是平台无关的，不仅仅Windows可以使用，Linux也可以使用。

Windows专用

Windows定义了自己的UNICODE数据类型，分别是：

数据类型	含义
WCHAR	Unicode 字符
PWSTR	WCHAR*
PCWSTR	CONST WCHAR*

UNICODE 常量

为了让我们定义的字符串常量使用 UNICODE 编码，我们可以在字符串前面加个 “L”，

#include <Windows.h>
WCHAR *wcs = L“这是UNICODE编码的字符串”

ANSI和UNICODE通用

有时，我们不知道是该使用ANSI还是UNICODE，这时，我们就可以使用通用的编写方式，不过会稍微麻烦一些。但是，我们不需要关注使用的ANSI还是UNICODE编码。

通用的方式就是统一使用TCHAR表示字符，对字符串常量使用_TEXT宏。

TCHAR *strs = _TEXT("我不需要关心这里使用什么编码")
`

此外，_TEXT不仅可以作用于字符串，还可以作用于字符，例如，可以这样使用

if (strs[0] == _TEXT('中'))
{
    ...
}

ANSI和UNICODE互相转换

要对ANSI和UNICODE进行互相转换，我们可以使用函数MultiByteToWideChar和WideCharToMultiByte。

MultiByteToWideChar

MultiByteToWideChar用于将多字节字符串转换成宽字符串，这里给出他的函数原型：

int MultiByteToWideChar(
    UINT uCodePage,         //指定执行转换的代码页
    DWORD dwFlags,          //这个一般不使用，设置为0
    PCSTR pMultiByteStr,    //需要转换的字符串
    int cchMultiByte,       //需要转换字符串的长度，字节数.如果设置为-1，则这个函数返回源字符串的长度
    PWSTR pWideCharStr,     //结果字符串
    int cchWideChar         //结果字符串的最大长度，不是字符串的长度。如果设置为0，则该函数不进行转换，而是返回需要的长度。
)

WideCharToMultiByte

WideCharToMultiByte用于将宽字符转换成多字节字符串，这里同样给出它的函数原型：

int WideCharToMultiByte(
    UINT CodePage,              //指定执行转换的代码页
    DWORD dwFlags,              //这个一般不使用，设置为0
    LPCWSTR lpWideCharStr,      //要转换为宽字节字符串
    int cchWideChar,            //需要转换的字符串的字符个数，如果设置为-1，则这个函数不进行转换，返回源字符串的长度
    LPSTR lpMultiByteStr,       //结果字符串
    int cchMultiByte,           //结果字符串的最大长度，如果设置为0，这个函数不转换，返回结果字符串需要的长度。
    LPCSTR lpDefaultChar,       //遇到一个不能转换的宽字符，函数便会使用这个字符。
                                //如果为NULL，则使用默认字符，通常为'?'，对文件名有危险！！因为表示通配符。
    LPBOOL pfUsedDefaultChar    //至少有一个字符不能转换为其多字节形式，函数就会把这个变量设为TRUE
)

下面给出一个ANSI和UNICODE互转的例子：

wstring ANSIToUnicode( const string& str )
{
    int  len = 0;
    len = str.length();
    int  unicodeLen = ::MultiByteToWideChar( CP_ACP, 0, str.c_str(), -1, NULL, 0 );  
    wchar_t *  pUnicode;  
    pUnicode = new  wchar_t[unicodeLen+1];  
    memset(pUnicode,0,(unicodeLen+1)*sizeof(wchar_t));  
    MultiByteToWideChar( CP_ACP, 0, str.c_str(), -1, (LPWSTR)pUnicode, unicodeLen );  
    wstring  rt;  
    rt = ( wchar_t* )pUnicode;
    delete  pUnicode; 
    return  rt;  
}   
string UnicodeToANSI( const wstring& str )
{
    char*     pElementText;
    int    iTextLen;
    iTextLen = WideCharToMultiByte( CP_ACP, 0, str.c_str(), -1, NULL, 0, NULL,  NULL );
    pElementText = new char[iTextLen + 1];
    memset( ( void* )pElementText, 0, sizeof( char ) * ( iTextLen + 1 ) );
    WideCharToMultiByte( CP_ACP, 0, str.c_str(), -1, pElementText, iTextLen, NULL, NULL );
    string strText;
    strText = pElementText;
    delete[] pElementText;
    return strText;
}
wstring UTF8ToUnicode( const string& str )
{
    int  len = 0;
    len = str.length();
    int  unicodeLen = ::MultiByteToWideChar( CP_UTF8, 0, str.c_str(), -1, NULL, 0 );  
    wchar_t *  pUnicode;  
    pUnicode = new  wchar_t[unicodeLen+1];  
    memset(pUnicode,0,(unicodeLen+1)*sizeof(wchar_t));  
    MultiByteToWideChar( CP_UTF8, 0, str.c_str(), -1, (LPWSTR)pUnicode, unicodeLen );  
    wstring  rt;  
    rt = ( wchar_t* )pUnicode;
    delete  pUnicode; 
    return  rt;  
}
string UnicodeToUTF8( const wstring& str )
{
    char*     pElementText;
    int    iTextLen;
    iTextLen = WideCharToMultiByte( CP_UTF8, 0, str.c_str(), -1, NULL, 0, NULL, NULL );
    pElementText = new char[iTextLen + 1];
    memset( ( void* )pElementText, 0, sizeof( char ) * ( iTextLen + 1 ) );
    WideCharToMultiByte( CP_UTF8, 0, str.c_str(), -1, pElementText, iTextLen, NULL, NULL );
    string strText;
    strText = pElementText;
    delete[] pElementText;
    return strText;
}

参考

eCharToMultiByte和MultiByteToWideChar函数的用法

格物致知

All Posts