C++ wchar_t type and wstring type
In C++ wchar_t type also known as “wide character” and wstring type also known as “wide string” are a built-in types in C++ which can be used for Unicode system to represent a different languages(other than English) characters and string in our program.Here we will discuss their basic concepts.
wchar_t type
Wide character type can represent characters which the ‘char’ type cannot.’char’ type has a size of 1 byte so it can hold only 256 characters consisting of English Alphabets,special characters( @,^#&%* ) and some other characters,but there are many known languages and script in the world for instance,Japanese(Kanji or Hiragana or Katakana) , Russian Alphabets , Hindi (Devanagari alphabets) ,etc.
These scripts and alphabets can be represented by wide character type and it’s keyword is wchar_t.wchar_t type have a size larger than char type and the size differ according to the machine and platform.In Windows it’s size is 2 bytes and in Linux the size is 4 bytes.To assign a character to wchar_t type a letter “L” is added in front of the character specifying that it is a wide character type and to output a wchar_t type ‘cout’ is not used instead ‘wcout’ is used.It simply means wide console output.An example program is given below.
#include <iostream> using namespace std ; int main( ) { wchar_t wc1=L’c’ ; //note the 'L' before the initializer wcout<< wc1 << endl ; cout<< wc1 << endl ; cout<< sizeof(wchar_t) << endl ; /*** For linux ***/ wcout<< sizeof(wchar_t) ; /***/ cin.get() ; return 0 ; }
The output is
c
99
2
For Linux it may 2 or 4 check your system.
Note::if we use cout instead of wcout we get an integer value of the character as an output.Every character English alphabets or not has their corresponding integer value ,if cout is used with wide character variable we are accessing the integer value of the character but if wcout is used the compiler will map the integer value to it’s character value and give it as an output.
Since wchar_t size is 2 bytes(16 bits) or more it can represent 65536 (216) or more characters, which is big enough to represent almost all the known possible languages script in the world.
wchar_t can be used to represent Unicode characters and ASCII is part of the Unicode system.The first 256 characters of Unicode is same as the ASCII characters from 0 to 127 plus 128 other characters which char data type can represent.This is the reason why wchar can output English alphabets and some special characters which the normal char type can.
#include <iostream> using namespace std; int main( ) { int i=80 , i1=157 ; // i<127 , i1>127 wchar_t wc=i , wc1=i1 ; unsigned char us=i , us1=i1 ; wcout<< wc << ” ” << wc1 << endl ; cout<< us << ” ” << us1 << endl ; cin.get( ) ; return 0 ; }
The output is
P ¥
P ¥
Link:char data type : unsigned char type and signed char type.
wstring type
wstring is a type that can represent a string of wide character.It has a size of 4 bytes and it’s function is analogous to string type but it can represent collection of wchar_t data.Like the wchar_t type to assign a string to wstring variable a letter “L” will be added and to output the string wcout will be used.A simple wstring program is given below.
#include <iostream> using namespace std; int main( ) { wstring ws=L”Where is my Japanese string?” ; wcout<< ws << endl ; cout<< sizeof(wstring) << endl ; cin.get( ) ; return 0 ; }
Note a double quotation “” will be used (like the string type) to assign a wide string to it’s variable.
Why can’t wchar_t print Unicode character?
Suppose if I try to print a Unicode character using wchar_t type will I get the character as an output? let us see.Consider the program below in which we will try to print a Hindi letter using wchar_t.
wchar wc=L’ऐ’ ; wcout<< wc endl ; //using wcout cout<< wc ; //using cout
If you run the program you will get the output as,
(blank)
2320
Unfortunately,the first output is blank,meaning wcout does not print anything ,the second output is 2320,which is a corresponding integer value of the character ‘ऐ’ in a Unicode system.Why does wcout gives blank output? When the compiler tries to print the character ‘ऐ’ it will take note of the integer value 2320 and will search for the character in the font use by the console.If the compiler finds the matching font of the value 2320 it will print it out on the screen but if it does not find any font that has the value 2320 or say the font used by the console does not support it then it prints nothing.Here,in our case the font used by the Console does not support Hindi alphabets so it cannot print the character.Hence we get the blank output.
I am sure now you are thinking “So much for trusting wchar_t type to support Unicode system and it can’t even print out the character..”.Well my friend you are not betrayed,wchar_t type does support Unicode system and it can print out any know languages script in the world but with little tweaking.What are the tweaks and how to get the console to print out any script is not shown here ,we will discuss that in another post cause it is better to dedicate an entire post for this;in the meanwhile happy programming!