I'm trying to get a UTF codepoint number to pass to QuickGUI. This code works for most characters, but I get an exception from Ogre when I try to run this statement, where newChar == '£' (and no doubt this is true for a few other characters as-yet unexamined):
//! estimates the number of UTF-8 code points in the sequence starting with \a cp
static size_t _utf8_char_length( unsigned char cp ) {
if ( !( cp & 0x80 ) ) return 1;
if (( cp & ~_lead1_mask ) == _lead1 ) return 2;
if (( cp & ~_lead2_mask ) == _lead2 ) return 3;
if (( cp & ~_lead3_mask ) == _lead3 ) return 4;
if (( cp & ~_lead4_mask ) == _lead4 ) return 5;
if (( cp & ~_lead5_mask ) == _lead5 ) return 6;
throw invalid_data( "invalid UTF-8 sequence header value" );
}
the call stack from my statement looks like:
Client.exe!Ogre::UTFString::_utf8_char_length(unsigned char cp='£') Line 2001 + 0x44 bytes C++
Client.exe!Ogre::UTFString::assign(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & str="£") Line 1237 + 0x10 bytes C++
Client.exe!Ogre::UTFString::UTFString(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & str="£") Line 869 C++
'£' isn't a valid UTF code point, it's an (extended) ASCII character. Only the 7-bit ASCII table is encoded in unicode in the same code points, and '£' is not in that range, it's in the extended code pages. So you can't use it directly as a number, it's not a UTF code point.
If you look up the unicode tables you'll find that the code point for '£' is 0x00A3.
Ouch. So I guess this means I need to tell kungfoomasta that QuickGUI needs to support more than UTFStrings? -- Or, at least, can't use the codepoints to identify characters in that instance? I'm assuming he'll want to support as many characters in his textboxes as possible. The weird thing is, I was hitting shift-3, which should be #, but OIS + UTFString was reading it as a GBP sign. And I'm on an American-bought laptop, with American keyboard settings... I guess maybe OIS just isn't.
Is there a better unicode alternative than UTFString to support more characters?
All I do is grab the texture created by the font class, and get the UV rect boundaries a particular glyph via code point. The only way I would be able to print '£' to screen would be if a .ttf file supported this glyph, and had an associated code point. Even if '£' is not mapped to a particular code point, you could have/make a .ttf with it and give it a code point, and inject this code point into QuickGUI.
What is the relationship between ASCII characters and Ogre::Font? (How can I get texture coords from an ascii character?)
If I remember correctly ASCII is incorporated unchanged into Unicode, so the ASCII code would be identical to the code point. But please remember that there is no £ in ASCII.
I just looked it up and it seems there is a ASCII-variant, where the £ is placed at the code, where you usually would expect the #. That would explain Jekteir's shift-3 experiment. Something is definitely messing up with character encoding on the way from OIS to the FreeType library. Now the only thing left to do is it identify the culprit.
KungFooMasta wrote:What is the relationship between ASCII characters and Ogre::Font? (How can I get texture coords from an ascii character?)
Outside code point 127, there is no direct relationship, which is exactly the same as unicode. Ogre::Font is now completely unicode based and as such you must use unicode code points, which means you cannot take ASCII values directly except within the 7-bit ASCII range which is shared with unicode.
'£' as an ASCII value is completely non-portable because it's specific to an ASCII code page - if you use that in a different country it's likely not to work. Unicode 0x00A3 in contrast will work everywhere. Nothing is 'messing up' the '£' value - it was wrong to begin with. You just need to adjust your thinking to unicode Here's a useful tool: http://www.mikezilla.com/exp0012.html
Ogre::Font will reference unicode fonts correctly. Most TTF fonts are unicode these days anyway.
I should have know that. Fanatic RISC OS user for one and a half decade (for some reason those British RISC OS boxes are getting delivered in Germany with a British keyboard most of the time). Only question left is, why Jekteir didn't know that he had a British keyboard mapping on his system
That was my whole point. If I hit Shift-3 in Notepad, I get #. If I do it with QuickGUI, it tries to write the GBP sign. So something is wrong in the code, it seems to me.
OIS just gives you key codes, identifiers for physical keys - it's up to the receiving application to turn that into a character based on the current regional keyboard layout.
My guess is that something in your app (perhaps QuickGUI) is using a fixed lookup table to turn the key ID to a character, and again, like assuming extended ASCII characters are the same everywhere, is not portable. Unicode is the way to store characters reliably across regions, but to arrive at the character code you also need to have a region-aware mapping from key code to unicode characters. Notepad is doing that, your app isn't, that's why they're different.
I have copied the OIS KeyCode enum table into QuickGUI, so that OIS::KC_ESCAPE maps to QuickGUI::ScanCode::KC_ESCAPE, etc. For injecting characters, I use OIS::KeyEvent::text, I forget the type, but I believe it's a Code Point. Aside from the static cast to inject an OIS::KeyCode to a QuickGUI::ScanCode (or something similarly named), I don't do any changing of the injected data.
Ok, I've never really looked at this aspect of OIS since CEGui has always handled the unicode conversion for me, but having checked OIS does indeed default to returning Unicode, according to the OIS::Keyboard::setTranslationMode default. That's pretty cool, well done pjcast
It comes back as an unsigned int. The key thing then is what you're doing with that. If you're using push_back directly on UTFString, passing the unsigned int, that should be fine. If you're going through a char* or std::string in the middle, that could be problematic.
Yah, I don't type cast the data, and I use the operations of the UTFString class, like push_back and insert, etc.
@Jekteir:
What data does OIS give you when you press Shift+3? (KeyCode and the text) What font are you using, and what symbol is represented by the code point you receive?
If you're using Windows, you can check out the
Program Files -> Accessories -> System Tools -> Character Map
Application, and find out what symbol is represented by the code point you're given. (I think..)