Ogre::UTFString .getChar exception

Problems building or running the engine, queries about how to use features etc.
User avatar
Jekteir
Halfling
Posts: 80
Joined: Mon Jun 19, 2006 9:13 pm

Ogre::UTFString .getChar exception

Post by Jekteir »

I'm trying to get a UTF codepoint number to pass to QuickGUI. This code works for most characters, but I get an exception from Ogre when I try to run this statement, where newChar == '£' (and no doubt this is true for a few other characters as-yet unexamined):

Code: Select all

Ogre::UTFString::UTFString(newChar).getChar(0)
it throws inside this function:

Code: Select all

		//! estimates the number of UTF-8 code points in the sequence starting with \a cp
		static size_t _utf8_char_length( unsigned char cp ) {
			if ( !( cp & 0x80 ) ) return 1;
			if (( cp & ~_lead1_mask ) == _lead1 ) return 2;
			if (( cp & ~_lead2_mask ) == _lead2 ) return 3;
			if (( cp & ~_lead3_mask ) == _lead3 ) return 4;
			if (( cp & ~_lead4_mask ) == _lead4 ) return 5;
			if (( cp & ~_lead5_mask ) == _lead5 ) return 6;
			throw invalid_data( "invalid UTF-8 sequence header value" );
		}
the call stack from my statement looks like:
Client.exe!Ogre::UTFString::_utf8_char_length(unsigned char cp='£') Line 2001 + 0x44 bytes C++
Client.exe!Ogre::UTFString::assign(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & str="£") Line 1237 + 0x10 bytes C++
Client.exe!Ogre::UTFString::UTFString(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & str="£") Line 869 C++
Can anyone help me out with this?

Thanks,

Jek
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 67

Post by sinbad »

'£' isn't a valid UTF code point, it's an (extended) ASCII character. Only the 7-bit ASCII table is encoded in unicode in the same code points, and '£' is not in that range, it's in the extended code pages. So you can't use it directly as a number, it's not a UTF code point.

If you look up the unicode tables you'll find that the code point for '£' is 0x00A3.
User avatar
Jekteir
Halfling
Posts: 80
Joined: Mon Jun 19, 2006 9:13 pm

Post by Jekteir »

Ouch. So I guess this means I need to tell kungfoomasta that QuickGUI needs to support more than UTFStrings? -- Or, at least, can't use the codepoints to identify characters in that instance? I'm assuming he'll want to support as many characters in his textboxes as possible. The weird thing is, I was hitting shift-3, which should be #, but OIS + UTFString was reading it as a GBP sign. And I'm on an American-bought laptop, with American keyboard settings... I guess maybe OIS just isn't.

Is there a better unicode alternative than UTFString to support more characters?

Jek
User avatar
KungFooMasta
OGRE Contributor
OGRE Contributor
Posts: 2087
Joined: Thu Mar 03, 2005 7:11 am
Location: WA, USA
x 16

Post by KungFooMasta »

I use the Ogre::Font class for QGUI, via the function:

Code: Select all

const UVRect &  getGlyphTexCoords (CodePoint id) const
All I do is grab the texture created by the font class, and get the UV rect boundaries a particular glyph via code point. The only way I would be able to print '£' to screen would be if a .ttf file supported this glyph, and had an associated code point. Even if '£' is not mapped to a particular code point, you could have/make a .ttf with it and give it a code point, and inject this code point into QuickGUI.

What is the relationship between ASCII characters and Ogre::Font? (How can I get texture coords from an ascii character?)
Creator of QuickGUI!
User avatar
Zini
Goblin
Posts: 254
Joined: Fri Nov 18, 2005 7:30 pm

Post by Zini »

If I remember correctly ASCII is incorporated unchanged into Unicode, so the ASCII code would be identical to the code point. But please remember that there is no £ in ASCII.

http://en.wikipedia.org/wiki/ASCII

Edit: Opps, that is pretty much what sinbad has written a few posts above. Should have read them first.
User avatar
KungFooMasta
OGRE Contributor
OGRE Contributor
Posts: 2087
Joined: Thu Mar 03, 2005 7:11 am
Location: WA, USA
x 16

Post by KungFooMasta »

Does this mean £ is not found in any .ttf files?
Creator of QuickGUI!
User avatar
Zini
Goblin
Posts: 254
Joined: Fri Nov 18, 2005 7:30 pm

Post by Zini »

No. Some/most .ttf files have more than the ASCII glyphs.
User avatar
Zini
Goblin
Posts: 254
Joined: Fri Nov 18, 2005 7:30 pm

Post by Zini »

I just looked it up and it seems there is a ASCII-variant, where the £ is placed at the code, where you usually would expect the #. That would explain Jekteir's shift-3 experiment. Something is definitely messing up with character encoding on the way from OIS to the FreeType library. Now the only thing left to do is it identify the culprit.
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 67

Post by sinbad »

KungFooMasta wrote:What is the relationship between ASCII characters and Ogre::Font? (How can I get texture coords from an ascii character?)
Outside code point 127, there is no direct relationship, which is exactly the same as unicode. Ogre::Font is now completely unicode based and as such you must use unicode code points, which means you cannot take ASCII values directly except within the 7-bit ASCII range which is shared with unicode.

'£' as an ASCII value is completely non-portable because it's specific to an ASCII code page - if you use that in a different country it's likely not to work. Unicode 0x00A3 in contrast will work everywhere. Nothing is 'messing up' the '£' value - it was wrong to begin with. You just need to adjust your thinking to unicode :) Here's a useful tool: http://www.mikezilla.com/exp0012.html

Ogre::Font will reference unicode fonts correctly. Most TTF fonts are unicode these days anyway.
User avatar
Zini
Goblin
Posts: 254
Joined: Fri Nov 18, 2005 7:30 pm

Post by Zini »

With "being messed up", I meant this here:
The weird thing is, I was hitting shift-3, which should be #, but OIS + UTFString was reading it as a GBP sign
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 67

Post by sinbad »

Ok, that's just down to keyboard mapping. SHIFT-3 is GBP on a British keyboard.
User avatar
Zini
Goblin
Posts: 254
Joined: Fri Nov 18, 2005 7:30 pm

Post by Zini »

I should have know that. Fanatic RISC OS user for one and a half decade (for some reason those British RISC OS boxes are getting delivered in Germany with a British keyboard most of the time). Only question left is, why Jekteir didn't know that he had a British keyboard mapping on his system ;)
User avatar
Jekteir
Halfling
Posts: 80
Joined: Mon Jun 19, 2006 9:13 pm

Post by Jekteir »

That was my whole point. If I hit Shift-3 in Notepad, I get #. If I do it with QuickGUI, it tries to write the GBP sign. So something is wrong in the code, it seems to me.

Jek
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 67

Post by sinbad »

OIS just gives you key codes, identifiers for physical keys - it's up to the receiving application to turn that into a character based on the current regional keyboard layout.

My guess is that something in your app (perhaps QuickGUI) is using a fixed lookup table to turn the key ID to a character, and again, like assuming extended ASCII characters are the same everywhere, is not portable. Unicode is the way to store characters reliably across regions, but to arrive at the character code you also need to have a region-aware mapping from key code to unicode characters. Notepad is doing that, your app isn't, that's why they're different.
User avatar
KungFooMasta
OGRE Contributor
OGRE Contributor
Posts: 2087
Joined: Thu Mar 03, 2005 7:11 am
Location: WA, USA
x 16

Post by KungFooMasta »

I have copied the OIS KeyCode enum table into QuickGUI, so that OIS::KC_ESCAPE maps to QuickGUI::ScanCode::KC_ESCAPE, etc. For injecting characters, I use OIS::KeyEvent::text, I forget the type, but I believe it's a Code Point. Aside from the static cast to inject an OIS::KeyCode to a QuickGUI::ScanCode (or something similarly named), I don't do any changing of the injected data.
Creator of QuickGUI!
User avatar
sinbad
OGRE Retired Team Member
OGRE Retired Team Member
Posts: 19269
Joined: Sun Oct 06, 2002 11:19 pm
Location: Guernsey, Channel Islands
x 67

Post by sinbad »

Ok, I've never really looked at this aspect of OIS since CEGui has always handled the unicode conversion for me, but having checked OIS does indeed default to returning Unicode, according to the OIS::Keyboard::setTranslationMode default. That's pretty cool, well done pjcast :)

It comes back as an unsigned int. The key thing then is what you're doing with that. If you're using push_back directly on UTFString, passing the unsigned int, that should be fine. If you're going through a char* or std::string in the middle, that could be problematic.
User avatar
KungFooMasta
OGRE Contributor
OGRE Contributor
Posts: 2087
Joined: Thu Mar 03, 2005 7:11 am
Location: WA, USA
x 16

Post by KungFooMasta »

Yah, I don't type cast the data, and I use the operations of the UTFString class, like push_back and insert, etc.

@Jekteir:

What data does OIS give you when you press Shift+3? (KeyCode and the text) What font are you using, and what symbol is represented by the code point you receive?

If you're using Windows, you can check out the

Program Files -> Accessories -> System Tools -> Character Map

Application, and find out what symbol is represented by the code point you're given. (I think..)
Creator of QuickGUI!