Unicode
-
- Goblin
- Posts: 260
- Joined: Tue Oct 25, 2011 1:07 am
- x 36
Unicode
Ogre::String isn't UTF aware, Resource manager does not accept non-english paths, resource names can't contain non-ascii characters.
On top of all that Ogre doesn't compile with wstring.
I am very confused about it.
How do you guys handle this?
On top of all that Ogre doesn't compile with wstring.
I am very confused about it.
How do you guys handle this?
-
- Orc
- Posts: 438
- Joined: Tue Sep 18, 2007 5:28 pm
- Location: Seattle, USA
- x 13
Re: Unicode
Ogre::String can be considered equivalent to a typedef of std::string or std::wstring depending on how you configured Ogre3D to be built in CMake. The default is a typedef equivalent to std::string. It's not a question of Ogre is not UTF aware. I use Ogre::String in my game and I make sure to convert all my resource strings (which happens to be Korean), to UTF-8 before I assign them to Ogre::String. You will have to do the same. For e.g., you can boost libraries to convert a Korean string to UTF-8 string and store it in Ogre::String:
Code: Select all
Ogre::String fromEncoding(const std::string& theString, const std::string& encodingName)
{
return boost::locale::conv::utf<char>(theString, encodingName);
}
// Usage
Ogre::String encodedString = fromEncoding(eucEncodedString, "euckr");
-
- Goblin
- Posts: 260
- Joined: Tue Oct 25, 2011 1:07 am
- x 36
Re: Unicode
But what if user puts the game in a directory like "c:/привет мир".
Ogre isn't able to open that directory and parse the contents.
For instance: Ogre uses _findfirst windows function. Which is char* ascii. Should be _wfindfirst to open that directory.
I seems to my 2 solutions are
1.
use wstring on windows. (which is doesn't compile, ogre has lots of "" string literals, plugins are too liberal with string usage.)
2.
Edit Archivemanager to use unicode. wstring at windows, utf8 char in unix.
I also need to limit resource names to ascii characters to avoid replacing all calls to ResourceManagers'.
Both of them are serious hurdles for such an elementary functionality.
It makes me feel like i am missing the big picture.
Ogre isn't able to open that directory and parse the contents.
For instance: Ogre uses _findfirst windows function. Which is char* ascii. Should be _wfindfirst to open that directory.
I seems to my 2 solutions are
1.
use wstring on windows. (which is doesn't compile, ogre has lots of "" string literals, plugins are too liberal with string usage.)
2.
Edit Archivemanager to use unicode. wstring at windows, utf8 char in unix.
I also need to limit resource names to ascii characters to avoid replacing all calls to ResourceManagers'.
Both of them are serious hurdles for such an elementary functionality.
It makes me feel like i am missing the big picture.
-
- Orc
- Posts: 438
- Joined: Tue Sep 18, 2007 5:28 pm
- Location: Seattle, USA
- x 13
Re: Unicode
Are you sure Ogre isn't able to handle that? I haven't handled Russian encoding yet, but it works on EUC-KR encoded paths from my experience. Windows by default will convert a Korean path like 'data\sprite\몬스터\spritename.spr' to 'data\sprite\¸ó½ºÅÍ\spritename.spr', which can still be stored by Ogre::String. Problem creeps in when searching for the path though. You'll have to search it as an ANSI string or UTF-8 string. I prefer the latter because that way, the string can be read from a database if required and I don't need to worry about the encoding of the string.
-
- Goblin
- Posts: 260
- Joined: Tue Oct 25, 2011 1:07 am
- x 36
Re: Unicode
I have the path in both utf-18 and utf-8.
Which i need to convert to correct ansi codepage since ogre accept only char*.
Problem is path is in russian, while system locale is en-us.
Does not help because i'm at a US computer but file path is RU. So it gives a string filled with ????.
I can manually detect codepage and convert it, but still what if there are English and Russian characters are mixed in same path.
Is there a way to automatically convert paths to valid char arrays?
(btw. I'm ok with using boost::filesystem and boost::locale. I just don't know how use them to get out of this nasty situation.)
Which i need to convert to correct ansi codepage since ogre accept only char*.
Problem is path is in russian, while system locale is en-us.
Code: Select all
WideCharToMultiByte(CP_ACP, 0, &ws[0], (int)ws.size(), NULL, 0, NULL, NULL);
I can manually detect codepage and convert it, but still what if there are English and Russian characters are mixed in same path.
Is there a way to automatically convert paths to valid char arrays?
(btw. I'm ok with using boost::filesystem and boost::locale. I just don't know how use them to get out of this nasty situation.)
-
- Orc
- Posts: 438
- Joined: Tue Sep 18, 2007 5:28 pm
- Location: Seattle, USA
- x 13
Re: Unicode
In my case, English letters are mixed in the path. the boost locale function I posted does handle that very well. So in your case, it's a matter of finding what's the encoding of your Russian strings. It doesn't matter if there are English characters in between. Can you try if using the string "ru_ru" to convert the string (in my example above) fixes your issue? Remember that printing out the string in Console will do no good. It will come up as all ???... what you need to do is create a breakpoint and check the contents of Ogre::String to see it in Russian letters.
-
- Goblin
- Posts: 260
- Joined: Tue Oct 25, 2011 1:07 am
- x 36
Re: Unicode
There isn't a function named boost::locale::conv::utf.
Did you mean ?
Since we convert from utf to russian codepage.
Converts it to "c:/ÐÒÉ×ÅÔ ÍÉÒ" which is correct i think. Well it's not c:/??? ??? at least.
Still _findfirst function can't open directory. I suspect because system locale is en-US, and windows api can't understand it because of that.
Did you mean ?
Code: Select all
boost::locale::conv::from_utf<wchar_t>(boost::path(L"c:/привет мир").wstring(), "koi8-r");
Since we convert from utf to russian codepage.
Converts it to "c:/ÐÒÉ×ÅÔ ÍÉÒ" which is correct i think. Well it's not c:/??? ??? at least.
Still _findfirst function can't open directory. I suspect because system locale is en-US, and windows api can't understand it because of that.
-
- Orc
- Posts: 438
- Joined: Tue Sep 18, 2007 5:28 pm
- Location: Seattle, USA
- x 13
Re: Unicode
There is and I'm using it in my game client. That code was literally copy-pasted from my client's code. I'm using Boost 1.50. What version of Boost are you using?saejox wrote:There isn't a function named boost::locale::conv::utf.
I'm not sure if boost::path is doing funky with the string. Can you try doing it like this?
Code: Select all
boost::locale::conv::from_utf<wchar_t>(L"c:/привет мир", L"koi8-r");
-
- Goblin
- Posts: 260
- Joined: Tue Oct 25, 2011 1:07 am
- x 36
Re: Unicode
You must have typedef'd or changed it's name by mistake. Even google doesn't know about it.
Anyway, i am testing in a empty project (no Ogre).
This time i have used your string. Created a C:/몬스터 directory with bunch of resources in it.
utf8name is "C:/¸ó½ºÅÍ/*". Which is the same as you reported.
_findfirst still can't find the directory.
I used _findfirst because it's the one ogre uses.
I am not not going pursue ANSI route anymore. It depends on user's windows codepage and fails miserably when characters from different codepages are mixed.
Anyway, i am testing in a empty project (no Ogre).
This time i have used your string. Created a C:/몬스터 directory with bunch of resources in it.
Code: Select all
int main()
{
std::string utf8name = boost::locale::conv::from_utf<wchar_t>(L"C:/몬스터/*", "euckr");
struct _finddata_t tagData;
int g = _findfirst(utf8name.c_str(), &tagData);
return 0;
}
_findfirst still can't find the directory.
I used _findfirst because it's the one ogre uses.
I am not not going pursue ANSI route anymore. It depends on user's windows codepage and fails miserably when characters from different codepages are mixed.
-
- Orc
- Posts: 438
- Joined: Tue Sep 18, 2007 5:28 pm
- Location: Seattle, USA
- x 13
Re: Unicode
I'm sorry.. seems like it was my mistake after all... it's boost::locale::conv::to_utf<char>(). So in your case you should be doing this:saejox wrote:You must have typedef'd or changed it's name by mistake. Even google doesn't know about it.
Anyway, i am testing in a empty project (no Ogre).
This time i have used your string. Created a C:/몬스터 directory with bunch of resources in it.utf8name is "C:/¸ó½ºÅÍ/*". Which is the same as you reported.Code: Select all
int main() { std::string utf8name = boost::locale::conv::from_utf<wchar_t>(L"C:/몬스터/*", "euckr"); struct _finddata_t tagData; int g = _findfirst(utf8name.c_str(), &tagData); return 0; }
_findfirst still can't find the directory.
I used _findfirst because it's the one ogre uses.
I am not not going pursue ANSI route anymore. It depends on user's windows codepage and fails miserably when characters from different codepages are mixed.
Code: Select all
std::string utf8name = boost::locale::conv::to_utf<wchar_t>(L"C:/몬스터/*", "euckr");
-
- OGRE Expert User
- Posts: 1920
- Joined: Sun Feb 19, 2012 9:24 pm
- Location: Russia
- x 201
Re: Unicode
My way of handling Unicode in projects targeting Windows is sticking to the TCHAR paradigm. Unfortunately Ogre doesn't use it and one would have to go through all its codebase to fix all string literals (and maybe char/wchar_t capable function calls) to make that work.
I also think that taking the charset-based route is a bad idea. Unicode is the right way. You have two shortcut options here:
I also think that taking the charset-based route is a bad idea. Unicode is the right way. You have two shortcut options here:
- create a simple resource system that will use the correct functions to communicate with the file system to avoid any Unicode related issues in file paths
- cast a search/replace on the entire Ogre codebase to make all string literals become wchar_t by prefixing them with "L", then fix whatever functions need to be changed to their wchar_t equivalents
-
- OGRE Moderator
- Posts: 2819
- Joined: Mon Mar 05, 2007 11:17 pm
- Location: Canada
- x 218
Re: Unicode
Just to confirm - yes, this is a problem.saejox wrote:But what if user puts the game in a directory like "c:/привет мир".
Ogre isn't able to open that directory and parse the contents.
Ogre cannot handle game directories like that, at least up to and including 1.7 ( not sure about 1.8 )
In my project, for the windows build, I threaded through a wstring for the config paths in the Ogre::Root constructor:
Code: Select all
Root(const String& pluginFileName = "plugins.cfg",
// We need to be able to write the config file into user directories with usernames that include
// unicode characters.
// - note: this isn't necessary for plugins.cfg because we install our game into ProgramFiles,
// and the plugins.cfg file is in there, so there's no chance of unicode chars in the path.
// const String& configFileName = "ogre.cfg",
const std::wstring& mConfigFileName = L"ogre.cfg",
// Ditto
// const String& logFileName = "Ogre.log");
const std::wstring& logFileName = L"Ogre.log");

-
- Goblin
- Posts: 260
- Joined: Tue Oct 25, 2011 1:07 am
- x 36
Re: Unicode
I did it 
This is how i did it: (for the interested parties)
Added two static functions UTF8toUTF16 and UTF16toUTF8
I have replaced all windows file operations with with widestring UTF16 versions.
All Ogre resources are still std::string and assumed be UTF8, which was already the case for unix. They are converted to UTF16 when file operation is about.
Works for both paths and file names. So now Ogre can read Unicode filenames and paths like "C:\몬스터\몬" as it should. Unicode resource names are also possible and works great.
Unfortunately this would break any application that used ANSI codepages. Because it now assumes every Ogre::String is UTF8 encoded unicode.
Thank you.

This is how i did it: (for the interested parties)
Added two static functions UTF8toUTF16 and UTF16toUTF8
I have replaced all windows file operations with with widestring UTF16 versions.
All Ogre resources are still std::string and assumed be UTF8, which was already the case for unix. They are converted to UTF16 when file operation is about.
Works for both paths and file names. So now Ogre can read Unicode filenames and paths like "C:\몬스터\몬" as it should. Unicode resource names are also possible and works great.
Unfortunately this would break any application that used ANSI codepages. Because it now assumes every Ogre::String is UTF8 encoded unicode.
Thank you.
-
- Orc
- Posts: 438
- Joined: Tue Sep 18, 2007 5:28 pm
- Location: Seattle, USA
- x 13
Re: Unicode
I'm not clear with what you did behind the scenes, which is what anyone looking at this solving the same issue would look at. What was your implementation of UTF8toUTF16() and UTF16toUTF8()?
Are you sure that applications using ANSI codepages would break? Have you tested it? By theory, it shouldn't. UTF-8 is a multi-byte sized string. Which means ANSI characters are represented with 1 byte each, while characters that cannot be represented with 1 byte takes up more than 1 byte in the string. If it does break, you have problem with your implementation of UTF16toUTF8() function (and probably the other one too).
Are you sure that applications using ANSI codepages would break? Have you tested it? By theory, it shouldn't. UTF-8 is a multi-byte sized string. Which means ANSI characters are represented with 1 byte each, while characters that cannot be represented with 1 byte takes up more than 1 byte in the string. If it does break, you have problem with your implementation of UTF16toUTF8() function (and probably the other one too).
-
- Goblin
- Posts: 260
- Joined: Tue Oct 25, 2011 1:07 am
- x 36
Re: Unicode
Functions use MultiByteToWideChar and WideCharToMultiByte.
For the path you give:
euckr codepage produces: C:/¸ó½ºÅÍ
utf8 produces: C:/몬스터
To convert these to same utf16:
- UTF8 doesn't need to know any additional info. MultiByteToWideChar does it with CP_UTF8.
- To convert ansi code to UTF16, it needs to know which code page is used. MultiByteToWideChar can do this, but it needs code page integer code.
Problem is system code page might not be same as code page used in this string. My computer is en_US but i'm using Korean and russian mixed resource names for testing. ANSI can't show both korean and russian characters mixed in the same string. It's one or the other.
UTF16toUTF8 is just what it is. It takes UTF8 (which is a multibyte per character) and outputs UTF16(at least 2 bytes per character, wastes space if chars are ascii).
It is impossible to write a ANSItoUTF16 function without knowing the codepage.
Modding Ogre to use UTF8 easy because
1. Ogre already uses UTF8 for unix
2. Ogre uses FileSystemArchive class to write. There aren't any other file operations done anywhere in the core.
Honestly, ANSI is ancient. No OS but windows uses it and even Microsoft calls it deprecated.
For the path you give:
euckr codepage produces: C:/¸ó½ºÅÍ
utf8 produces: C:/몬스터
To convert these to same utf16:
- UTF8 doesn't need to know any additional info. MultiByteToWideChar does it with CP_UTF8.
- To convert ansi code to UTF16, it needs to know which code page is used. MultiByteToWideChar can do this, but it needs code page integer code.
Problem is system code page might not be same as code page used in this string. My computer is en_US but i'm using Korean and russian mixed resource names for testing. ANSI can't show both korean and russian characters mixed in the same string. It's one or the other.
UTF16toUTF8 is just what it is. It takes UTF8 (which is a multibyte per character) and outputs UTF16(at least 2 bytes per character, wastes space if chars are ascii).
It is impossible to write a ANSItoUTF16 function without knowing the codepage.
Modding Ogre to use UTF8 easy because
1. Ogre already uses UTF8 for unix
2. Ogre uses FileSystemArchive class to write. There aren't any other file operations done anywhere in the core.
Honestly, ANSI is ancient. No OS but windows uses it and even Microsoft calls it deprecated.
-
- Orc
- Posts: 438
- Joined: Tue Sep 18, 2007 5:28 pm
- Location: Seattle, USA
- x 13
Re: Unicode
So why didn't you opt for the cross-platform version instead? Which was to use boost::locale::conv::to_utf()? Was it because you wouldn't know the locale in advance? I'm just trying to understand why you opted for what you went for.
-
- Goblin
- Posts: 260
- Joined: Tue Oct 25, 2011 1:07 am
- x 36
Re: Unicode
Well, can't possibly know the locale. I can assume that users file path are encoded same as system code page. Which is true %99 of the time.vitefalcon wrote:So why didn't you opt for the cross-platform version instead? Which was to use boost::locale::conv::to_utf()? Was it because you wouldn't know the locale in advance? I'm just trying to understand why you opted for what you went for.
I don't think there are many Russians using Korean filepaths. But i rather not risk it.
Even today you still see some applications can't understand the encoding page and show bunch of ???????s. Unicode fixes all that why would i not use it.
Also there no reason to be cross platform. Unix is already UTF8 everywhere.
I don't convert anything at all in linux. They just work. (lesson: Microsoft is such a great company)