Crash in libGL with fglrx and OpenGL Rendering System
-
- Gnoblar
- Posts: 4
- Joined: Tue May 25, 2021 11:18 pm
Crash in libGL with fglrx and OpenGL Rendering System
Ogre Version: 1.12.9 (with minor changes)
Operating System: Linux 3.18
Render System: OpenGL Rendering System
I have been dealing with issues in our application that embeds the Ogre engine for two weeks. Everything is working out nicely until we unload the engine (by destroying the root object). Once the render system is unloaded, many OpenGL calls will behave erroneously or outright segfault our application. This mostly happens, very unfortunately, after reloading the engine (typically only when ASLR is turned on) when Ogre calls glXMakeCurrent. It also happens when we are calling that function in our code once Ogre has been loaded and unloaded, but not before that. Additionally, the unloading of libGL when the application exits also causes a segfault.
I have confirmed that loading the render system with RTLD_NODELETE while ASLR is enabled can work around that issue entirely. Further, I have also confirmed that turning ASLR off solves at least the issue with crashes after reloading the engine. Explicitly linking against all the X, GL, etc. libraries does not solve the issue.
Now, I would like to know if anyone has experienced similar issues or if there are any ideas on what could cause this. There must be some kind of reference in libGL to something within the render system library. My guess is that AMD's libGL implementation does something weird here, but maybe there is something off with Ogre here.
P.S.:
I am aware that this issue very likely stems from the render system being loaded at a different address than the first time it was loaded.
Operating System: Linux 3.18
Render System: OpenGL Rendering System
I have been dealing with issues in our application that embeds the Ogre engine for two weeks. Everything is working out nicely until we unload the engine (by destroying the root object). Once the render system is unloaded, many OpenGL calls will behave erroneously or outright segfault our application. This mostly happens, very unfortunately, after reloading the engine (typically only when ASLR is turned on) when Ogre calls glXMakeCurrent. It also happens when we are calling that function in our code once Ogre has been loaded and unloaded, but not before that. Additionally, the unloading of libGL when the application exits also causes a segfault.
I have confirmed that loading the render system with RTLD_NODELETE while ASLR is enabled can work around that issue entirely. Further, I have also confirmed that turning ASLR off solves at least the issue with crashes after reloading the engine. Explicitly linking against all the X, GL, etc. libraries does not solve the issue.
Now, I would like to know if anyone has experienced similar issues or if there are any ideas on what could cause this. There must be some kind of reference in libGL to something within the render system library. My guess is that AMD's libGL implementation does something weird here, but maybe there is something off with Ogre here.
P.S.:
I am aware that this issue very likely stems from the render system being loaded at a different address than the first time it was loaded.
- sercero
- Bronze Sponsor
- Posts: 450
- Joined: Sun Jan 18, 2015 4:20 pm
- Location: Buenos Aires, Argentina
- x 156
Re: Crash in libGL with fglrx and OpenGL Rendering System
Why do you unload the engine?
What errors do you get?
What errors do you get?
-
- OGRE Team Member
- Posts: 1995
- Joined: Sun Mar 30, 2014 2:51 pm
- x 1075
- Contact:
Re: Crash in libGL with fglrx and OpenGL Rendering System
glXMakeCurrent is resolved by the linker, while other GL functions are resolved at runtime by dlsym (via GLEW/ gl3w).
Try comparing the glXMakeCurrent function pointer to what dlsym would give you.
For further reading:
https://github.com/NVIDIA/libglvnd#architecture
Try comparing the glXMakeCurrent function pointer to what dlsym would give you.
For further reading:
https://github.com/NVIDIA/libglvnd#architecture
-
- Gnoblar
- Posts: 4
- Joined: Tue May 25, 2021 11:18 pm
Re: Crash in libGL with fglrx and OpenGL Rendering System
Because that's how this application was built. It receives commands from another application and one of those is resetting the engine. Whoever built this thought it would be a neat idea to completely reload the engine at that point.Why do you unload the engine?
I'm getting a segmentation fault. When creating the window, Ogre calls glXMakeCurrent, libGL calls into fglrx_dri, many calls within that library happen, then a call back into libGL, followed by a segmentation fault.What errors do you get?
I might take a look at that, but I fail to see how this might be relevant. Essentially, we aren't really calling glXMakeCurrent or anything like that outside the engine. I just tried that to see if I could somehow get an external context into Ogre because I thought that might solve the issue. BTW, it actually works if I'm using my RTLD_NODELETE workaround, but it's also completely unnecessary with that workaround. I was told we're going with that solution, but I'd still like to figure out what exactly is going on here.glXMakeCurrent is resolved by the linker, while other GL functions are resolved at runtime by dlsym (via GLEW/ gl3w).
Try comparing the glXMakeCurrent function pointer to what dlsym would give you.
Again: Essentially, the problem is that after reloading the render system, most GL calls will fail in weird ways (e.g. GL_FALSE returned, but error handler not called) or cause a segmentation fault.
- dark_sylinc
- OGRE Team Member
- Posts: 5298
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 1279
- Contact:
Re: Crash in libGL with fglrx and OpenGL Rendering System
I suspect whatever you're doing to reload the engine has to do with dlopen of libGL.
Or perhaps GL3PlusRenderSystem::initialiseContext is not called correctly the 2nd time, or too late.
Btw why fglrx? That driver was really buggy and modern Mesa is much superior to fglrx, supporting older cards all the way to the Radeon HD 2000
Or perhaps GL3PlusRenderSystem::initialiseContext is not called correctly the 2nd time, or too late.
Btw why fglrx? That driver was really buggy and modern Mesa is much superior to fglrx, supporting older cards all the way to the Radeon HD 2000
-
- Gnoblar
- Posts: 4
- Joined: Tue May 25, 2021 11:18 pm
Re: Crash in libGL with fglrx and OpenGL Rendering System
That could be it. I'll probably investigate that once I'm back from vacation.I suspect whatever you're doing to reload the engine has to do with dlopen of libGL.
I mentioned that we should try it with the radeon driver instead, but we don't have access to a module for our kernel, we would have to redo the display configuration, and we don't want to deal with potentially worse performance. Basically, I was told it's too much work, we'd have to get our kernel guy to build the module, and the workaround seems to be good enough.Btw why fglrx? That driver was really buggy and modern Mesa is much superior to fglrx, supporting older cards all the way to the Radeon HD 2000
-
- OGRE Team Member
- Posts: 1995
- Joined: Sun Mar 30, 2014 2:51 pm
- x 1075
- Contact:
Re: Crash in libGL with fglrx and OpenGL Rendering System
for reference, I also observed some strange behavior with nvidia (proprietary): when switching from GL3+ to GLES2 using EGL within the same process, some GLES functions behave like in the GL3+ context, making the GLES2 rendersystem crash. This does not occur when using GLX.
Maybe using EGL for you is worth a try - at least to get another datapoint.
Maybe using EGL for you is worth a try - at least to get another datapoint.
-
- Gnoblar
- Posts: 4
- Joined: Tue May 25, 2021 11:18 pm
Re: Crash in libGL with fglrx and OpenGL Rendering System
I'm not sure if that's even supported with fglrx. The Wikipedia page has a list of implementations, but fglrx is absent from that list. Curiously, the Nvidia closed-source driver supports that since 2013.
As it stands, it's either "make sure libGL is unloaded before the render system (or one of the libraries that get pulled in) is unloaded" or "suffer weird behavior and segmentation faults on engine reload or application exit."
As it stands, it's either "make sure libGL is unloaded before the render system (or one of the libraries that get pulled in) is unloaded" or "suffer weird behavior and segmentation faults on engine reload or application exit."
- dark_sylinc
- OGRE Team Member
- Posts: 5298
- Joined: Sat Jul 21, 2007 4:55 pm
- Location: Buenos Aires, Argentina
- x 1279
- Contact:
Re: Crash in libGL with fglrx and OpenGL Rendering System
Another recommendation I can think of is that you try reproducing the problem on a modern Linux machine to see if the problem persists or if Mesa complains about something that gives you a hint.
If you're lucky, you'll get a crash. If so, build Mesa from source (that's easy on Ubuntu) in full debug, then you'll get a full call stack trace you can debug and walk your way backwards through dangling pointers to see what's wrong, you can even run valgrind on it.
Then once fixed, that hopefully fixes fglrx as well.
If you're lucky, you'll get a crash. If so, build Mesa from source (that's easy on Ubuntu) in full debug, then you'll get a full call stack trace you can debug and walk your way backwards through dangling pointers to see what's wrong, you can even run valgrind on it.
Then once fixed, that hopefully fixes fglrx as well.