Crash in libGL with fglrx and OpenGL Rendering System

Problems building or running the engine, queries about how to use features etc.
Post Reply
oold
Gnoblar
Posts: 4
Joined: Tue May 25, 2021 11:18 pm

Crash in libGL with fglrx and OpenGL Rendering System

Post by oold »

Ogre Version: 1.12.9 (with minor changes)
Operating System: Linux 3.18
Render System: OpenGL Rendering System

I have been dealing with issues in our application that embeds the Ogre engine for two weeks. Everything is working out nicely until we unload the engine (by destroying the root object). Once the render system is unloaded, many OpenGL calls will behave erroneously or outright segfault our application. This mostly happens, very unfortunately, after reloading the engine (typically only when ASLR is turned on) when Ogre calls glXMakeCurrent. It also happens when we are calling that function in our code once Ogre has been loaded and unloaded, but not before that. Additionally, the unloading of libGL when the application exits also causes a segfault.

I have confirmed that loading the render system with RTLD_NODELETE while ASLR is enabled can work around that issue entirely. Further, I have also confirmed that turning ASLR off solves at least the issue with crashes after reloading the engine. Explicitly linking against all the X, GL, etc. libraries does not solve the issue.

Now, I would like to know if anyone has experienced similar issues or if there are any ideas on what could cause this. There must be some kind of reference in libGL to something within the render system library. My guess is that AMD's libGL implementation does something weird here, but maybe there is something off with Ogre here.

P.S.:
I am aware that this issue very likely stems from the render system being loaded at a different address than the first time it was loaded.
User avatar
sercero
Bronze Sponsor
Bronze Sponsor
Posts: 449
Joined: Sun Jan 18, 2015 4:20 pm
Location: Buenos Aires, Argentina
x 155

Re: Crash in libGL with fglrx and OpenGL Rendering System

Post by sercero »

Why do you unload the engine?

What errors do you get?
paroj
OGRE Team Member
OGRE Team Member
Posts: 1993
Joined: Sun Mar 30, 2014 2:51 pm
x 1073
Contact:

Re: Crash in libGL with fglrx and OpenGL Rendering System

Post by paroj »

glXMakeCurrent is resolved by the linker, while other GL functions are resolved at runtime by dlsym (via GLEW/ gl3w).
Try comparing the glXMakeCurrent function pointer to what dlsym would give you.

For further reading:
https://github.com/NVIDIA/libglvnd#architecture
oold
Gnoblar
Posts: 4
Joined: Tue May 25, 2021 11:18 pm

Re: Crash in libGL with fglrx and OpenGL Rendering System

Post by oold »

Why do you unload the engine?
Because that's how this application was built. It receives commands from another application and one of those is resetting the engine. Whoever built this thought it would be a neat idea to completely reload the engine at that point.
What errors do you get?
I'm getting a segmentation fault. When creating the window, Ogre calls glXMakeCurrent, libGL calls into fglrx_dri, many calls within that library happen, then a call back into libGL, followed by a segmentation fault.
glXMakeCurrent is resolved by the linker, while other GL functions are resolved at runtime by dlsym (via GLEW/ gl3w).
Try comparing the glXMakeCurrent function pointer to what dlsym would give you.
I might take a look at that, but I fail to see how this might be relevant. Essentially, we aren't really calling glXMakeCurrent or anything like that outside the engine. I just tried that to see if I could somehow get an external context into Ogre because I thought that might solve the issue. BTW, it actually works if I'm using my RTLD_NODELETE workaround, but it's also completely unnecessary with that workaround. I was told we're going with that solution, but I'd still like to figure out what exactly is going on here.
Again: Essentially, the problem is that after reloading the render system, most GL calls will fail in weird ways (e.g. GL_FALSE returned, but error handler not called) or cause a segmentation fault.
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5292
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Crash in libGL with fglrx and OpenGL Rendering System

Post by dark_sylinc »

I suspect whatever you're doing to reload the engine has to do with dlopen of libGL.

Or perhaps GL3PlusRenderSystem::initialiseContext is not called correctly the 2nd time, or too late.

Btw why fglrx? That driver was really buggy and modern Mesa is much superior to fglrx, supporting older cards all the way to the Radeon HD 2000
oold
Gnoblar
Posts: 4
Joined: Tue May 25, 2021 11:18 pm

Re: Crash in libGL with fglrx and OpenGL Rendering System

Post by oold »

I suspect whatever you're doing to reload the engine has to do with dlopen of libGL.
That could be it. I'll probably investigate that once I'm back from vacation.
Btw why fglrx? That driver was really buggy and modern Mesa is much superior to fglrx, supporting older cards all the way to the Radeon HD 2000
I mentioned that we should try it with the radeon driver instead, but we don't have access to a module for our kernel, we would have to redo the display configuration, and we don't want to deal with potentially worse performance. Basically, I was told it's too much work, we'd have to get our kernel guy to build the module, and the workaround seems to be good enough.
paroj
OGRE Team Member
OGRE Team Member
Posts: 1993
Joined: Sun Mar 30, 2014 2:51 pm
x 1073
Contact:

Re: Crash in libGL with fglrx and OpenGL Rendering System

Post by paroj »

for reference, I also observed some strange behavior with nvidia (proprietary): when switching from GL3+ to GLES2 using EGL within the same process, some GLES functions behave like in the GL3+ context, making the GLES2 rendersystem crash. This does not occur when using GLX.

Maybe using EGL for you is worth a try - at least to get another datapoint.
oold
Gnoblar
Posts: 4
Joined: Tue May 25, 2021 11:18 pm

Re: Crash in libGL with fglrx and OpenGL Rendering System

Post by oold »

I'm not sure if that's even supported with fglrx. The Wikipedia page has a list of implementations, but fglrx is absent from that list. Curiously, the Nvidia closed-source driver supports that since 2013.

As it stands, it's either "make sure libGL is unloaded before the render system (or one of the libraries that get pulled in) is unloaded" or "suffer weird behavior and segmentation faults on engine reload or application exit."
User avatar
dark_sylinc
OGRE Team Member
OGRE Team Member
Posts: 5292
Joined: Sat Jul 21, 2007 4:55 pm
Location: Buenos Aires, Argentina
x 1278
Contact:

Re: Crash in libGL with fglrx and OpenGL Rendering System

Post by dark_sylinc »

Another recommendation I can think of is that you try reproducing the problem on a modern Linux machine to see if the problem persists or if Mesa complains about something that gives you a hint.

If you're lucky, you'll get a crash. If so, build Mesa from source (that's easy on Ubuntu) in full debug, then you'll get a full call stack trace you can debug and walk your way backwards through dangling pointers to see what's wrong, you can even run valgrind on it.

Then once fixed, that hopefully fixes fglrx as well.
Post Reply