Thank you for your answers!
It has helped a lot in trying to figure out how to do this.
Now after a while of coding, I think I found a possible solution, avoiding multiple materials or a special load order and that it supports default parameters in the materials.
just a simple std::vector+std::mutex+std::threads via .join()
Do I need to use mutex in order to compile shaders? Can't many shaders compile at the same time?
Is it because of D3D9HLSLProgram::prepareImpl adding/getting it from/to the microcode?
Is D3D9HLSLProgram::prepareImpl/D3D9HLSLProgram::compileMicrocode not thread safe with multiple threads using it?
In that case, how would I handle this? If they are not thread safe for multiple threads at once, how can I ever load them faster than a synchronous compiler would?
With my current solution, when I load the shaders synchronous and exit the game, my shader cache is identical to when I do the same thing but with multiple threads, so I guess that is an indicator it is thread safe from multiple threads at the same time?
My current approach is that I add a new event to the Ogre Source before it tries to set the default parameters, which I use a listener for with ScriptCompilerListener::handleEvent, where I basically copy what I need (the compiler, program and its params), and then return true so it won't continue in the Ogre Source with how it sets the default parameters there (more in detail below).
That means that I get the program and its parameters in my user-code to be able to set them later.
Also, at ScriptCompilerListener::handleEvent, I just add its prepare function to a background thread (either with Ogre WorkQueue or my own thread manager).
Later, when the shader has been compiled (and called ResourceBackgroundQueue::Listener::operationCompleted or just waited until a thread has finished with my own thread manager) I add back the parameter list to it by simply using the same code Ogre wanted to do before on my "compiler" and "params" values I saved earlier (using GpuProgramTranslator::translateProgramParameters).
That makes it so I can load shader materials that has parameters in them without having to do a special way of loading them with multiple materials or in a certain order.
I only do the above if I want it on a resource group, and that resource group must only be using .hlsl files and .material files to load those shaders (nothing else, not any textures, normal materials, compositors or anything).
That way I know for sure that it cannot be messed up by any load order.
Here are my results when I profile the loading of that resource group full of shaders with different methods:
Using no threading: 20.300716
Using Ogre WorkQueue with 6 threads: 20.101999
Using Ogre WorkQueue with 12 threads: 19.973398
Using Ogre WorkQueue with 50 threads: 19.875628
Using my own thread manager with 6 threads: 6.532257
Using my own thread manager with 12 threads: 6.384386
Using my own thread manager with 50 threads: 6.079521
Updating or not updating the loading screen has no significant effect on the timestamps above for any test I tried.
For Ogre WorkQueue, I add each shader (HighLevelGpuProgramPtr) to a new "tmpResourceBackgroundQueue.prepare" call.
I find it kind of strange that the Ogre WorkQueue way is so slow... If I set it to use 12 threads it still only seems to compile one at a time, and for me it seems it even compiles them in the order I added them in (but I might be wrong), when I would expect some of them to render slower than others, which would make them not in the same order.
I really don't understand how the WorkQueue works, because how can it almost be as slow with 6 threads compared to 50 threads? Is it really only on one thread even if I set it to more?
And how can it almost always be just as slow as not using any threading at all?
Also, calling "prog->setBackgroundLoaded(true);" before "tmpResourceBackgroundQueue.prepare" seems to have no difference when it comes to the speed.
For my own thread manager, I add each shader (HighLevelGpuProgramPtr) to a new thread and call resource->prepare, and make sure only X amount of threads are used at a time.
So in short, my own thread manager is 3 times faster than the WorkQueue, and I don't really understand why.
Here is an explanation of my current code in detail:
Here is how to use my code on a resource group (with only shaders in it), pretty simple:
Code: Select all
CShaderBackgroundLoader* tmpShaderBackgroundLoader = new CShaderBackgroundLoader();
tmpResourceGroupManager.initialiseResourceGroup(m_groupName);
delete tmpShaderBackgroundLoader;
tmpResourceGroupManager.loadResourceGroup(m_groupName, true, true);
I had to alter a few things in the Ogre Source to be able to do this, here are the changes:
OgreScriptCompiler.h addition:
Code: Select all
class _OgreExport SetDefaultParametersGpuProgramScriptCompilerEvent : public ScriptCompilerEvent
{
public:
GpuProgram* mProgram;
AbstractNodePtr mParams;
static String eventType;
SetDefaultParametersGpuProgramScriptCompilerEvent(GpuProgram* program, AbstractNodePtr params)
:ScriptCompilerEvent(eventType), mProgram(program), mParams(params) {}
};
OgreScriptCompiler.cpp addition:
Code: Select all
String SetDefaultParametersGpuProgramScriptCompilerEvent::eventType = "setDefaultParametersGpuProgram";
GpuProgramTranslator::translateGpuProgram change (so we can choose to not set the default parameters on the shader):
Code: Select all
SetDefaultParametersGpuProgramScriptCompilerEvent evtSDP(prog, params);
if (!compiler->_fireEvent(&evtSDP, 0))
{
// Set up default parameters
if (prog->isSupported() && params)
{
GpuProgramParametersSharedPtr ptr = prog->getDefaultParameters();
GpuProgramTranslator::translateProgramParameters(compiler, ptr, static_cast<ObjectAbstractNode*>(params.get()));
}
}
This means we need to reach GpuProgramTranslator::translateProgramParameters in our code, but that is impossible as it cannot be reached by the user (that class is not exported or something), so I had to alter it as well:
OgreBuiltinScriptTranslators.h GpuProgramTranslator translateProgramParameters removal/commented out, since we cannot reach it from our user-code anyway:
Code: Select all
//static void translateProgramParameters(ScriptCompiler *compiler, GpuProgramParametersSharedPtr params, ObjectAbstractNode *obj);
OgreScriptTranslator.h ScriptTranslator:
Code: Select all
static void translateProgramParameters(ScriptCompiler *compiler, GpuProgramParametersSharedPtr params, ObjectAbstractNode *obj);
I also then replaced "GpuProgramTranslator::translateProgramParameters" with "ScriptTranslator::translateProgramParameters" everywhere in OgreScriptTranslator.cpp.
Here is my user code:
I have a define that I can remove (MY_OWN_THREADED_STUFF) to see the speed of the normal WorkQueue as well.
CShaderBackgroundLoader.cpp
Code: Select all
#define MY_OWN_THREADED_STUFF
// Constructor for the CShaderBackgroundLoader
CShaderBackgroundLoader::CShaderBackgroundLoader()
{
// Reset the resources count
m_resourcesCount = 0;
// Set the script compiler listener
ScriptCompilerManager::getSingleton().setListener(this);
}
// Destructor for the CShaderBackgroundLoader
CShaderBackgroundLoader::~CShaderBackgroundLoader()
{
// Wait until all threaded resources are loaded
WaitUntilAllCompleted();
// Remove the script compiler listener
ScriptCompilerManager::getSingleton().setListener(NULL);
// Clear the list of resources
#ifdef MY_OWN_THREADED_STUFF
m_resources.Clear();
#else
m_resources.clear();
#endif
}
// This gets called when a shader has compiled in the background, and this is always called on the main thread
#ifndef MY_OWN_THREADED_STUFF
void CShaderBackgroundLoader::operationCompleted(Ogre::WorkQueue::RequestID id, const Ogre::BackgroundProcessResult& result)
{
// Get the current resource
CContent& tmpContent = m_resources[id];
// Set the error to it (if any)
if (result.error ||
result.message != "")
tmpContent.m_errorMessage = result.message == "" ? "Unknown error" : result.message;
if (app->m_GUIManager->m_screen_loadingScreen)
app->m_GUIManager->m_screen_loadingScreen->SetStartupLoadingPercentage();
// There is no need to handle the error message as it already gets into the log automatically
//if (tmpContent.m_errorMessage != "")
// CGeneric::ShowMessage(tmpContent.m_resource->getName() + ", compile error: " + tmpContent.m_errorMessage);
// Set the text of the loading screen
if (app->m_GUIManager->m_screen_loadingScreen)
app->m_GUIManager->m_screen_loadingScreen->AddLogText("Compiled shader: " + tmpContent.m_resource->getName());
// Update loading screen
if (app->m_GUIManager->m_screen_loadingScreen)
app->m_GUIManager->m_screen_loadingScreen->UpdateLoading();
// This has no effect on speed for some reason when I profile it, it should, but it does not
// Load the resource (this is faster to do here while other threads are running than to call loadResourceGroup at the end)
//if (tmpContent.m_errorMessage != "")
// tmpContent.m_resource->load();
// Set up default parameters
if (tmpContent.m_resource->isSupported() && tmpContent.m_params)
{
GpuProgramParametersSharedPtr ptr = tmpContent.m_resource->getDefaultParameters();
ScriptTranslator::translateProgramParameters(tmpContent.m_compiler, ptr, static_cast<ObjectAbstractNode*>(tmpContent.m_params.get()));
}
// Remove the resource from the list
m_resources.erase(id);
}
#endif
class CResourceManager_ThreadClass
{
public:
static CList<CShaderBackgroundLoader::CContent>* list;
static void Finish(int index)
{
CShaderBackgroundLoader::CContent& tmpObj = (*list)[index];
if (app->m_GUIManager->m_screen_loadingScreen)
app->m_GUIManager->m_screen_loadingScreen->SetStartupLoadingPercentage();
// There is no need to handle the error message as it already gets into the log automatically
//if (content.m_errorMessage != "")
// CGeneric::ShowMessage(content.m_resource->getName() + ", compile error: " + content.m_errorMessage);
// Set the text of the loading screen
if (app->m_GUIManager->m_screen_loadingScreen)
app->m_GUIManager->m_screen_loadingScreen->AddLogText("Compiled shader: " + tmpObj.m_resource->getName());
/* app->m_GUIManager->m_screen_loadingScreen->AddLogText("Compiled shader (" +
#ifdef MY_OWN_THREADED_STUFF
CGeneric::ToString(m_resourcesCount--) + "/" + CGeneric::ToString(int(m_resources.Size())) + "): " + content.m_resource->getName());
#else
CGeneric::ToString(int(m_resources.size() - 1)) + "/" + CGeneric::ToString(m_resourcesCount) + "): " + content.m_resource->getName());
#endif*/
// Update loading screen
if (app->m_GUIManager->m_screen_loadingScreen)
app->m_GUIManager->m_screen_loadingScreen->UpdateLoading();
// This has no effect on speed for some reason when I profile it, it should, but it does not
// Load the resource (this is faster to do here while other threads are running than to call loadResourceGroup at the end)
//if (content.m_errorMessage != "")
// content.m_resource->load();
// Set up default parameters
if (tmpObj.m_resource->isSupported() && tmpObj.m_params)
{
GpuProgramParametersSharedPtr ptr = tmpObj.m_resource->getDefaultParameters();
ScriptTranslator::translateProgramParameters(tmpObj.m_compiler, ptr, static_cast<ObjectAbstractNode*>(tmpObj.m_params.get()));
}
}
static void CreateThread(int index, std::list<std::future<void>>& answers)
{
answers.push_back(std::async(std::launch::async, &DoThreadedAction, index));
}
static void DoThreadedAction(int index)
{
// Compile the shader
CShaderBackgroundLoader::CContent& tmpContent = (*list)[index];
try
{
tmpContent.m_resource->prepare(true);
}
catch (Exception& e)
{
tmpContent.m_errorMessage = e.getFullDescription();
}
}
};
CList<CShaderBackgroundLoader::CContent>* CResourceManager_ThreadClass::list = NULL;
// Waits until all threads are done
void CShaderBackgroundLoader::WaitUntilAllCompleted()
{
#ifdef MY_OWN_THREADED_STUFF
// Compile all shaders using threads
// Do the threaded actions until they are done
CResourceManager_ThreadClass::list = &m_resources;
CThreadManager::SetupAndRunUntilDone(12, CResourceManager_ThreadClass::list->Size(), &CResourceManager_ThreadClass::Finish, &CResourceManager_ThreadClass::CreateThread);
CResourceManager_ThreadClass::list = NULL;
#else
// Start looping
ResourceBackgroundQueue& tmpResourceBackgroundQueue = ResourceBackgroundQueue::getSingleton();
while (true)
{
// Loop through all resources and check if we have compiled all
bool tmpAllDone = true;
for (std::unordered_map<Ogre::WorkQueue::RequestID, CContent>::const_iterator tmpItr = m_resources.begin(); tmpItr != m_resources.end(); ++tmpItr)
{
Ogre::WorkQueue::RequestID tmpVar = tmpItr->first;
if (!tmpResourceBackgroundQueue.isProcessComplete(tmpVar))
{
tmpAllDone = false;
break;
}
}
if (tmpAllDone)
// Break out of the loop, we are done
break;
// Continue compiling the shaders in the background
Ogre::Root::getSingleton().getWorkQueue()->processResponses();
// Update loading screen
//if (app->m_GUIManager->m_screen_loadingScreen)
// app->m_GUIManager->m_screen_loadingScreen->UpdateLoading();
}
#endif
}
bool CShaderBackgroundLoader::handleEvent(ScriptCompiler *compiler, ScriptCompilerEvent *_evt, void *retval)
{
// Check if this event is trying to set default parameters
if (_evt->mType == "setDefaultParametersGpuProgram")
{
// Get the event
SetDefaultParametersGpuProgramScriptCompilerEvent* tmpEvent = (SetDefaultParametersGpuProgramScriptCompilerEvent*)_evt;
// Check if the event has a valid program
if (tmpEvent->mProgram)
{
// Compile the shader in a background thread
Ogre::HighLevelGpuProgramPtr tmpProgram = HighLevelGpuProgramManager::getSingleton().getByName(tmpEvent->mProgram->getName());
#ifdef MY_OWN_THREADED_STUFF
m_resources.Add(CContent(tmpProgram, tmpEvent->mParams, compiler));
#else
ResourceBackgroundQueue& tmpResourceBackgroundQueue = ResourceBackgroundQueue::getSingleton();
//prog->setBackgroundLoaded(true);
BackgroundProcessTicket tmpTicket = tmpResourceBackgroundQueue.prepare("HighLevelGpuProgram", tmpProgram->getName(), tmpProgram->getGroup(), false, NULL, NULL, this);
CContent& tmpContent = m_resources[tmpTicket];
tmpContent.m_resource = tmpProgram;
tmpContent.m_compiler = compiler;
tmpContent.m_params = tmpEvent->mParams;
#endif
m_resourcesCount++;
}
// Return that we should skip to set the default parameters.
// We now have all of its content, so we can set them when we feel like it instead.
return true;
}
// Return that we should not skip this event
return false;
}
CShaderBackgroundLoader.h
Code: Select all
#define MY_OWN_THREADED_STUFF
class CShaderBackgroundLoader
: public Ogre::ScriptCompilerListener
#ifndef MY_OWN_THREADED_STUFF
, Ogre::ResourceBackgroundQueue::Listener
#endif
{
public:
class CContent
{
public:
#ifdef MY_OWN_THREADED_STUFF
CContent(Ogre::HighLevelGpuProgramPtr resource, AbstractNodePtr params, ScriptCompiler* compiler)
{
m_resource = resource;
m_params = params;
m_compiler = compiler;
m_errorMessage = "";
}
#endif
~CContent()
{
m_resource.reset();
m_params.reset();
m_compiler = NULL;
}
Ogre::HighLevelGpuProgramPtr m_resource;
AbstractNodePtr m_params;
ScriptCompiler* m_compiler;
CString m_errorMessage;
};
#ifdef MY_OWN_THREADED_STUFF
CList<CContent> m_resources;
#else
std::unordered_map<Ogre::WorkQueue::RequestID, CContent> m_resources;
#endif
int m_resourcesCount;
CString m_type;
CShaderBackgroundLoader();
~CShaderBackgroundLoader();
#ifndef MY_OWN_THREADED_STUFF
void operationCompleted(Ogre::WorkQueue::RequestID id, const Ogre::BackgroundProcessResult& result);
#endif
void WaitUntilAllCompleted();
bool handleEvent(ScriptCompiler *compiler, ScriptCompilerEvent *_evt, void *retval);
};
Since the script compiler is only created once (in ScriptCompilerManager::ScriptCompilerManager), and never destroyed until shutting down Ogre (it seems like), we can store it in a pointer.
Its "mGroup" is also only set once, since we are doing this thing for the entire resource group until it is done, which means we can use it however we like as long as we don't create another resource group and try to load/initialize it while we are compiling these shaders.
That means we must load the entire resource group before continuing (which is fine, since we want to compile all shaders before we try to use/render them anyway).
It seems like the params it will not get destroyed when I save it, since it will only get destroyed when I remove the reference from it (at least I think that is correct, since it destroys it in _Decref when I use "m_params.reset()" later on).
Since we use initializeResourceGroup synchronously and set the params synchronously when the shader has been compiled, we never use "params" or "compiler" in a thread outside of the main thread.
Now I am just worried that there is something that I have missed that makes it not thread safe.
Is there something I might have done wrong?
I mean, everything in the game seems to work, but I am still worried of bugs that might appear in the future because of these threaded stuff.
I also wonder why Ogre WorkQueue is so slow compared to my own version of it.